The design and use of the git database is vastly different than the other Version Control Systems, which results in some great advantages that we will get to. First there is the Git blob, which is the basic data storage unit.
For now, let’s play with the object model!
The main object in the git object store: The “blob”
The “blob” type is just a bunch of bytes that could be anything (for example a text file, source code, executable binary file, or picture, etc.). There are a total of 4 types of data in git’s object store.
Let’s start with a file named README and we want to add to a git repository. We will:
- Create a git repository
- Add the README file to the git “object store” (database):
We will then show how the file is stored in the git object store.
Below are 4 diagrams. You can click on the thumbnails, in the lower left hand corner, to see the different diagrams.
- Diagram 1: Shows that the git add README command:
- Computes the hash of the file.
- Stores the contents of the README file using the hash of the file to name the file.
- Adds a reference to the README file to the git index
- Diagram 2: Shows the .git directory after the git add README command is done. Git stores the contents of user files in .git/objects. The first two digits of the hash of the README file is bc so a directory was created by the name bc. The rest of the hash is used as the filename in the bc directory.
- Diagram 3: This is the same diagram as diagram 2, but with various notes, pointing to filenames, directories, the hash, etc.
- Diagram 4: Shows how the hash of the README file relates to the directory and filename where the contents of
README is stored.
Click on the thumbnail images at the bottom of the diagram:
The Hash Used For the Directory and Filename
The diagram below is another way to see how the git add README command uses the hash
value of the README file to determine the directory and filename where the contents of the README file should be stored.
- How you can use the “git hash-object filename” command to tell you the hash value of a file.
- Git takes the first two characters of the hash of the README file (bc) for the directory name.
- Git takes the the remaining 38 characters of the hash as the filename. Git then stores the README file contents in .git/objects using the 2-byte directory from step 2 and 38-byte remaining characters of the hash.
- After the README file contents is stored in the git object store, the README filename and its contents is referred to in the git index.
- The README file is stored in the git object store by its hash, not its filename. This method of storing files based on the file’s contents (instead of the original filename) is called “content addressable storage”.
- The filename is stored in git as a “blob” type. The “blob” type is just a bunch of bytes that could be anything (source code, executable binary file, or image, etc.). There are a total of 4 types of data in git’s object store.
- If another file with the exact same contents is added to git, git will notice that the hash is identical (identical file contents result in an identical hash) and git will not need to store another copy of the file in git’s object store.
- Git stores compressed versions of files and can also “pack” files to save disk space. “Packing” files allows a small change to a large file to not require the two files to be stored, separately. Instead, git can “pack” the two files together and refer to their common portions to save disk space.
Poke Around The Git Object Store
Go ahead a create a git repository with git init, then git add a file to the git object store, and then poke around in the .git directory. You can see in the below diagram that you don’t need to type the whole 40-character hash value when using git
commands, but can abbreviate the hash to around 6 characters.
Desperate for the gory details?
If you are desperate for the gory details: What Is The File Format Of A Git “Blob”?
You’ve now seen the most-used git object type: blob – a bunch of bytes.