In the first part of this article series we looked into the objects that make up Git’s datastore (blobs, trees and commits). We also saw how commits participate in a directed acyclic graph (or DAG).
In this second, and final part of this series we examine a few Git commands and see how they work with and manipulate the DAG.
Initial set up
If you have been playing along since we last met then you can continue to use the
gitsGuts repository we set up.
Otherwise let us quickly initialize a new repository with a few commits so we have a basis to work with.
$ git init gitsGuts (1) # Initialized empty Git repository in /Users/looselytyped/Documents/articles/gitsGuts/.git/ $ cd gitsGuts (2) $ (master) echo 'Hello Git!' > README.md (3) $ (master) mkdir src $ (master) echo '// This is my source code' > src/Main.java (4) $ (master) cd .. $ (master) git add . $ (master) git commit -m "Initial commit" (5) # [master (root-commit) 3cf00f8] Initial commit # 2 files changed, 2 insertions(+) # create mode 100644 README.md # create mode 100644 src/Main.java $ (master) echo 'Making another commit' >> README.md (6) $ (master) git add README.md $ (master) git ci -m "Second commit" (7) # [master aed7e05] Second commit # 1 file changed, 1 insertion(+) .... <1> Initialize a new respository <2> Be sure to cd into it! <3> Initialize a README me file with some text <4> Initialize another plain text file inside the src sub-directory <5> git-add and git-commit both the files <6> Edit the README file by appending some text <7> Make a second commit
We now have a Git repository with 2 files and 2 commits. Just to be sure we are on the same page let us inspect directory structure using the
I also display an abbreviated version of the Git log. 
$ (master) tree (1) # . # ├── README.md # └── src # └── Main.java # 1 directory, 2 files $ (master) git lg (2) # * aed7e05 - (HEAD, master) Second commit <Raju Gandhi> # * 3cf00f8 - Initial commit <Raju Gandhi> .... <1> Display the structure of the repository <2> An abbreviated Git log
Bear in mind that the hashes of your commits will be different than those you see in my log.
The latest commit in my repository happens to be
aed7e05 — be sure to remember yours.
Looking good? Then let us talk about branching.
If you have used Git for any amount of time then you are most certainly used to and are most likely an ardent proponent of branching. You have probably been even told or heard that branching in Git is really cheap. So how does branching in Git really work?
One way to think about branches in Git is to think of them as sticky notes. You can visualize these sticky notes to have two lines of text in them — the first line contains the name of the branch, written by a thick permanent marker. The second line on the sticky note happens to be written using a pencil and is the hash of the last commit on that branch.
Let us start by examining the
.git/refs/heads directory, and then create a new branch using
git-branch, and inspect that directory once again.
$ (master) tree .git/refs/heads (1) # .git/refs/heads # └── master # # 0 directories, 1 file $ (master) cat .git/refs/heads/master (2) # aed7e05f8b3fc115c1c2507c79454c002383e9ee $ (master) git branch featureBranch (3) $ (master) tree .git/refs/heads (4) # .git/refs/heads # ├── featureBranch # └── master # # 0 directories, 2 files $ (master) cat .git/refs/heads/featureBranch (5) # aed7e05f8b3fc115c1c2507c79454c002383e9ee .... <1> List the files under the .git/refs/heads <2> Inspect the contents of the master file <3> Create a new branch using git-branch <4> List the files under .git/refs/heads again to see a new file <5> Display the contents of the newly created file
Recall that by default Git creates a
master branch for our repository.
Listing the files under the
.git/refs/heads directory reveals a file with exactly that name.
.git/refs/heads/master happens to be a plain text file that contains exactly one line of text — which is the hash of the latest commit on the
We then create a new branch using the
git-branch command supplying it with the name of the branch.
.git/refs/heads once again reveals that a new file now resides beneath it — and the name of the file just so happens to be the name of the newly created branch.
Inspecting the contents of
.git/refs/heads/featureBranch tells us that it too contains the same hash as the
master file — or in other words the hash of the latest commit on that branch.
In this illustration I have added how branches play into the DAG. You will notice that this is a slightly different version of the illustration that we saw in Part I of this series — in that I have stripped out the trees that the commits point to, and correspondingly sub-trees and blobs. This will allow us to focus on the DAG.
As you can see Git branches are simply pointers, or references — that point to commit object using their hashes. Each branch has two parts to it — the name of the branch, and the commit it points to.
Now, what did I mean earlier when I spoke of permanent markers and pencils and sticky notes? It turns out that we can answer that question simply by making another commit. Let us do that, shall we?
$ (master) git status (1) # On branch master # nothing to commit, working directory clean $ (master) echo 'Making a third commit on master' >> README.md (2) $ (master) git add README.md $ (master) git commit -m "Third commit" (3) # [master a509575] Third commit # 1 file changed, 1 insertion(+) $ (master) git lg (4) # * a509575 - (HEAD, master) Third commit <Raju Gandhi> # * aed7e05 - (featureBranch) Second commit <Raju Gandhi> # * 3cf00f8 - Initial commit <Raju Gandhi> .... <1> Git status tells us we are on the master branch and the working directory is clean <2> Make an edit <3> Commit the edit <4> Display the abbreviated git log
The abbreviated Git log tells us that the
master branch is one commit ahead of
If you recall our discussion from Part I of this series article you know that when we made our latest commit Git created a new commit object.
This commit has a calculated hash of
a509575203205931cbcfc5a21d11c395ffbdced4 to be precise) and has a pointer to its parent commit which happens to be
Git also took the sticky note with
master on it, erased the hash that was previously written on it and replaced it with
You can verify this by simply
$ (master) cat .git/refs/heads/master # a509575203205931cbcfc5a21d11c395ffbdced4 $ (master) cat .git/refs/heads/featureBranch # aed7e05f8b3fc115c1c2507c79454c002383e9ee
You can visualize the net effect in the following illustration.
It really is that simple! Git simply adds to the DAG just as we expected it to, and updates the appropriate references (written in pencil). The name of the branch needs no updating, hence in our analogy the name can be seen as written with a permanent marker. 
You should also note that the commit has no knowledge of any of the references that point to it — that information is maintained outside the DAG.
Quiz time — can you visualize what were to happen if I checked out
featureBranch and made a commit on that branch?
Git creates a new commit with
aed7e05f8b3fc115c1c2507c79454c002383e9ee as the parent, then updates the
featureBranch sticky note with the hash of the latest commit on that branch.
Take a look.
You see how the code diverges away from
What if we were to delete a branch, say
git branch -D master? 
Git simply takes the sticky note with
master on it, crumples it and throws it away!
On inspecting the
.git/refs/heads directory you will see that the
master file has indeed been deleted.
You might wonder about the commit that
master was referencing prior to being deleted.
In our particular scenario you can see that if the
master sticky note disappears there is nothing referencing the latest commit on that branch.
Git will eventually  throw that commit away as well.
Note that all other commit objects in the DAG have a reference to it — that could be a sticky note or child commit treating it as its parent.
As long as a commit object has a hard reference to it, Git will keep it around, else it will be garbage collected.
In this section we saw how
git-branch affects the DAG, and how operations like
git-commit and deleting branches affect the DAG.
One thing you might have been wondering about all along is — how does Git know which branch to work on? Let us take a look, shall we?
Whenever we wish to work on a particular branch in Git we have to check it out. What does this mean in terms of the DAG, and is there more to it than meets the eye?
Our leading character for this section is the
HEAD file that resides directly beneath the
Let us start by inspecting the
HEAD file, then checkout (or switch) branches and see what happens. (Please note that if you have been following along on the terminal you should have
featureBranch checked out and we will need to create another branch just so we can switch to it since we deleted
$ (featureBranch) git branch master (1) $ (featureBranch) cat .git/HEAD (2) # ref: refs/heads/featureBranch $ (featureBranch) git checkout master (3) # Switched to branch 'master' $ (master) cat .git/HEAD (4) # ref: refs/heads/master .... <1> Recreate master <2> List the contents of .git/HEAD <3> Switch branches <4> Inspect .git/HEAD again
First things first — the
.git/HEAD file tells Git what the
HEAD currently points to.
Furthermore, it turns out that the
HEAD file, unlike the
refs files does not seem to contain a hash.
Rather, it seems to point to a reference!
Another way to think about this is that the
HEAD is a symbolic reference, in that it does not directly point to a hash, rather it points to the reference that represents the currently checked out commit.
You can visualize how the
HEAD works as shown here (I have truncated the diagram for brevity)
featureBranchis checked out
After checking out
master this is how the DAG would look
masteris checked out
As you can see, whatever
HEAD points to represents what is “checked” out.
But there is more to that than meets the eye.
The most important thing to bear in the mind about the
HEAD is that the
HEAD will always represent the parent of the next commit.
There is no exception to this rule.
Knowing this, can you see how making a commit now will work?
Git will kick off all the machinery that is needed to calculate the hashes of the blobs, trees, and finally the commit. It will use the commit that
HEAD points to, and make that commit the parent of the next commit.
Now that the commit is a member of the DAG, Git will simply rewrite the
master sticky note with the hash of the new commit.
HEAD need updating?
It continues to point to the
Knowing that the
HEAD will always be the parent of the next commit has a few implications.
If you have ever committed on the wrong branch then it was because you lost track of your
HEAD (pun intended).
Liberal use of
git-status is a good way to avoid the aforementioned problem.
An alternative is to combine the use of git-prompt.sh along with some
bash prompt trickery to always have the branch you have checked out visible when working at the terminal.
There is yet another powerful, and often nerve-racking (especially for newcomers to Git) facet to the
For a minute let us consider what happens when we
git-checkout a branch.
Git looks in the
.git/refs/heads directory to find the file that matches the name of the branch we wish to check out and identifies the hash that that branch currently points to.
It then looks in the
.git/refs/objects directory and finds the commit object that the hash represents and “unfolds” it — in that it finds the tree the commit points to and recreates the working directory as represented by that tree object.
Finally, it rewrites
.git/HEAD file to symbolically point to the newly checked out branch.
If you were to boil down the
git-checkout lookup algorithm to its essence you could think of Git as checking out a hash!
We are programmers, and now we are curious — what if we were to checkout a hash? What happens? Let us find out, shall we?
$ (master) git lg (1) # * 40ee28b - (HEAD, master, featureBranch) Some commit <Raju Gandhi> # * aed7e05 - Second commit <Raju Gandhi> # * 3cf00f8 - Initial commit <Raju Gandhi> $ (master) git checkout aed7e05 (2) # Note: checking out 'aed7e05'. # # You are in 'detached HEAD' state. You can look around, make experimental # changes and commit them, and you can discard any commits you make in this # state without impacting any branches by performing another checkout. # # If you want to create a new branch to retain commits you create, you may # do so (now or later) by using -b with the checkout command again. Example: # # git checkout -b new_branch_name # # HEAD is now at aed7e05... Second commit .... <1> Abbreviated git log <2> Pick the second commit the check it out
We start by looking at the log (just so we can pick a commit hash at random) and then proceed to check it out. Git informs us that we are in detached HEAD state — we will see what that means in a minute.
Before we proceed I want you to read the warning that Git emitted when we checked out
Moving on then …
First things first, what does
HEAD point to?
That one is easy — we can simply
$ ((aed7e05...)) cat .git/HEAD # aed7e05f8b3fc115c1c2507c79454c002383e9ee
.git/HEAD points directly to a hash instead of symbolically pointing to one via a reference.
Let us attempt to visualize how this looks.
As you can see
HEAD now points to a commit directly.
Knowing this, and that
HEAD will always point to the parent of the next commit, can you visualize what were to happen if were to make a commit at this point?
Let us quickly make a commit, and then lay out the DAG so we can conceptualize how the DAG changed.
$ ((aed7e05...)) echo 'In Detached HEAD state' >> README.md (1) $ ((aed7e05...)) git add README.md (2) $ ((aed7e05...)) git commit -m "Making a commit in detached HEAD state" (3) # [detached HEAD ff21829] Making a commit in detached HEAD state # 1 file changed, 1 insertion(+) $ ((ff21829...)) git lg (4) # * ff21829 - (HEAD) Making a commit in detached HEAD state <Raju Gandhi> # | * 40ee28b - (master, featureBranch) Some commit <Raju Gandhi> # |/ # * aed7e05 - Second commit <Raju Gandhi> # * 3cf00f8 - Initial commit <Raju Gandhi> .... <1> Make an edit <2> Add the file to the index <3> Make a commit <4> Git log
We see we have a new commit (
HEAD now points to.
Any ideas on the DAG?
We are one step away from truly understanding what the “detached” in detached HEAD means.
Answer this question — what happens if were to
featureBranch (or for that matter any other commit?)
If we are to checkout another commit then the
HEAD would directly or indirectly point to that commit — and leave
Who points to
Which means that when Git’s garbage collector comes around (and it will) our newly created commit will disappear.
Another way to think about detached HEAD state is to think of it as being on an anonymous branch.
See, when we have the
HEAD pointing directly to a commit Git continues to behave like if were working with a “named” branch — except when we check something else out.
At that point there is a small chance that if we are not careful the commit that
HEAD was pointing to may not have anything else pointing to it.
And we know what happens to commits that have no hard references to them, yes?
What are the chances that we will leave a commit behind?
Let us check out
master right now and see what happens.
$ ((ff21829...)) git checkout master # Warning: you are leaving 1 commit behind, not connected to # any of your branches: # # ff21829 Making a commit in detached HEAD state # # If you want to keep them by creating a new branch, this may be a good time # to do so with: # # git branch new_branch_name ff21829 # # Switched to branch 'master'
Git ever so nicely warns us that we are indeed leaving
ff21829 behind, and if we do wish to keep it around it may serve us well to create a new branch.
It even tells us how to go about doing it.
In essence Git is telling us to create a sticky note as a reminder of the commits hash!
Keeping track of the
HEAD when working in Git is essential since it dictates where our changes will eventually end up in the DAG.
However, Git allowing us to move the
HEAD to any arbitrary commit allows us to be playful — we can checkout any other state of our repository for quick and dirty experimentation or debugging.
If we like what we see we can simply create a new branch and keep our changes around a little bit longer, or simply checkout some other commit and be on our merry way knowing that Git’s garbage collector will come around and clean up our mess for us.
Reiterating what I said about Git at the end of Part I — Git’s power comes from simplicity. The DAG represents the fundamental datastructure that Git uses to store our repository’s history — and all commands that we love and use in Git affect that DAG. We now understand how the DAG is built, and we understand how a few commands operate on that DAG.
Take a look at any of Gits man-pages for
git-rebase or what-have-you — you will see references to the DAG everywhere.
Where do we go from here? I suggest the next time you issue start to work with Git you keep a mental picture of the DAG in your mind’s eye. The next time you are about to issue a command to Git attempt to visualize what the DAG will look like after the command executes, then attempt to find out  if you got it right.
Till next time, May the DAG be with you.
git log --graph --all --full-history --color --pretty=format:'%x1b[31m%h%x09%x1b[32m %C(white)- %d%x1b[0m%x20%s %C(bold blue)<%an>%Creset'. I have the same aliased to
-D(uppercase) flag in this case since Git will complain of
masternot being fully merged