git inside, git better, git stronger

… exploring git from the inside out …

###Preface

Apologies up front - this is long... very long, but there is food for the brain here. I will discuss git for personal use. Using it with many other contributors is an extension of the principle covered here. For simplicity, I am only going to discuss you working alone and only using a single "branch" of code for each project - again, for simplicity. Additionally we are going to examine the git "database" (stored as compressed SHA-1 named files/objects) as we go. We can gain a better understanding by reviewing the changes to the guts of your personal git repository. We will not cover branches, tags, or the git daemon in this discussion.

###Why git? (skip this if you are less than patient)

Many years ago I used backups of my projects or just documents, notes, etc. All worked well until it didn’t. If you make a mistake and back it up guess what,

now you have two broken copies. Then I used cvs - awesome. It worked for me for years. Never lost code (as long as I put it under cvs). Really, maintained my work and previous versions with only very minor issues but kept my data useful and available. I tried git but wow, so many moving parts, and I had trouble wrapping my head around the differences to older version control systems where a “master” copy was kept in one place and all other copies were “clients” who shared and merged with the master copy. But git is different and much better - every copy is a master, fully self standing, with it’s own repository residing within the project directory itself. Hidden in the .git directory but always working with you as your progress. Full blown copies can be “cloned” to any other server you desire. And any set of copies can all be merged together easily. As wonderful as this all is…. alas, git has miles and miles of depth. That, like most really powerful programs, is a good bad thing. You can get lost in the features and they can trip you up but the secret is to learn incrementally. Bite off small bits at a time. Take your time and the investment will definitely be rewarding. When you extend git into a world with multiple (even many multiples) of contributors git really shines. All your files are stored in the .git/objects/pack/ directory with weird long name and as read only protecting you from accidental deletion.

###Getting Started with git

I have, over the years, always had a “$HOME/dev/” directory where I keep all my code or even document files, graphics, etc. Really anything I want to preserve, especially if versions are involved. So this discussion will be using $HOME/dev to place our projects; each project with its own directory.

EXERCISE

mkdir -p $HOME/dev/mypjct
cd $HOME/dev/myprjct
ls -ltra

Nothing exists in this directory.

We can initialize (start) a git repository any time you desire. Because you want to be very careful, you are going to initialize your project before creating any files. This will also allow us to see the changes to the git database (really the .git directory) as we work with git.

git init
ls -ltra

Here is what your project directory looks like now.

+-----.git
|     +-----branches
|     +-----config
|     +-----description
|     +-----HEAD
|     +-----hooks
|     |     +-----applypatch-msg.sample
|     |     +-----commit-msg.sample
|     |     +-----post-update.sample
|     |     +-----pre-applypatch.sample
|     |     +-----pre-commit.sample
|     |     +-----prepare-commit-msg.sample
|     |     +-----pre-push.sample
|     |     +-----pre-rebase.sample
|     |     +-----update.sample
|     +-----info
|     |     +-----exclude
|     +-----objects
|     |     +-----info
|     |     +-----pack
|     +-----refs
|     |     +-----heads
|     |     +-----tags

Please note the objects directory becuase that is where all your files changes will get recorded as you work with git.

We are going to create a test file called t.f and add the text “ver 1”.

echo "ver 1" >> t.f

No changes have occurred to git at this point. You now have one “working” file in your project directory.

Now “stage” the new file [t.f] into git with this command.

git add -a * # or we can use a shortcut command: git add -A (which we use going forward)

Now lets look at your project directory again and note the changes from above.

+-----.git
|     +-----branches
|     +-----config
|     +-----description
|     +-----HEAD
|     +-----hooks
|     |     +-----applypatch-msg.sample
|     |     +-----commit-msg.sample
|     |     +-----post-update.sample
|     |     +-----pre-applypatch.sample
|     |     +-----pre-commit.sample
|     |     +-----prepare-commit-msg.sample
|     |     +-----pre-push.sample
|     |     +-----pre-rebase.sample
|     |     +-----update.sample
|     +-----index
|     +-----info
|     |     +-----exclude
|     +-----objects
|     |     +-----5b
|     |     |     +-----d657f7b8fcd822d6e44202bc33e808cdd01ee7
|     |     +-----info
|     |     +-----pack
|     +-----refs
|     |     +-----heads
|     |     +-----tags
+-----t.f

Now we can see our new file listed last [t.f] and also a new directory [5b] under .git/objects and a file under that [d657f7b8fcd822d6e44202bc33e808cdd01ee7]. Here is where it gets a little complicated just stick with me. This is one git “object” which was a single file called [5bd657f7b8fcd822d6e44202bc33e808cdd01ee7] - the first two characters of this file where used to create a subdirectory and the rest of the name was used for a file name. The full name here is actually the SHA-1 hash value of the contents of the file. Also know that the contents of the file have been compressed with the zlib library. There is a git command to allow you to see the contents which will do here. Just understand that the git “object name” is really the subdirectory+filename hence [5bd657f7b8fcd822d6e44202bc33e808cdd01ee7] and the git way to print the contents out is a bit strange. You do not need to remember this command; it is just so we can monitor things as we go.

git cat-file -p 5bd657f7b8fcd822d6e44202bc33e808cdd01ee7

This shows the content of the file only [ver 1].

Now just so you can see what can be done at this point let’s delete your working file and then list the directory to see that it is gone then use git to restore it!

rm t.f
ls -ltra
# your t.f file is gone!
git checkout t.f
ls -ltra
# your t.f file is back!

If you examine the .git directory tree at this point you will that there are no changes. Now lets “commit” what we “staged” earlier and then examine the .git directory tree again to see what has occurred (we will only list the .git/objects tree for brevity.

git commit -m "commit of ver 1" # -m just means add this message - without it 
                                # you would be forced into an editor and required
                                # to add a message - a good/required habbit to get into.

Now the .git/objects tree:

objects
+-----03
|     +-----f1148542c5dbf6347c7e8e6a62863258c1c3b0
+-----5b
|     +-----d657f7b8fcd822d6e44202bc33e808cdd01ee7
+-----f0
|     +-----fa0104ec53e1614f556472b6ec9e355e297d7f
+-----info
+-----pack

Our original object which was “staged” is still here under [5b] and two new ones appear. Let’s look at the contents of each.

$ git cat-file -p 03f1148542c5dbf6347c7e8e6a62863258c1c3b0
tree f0fa0104ec53e1614f556472b6ec9e355e297d7f
author Geoff McNamara <geoff.mcnamara@gmail.com> 1430154082 -0400
committer Geoff McNamara <geoff.mcnamara@gmail.com> 1430154082 -0400

commit ver 1

$ git cat-file -p 5bd657f7b8fcd822d6e44202bc33e808cdd01ee7
ver 1

$ git cat-file -p f0fa0104ec53e1614f556472b6ec9e355e297d7f
100644 blob 5bd657f7b8fcd822d6e44202bc33e808cdd01ee7    t.f

Let’s also look at the git log

$ git log
commit 03f1148542c5dbf6347c7e8e6a62863258c1c3b0
Author: Geoff McNamara <geoff.mcnamara@gmail.com>
Date:   Mon Apr 27 13:01:22 2015 -0400

    commit ver 1

$ git status
On branch master
nothing to commit, working directory clean

The git log refers to the object which has the contents of the first file we examined above.

So how are these files/objects/versions connected? They are chained as parent/child. Lets examine a later (more recent) object and note it has a parent which is the object above. First we will modify the t.f file, git add it, and commit it.

$ echo "updating to ver 2" >> t.g
$ git add -A
$ git commit -m "committing ver 2"
$ git cat-file -p 46769acc026e57212ca06ad8991643061ac429eb
tree c828f096c4a01ca3cffa690c484b5ef8ee714a2c
parent 03f1148542c5dbf6347c7e8e6a62863258c1c3b0
author Geoff McNamara <geoff.mcnamara@gmail.com> 1430157487 -0400
committer Geoff McNamara <geoff.mcnamara@gmail.com> 1430157487 -0400

updating to ver 2

This is enough for you to start to appreciate how git stores data. Obviously we only scratched the surface here.

Repositories for git exists as either a working repository or a “shared” repository - created as “bare” with “git init –bare” - more on this later.

Before we move on please take a look at your .git/config file.

cat .git/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true

Note the “bare = false” so by default this is not a “shared” repository. It is a “working” project directory.

Setting up a remote shared git server

#############################################

A remote shared server can be used as a personal storage/backup of your important code and files. It becomes especially useful if it is universally available from anywhere by ssh or http (read: internet accessible).

A shared git repo is just a bit different from your self-standing git project “working” directory. It will only contain the git structure found in your .git directory and by convention it will have an extension on the name of [.git]. Your project .git directory has these subdirectories or files branches, config, description, HEAD, hooks, info, objects, and refs. Your local project directory also has your working files. A remote shared directory has no working file - it acts only as a git repository so it has all the git files or subdirectories but no working files. It is created differently [git init --bare] and used only remotely for checking files in (pushes) and checking them out (pulls).

Create a git user and a .ssh directory for that user on the remote server. Or you can use any user on the remote server… just understand the permission for the files are influenced by the user who tries to access them. We suggest a generic user name like “git” because this repository will be shared - it will not be a “working” project directory.

 sudo adduser git
 su git
 cd
 mkdir .ssh && chmod 700 .ssh 
 touch .ssh/authorized_keys && chmod 600 .ssh/authorized_keys

To make logins easier (passwordless)

You need to add your SSH public keys to the remote authorized_keys file for the git user. You should take a look at it just to be familar with it.

$ cat myhome/.ssh/id_rsa.pub
ssh-rsa AAAAB3NADAQABAAABAQCB007n/ww+ouN4gSLKssMxXnBOvf9LGt4L...dAv8JggJICUvax2T9va5... user@host

Keep in mind that the content of this file is all one line. Now append that into to the git user’s authorized_keys …

ssh-copy-id git@remote_server  # we use the tool provided by ssh 
# This will do the same as this below
# except it offers the advantage of making sure 
# the ~/.ssh directory exists on the remote server.
# 
cat /tmp/id_rsa.pub >> git@remote_server:.ssh/authorized_keys
# the above command requires that the 
# remote user .ssh directory already exists. 
# Best to use ssh-copy-id provided for this purpose
# but know you know what it is doing for you.

Lock down the git user shell

Change the git user shell to: /usr/bin/git-shell on the remote server. Make sure that git-shell exists in the /etc/shells file >>Note: you will not be able to ssh into the server as the git user… but it will allow git commands to work against it.

Build an empty git initialized/prepared directory for every project

On the remote server you need to build out a project.git directory and initialize if with git init --bare to allow a new project to be pushed to the remote server.

NOTE: git init --bare is VERY DIFFERENT from git init bare. The first creates a shared git repository (with no working files). The second command creates a local project directory named “bare” complete with its own .git directory. The “bare” project directory is ready to receive and git manage any new working files.

Create an empty shared git directory (remote repository) on the remote server for the project (same process for any new shared project).

 cd /opt/gitroot # I used /data/gitroot - you can name this 
                 # git repo dir anything you want
 mkdir project.git # yes, add the git extension - trust me here - it is convention
 cd project.git # IMPORTANT - by convention shared git repos are named with a [.git] extension!!!
 git init --bare # build out a default git directories with needed files (it will
                 # not create .git - instead all the directories are not hidden
                 # and at the project directory level. Do an `ls -l` to see the
                 # difference

git init --bare builds out the bare .git/ directory with all the needed structure and files for controlling git but no .git subdirectory.

It will look like this before any pushes are done.

+-----branches
+-----config
+-----description
+-----HEAD
+-----hooks
|     +-----applypatch-msg.sample
|     +-----commit-msg.sample
|     +-----post-update.sample
|     +-----pre-applypatch.sample
|     +-----pre-commit.sample
|     +-----prepare-commit-msg.sample
|     +-----pre-push.sample
|     +-----pre-rebase.sample
|     +-----update.sample
+-----info
|     +-----exclude
+-----objects
|     +-----info
|     +-----pack
+-----refs
|     +-----heads
|     +-----tags

Note: no objects yet and no .git subdirectory (and no working files will be stored here).

Prepare your project directory on the local computer if you haven’t already

 cd myproject # even when files already exist
 git init # to build out the .git/ directory structure and files
 git add . # stages files in the cache (staging area)
 git commit -m 'Initial base commit' # commits the staged files to you local git repo 
 # the .git/ directory and files now has all the history and needed info making up
 # your local git repo.
 # this next command adds/changes a line in the local myproject/.git/config
 git remote add origin git@gitserver:/opt/git/project.git
 # the above line tells git you want a remote server which you are naming "origin"
 # to act as another copy of your local git repo and its location is then declared
 # This information is now stored in your prj/.git/config file - go look at it to see.
 git push origin master # this pushes your git repo up to the declared remote server
                        # which has the alias name "origin"
 # git status # note the output on this will say you are "on branch master"
 # that is the default working area.

Now the remote shared prj.git repository might look something like this:

+-----branches
+-----config
+-----description
+-----HEAD
+-----hooks
|     +-----applypatch-msg.sample
|     +-----commit-msg.sample
|     +-----post-update.sample
|     +-----pre-applypatch.sample
|     +-----pre-commit.sample
|     +-----prepare-commit-msg.sample
|     +-----pre-push.sample
|     +-----pre-rebase.sample
|     +-----update.sample
+-----info
|     +-----exclude
+-----objects
|     +-----03
|     |     +-----f1148542c5dbf6347c7e8e6a62863258c1c3b0
|     +-----46
|     |     +-----769acc026e57212ca06ad8991643061ac429eb
|     +-----4e
|     |     +-----f21067659fb999c8efcdb03e800245dbd80eb0
|     +-----5b
|     |     +-----d657f7b8fcd822d6e44202bc33e808cdd01ee7
|     +-----84
|     |     +-----bce25cbc083f803b75fc34f4b4577ea95c6a0b
|     +-----8a
|     |     +-----611c4dabd6b8d8654d554a8d5378656682c4f9
|     +-----c8
|     |     +-----28f096c4a01ca3cffa690c484b5ef8ee714a2c
|     +-----f0
|     |     +-----fa0104ec53e1614f556472b6ec9e355e297d7f
|     +-----f2
|     |     +-----79361d9a6ab19614d57d96cafd47a17280b02e
|     +-----info
|     +-----pack
+-----refs
|     +-----heads
|     |     +-----master
|     +-----tags

Note the additional objects and NO WORKING FILES.

Ok, lets have just a small aside on how git is different from other older version control programs like CVS. git likes living all alone by itself but it is also just as happy to share life with others. When you do a git init it builds out a whole new .git/ directory where - this becomes your local repository - complete in every way - all local to your project directory. When you do a git add . you “stage” files. This gives you a chance to unstage file(s) before committing them to the repository storage. Once you run git commit all the staged files are stored as copies into the local repository and they are now available for checkout as the latest version. Everything is self contained in your local project directory - your original files and all changes, history, latest version etc all in the .git/ repository. So this doesn’t seem like it offers and safety advantages until your realize that you can very easily clone everything onto any other PC … and another and another - all existing as a full self-sufficient repository.

Anyone can now clone it and push changes back up just as easily:

# from another remote machine
 git clone git@remoteserver:/opt/gitroot/project.git # clone the whole
                                                     #  enchilada - notice
                                                     # the .git extension
                                                     # denoting this is
                                                     # a shared git repos
 cd project
 vim README # make any change you want
 git commit -am 'Small modification to README' # commits it locally
 git push origin master # pushes it to the server it was claned from above
 # go look at .git/config to gain a little better understanding

There you have it - one fully functional personal repository and one shared repository..

Enjoy,

-g-

Geoff McNamara

“Do not meddle in the affairs of wizards, for they are subtle and quick to anger.” J.R.R Tolkien

blah Elizabeth City, NC https://www.companionway.net