Here be dragons
In Chuck Palahniuk’s words, "You must jump into disaster with both feet." Git submodules are generally avoided because:
They tend to be poorly understood, and
It’s hard to keep track one repository, let alone several that have to sync into one master project.
But here we are.
This is a quick overview on what Git submodules are, and how they’re useful for keeping your code manageable.
Let’s say we have a parent repository named
docsource-master, and that contains 3 submodules:
What are they good for?
Git submodules are used to break up large monolithic projects into smaller code-bases. This sounds great in theory — modular environments! — until you realise that juggling four repositories as opposed to managing one large one is a terrible idea.
If you can manage everything in one repository, you should never use a Git submodule to plug code into your code base. If in doubt, there are other tools that may do what you want (see Git tree).
Why submodules then? They have a few specific uses:
Sane Git histories
Git submodules allow you to plug other repositories into a parent repository while keeping both codebases isolated (technically, they live in the same
.git folder as the parent repository, but let’s not go there yet).
A submodule doesn’t add its commit history to the master project. Instead, it maintains its own commit history. The parent repository only records a given commit hash from its children submodules.
4086d4082355feab4172ce712cee4a17), I’ve committed
I make another commit to
master, bringing HEAD to
I make another commit to
master, bringing HEAD to
docsource-masterdoesn’t change its reference to
repo_1— it still remains at
This means that
repo_1 is never updated on
docsource-master unless I commit the new hash of
master HEAD, which is currently
We then know that we can make commits to
repo_1 without having to worry about breaking anything in
master, and vice-versa.
This becomes important when maintaining stable builds that require resources from multiple independent repositories, as is the case with
With an effective submodule system in place, we know that the relationship between the parent repository and children repository is always stable.
Which brings us to the next benefit:
Having a modular codebase means that writers can effectively isolate their work from other collaborators, so that no one blocks anyone else’s work.
We need to produce three sets of documentation for three different products. Because some features and documentation needs overlap, resources should be shared between the three sets of docs to keep information and assets single-sourced.
docsource-master started out as a single monolithic repository for ER, CR, and DR docs. We split out the projects for easier content distribution, and had
rsync synchronise resources between the projects. As long as work was done in branches, we were able to isolate work enough to prevent merge conflicts and a scattered history. But at some point, the side navigation stopped working — none of the projects were able to render the side navigation in the HTML output.
This resulted in two weeks spent trying to trace the commit that broke it — because we were still building the site, I may have made a change that broke it but didn’t realise it until more changes have been merged to the
master branch. We never found the commit — because we didn’t break the side navigation. It was an update to Flare 2017r2 that broke the layout.
But we weren’t able to rule out changes to the layout as a probable cause for the breakage, which led to two weeks of pointless troubleshooting and much gnashing of teeth.
Splitting out the projects into individual repositories makes sure that this never happens again. Any change to shared resources is tightly versioned and controlled from the
docsource-master project. Anything breakage on an individual project is isolated to that project itself. If the problem is found on all projects, then it’s either a toolchain issue, or a shared resource issue. Troubleshooting becomes a lot more straightforward, and the project becomes a lot more stable.
Sane Git histories
Splitting the repositories makes the project histories more readable.
Instead of having to track a specific change to a minute component on a sprawling monolithic repository, we can search each project’s history easily because it exists in its own repository.
By splitting the repos the way we have, we know that all commits to shared resources and build notes can be found in
docsource-master, and notes on individual issues can be found in the project submodules themselves. Specific commit notes become easier to find and read, instead of having to wade through notes from other projects.
Sane codebase mental models
They say that if you can’t keep a mental model of your codebase in your head, then it’s too complicated.
docsource-master project into submodules makes managing Flare’s complex directory strutures more manageable, and easier to cache in meat-RAM.
Working with submodules
Configure Git Submodule Visibility
Before attempting anything, set up Git to be more submodule friendly:
# include a submodule status summary when you run 'git status'. git config --global status.submoduleSummary true # improve 'git diff' for submodules. git config --global diff.submodule log # makes sure that 'git fetch' will always grab submodule refs. git config --global fetch.recurseSubmodules on-demand
Add Submodule to Container Repo
To add a submodule to the
git submodule add -b master <repository-remote-url> # we only want to sync the master branch of each submodule to the docsource repo
Submodules have to be updated individually. Do this by running in the parent repository root:
git pull && git submodule for each "git checkout master && git pull"
When you initialize a submodule in
docsource, git will create a
.gitmodules file that will contain config re: submodules. It will look like this:
[submodule "cr-core"] path = cr-core url = ssh://repo-man.internal.groundlabs.com:7999/doc/cr-core.git branch = master
In addition, git will add an entry to your
.git/config file. For example:
[core] repositoryformatversion = 0 filemode = false bare = false logallrefupdates = true symlinks = false ignorecase = true [remote "origin"] url = ssh://email@example.com:7999/doc/docsource-master.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master [branch "develop"] remote = origin merge = refs/heads/develop [submodule "cr-core"] url = ssh://repo-man.internal.groundlabs.com:7999/doc/cr-core.git
Git 1.7.8\^, all submodule refs are stored in
.git/modules. If a submodule were to be removed in a branch, it would persist in
.git/modules, allowing the container repository to keep track of the submodule outside of the working directory.
This means that to remove a submodule, in addition to removing its entries in
.git/config, you have to remove the refs in
Most of this readme is derived from this fantastic article: Christophe Porteneuve, "Mastering Git submodules," published 9 Jan 2015. Available: https://medium.com/@porteneuve/mastering-git-submodules-34c65e940407