273 lines
15 KiB
Org Mode
273 lines
15 KiB
Org Mode
#+SETUPFILE: ../../config.org
|
||
#+TAGS: Git(g)
|
||
#+CATEGORY: Engineering
|
||
#+DATE: 2022-02-19
|
||
#+TITLE: Git Etiquette
|
||
#+DESC: A rather long essay on how to use Git in a civilized way
|
||
|
||
* Rational :Git:
|
||
The ability of making tools and using them is one of the many things that makes us special
|
||
and the skills to use tools properly is what makes some of us
|
||
elites.
|
||
|
||
As software engineers, interacting with [[https://git-scm.com/][Git]] is an important part of our daily life. These days
|
||
*Git* is the de facto standard of version control systems and almost everyone uses it. *Git* is
|
||
of one those special tools that every engineer has to be familiar with, since it's widely spread
|
||
in the tech world. It will be a big surprise if you find a new project or company that is not using
|
||
*Git*.
|
||
|
||
As a free software contributor, I spend all my professional career in FOSS communities and projects
|
||
and proper use of *Git* seems so natural to me. But to my surprise, every now and the I witness
|
||
how some "commercial engineers" (air quote) uses git and it makes me sad that in a commercial space
|
||
which you get paid to build technology people do so poorly. After a lot of these type of incidents
|
||
I've decided to put together a document to help improving my team's *Git* workflows. While there
|
||
are plenty of reading materials up on the internet dedicated to *Git best practices*, I thought it
|
||
might be useful to publish that document publicly to help others as well. For the lack of a better
|
||
word I've chose the title *"Git Etiquette"*. Following Git etiquette help teams to get more out of
|
||
their git workflows and avoid frustration.
|
||
|
||
I'll try to keep it short and refer to essays from others who explained it much better that me. I
|
||
borrowed some of the words from the others and I included most of them in the resources section to
|
||
best of my ability, but since I wrote the original document so long ago and that suppose to be a
|
||
private doc for few people some of the resources might have been lost.
|
||
|
||
Also, I'll add more items to the least overtime.
|
||
|
||
* Git Commits
|
||
Commits are the building blocks of version controlling via Git. It's obvious that improving the commit
|
||
quality will result in improvement in the overall quality of the repository.
|
||
|
||
** Single purpose commits
|
||
Oftentimes engineers working on something get sidetracked into doing too many things when working on
|
||
one particular thing like when you are trying to fix one particular bug and you spot another one,
|
||
and you can’t resist the urge to fix that as well. And another one. Soon, it snowballs and you end
|
||
up with so many changes all going together in one commit.
|
||
|
||
This is problematic, and it is better to keep commits as small and focused as possible for many
|
||
reasons, including:
|
||
|
||
- It makes it easier for other people in the team looking at your change, making code reviews
|
||
more efficient.
|
||
- If the commit has to be rolled back completely, it’s far easier to do so.
|
||
- It's straightforward to track these changes with your ticketing system.
|
||
- It helps you mentally parse changes you’ve made using git log.
|
||
|
||
A commit should be a wrapper for related changes. For example, fixing two different bugs should
|
||
produce two separate commits. Small commits make it easier for other team members to understand
|
||
the changes and roll them back if something went wrong. With tools like the staging area and the
|
||
ability to stage only parts of a file, Git makes it easy to create very granular commits.
|
||
|
||
** Commit Messages
|
||
On many occasions we need to inspect the *Git* history to find something. A commit, specific changes,
|
||
find clues about errors or even to find the engineer who made a certain change. I have bittersweet
|
||
experience when it comes to dealing with commit messages in the Git history of projects. Let me
|
||
demonstrate with real examples.
|
||
|
||
I saw it many times in commercial teams that engineers don't bother with writing a proper and useful
|
||
Git commit message. For some reason that is beyond my understanding, they think having *"I hate my
|
||
life!"* as commit message for a commit with ~1200 lines of change in a repository with more than
|
||
~300k commits (at the time) that is used by about 200 engineers is a cool thing to do. I came across
|
||
this commit message long ago when I was trying to figure out why a service malfunctions. This commit
|
||
message wasn't helpful at all and I had to read through the diff to figure out whether or not that
|
||
commit is the root of the issue. I can tell so many stories like this one but for the sake of this
|
||
essay one would be enough.
|
||
|
||
But let's have a look at real Git history of a repository that I don't like at all (using =--one-line=
|
||
flag):
|
||
|
||
#+BEGIN_SRC
|
||
2683332a333a Update tests
|
||
3315442a4983e Remove icon from manage header
|
||
aa234e8aa83f8 test fix
|
||
29c35ba3adcee Class migration
|
||
fbde3a265ab3f Migrate header styles
|
||
01eaac4b4cc13 tests
|
||
8d004a970eef7 fix tests
|
||
d2890dfdc360 add tests
|
||
91c2aa31720f2 add test for notice variable
|
||
135a2df25e86a fix tests
|
||
3aa4101546a93 refactor
|
||
0eaae58006f51 add test for global variable
|
||
3ae7ee7297104 remove unnecessary check
|
||
#+END_SRC
|
||
|
||
These commits are taken from a repository with more than 400k commits and many active contributors
|
||
in a commercial space (Don't worry, the SHAs are not the original SHAs).
|
||
|
||
In the other hand, few weeks ago I pulled from the [[https://llvm.org/][LLVM]] repository and built in again (I do this weekly)
|
||
and tried to build the [[https:://serene-lang.org][Serene compiler]] (a programming language that I'm working on) against that.
|
||
But the compilation failed with an error like "Identifier is unknown". I grepped the Git logs of LLVM
|
||
repository and saw a commit and all of a sudden smiled and praised the author in my mind. Here is
|
||
the commit message (I removed the commit details):
|
||
|
||
#+BEGIN_SRC
|
||
Date: Wed Jan 12 11:20:18 2022 -0800
|
||
|
||
[mlir] Finish removing Identifier from the C++ API
|
||
|
||
There have been a few API pieces remaining to allow for a smooth transition for
|
||
downstream users, but these have been up for a few months now. After this only
|
||
the C API will have reference to "Identifier", but those will be reworked in a followup.
|
||
|
||
The main updates are:
|
||
* Identifier -> StringAttr
|
||
* StringAttr::get requires the context as the first parameter
|
||
- i.e. `Identifier::get("...", ctx)` -> `StringAttr::get(ctx, "...")`
|
||
#+END_SRC
|
||
|
||
It was so obvious how to fix my issue by looking at this fantastic commit message.
|
||
|
||
Which one would you rather read? Which one helps you understand what happened in any specific commit ?
|
||
|
||
According to [[https://cbea.ms/git-commit/][Chris Beams]], A well-crafted Git commit message is the best way to communicate the context
|
||
about a change to other engineers (and our future selves). A diff will tell you what changed,
|
||
but only the commit message can properly tell you why.
|
||
|
||
Peter Hutterer [[https://who-t.blogspot.com/2009/12/on-commit-messages.html][makes this point]] well:
|
||
|
||
#+begin_quote
|
||
Re-establishing the context of a piece of code is wasteful. We can’t avoid it completely, so our
|
||
efforts should go to [[https://www.osnews.com/story/19266/wtfsm/][reducing it]] [as much] as possible. Commit messages can do exactly that and
|
||
as a result, a commit message shows whether a developer is a good collaborator.
|
||
#+end_quote
|
||
|
||
If you ever used =git log= or any other Git sub command that requires interactions with commits
|
||
(which many of them do), you'll understand what a valuable asset, a well written commit message
|
||
is.
|
||
|
||
The Git history is just bunch of commits in a certain order. It's up to the engineers to make the
|
||
most of it. With the growth of any project, maintenance becomes an issue and the messier your history
|
||
is the harder it is to maintain the project. Also it would be painful for other to be involved in the
|
||
project too.
|
||
|
||
|
||
There are seven easy rules that you can follow to rock your commit messages:
|
||
|
||
1. Separate subject from body with a blank line
|
||
2. Limit the subject line to 50 characters
|
||
3. Capitalize the subject line
|
||
4. Do not end the subject line with a period
|
||
5. Use the imperative mood in the subject line
|
||
6. Wrap the body at 72 characters
|
||
7. Use the body to explain what and why vs. how
|
||
|
||
I highly recommend to read the [[https://cbea.ms/git-commit/][How to Write a Git Commit Message]] post from Chris Beams that
|
||
explain these rules in depth.
|
||
|
||
** Commit early, commit often
|
||
Git works best, and works in your favor, when you commit your work often. Instead of waiting to
|
||
make the commit perfect, it is better to work in small chunks and keep committing your work. Personally,
|
||
I have found it much easier to have smaller commits that group together related changes. This way
|
||
you can easily revert commits that you don't like and cherry pick those that you want and avoid dealing
|
||
with un-necessary changes that comes in a commit.
|
||
|
||
If you are working on a feature branch that could take some time to finish, it helps you keep
|
||
your code updated with the latest changes so that you avoid conflicts.
|
||
|
||
Also, Git only takes full responsibility for your data when you commit. It helps you from losing work,
|
||
reverting changes, and helping trace what you did when using =git-reflog=.
|
||
|
||
|
||
** Don’t commit generated files
|
||
This one is fairly obvious, but many times I had to look at the history to figure out who has committed
|
||
an auto generated file or a massive file into the repository.
|
||
|
||
Generally, only those files should be committed that have taken manual effort to create, and cannot
|
||
be re-generated. Files can be re-generated at will, can be generated any time, and normally don’t
|
||
work with line-based diff tracking as well. It is useful to add a =.gitignore= file in your
|
||
repository’s root to automatically tell Git which files or paths you don’t want to track.
|
||
|
||
* Don’t alter published history
|
||
Once a commit has been merged to an upstream default branch (and is visible to others), it is strongly
|
||
advised not to alter history. Git and other VCS tools to rewrite branch history, but doing so is
|
||
problematic for everyone who has access to the repository. While =git-rebase= is a useful feature,
|
||
it should only be used on branches that only you are working with (Private branches).
|
||
|
||
One of the key aspects of Git is its distributed nature. Meaning that everyone can have their own
|
||
repositories and push their commits to their own fork and send pull requests to others to pull from
|
||
their repositories. This process is centralized these days via Git hosting services (While the
|
||
provide the forking functionality, that is not a common thing to do in a commercial and closed source
|
||
project) specially in the commercial space that causes engineers to share feature branches. It
|
||
happens to me many time in different roles that some one force pushed to a public (within the org)
|
||
branch and screwed everyone's workflow. For your the sake of your peace of mind and others sanity,
|
||
*DO NOT CHANGE THE PUBLIC HISTORY*.
|
||
|
||
It's kind of a joke, but if you are a public force pusher, I'll end my friendship with you.
|
||
|
||
|
||
Having said that, there would inevitably be occasions where there’s a need for a history rewrite
|
||
on a published branch. Extreme care must be practiced while doing so.
|
||
|
||
* Merge VS Rebase
|
||
|
||
The golden rule is to never rebase on public branches and always merge to public branches.
|
||
When it comes to merge vs rebase, there are two simple rules.
|
||
*Note:* It's better to use squash and merge instead of normal merge because in projects with
|
||
many contributors, it is easier to maintain a Git history on the main branch that contains
|
||
one commit per feature.
|
||
|
||
** Don’t change other people’s history
|
||
You must never ever destroy other peoples history. You must not rebase commits other people did.
|
||
Basically, if it is not your branch you can't rebase it. Notice that this really is about other
|
||
people's history, not about other people's code. If you want to pull down some changes from other
|
||
developers into your branch, it’s fine to rebase, because it’s their code but it’s your history.
|
||
So you can go wild on the rebase thing on it, even though you didn't write the code, as long as
|
||
the commit itself is your private one.
|
||
|
||
Minor clarification: once you've published your history in a public branch, other people may be
|
||
using it, and so now it's clearly not your private history anymore. So the minor clarification
|
||
really is that it's not just about *your commit*, it's also about it being private to your tree,
|
||
and you haven't pushed it out and announced it yet.
|
||
|
||
** Don’t expose your unfinished work to public
|
||
Keep your own history readable. Some people do this by just working things out in their head first,
|
||
and not making mistakes. but that's very rare, and for the rest of us, we use =git rebase= etc
|
||
while we work on our problems. So =git rebase= is not wrong. But it's right only if it's
|
||
*YOUR VERY OWN PRIVATE* git tree.
|
||
|
||
If you're still in the =git rebase= phase, you don't push it out. If it's not ready, you don't
|
||
tell the public at large about it. Don’t push your changes to a shared feature branch or the main
|
||
branch.
|
||
|
||
Don’t merge upstream changes at random points. If you’re working on a shared feature branch,
|
||
don’t pull down the changes when they are not verified and finalized. It will put your history
|
||
in an inconsistent state because your history will contain some changes which might get
|
||
removed upstream and later on when you push your changes you’re going to put back those removed
|
||
changes again.
|
||
|
||
* Conclusion
|
||
This essay was just a superficial try to explain some of the etiquette of Git that we need to
|
||
follow when we're collaborating on a project with others. At the end of the day we are looking
|
||
to make it easier for ourselves to develop software and following certain rules will help us to
|
||
get there faster and makes the process more pleasant.
|
||
|
||
* References and Resources
|
||
- https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html
|
||
The kernel community is one of the biggest communities of paid and volunteer contributors
|
||
that are using Git intensively with a really high traffic. In order to manage the development
|
||
process and keep the productivity that has really strict guidelines which some of them can
|
||
be useful for us.
|
||
|
||
|
||
- https://chris.beams.io/posts/git-commit/
|
||
Chris Beams made a research about the best practices around the commit messages
|
||
By reviewing many projects, his article is one the most referenced articles in this field.
|
||
|
||
|
||
- https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
|
||
Another short but widely referenced article on best practices around Git commit messages
|
||
|
||
|
||
- https://yarchive.net/comp/linux/commit_messages.html
|
||
Who can be better to follow on Git best practices rather than Linus Torvalds himself?
|
||
|
||
|
||
- https://lwn.net/Articles/328438/
|
||
A famous email from Linus Torvalds describing how to maintain a git tree from merge vs
|
||
rebase perspective
|
||
|
||
|
||
- https://www.atlassian.com/git/tutorials/merging-vs-rebasing
|
||
Atlasians guidelines on merge vs rebase
|