2022-02-19 18:51:06 +00:00
|
|
|
|
#+SETUPFILE: ../../config.org
|
2022-02-19 19:02:28 +00:00
|
|
|
|
#+TAGS: Git(g)
|
2022-02-19 18:51:06 +00:00
|
|
|
|
#+CATEGORY: Engineering
|
|
|
|
|
#+DATE: 2022-02-19
|
|
|
|
|
#+TITLE: Git Etiquette
|
|
|
|
|
#+DESC: A rather long essay on how to use Git in a civilized way
|
|
|
|
|
|
2022-02-19 19:02:28 +00:00
|
|
|
|
* Rational :Git:
|
2022-02-19 19:10:46 +00:00
|
|
|
|
The ability of making tools and using them is one of the many things that makes us special
|
2022-02-19 19:11:36 +00:00
|
|
|
|
and the skills to use tools properly is what makes some of us
|
2022-02-19 18:51:06 +00:00
|
|
|
|
elites.
|
|
|
|
|
|
|
|
|
|
As software engineers, interacting with [[https://git-scm.com/][Git]] is an important part of our daily life. These days
|
|
|
|
|
*Git* is the de facto standard of version control systems and almost everyone uses it. *Git* is
|
|
|
|
|
of one those special tools that every engineer has to be familiar with, since it's widely spread
|
|
|
|
|
in the tech world. It will be a big surprise if you find a new project or company that is not using
|
|
|
|
|
*Git*.
|
|
|
|
|
|
|
|
|
|
As a free software contributor, I spend all my professional career in FOSS communities and projects
|
|
|
|
|
and proper use of *Git* seems so natural to me. But to my surprise, every now and the I witness
|
|
|
|
|
how some "commercial engineers" (air quote) uses git and it makes me sad that in a commercial space
|
|
|
|
|
which you get paid to build technology people do so poorly. After a lot of these type of incidents
|
|
|
|
|
I've decided to put together a document to help improving my team's *Git* workflows. While there
|
|
|
|
|
are plenty of reading materials up on the internet dedicated to *Git best practices*, I thought it
|
|
|
|
|
might be useful to publish that document publicly to help others as well. For the lack of a better
|
|
|
|
|
word I've chose the title *"Git Etiquette"*. Following Git etiquette help teams to get more out of
|
|
|
|
|
their git workflows and avoid frustration.
|
|
|
|
|
|
|
|
|
|
I'll try to keep it short and refer to essays from others who explained it much better that me. I
|
|
|
|
|
borrowed some of the words from the others and I included most of them in the resources section to
|
|
|
|
|
best of my ability, but since I wrote the original document so long ago and that suppose to be a
|
|
|
|
|
private doc for few people some of the resources might have been lost.
|
|
|
|
|
|
|
|
|
|
Also, I'll add more items to the least overtime.
|
|
|
|
|
|
|
|
|
|
* Git Commits
|
|
|
|
|
Commits are the building blocks of version controlling via Git. It's obvious that improving the commit
|
|
|
|
|
quality will result in improvement in the overall quality of the repository.
|
|
|
|
|
|
|
|
|
|
** Single purpose commits
|
2022-02-19 19:00:17 +00:00
|
|
|
|
Oftentimes engineers working on something get sidetracked into doing too many things when working on
|
|
|
|
|
one particular thing like when you are trying to fix one particular bug and you spot another one,
|
|
|
|
|
and you can’t resist the urge to fix that as well. And another one. Soon, it snowballs and you end
|
|
|
|
|
up with so many changes all going together in one commit.
|
|
|
|
|
|
|
|
|
|
This is problematic, and it is better to keep commits as small and focused as possible for many
|
|
|
|
|
reasons, including:
|
|
|
|
|
|
|
|
|
|
- It makes it easier for other people in the team looking at your change, making code reviews
|
|
|
|
|
more efficient.
|
|
|
|
|
- If the commit has to be rolled back completely, it’s far easier to do so.
|
|
|
|
|
- It's straightforward to track these changes with your ticketing system.
|
|
|
|
|
- It helps you mentally parse changes you’ve made using git log.
|
|
|
|
|
|
|
|
|
|
A commit should be a wrapper for related changes. For example, fixing two different bugs should
|
|
|
|
|
produce two separate commits. Small commits make it easier for other team members to understand
|
|
|
|
|
the changes and roll them back if something went wrong. With tools like the staging area and the
|
|
|
|
|
ability to stage only parts of a file, Git makes it easy to create very granular commits.
|
2022-02-19 18:51:06 +00:00
|
|
|
|
|
|
|
|
|
** Commit Messages
|
|
|
|
|
On many occasions we need to inspect the *Git* history to find something. A commit, specific changes,
|
|
|
|
|
find clues about errors or even to find the engineer who made a certain change. I have bittersweet
|
|
|
|
|
experience when it comes to dealing with commit messages in the Git history of projects. Let me
|
|
|
|
|
demonstrate with real examples.
|
|
|
|
|
|
|
|
|
|
I saw it many times in commercial teams that engineers don't bother with writing a proper and useful
|
|
|
|
|
Git commit message. For some reason that is beyond my understanding, they think having *"I hate my
|
|
|
|
|
life!"* as commit message for a commit with ~1200 lines of change in a repository with more than
|
|
|
|
|
~300k commits (at the time) that is used by about 200 engineers is a cool thing to do. I came across
|
|
|
|
|
this commit message long ago when I was trying to figure out why a service malfunctions. This commit
|
|
|
|
|
message wasn't helpful at all and I had to read through the diff to figure out whether or not that
|
|
|
|
|
commit is the root of the issue. I can tell so many stories like this one but for the sake of this
|
|
|
|
|
essay one would be enough.
|
|
|
|
|
|
|
|
|
|
But let's have a look at real Git history of a repository that I don't like at all (using =--one-line=
|
|
|
|
|
flag):
|
|
|
|
|
|
|
|
|
|
#+BEGIN_SRC
|
|
|
|
|
2683332a333a Update tests
|
|
|
|
|
3315442a4983e Remove icon from manage header
|
|
|
|
|
aa234e8aa83f8 test fix
|
|
|
|
|
29c35ba3adcee Class migration
|
|
|
|
|
fbde3a265ab3f Migrate header styles
|
|
|
|
|
01eaac4b4cc13 tests
|
|
|
|
|
8d004a970eef7 fix tests
|
|
|
|
|
d2890dfdc360 add tests
|
|
|
|
|
91c2aa31720f2 add test for notice variable
|
|
|
|
|
135a2df25e86a fix tests
|
|
|
|
|
3aa4101546a93 refactor
|
|
|
|
|
0eaae58006f51 add test for global variable
|
|
|
|
|
3ae7ee7297104 remove unnecessary check
|
|
|
|
|
#+END_SRC
|
|
|
|
|
|
|
|
|
|
These commits are taken from a repository with more than 400k commits and many active contributors
|
|
|
|
|
in a commercial space (Don't worry, the SHAs are not the original SHAs).
|
|
|
|
|
|
|
|
|
|
In the other hand, few weeks ago I pulled from the [[https://llvm.org/][LLVM]] repository and built in again (I do this weekly)
|
|
|
|
|
and tried to build the [[https:://serene-lang.org][Serene compiler]] (a programming language that I'm working on) against that.
|
|
|
|
|
But the compilation failed with an error like "Identifier is unknown". I grepped the Git logs of LLVM
|
|
|
|
|
repository and saw a commit and all of a sudden smiled and praised the author in my mind. Here is
|
|
|
|
|
the commit message (I removed the commit details):
|
|
|
|
|
|
|
|
|
|
#+BEGIN_SRC
|
|
|
|
|
Date: Wed Jan 12 11:20:18 2022 -0800
|
|
|
|
|
|
|
|
|
|
[mlir] Finish removing Identifier from the C++ API
|
|
|
|
|
|
|
|
|
|
There have been a few API pieces remaining to allow for a smooth transition for
|
|
|
|
|
downstream users, but these have been up for a few months now. After this only
|
|
|
|
|
the C API will have reference to "Identifier", but those will be reworked in a followup.
|
|
|
|
|
|
|
|
|
|
The main updates are:
|
|
|
|
|
* Identifier -> StringAttr
|
|
|
|
|
* StringAttr::get requires the context as the first parameter
|
|
|
|
|
- i.e. `Identifier::get("...", ctx)` -> `StringAttr::get(ctx, "...")`
|
|
|
|
|
#+END_SRC
|
|
|
|
|
|
|
|
|
|
It was so obvious how to fix my issue by looking at this fantastic commit message.
|
|
|
|
|
|
|
|
|
|
Which one would you rather read? Which one helps you understand what happened in any specific commit ?
|
|
|
|
|
|
|
|
|
|
According to [[https://cbea.ms/git-commit/][Chris Beams]], A well-crafted Git commit message is the best way to communicate the context
|
|
|
|
|
about a change to other engineers (and our future selves). A diff will tell you what changed,
|
|
|
|
|
but only the commit message can properly tell you why.
|
|
|
|
|
|
|
|
|
|
Peter Hutterer [[https://who-t.blogspot.com/2009/12/on-commit-messages.html][makes this point]] well:
|
|
|
|
|
|
|
|
|
|
#+begin_quote
|
|
|
|
|
Re-establishing the context of a piece of code is wasteful. We can’t avoid it completely, so our
|
|
|
|
|
efforts should go to [[https://www.osnews.com/story/19266/wtfsm/][reducing it]] [as much] as possible. Commit messages can do exactly that and
|
|
|
|
|
as a result, a commit message shows whether a developer is a good collaborator.
|
|
|
|
|
#+end_quote
|
|
|
|
|
|
|
|
|
|
If you ever used =git log= or any other Git sub command that requires interactions with commits
|
|
|
|
|
(which many of them do), you'll understand what a valuable asset, a well written commit message
|
|
|
|
|
is.
|
|
|
|
|
|
|
|
|
|
The Git history is just bunch of commits in a certain order. It's up to the engineers to make the
|
|
|
|
|
most of it. With the growth of any project, maintenance becomes an issue and the messier your history
|
|
|
|
|
is the harder it is to maintain the project. Also it would be painful for other to be involved in the
|
|
|
|
|
project too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are seven easy rules that you can follow to rock your commit messages:
|
|
|
|
|
|
2022-02-19 19:56:20 +00:00
|
|
|
|
1. Separate subject from body with a blank line
|
2022-02-19 18:51:06 +00:00
|
|
|
|
2. Limit the subject line to 50 characters
|
|
|
|
|
3. Capitalize the subject line
|
|
|
|
|
4. Do not end the subject line with a period
|
|
|
|
|
5. Use the imperative mood in the subject line
|
|
|
|
|
6. Wrap the body at 72 characters
|
|
|
|
|
7. Use the body to explain what and why vs. how
|
|
|
|
|
|
|
|
|
|
I highly recommend to read the [[https://cbea.ms/git-commit/][How to Write a Git Commit Message]] post from Chris Beams that
|
|
|
|
|
explain these rules in depth.
|
|
|
|
|
|
|
|
|
|
** Commit early, commit often
|
|
|
|
|
Git works best, and works in your favor, when you commit your work often. Instead of waiting to
|
|
|
|
|
make the commit perfect, it is better to work in small chunks and keep committing your work. Personally,
|
|
|
|
|
I have found it much easier to have smaller commits that group together related changes. This way
|
|
|
|
|
you can easily revert commits that you don't like and cherry pick those that you want and avoid dealing
|
|
|
|
|
with un-necessary changes that comes in a commit.
|
|
|
|
|
|
|
|
|
|
If you are working on a feature branch that could take some time to finish, it helps you keep
|
|
|
|
|
your code updated with the latest changes so that you avoid conflicts.
|
|
|
|
|
|
|
|
|
|
Also, Git only takes full responsibility for your data when you commit. It helps you from losing work,
|
|
|
|
|
reverting changes, and helping trace what you did when using =git-reflog=.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
** Don’t commit generated files
|
|
|
|
|
This one is fairly obvious, but many times I had to look at the history to figure out who has committed
|
|
|
|
|
an auto generated file or a massive file into the repository.
|
|
|
|
|
|
|
|
|
|
Generally, only those files should be committed that have taken manual effort to create, and cannot
|
|
|
|
|
be re-generated. Files can be re-generated at will, can be generated any time, and normally don’t
|
|
|
|
|
work with line-based diff tracking as well. It is useful to add a =.gitignore= file in your
|
|
|
|
|
repository’s root to automatically tell Git which files or paths you don’t want to track.
|
|
|
|
|
|
|
|
|
|
* Don’t alter published history
|
|
|
|
|
Once a commit has been merged to an upstream default branch (and is visible to others), it is strongly
|
|
|
|
|
advised not to alter history. Git and other VCS tools to rewrite branch history, but doing so is
|
|
|
|
|
problematic for everyone who has access to the repository. While =git-rebase= is a useful feature,
|
|
|
|
|
it should only be used on branches that only you are working with (Private branches).
|
|
|
|
|
|
|
|
|
|
One of the key aspects of Git is its distributed nature. Meaning that everyone can have their own
|
|
|
|
|
repositories and push their commits to their own fork and send pull requests to others to pull from
|
|
|
|
|
their repositories. This process is centralized these days via Git hosting services (While the
|
|
|
|
|
provide the forking functionality, that is not a common thing to do in a commercial and closed source
|
|
|
|
|
project) specially in the commercial space that causes engineers to share feature branches. It
|
|
|
|
|
happens to me many time in different roles that some one force pushed to a public (within the org)
|
|
|
|
|
branch and screwed everyone's workflow. For your the sake of your peace of mind and others sanity,
|
|
|
|
|
*DO NOT CHANGE THE PUBLIC HISTORY*.
|
|
|
|
|
|
|
|
|
|
It's kind of a joke, but if you are a public force pusher, I'll end my friendship with you.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Having said that, there would inevitably be occasions where there’s a need for a history rewrite
|
|
|
|
|
on a published branch. Extreme care must be practiced while doing so.
|
|
|
|
|
|
|
|
|
|
* Merge VS Rebase
|
|
|
|
|
|
|
|
|
|
The golden rule is to never rebase on public branches and always merge to public branches.
|
|
|
|
|
When it comes to merge vs rebase, there are two simple rules.
|
|
|
|
|
|
|
|
|
|
*Note:* It's better to use squash and merge instead of normal merge because in projects with
|
|
|
|
|
many contributors, it is easier to maintain a Git history on the main branch that contains
|
|
|
|
|
one commit per feature.
|
|
|
|
|
|
|
|
|
|
** Don’t change other people’s history
|
|
|
|
|
You must never ever destroy other peoples history. You must not rebase commits other people did.
|
|
|
|
|
Basically, if it is not your branch you can't rebase it. Notice that this really is about other
|
|
|
|
|
people's history, not about other people's code. If you want to pull down some changes from other
|
|
|
|
|
developers into your branch, it’s fine to rebase, because it’s their code but it’s your history.
|
|
|
|
|
So you can go wild on the rebase thing on it, even though you didn't write the code, as long as
|
|
|
|
|
the commit itself is your private one.
|
|
|
|
|
|
|
|
|
|
Minor clarification: once you've published your history in a public branch, other people may be
|
|
|
|
|
using it, and so now it's clearly not your private history anymore. So the minor clarification
|
|
|
|
|
really is that it's not just about *your commit*, it's also about it being private to your tree,
|
|
|
|
|
and you haven't pushed it out and announced it yet.
|
|
|
|
|
|
|
|
|
|
** Don’t expose your unfinished work to public
|
|
|
|
|
Keep your own history readable. Some people do this by just working things out in their head first,
|
|
|
|
|
and not making mistakes. but that's very rare, and for the rest of us, we use =git rebase= etc
|
|
|
|
|
while we work on our problems. So =git rebase= is not wrong. But it's right only if it's
|
|
|
|
|
*YOUR VERY OWN PRIVATE* git tree.
|
|
|
|
|
|
|
|
|
|
If you're still in the =git rebase= phase, you don't push it out. If it's not ready, you don't
|
|
|
|
|
tell the public at large about it. Don’t push your changes to a shared feature branch or the main
|
|
|
|
|
branch.
|
|
|
|
|
|
|
|
|
|
Don’t merge upstream changes at random points. If you’re working on a shared feature branch,
|
|
|
|
|
don’t pull down the changes when they are not verified and finalized. It will put your history
|
|
|
|
|
in an inconsistent state because your history will contain some changes which might get
|
|
|
|
|
removed upstream and later on when you push your changes you’re going to put back those removed
|
|
|
|
|
changes again.
|
|
|
|
|
|
|
|
|
|
* Conclusion
|
|
|
|
|
This essay was just a superficial try to explain some of the etiquette of Git that we need to
|
|
|
|
|
follow when we're collaborating on a project with others. At the end of the day we are looking
|
|
|
|
|
to make it easier for ourselves to develop software and following certain rules will help us to
|
|
|
|
|
get there faster and makes the process more pleasant.
|
|
|
|
|
|
|
|
|
|
* References and Resources
|
|
|
|
|
- https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html
|
|
|
|
|
The kernel community is one of the biggest communities of paid and volunteer contributors
|
|
|
|
|
that are using Git intensively with a really high traffic. In order to manage the development
|
|
|
|
|
process and keep the productivity that has really strict guidelines which some of them can
|
|
|
|
|
be useful for us.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- https://chris.beams.io/posts/git-commit/
|
|
|
|
|
Chris Beams made a research about the best practices around the commit messages
|
|
|
|
|
By reviewing many projects, his article is one the most referenced articles in this field.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
|
|
|
|
|
Another short but widely referenced article on best practices around Git commit messages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- https://yarchive.net/comp/linux/commit_messages.html
|
|
|
|
|
Who can be better to follow on Git best practices rather than Linus Torvalds himself?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- https://lwn.net/Articles/328438/
|
|
|
|
|
A famous email from Linus Torvalds describing how to maintain a git tree from merge vs
|
|
|
|
|
rebase perspective
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- https://www.atlassian.com/git/tutorials/merging-vs-rebasing
|
|
|
|
|
Atlasians guidelines on merge vs rebase
|