From 39629cdf31c53ec662c5d51811bfad279498f2e0 Mon Sep 17 00:00:00 2001 From: Sameer Rahmani Date: Sat, 19 Feb 2022 18:51:06 +0000 Subject: [PATCH] Add the git etiquette post --- orgs/essays/git-etiquette.org | 254 ++++++++++++++++++++++++++++++++++ templates/index.org | 10 +- 2 files changed, 263 insertions(+), 1 deletion(-) create mode 100644 orgs/essays/git-etiquette.org diff --git a/orgs/essays/git-etiquette.org b/orgs/essays/git-etiquette.org new file mode 100644 index 0000000..b27497a --- /dev/null +++ b/orgs/essays/git-etiquette.org @@ -0,0 +1,254 @@ +#+SETUPFILE: ../../config.org +#+TAGS: git(g) +#+CATEGORY: Engineering +#+DATE: 2022-02-19 +#+TITLE: Git Etiquette +#+DESC: A rather long essay on how to use Git in a civilized way + +* Rational :git: +The ability of making tools and using them is one of the many things that differentiates +us from other animal species and the skills to use tools properly in what makes some of us +elites. + +As software engineers, interacting with [[https://git-scm.com/][Git]] is an important part of our daily life. These days +*Git* is the de facto standard of version control systems and almost everyone uses it. *Git* is +of one those special tools that every engineer has to be familiar with, since it's widely spread +in the tech world. It will be a big surprise if you find a new project or company that is not using +*Git*. + +As a free software contributor, I spend all my professional career in FOSS communities and projects +and proper use of *Git* seems so natural to me. But to my surprise, every now and the I witness +how some "commercial engineers" (air quote) uses git and it makes me sad that in a commercial space +which you get paid to build technology people do so poorly. After a lot of these type of incidents +I've decided to put together a document to help improving my team's *Git* workflows. While there +are plenty of reading materials up on the internet dedicated to *Git best practices*, I thought it +might be useful to publish that document publicly to help others as well. For the lack of a better +word I've chose the title *"Git Etiquette"*. Following Git etiquette help teams to get more out of +their git workflows and avoid frustration. + +I'll try to keep it short and refer to essays from others who explained it much better that me. I +borrowed some of the words from the others and I included most of them in the resources section to +best of my ability, but since I wrote the original document so long ago and that suppose to be a +private doc for few people some of the resources might have been lost. + +Also, I'll add more items to the least overtime. + +* Git Commits +Commits are the building blocks of version controlling via Git. It's obvious that improving the commit +quality will result in improvement in the overall quality of the repository. + +** Single purpose commits + +** Commit Messages +On many occasions we need to inspect the *Git* history to find something. A commit, specific changes, +find clues about errors or even to find the engineer who made a certain change. I have bittersweet +experience when it comes to dealing with commit messages in the Git history of projects. Let me +demonstrate with real examples. + +I saw it many times in commercial teams that engineers don't bother with writing a proper and useful +Git commit message. For some reason that is beyond my understanding, they think having *"I hate my +life!"* as commit message for a commit with ~1200 lines of change in a repository with more than +~300k commits (at the time) that is used by about 200 engineers is a cool thing to do. I came across +this commit message long ago when I was trying to figure out why a service malfunctions. This commit +message wasn't helpful at all and I had to read through the diff to figure out whether or not that +commit is the root of the issue. I can tell so many stories like this one but for the sake of this +essay one would be enough. + +But let's have a look at real Git history of a repository that I don't like at all (using =--one-line= +flag): + +#+BEGIN_SRC + 2683332a333a Update tests + 3315442a4983e Remove icon from manage header + aa234e8aa83f8 test fix + 29c35ba3adcee Class migration + fbde3a265ab3f Migrate header styles + 01eaac4b4cc13 tests + 8d004a970eef7 fix tests + d2890dfdc360 add tests + 91c2aa31720f2 add test for notice variable + 135a2df25e86a fix tests + 3aa4101546a93 refactor + 0eaae58006f51 add test for global variable + 3ae7ee7297104 remove unnecessary check +#+END_SRC + +These commits are taken from a repository with more than 400k commits and many active contributors +in a commercial space (Don't worry, the SHAs are not the original SHAs). + +In the other hand, few weeks ago I pulled from the [[https://llvm.org/][LLVM]] repository and built in again (I do this weekly) +and tried to build the [[https:://serene-lang.org][Serene compiler]] (a programming language that I'm working on) against that. +But the compilation failed with an error like "Identifier is unknown". I grepped the Git logs of LLVM +repository and saw a commit and all of a sudden smiled and praised the author in my mind. Here is +the commit message (I removed the commit details): + +#+BEGIN_SRC +Date: Wed Jan 12 11:20:18 2022 -0800 + + [mlir] Finish removing Identifier from the C++ API + + There have been a few API pieces remaining to allow for a smooth transition for + downstream users, but these have been up for a few months now. After this only + the C API will have reference to "Identifier", but those will be reworked in a followup. + + The main updates are: + * Identifier -> StringAttr + * StringAttr::get requires the context as the first parameter + - i.e. `Identifier::get("...", ctx)` -> `StringAttr::get(ctx, "...")` +#+END_SRC + +It was so obvious how to fix my issue by looking at this fantastic commit message. + +Which one would you rather read? Which one helps you understand what happened in any specific commit ? + +According to [[https://cbea.ms/git-commit/][Chris Beams]], A well-crafted Git commit message is the best way to communicate the context +about a change to other engineers (and our future selves). A diff will tell you what changed, +but only the commit message can properly tell you why. + +Peter Hutterer [[https://who-t.blogspot.com/2009/12/on-commit-messages.html][makes this point]] well: + +#+begin_quote +Re-establishing the context of a piece of code is wasteful. We can’t avoid it completely, so our +efforts should go to [[https://www.osnews.com/story/19266/wtfsm/][reducing it]] [as much] as possible. Commit messages can do exactly that and +as a result, a commit message shows whether a developer is a good collaborator. +#+end_quote + +If you ever used =git log= or any other Git sub command that requires interactions with commits +(which many of them do), you'll understand what a valuable asset, a well written commit message +is. + +The Git history is just bunch of commits in a certain order. It's up to the engineers to make the +most of it. With the growth of any project, maintenance becomes an issue and the messier your history +is the harder it is to maintain the project. Also it would be painful for other to be involved in the +project too. + + +There are seven easy rules that you can follow to rock your commit messages: + +1. Separate subject from body with a blank +2. Limit the subject line to 50 characters +3. Capitalize the subject line +4. Do not end the subject line with a period +5. Use the imperative mood in the subject line +6. Wrap the body at 72 characters +7. Use the body to explain what and why vs. how + +I highly recommend to read the [[https://cbea.ms/git-commit/][How to Write a Git Commit Message]] post from Chris Beams that +explain these rules in depth. + +** Commit early, commit often +Git works best, and works in your favor, when you commit your work often. Instead of waiting to +make the commit perfect, it is better to work in small chunks and keep committing your work. Personally, +I have found it much easier to have smaller commits that group together related changes. This way +you can easily revert commits that you don't like and cherry pick those that you want and avoid dealing +with un-necessary changes that comes in a commit. + +If you are working on a feature branch that could take some time to finish, it helps you keep +your code updated with the latest changes so that you avoid conflicts. + +Also, Git only takes full responsibility for your data when you commit. It helps you from losing work, +reverting changes, and helping trace what you did when using =git-reflog=. + + +** Don’t commit generated files +This one is fairly obvious, but many times I had to look at the history to figure out who has committed +an auto generated file or a massive file into the repository. + +Generally, only those files should be committed that have taken manual effort to create, and cannot +be re-generated. Files can be re-generated at will, can be generated any time, and normally don’t +work with line-based diff tracking as well. It is useful to add a =.gitignore= file in your +repository’s root to automatically tell Git which files or paths you don’t want to track. + +* Don’t alter published history +Once a commit has been merged to an upstream default branch (and is visible to others), it is strongly +advised not to alter history. Git and other VCS tools to rewrite branch history, but doing so is +problematic for everyone who has access to the repository. While =git-rebase= is a useful feature, +it should only be used on branches that only you are working with (Private branches). + +One of the key aspects of Git is its distributed nature. Meaning that everyone can have their own +repositories and push their commits to their own fork and send pull requests to others to pull from +their repositories. This process is centralized these days via Git hosting services (While the +provide the forking functionality, that is not a common thing to do in a commercial and closed source +project) specially in the commercial space that causes engineers to share feature branches. It +happens to me many time in different roles that some one force pushed to a public (within the org) +branch and screwed everyone's workflow. For your the sake of your peace of mind and others sanity, +*DO NOT CHANGE THE PUBLIC HISTORY*. + +It's kind of a joke, but if you are a public force pusher, I'll end my friendship with you. + + +Having said that, there would inevitably be occasions where there’s a need for a history rewrite +on a published branch. Extreme care must be practiced while doing so. + +* Merge VS Rebase + +The golden rule is to never rebase on public branches and always merge to public branches. +When it comes to merge vs rebase, there are two simple rules. + +*Note:* It's better to use squash and merge instead of normal merge because in projects with +many contributors, it is easier to maintain a Git history on the main branch that contains +one commit per feature. + +** Don’t change other people’s history +You must never ever destroy other peoples history. You must not rebase commits other people did. +Basically, if it is not your branch you can't rebase it. Notice that this really is about other +people's history, not about other people's code. If you want to pull down some changes from other +developers into your branch, it’s fine to rebase, because it’s their code but it’s your history. +So you can go wild on the rebase thing on it, even though you didn't write the code, as long as +the commit itself is your private one. + +Minor clarification: once you've published your history in a public branch, other people may be +using it, and so now it's clearly not your private history anymore. So the minor clarification +really is that it's not just about *your commit*, it's also about it being private to your tree, +and you haven't pushed it out and announced it yet. + +** Don’t expose your unfinished work to public +Keep your own history readable. Some people do this by just working things out in their head first, +and not making mistakes. but that's very rare, and for the rest of us, we use =git rebase= etc +while we work on our problems. So =git rebase= is not wrong. But it's right only if it's +*YOUR VERY OWN PRIVATE* git tree. + +If you're still in the =git rebase= phase, you don't push it out. If it's not ready, you don't +tell the public at large about it. Don’t push your changes to a shared feature branch or the main +branch. + +Don’t merge upstream changes at random points. If you’re working on a shared feature branch, +don’t pull down the changes when they are not verified and finalized. It will put your history +in an inconsistent state because your history will contain some changes which might get +removed upstream and later on when you push your changes you’re going to put back those removed +changes again. + +* Conclusion +This essay was just a superficial try to explain some of the etiquette of Git that we need to +follow when we're collaborating on a project with others. At the end of the day we are looking +to make it easier for ourselves to develop software and following certain rules will help us to +get there faster and makes the process more pleasant. + +* References and Resources +- https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html + The kernel community is one of the biggest communities of paid and volunteer contributors + that are using Git intensively with a really high traffic. In order to manage the development + process and keep the productivity that has really strict guidelines which some of them can + be useful for us. + + +- https://chris.beams.io/posts/git-commit/ + Chris Beams made a research about the best practices around the commit messages + By reviewing many projects, his article is one the most referenced articles in this field. + + +- https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html + Another short but widely referenced article on best practices around Git commit messages + + +- https://yarchive.net/comp/linux/commit_messages.html + Who can be better to follow on Git best practices rather than Linus Torvalds himself? + + +- https://lwn.net/Articles/328438/ + A famous email from Linus Torvalds describing how to maintain a git tree from merge vs + rebase perspective + + +- https://www.atlassian.com/git/tutorials/merging-vs-rebasing + Atlasians guidelines on merge vs rebase diff --git a/templates/index.org b/templates/index.org index aad3f44..bb512f5 100644 --- a/templates/index.org +++ b/templates/index.org @@ -5,6 +5,9 @@ #+TITLE: The little nest of mine #+PAGE: true #+DESC: All about lxsameer's experience in science and engineering +#+MACRO: buymebook @@html:Buy Me A Book@@ + + Welcome to my little piece of the world. I'm a software engineer by day and an amateur scientist by night who lives by his [[./coh.org][Code of Honor]]. I write about my thoughts and researches. The views expressed here are my @@ -15,8 +18,13 @@ a video series on my [[https://www.youtube.com/c/lxsameer][Youtube channel]]. If you're interested in my work and research feel free to contact me to have a friendly chat or share your thoughts -with me via email (check my [[./gpg.org][GPG]] info page). If you have a question for me, you might be able to find the answer to it +with me via email (check my [[./gpg.org][GPG]] info page) orrrr you can [[https://www.buymeacoffee.com/lxsameer][buy me a book]] :P. + +# {{{buymebook(https://www.buymeacoffee.com/lxsameer,https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png)}}} + +If you have a question for me, you might be able to find the answer to it in the [[./faq.org][FAQs]] page. + * Recent updates: <<>>