Committing code should not be a glorified Ctrl/Cmd+S. With few exceptions each commit should be an isolated unit of work. Also, this floating to the top of HN is a bit of a mystery to me.
One of them is the "published, shared" kind; for this one, I can agree with you, that they should be self-contained and complete, especially when there many devs on the project, and/or you'd like to be able to use bisect.
The second kind, and for me actually a very important one, is more of a "local backup, dirty hacking" one. Those shouldn't probably be published (unless a private/one-guy repo), but allow for easy hacking, developing, testing, experimenting. The "commit often, whatever you have, however broken or non-ideal" approach provides safe lightweight backup, and allows for easy switching between various paths/approaches. Now, after you design stabilizes, dust settles down, and you are completing the work, you can (and probably should) eventually either just squash everything into one final feature commit (the end result will be indistinguishable to what you'd deliver if not committing in the meantime), or remodel the intermediary commits to whatever ideal shape you like them to have retroactively (with "git rebase -i", including patch editing etc.) Knowing the benefits of this approach, I don't understand why anyone would want to reject it.
edit: Also, with many small, dirty commits, it's sometimes possible to fairly quickly do stuff like carving out some subfeature into a separate branch (with judicious use of rebases, cherrypicking and some occasional small edits). Or, remove/revert commits marked earlier as DEBUG. Seems useful to me, as a means to somewhat speed up such operations and decrease the possibility of manual error at the same time.
This is the workflow we've adopted at my company. There are 2 types of commits, you either want to release some new feature or bit of work and do it atomically, or you want to develop the feature, fix typos and throw it between machines without a care.
Is it ever going to help anyone to `git blame` a line of a file and see a commit message like "Whoops, fixed a typo!"? No. But you can't avoid those types of commits.
What you can do is prevent those commits from going into the released history of the product. We develop all new features in feature branches (made _very_ easy with the commandline tool we built for it) and only after code review and acceptance testing do we squash merge it to master.
This lets us put together all of the commits, typo commits, WIP commits and whatever else into a combined commit that contains all of the changes that make up that feature. We even enforce a special syntax on the commit so when you look back in the history, you can trace it back to the code review and to the actual ticket that requested the work.
This means we keep all of our dirty backup commits until release time when we throw them all away in favor of the squashed commit. It makes git blame on any line of any file a sexy experience that provides accountability and an easy way to figure out why the code is the way it is.
Our commandline tool automates a lot of this for us. It's biggest benefit is the tie in with github (and this week bitbucket) to create pull requests and the "deliver" command that handles squash merging and providing the structure for the final commit.
We prevent force pushes to the remote, so if you need to rebase commits that are on the remote already, you'd have to delete the branch from the remote and re-push, and that closes the PR, which confuses reviewers and causes spurious notifications.
Do you merge master into feature branches between the time the feature branch is first pushed to the remote and when they are reviewed and ready to merge? Can your command line tool handle creating a squashed commit when master has been merged into the feature branch a few times?
Yes, master is merged into the feature branches before going upstream. We consider it to be the responsibility of the feature branch owner to make sure their code will work with master.
Because we don't care about the commits on the feature branches, we prefer doing a regular merge, not a rebase of the feature branch, so that there's no need for force pushing at any point. We consider a force push of any kind to be unnecessarily dangerous.
The tool will handle the merge up for you, but if there are conflicts you'll have to resolve them before you can release the feature.
Squash merging is only of the feature branch to master. If your feature branch makes it through code review and acceptance and you still need to update the typo, then you make the commit to master directly.
It would be great if we could be perfect, but sometimes small commits to master do have to happen. All we do is make sure we subject every feature to code review and acceptance before it gets to master. If it slips through, then it's technically a new change request in our eyes
Interestingly, I have two different version control systems for doing these two different species of commit. The published kind, I do in my regular team version control system, complete with commit comments that explain the meaning of the complete thought that is being committed. The other kind (bookmarking my state before messing around with things I may want to cleanly revert) I do using the version control that is built into IntelliJ IDEA. It is completely orthogonal to the one used by my team.
Other IDEs should offer this feature... it turns out to be quite useful in practice.
Netbeans also has a local history recording feature which is integrated pretty well alongside the git/vcs features (just hit history on any file and see a log of all your local changes intermingled with the committed versions).
I think most IDEs nowadays have a local history. The Eclipse one let you compare between revisions like if it were a local git repo, which is useful sometimes.
IntelliJ, when you combine the internal version control and the undo log, manages it so that effectively there is a commit after each piece of typing -- without any need to save or even compile. That turns out to be a nice way of working.
You could easily do this entirely in git. Create a microfeature branch, work in it, and when you've got something you want to commit as a feature, squash and merge onto the feature branch.
"Knowing the benefits of this approach, I don't understand why anyone would want to reject it."
It's always difficult to have these kinds of conversations in the abstract, but for me it's a slightly cumbersome distraction (unless you don't write commit messages). But if it works for you, you should certainly keep doing it. I tend to divide stuff into small chunks so never work on something for long without something that can be committed.
Most usually I still do write commit messages, and that's actually an important additional value of this workflow for me. That said, the commit message is again often a quick and raw: usually a quick in-progress summary of the most important work in the commit, but not perfected at all - just quick something, can have errors, ignores all Commit Message Guidelines like length or whatever. One benefit is, as well noted in other post in the thread, that it will remind me "what the hell was I thinking doing that" when I look at it tomorrow. Second one, obviously, is to help distinguish those commits when doing any sculpting/molding later (i.e. "git rebase -i" & "git cherry-pick" & "git merge --squash").
But that, as I said, is most usually. Sometimes, as you considered, I do write just "WIP" or "dump of current state". The second one, mostly when I already don't remember WTF was I doing here... (i.e. when I forgot to commit month ago and left some stuff on the table). The first one... I'm not sure now. But I know I sometimes do; with some guilt too; but usually the need for having that particular stuff committed is stronger than the guilt...
So, still can't really understand why anyone would want to reject it :) but that said, thanks a lot for expressing your approach. As much as it still puzzles me :)
If you're using Git or another version control that can do squash/rebase, what would be the advantage of having a separate concept of "temporary commits" or something?
In addition, git has a staging area (git add), which I use to stage intermediate steps of work.
It's a good idea to have only "clean" commits in your master branch (to allow for easy blame/bisect) but you can easily do feature branches that can remain "dirty" before squashing and merging to master.
> If you're using Git or another version control that can do squash/rebase, what would be the advantage of having a separate concept of "temporary commits" or something?
It'd be nice to be able to mark commits as "squashable" as I went along (without actually squashing them yet), rather than finishing my feature branch and then have to go back and remember which commits were the "good" ones.
As another poster pointed out, Mercurial phases are the answer to this. I don't know if very many mercurial users actually use them this way consciously or not. Most mercurial people that are deliberately and intensely into this workflow learned to use mercurial patch queues before phases existed (patch queues are basically the git staging area on steroids).
It's more of a workflow thing. When working on a difficult feature, or rather difficult ticket, that cannot reasonably split into 2, it makes sense to work on a separate feature branch. This branch is just for yourself and can be polluted with dirty commits. Once your done, your can git rebase the thing into one, beautiful commit.
If you're working in a feature branch I don't think it would be that much of a problem to be honest. Just be sure to squash the commits before writing the pull request later on.
You can squash or rebase history in Git, which basically means you compress all your changes into a single one. People use this to keep a clean history.
If you're on a feature branch, you can probably commit often, and push often (effectively backing up your work), then, the isolated unit of work is done on a single commit (non-FF) when merging with the main branch (this is the one you search for when you want to revert something or find the culprit for something).
This allows having "best of both worlds", I think.
How and when to comment should be a decision made by the contributors working on the feature, but squashing is something I always disallow for any team I work with just to keep the history and be able to follow the programmer when chasing down a bug.
My biggest concern here is, that you have to force push because of you rewrite the git history. That way you are most likely going to loose data.
If you need WIP commits to switch branches use the stash instead. You won't overload your history that way in a cleaner way.
If I don't publish my intermediate commits, how are you going to tell whether I made lots of small commits and squashed or just didn't bother committing until I was finished? The objects generated are identical.
Also, the git reflog keeps references to the pre-squash commits for a time, so the information is still there if you realise you squashed too much.
My recommendation is that commits should be atomic, minimal changes that take the source into a working state. This makes working with git bisect so much easier, but it does sometimes mean reordering and squashing commits.
A safety check in the case where you've got everything working but your history needs some tweaking is to note the hash of your tree in its working state, and verify that the tree you created after squashing has the same hash. You can do that by either inspecting the commit or diffing against the old reference.
> My biggest concern here is, that you have to force push because of you rewrite the git history. That way you are most likely going to loose data.
Not entirely true. If you only squash when you merge the branch in, there's no need for the force-push.
Rebasing into the already pushed history should be flat out banned without question within teams, and force-push be disabled within configs. It's only in exceptional circumstances where you would actually need to do these.
But why do you need to squash while merging? The concept of merge tells that a branch is merged into the history by adding a merge commit. This commit reflects all differences. It's done automatically if possible - you need to resolve conflicts if it's not.
Correct me if I'm wrong but in my understanding a merge does nothing else than create a commit which is like a squashed representation.
Your team can do whatever they like on their own machines, before they push to the remote, unless you really are not fun to work with ;) so there's no reason to prefer the stash, temp commits and temp branches provide a more structured and better documented stash. Stash should just be for little things, and ideally things which are not very branch-specific (because there's only one stash, for all branches).
If you take it like that you can also say ignore the squash and keep the merge commit clean.
That way you don't push all your development commits to the remote and ensure a clean streamlined history.
A problem some teams I worked with have is that there is not one person working on a feature. Especially if you have feature-driven development teams at least 2-3 people actually work on it and a review process is included. That's why teams need to push their feature branches to the remote. Once they are merged: Feel free to delete them. But in my opinion it's hard to work on one feature with several people without distributing the feature branch.
You shouldn't push dysfunctional branches anyways to keep the application working and always testable.
We have a CI server testing every branch for every push - I'd regard it as rather bad to have failing builds 20 times a day.
Worst case could be that it goes live because of some processes related to continuous deployment (misconfiguration happens, in companies of every size) send this to live environments and there is a chance to ruin experience for your users.
In my experience it is extremely useful to have a bunch of "Unfinished; WIP" commits that I can look through to figure out what I was thinking yesterday when I started writing code. When I'm ready for other people to pull my commits, I'll rebase them into more meaningful commits (or often just one single commit).
Actually, a glorified save is exactly what a git commit is. I literally have a command that (haskell) compiles and commits everything if it compiled. I sort it out into sensible commits later.
You're right. I cringe a bit when I hear people say "I've not committed for a while, I'll do it now". Ok, and if I need to revert any of it, I'll have to do it hunk by hunk?
committing after some random number of saves or lines seems silly yes. but i see no reason to not commit whenever you come to a stopping point. even if that stopping point is as arbitrary as the time to go home. I've never seen a case where someone would revert hunk by hunk. You can just revert to the last stopping point, if you need to, or as most people do just keep editing the file, hitting undo, editing, hitting undo... etc until you have another stopping point. commit. repeat.
when you're all done you clean up the commits with rebase and voilla. life is good.
I'm talking more about the situation a fellow team member has to revert something that Frank wrote that's dropped production at 3am. Trawling through commits that contain half of two small features because he commits based on home time and coffee breaks in an emergency sounds terrible.
That's exactly what the idea is here. If you commit very incrementally during development cycles, you're more likely to have a singular commit to revert.
Then before you push upstream, you rebase down your commits into full feature chunks.
There's nothing wrong with committing often, but pushing too often should not be encouraged. If you commit a lot, you should probably rebase before pushing.
What if, in a team-based situation, someone is off and half way through a feature. Based on your point above there's now two problems:
* Getting the stuff off the developer's machine because they didn't push their code.
* Even if the code is retrieved, now the commit log is a mess of commits that were based on nothing but arbitrary time lapses. It would be preferable they're a 'story' of the current development process for the feature and atomic in their own right.
My code folder is synced with Dropbox. Probably bandwidth-heavy, sure, but if I have Internet access than that generally means I'll lose at most 5-10 minutes of work in the case of disaster.
If the concern is the computer failing, you should do backups. And it's probably simpler to use automated backups than using git push as a backup mechanism.
Once you have automated backup set up, how much work is it to maintain? If you keep your code in a few areas/directories (like for example /home/workspace), it seems like a set-it-and-forget-it thing.
PS: how is a command that demands that you write a commit message really just 'muscle memory'?
(and before the outrage beings, I always push these to my private fork of the main repo, so it's basically just a way of syncing code between machines. Other ways of achieving this with git seem to be always painful; you can push to another machine but only if you rename branches, which is tedious)
Well now we are up to three, arguably distinct needs:
- version control
- backup
- syncing
Syncing can, like backup, be had with software and services that are dedicated to that end. I just use Dropbox, though I'm not that worried about privacy. I guess something could be built "from scratch" with rsync, if the ready-made alternatives arent satisfactory?
No. You are assuming that my backup and syncing needs overlap with my git usage, so that all I need to backup or sync is to use git push in my already-existing git directories.
I backup and/or sync things that I don't have under version control (git). I only sync my grocery list, for example, since I don't need a history of the 'evolution' of my grocery list. Actually, I may have it under a backup scheme, but that is only because of other files and directories in that directory that I need to backup, and I've at this point set (and partly forgot) that backup plan.
My /home/Dropbox contains a lot of files that I regularly need, and it seems a bit excessive to put all of that into one, monolithic git repo.
Automated backups are useful. However how often can you backup? If you are hacking/experimenting backing up after every change seems too resource intensive
Yeah. One slightly annoying thing is that git doesn't provide a way to disable force pushes to particular branches (how would such a thing even be implemented? I guess the remote would have to remember the old and new branch pointers and determine whether the change is fast forward for each individual branch)
So if you want your team to be able to do what you say then someone could force push to master by mistake.
> Committing code should not be a glorified Ctrl/Cmd+S.
What? Of course it is. Commit as many times as you want, branch when testing out a segment of code, do all the fancy things you need to do - then clean it all up when you're done.
This is the same as doing WIP commits then re-building the history when done, except you can't move your WIP between machines easily, can't leave yourself notes when wrapping up for the day, etc etc
How big of a team and how big of a codebase? Often massive amounts of commits during development wreak havoc on interconnected projects. At least in Git, not so much in TFS. Mercurial kind of splits the gap.
I've worked with git in small teams (~5) and with other systems like clearcase in larger teams (20+, ~100 if you count sister teams).
So long as everyone is working on their own branches and pushing/pulling to a shared stream when features are complete, and that pushed to a release stream when releases are to be prepared.... I don't see why folks couldn't comm8t every 5 minutes if they wanted.
It's more interesting to discuss the strategy for moving code from personal branches to development, and dev to release.
Even more complicated: moving code from branch to branch. To share needed support, or bug fixes, or test interoperability.
If you can push to the shared stream and the others pull it up to their branch, cool. But with interoperability testing of incomplete features that's not ok.
I tend to commit as I get part of a "unit of work" done that I'm happy with. Then I just amend that commit as I go until I'm done. That way if I mess something up before I'm finished, it's easy for me to go back to a state where things were in better shape.
This is true, but it's often too easy to wait and commit an entire feature or other larger change that is made up of 3 or 4 reasonably well-isolated smaller changes that might be better independent commits especially for bisection purposes.
That's, to me, the exact blind spot of versionning as it is, we need some 'theory' on how to size/modularize commits to avoid micro-commits explosion but also spaghetti patches. More than that it relates to how you resolve, design, test a system.
I thought it was clear enough from just the first sentence after the image: "Set the number of writes without committing before the message is shown".
Anyways, this is a cute idea but as someone who compulsive hits save after practically every edit (i.e. after most transitions from insert to normal mode), this wouldn't be very helpful.
I thought it was fairly straightforward - it displays a message to the user if they haven't created a git commit after x number of writes to disk (i.e. saves). This is to try and stop commits from becoming huge I guess.
Sometimes I forget to commit, and suddenly I'm in the middle of a second feature, with the code in a non-working state. Then it's perfect to just undo back a couple of hours, save, commit, and redo back to where I am.
However, It's rather risky though if you're not confident with the undo system, so it's a good idea to save the file to a temporary location first.
Then they probably shouldn't be running this plugin in the first place. It's more for the people that can occasionally forget to commit, and their commits end up being huge.
I'm finding that I commit in units of work. Currently, I'm refactoring LibreOffice's SalGraphics class to move the mirroring from this class into an upper layer, but I'm finding it better to do it little by little.
I guess professional programmers has this sort of thing already sorted out, but it happens all to often to reach the point where my new feature is working as it should (appears to be stable), and I keep working on yet another addition on the same branch. When I commit, I feel like I should have committed at least 2 or 3 times by now.
Happens too often so I setup up this plugin from 20 to 10 and see how it goes! But thanks, the idea is really handy.
Heh, you'd be surprised how often even moderately experienced developers make the same mistake. I've seen months-long chains of "add feature X, it's sorta stable, start adding feature Y, whoops, gotta ship, uh-oh, X isn't stable either, work for three weeks, X is now stable but Y only seems to be, managers now calling feature Z release-critical, start working on Z, gotta ship, uh-oh, Y isn't stable yet..."
and so it goes for literally months, always some feature preventing shipping, always some feature that has to be added.
I have the same "problem". I guess I'm just too focuses on fixing stuff to stop and commit. Thankfully we're alive in a time where we don't have to adapt to our tools anymore, but our tools are adapting to us.
git add <file> -p
This lets you pick chunks of code from <file> that you'd like to stage for commit. When I feel I'm done coding for a session, I just run a `git diff` in one window, and in the other one pick chunks that contain changes that belong in one commit.
Oh good to know! I didn't knew that was possible :-) thanks for the hint. I use a cli tool called 'tig' to view difference between commits and not committed changes.
We've adopted the "every feature a branch" idea from git-flow. It feels silly to make a branch that is a 3 line typo fix, but does make it nice and clean to switch between them as your working.
Slightly related, I get a lot of value out of a bash prompt which tells me:
- if my git repo is dirty (uncommitted changes)
- which branch it is on
- if (and how many) changes I have stashed
It's not originally my work but I find it massively useful, since it avoids the "committed but not pushed" failure mode, the "on wrong branch" failure more and the "stashed and forgotten about" failure mode.
I know that regularly running git status would work, but this is a useful visual reminder for me.
This thread has included some interesting discussion of how everyone handles the different tasks of a) committing individual units of work, b) saving works-in-progress, c) syncing files, d) backing up files, and e) persistent undo editor features.
I thought I'd toss out what my setup is right now on my personal machine, for a bit of reflection/discussion...
- I compulsively save to disk after nearly every edit, so I'm a big fan of editors that provide some kind of persistent undo functionality. Right now I'm using live-archive inside the Atom editor, and it is wonderful. I hit cmd-shift-y and it pops the current buffer open in a live-archive interface that includes VCR-like with rewind/fast-forward buttons, etc. It leaves me feeling quite free to mess with code a little more dangerously that I would otherwise. Whereas before I might briefly comment out a line while I try something out, now I'll just delete it and hack away. Anything that was there is just a cmd-shift-y away.
live-archive is actually maybe the foremost reason I'm still using Atom. I'll probably switch away from it eventually, but I'm grown to like the customized little hole I've built for myself, inside that particular editor. I might try out a JetBrains IDE next, sometime ...
- I don't do a lot of work-in-progress saving. Maybe I should be better about this? I'm not a fan of git histories that read like "went to lunch," "stuck, think on this--" etc. Even when I'm the only one that's going to be seeing them. I keep a gitignored text file where I tend to scribble down notes like this, but maybe I should play around with WIP commits.
- For syncing I have an hourly cron job that does a rsync -av --delete-after for my important directories to several different locations. This is mostly meant as a true sync and not backup, but I do find myself using it as a way to lose no more than an hour's work at a time. I might change this to running every 30 or 20 minutes since it doesn't seem to tax the machine too much.
- For backup I have crashplan backing everything up to a local external drive and then their cloud service too. I haven't put a terrible amount of thought into this. It only runs when I sleep. I want to play around with using arq and amazon glacier eventually.
For git I just do small unit-of-work commits for myself, and then cleaning up when necessary.
I tend to write to disk after every edit (a bad habit, I know, but it's pretty ingrained in my muscle memory), so as-is, this would spam me quite a bit.
A better criterion for determining when to display the message would be to count the non-empty lines of the diff, and only show it if it passes a threshold.
Or: only if the added/removed lines ratio is > 1, or something like that.
I save after every logical edit too - I think it's a good habit. You never get caught out by network/file-system failures, and can confidently reload from disk to a recent good state if you mess up or change your mind mid-edit.
I think commiting should be done for a more meaningful unit of work than a single edit, something logically 'complete' akin to a transaction.
> What do you think would be a good set of criteria for what should count as a meaningful unit of work?
I'd consider a meaningful unit of work to be a bug fixed, or feature implemented, at least to the extent that you have a runnable version of the code again (even if you intend to do further work to implement it better e.g. replacing hard-coding). I'd consider changes short of this as not worth commiting, as this would not be a good starting point for further work (I know it's cheap to branch with Git etc., but it just feels untidy to leave something so part-done).
I would typically commit at the end of the day, even if it is into my private repository. So days, I think it as backup, but on other days, I use it to socialise the code.
I find that this forces me to take stock of the current state of implementation and put my mind in the right mind frame for the next stage.
When pushing it into the main repository, it will be a unit of work.
I like the general idea, I sometimes also tend to code too much until committing, however, for me it is more a combination of running tests, amount of code and of course completeness. Plus, I often start hammering ":w" when I am stuck. (which would be miss-interpreted by this tool)
Commit time is a good time to step back a little and think about what I wrote and why. Organizing hunks of changes into a meaningful package. I usually commit when I'm done with a feature (once or twice a day) or right after fixing a bug (a good dozen times a day).
Nice idea but I don't think I'd want to use it for risk of too many false positives. e.g. if I lift a 20+ line function to move it somewhere else in the file, I would guess it will warn I should commit now.
Seems like a pretty useful idea. It would be great to also set thresholds on number of files changed and total number of lines (ignoring whitespace) changed.
I would guess (and probably get downvoted for this) that it's because GIT is so widely used, is not well structured and overall group source control is not well understood by developers.
If you are exposed to several source control systems, you would avoid GIT, at least until you know what you are doing. But so many people use it and don't understand what they are doing, there are massive numbers of conversations just like this article - often between people who have never used anything other than GIT.