In practice, rebasing increases conflicts, requires teams to time their merges, ...

TheLocehiliosan · on April 8, 2021

What you need to understand is people use rebase on their unshared branches. It's part of crafting your commit history to be a coherent set of atomic changes instead of the path you took while developing it all.

You rebase BEFORE you merge into the mainline branch.

fshbbdssbbgdd · on April 8, 2021

Do you run your test suite against each of the commits you create when rebasing? If not, isn’t this “coherent set of atomic changes” misleading? It seems like a lot of effort to make a fake clean-looking history.

breischl · on April 8, 2021

When I've done this, your private/dev branch may be a series of broken commits. Then you rebase onto main, squash to one commit, and test (if necessary). So what shows up on the main branch is a single, squashed, tested commit that contains one logical unit of code (usually a feature or fix).

In this model the main branch history is "real" in that it records the sequence of changes to the production code. It's "fake" in that it doesn't record the exact sequence of fumbling steps and backtracks you took to get there. But IME the latter is usually not very useful anyway.

fshbbdssbbgdd · on April 8, 2021

I like the squashed commit approach. I get there by merging upstream into my dev branch when developing, then squashing before I merge my changes into the upstream. As far as I can tell, that has the same outcome as rebase with squash. Both approaches create a simple commit graph, and both avoid fake intermediate commits.

xoudini · on April 8, 2021

In some cases I agree, but squashes can end up so large that doing a `git bisect` (which is quite useful in finding the comparatively small commit which introduced a bug) becomes unfeasible.

xoudini · on April 8, 2021

There shouldn't be an issue in doing so. During a rebase you'll either have no conflicts — in which case there isn't an issue — or you'll have to stop to resolve conflicts, and you might as well run tests before continuing the rebase. In both cases I'd argue that the statement "coherent set of atomic changes" applies.

fshbbdssbbgdd · on April 8, 2021

Correct me if I’m missing something here - but a lack of conflicts during rebase only means that the few lines surrounding your changes weren’t changed in the upstream. The rest of the repo changed, and this will often cause some kind of inconsistent state. I’ve encountered this situation frequently when using git bisect.

xoudini · on April 8, 2021

When you rebase, you basically replay the history of your branch since it diverged from the branch you're rebasing onto. Thus, the branch is always in a consistent state (or equally consistent to when you originally authored the commit you're replaying). And of course this assumes the target branch is already in a consistent state.

fshbbdssbbgdd · on April 8, 2021

If the upstream is like this:

A -> B

And you branch off B and start making changes, then the upstream continues on its own:

A -> B -> C -> D

Now you rebase your dev branch off D. Your changes get replayed on top of D and create new commits. Some of those commits might not be valid, because they take code that worked in the context of B and put it in the context of D. The history seems clean if all you do is look at the diffs, but if you bisect and try to use the repo in one of the rewritten commits, you may find it doesn’t even compile (even if that commit was fully functional before rebasing).

xoudini · on April 8, 2021

Hm, you're right. The simplest example I could think of right now is the upstream having renamed/deleted something that the dev branch depends on, but didn't directly touch. That would definitely cause a "broken" history during the rebased commits, and is technically unavoidable.

fshbbdssbbgdd · on April 9, 2021

A surprisingly common occurrence is two developers independently notice and fix the same problem, but they implement the fix in two different ways. The diffs might not conflict at all during a rebase. Or they might only conflict in some places, and the “behavioral conflict” remains after the diff conflict is resolved. This issue would eventually be noticed and fixed when tests fail before merging to master, but the intermediate rewritten commits are unlikely to be fixed.

xoudini · on April 9, 2021

I can see this happening, but with a reasonable bug-tracking solution in place and enforcing `fix/...` branches for fixes, these situations could mostly be avoided.

Solvitieg · on April 8, 2021

For sure, I understand that.

It tends not to be an issue when a developer is working on an isolated feature that only he or she cares about, that is reviewed in a timely matter, and gets directly committed to main.

Often this is not the case.

Espressosaurus · on April 8, 2021

In a large repo with many people merging, it helps keep things organized.

In my experience you can make an argument for a merge-based workflow up to around 6 people. By 12 it's painful and hard to track what's going on, doubly so when you have a dev branch and multiple sustaining branches or something more complicated.

By the time you get to 100 people or more committing to the same repo, it just becomes absolute chaos, and at least you can maintain a semblance of sanity in your official branches by forcing a rebase-based workflow on them.

wruza · on April 9, 2021

I feel like it’s one of these moments when people speak of git and everyone has their own version of the manual. git-rebase:

git rebase master; Now, the snapshot pointed to by C4' is exactly the same as the one that was pointed to by C5 in the merge example. There is no difference in the end product of the integration, but rebasing makes for a cleaner history.

What does it even mean to have a rebase-based workflow? In svn-like terms, does it mean that you have to sync-merge before reintegrate-merging? If yes, why is rebase stated as if something completely non-existent before and reinvented? You do sync-merge before reintegrating in svn, otherwise you’ll apply ancient-based patches to the young trunk, which is obviously not what you want.

And if you do not use rebase, but use a merge-based workflow, does it mean that you apply ancient-based patches to the master? If yes, of course it will be a conflict hell, cause master could undergo few refactorings in the meantime.

It is so confusing when people talk in a different slang, and you can’t tell if they invented something new or just missed something so damn obvious in the old tech. Can you please comment on which of these thoughts are [in]correct?

Espressosaurus · on April 9, 2021

It's not exactly congruent to what you're describing with SVN. It's all about the DAG of commits and how we're using it to describe the history of the codebase. The end code is identical between the two workflows.

A merge-based workflow maintains the work-in-progress history of commits running parallel to main before merging the two together. So your commit tree splits and reforms, with the number of branches at any one time equal to the number of people working on distinct features at one time.

It shows you how a feature evolved, which there is some benefit of, but at the cost of an explosion in branches that are now part of the permanent record of your codebase and the main branch you're working in. It rapidly turns into spaghetti with even just a few people working in the repo.

A rebase-based workflow will typically compress all the work in progress to a single commit, which then gets applied to the tip of the main branch. This maintains a linear flow of commits where each commit is a single PR.

Maintaining that linear flow of commits is increasingly important as the number of people committing to the repo rises and the branches rise with them.

Visually, a merge-based workflow might look like this:

   4
   |\
   | \
  /3  \
 | 2\  |
  \| |/
   |//
   |/
   1

This would represent 3 features worked on by different people, all branched off the same source (1), and then merging back in.

The same thing in a rebase-based workflow would look like this:

All of the work in progress is collapsed into a single commit when completing the PR to maintain the linear history. Of course while it was in progress, it resembled the merge-based workflow above. The difference is that instead of merging at the end, they rebased and squashed the commits.

Again, the end result in terms of the code is the same. The difference is what you see when you're navigating the history of the repo.

wruza · on April 9, 2021

Thanks for taking time on such detailed explanation! So, the benefit of rebase is in a graph view of the repo, not in a conflict resolving workflow (which is consistent with the manual). But why

merge-based ... rapidly turns into spaghetti with even just a few people working in the repo

Isn’t it just a detail of how graph/report tools work? Can’t they track these merge points and “rebase in their ram”? I don’t get how a graphical representation of merge points may change the workflow.

One more thing that is unclear is why some people think that rebase is somehow superior in terms if conflict and reintegration. Like they “had issues with svn and now that rebase is a thing, issues gone”. Maybe they didn’t understand that you have to sync-merge your branches (effectively rebasing) periodically to not diverge from trunk (or parent branch) too much?

Added: I know rebase is not congruent with what I’m asking, but my questions are more about how git folks think, not about how git works. Cause I often see its comparison to other VCSs and claims that are vaguely or simply untrue about git competitors. As if before git there was some stoneage.

Espressosaurus · on April 10, 2021

Thinking about this further, SVN forces history to be a linear history of commits, which is easy to reason about.

DVCS have a DAG of commits, which can get arbitrarily complicated and difficult to reason about.

Rebase-based workflow results in a linear history of commits, which is easy to reason about.

Merge-based workflow results in an arbitrarily complicated DAG of commits which is difficult to reason about.

That's all there is to it.

Espressosaurus · on April 9, 2021

It is a detail of how the graph is stored and viewed, but you'll be looking at that graph when bisecting searching for when a regression occurred, or how a particular feature came to be, or when a particular feature landed.

The repo rapidly turns into spaghetti with a merge-based workflow because all the work in progress is now part of the historical record for any particular commit to main. You're not looking at a single commit, you're looking at a chain of commits and a merge node. Now imagine there are 20 people committing to the same repo. Your branch factor is exploding! Imagine the picture of my merge-based workflow, but multiplied 7. It rapidly becomes very difficult to navigate.

The DAG is important as a historical record because you have to go back to it on a very regular basis.

It doesn't change the workflow from a "do your work and commit to the repo, periodically sync with main" perspective. It changes the workflow from a pull request perspective.

I've found rebase is (slightly) inferior for conflict resolution and reintegration--mainly because unless you squash your commits down to a single commit, you may need to resolve the same conflict as many times as you have commits after the conflict in the worst case, which is irritating. Merge is just one and done. But that's a minor thing.

Having used SVN (briefly) and Mercurial (a lot) and Git (a lot), SVN pushes you into dealing with a single linear history of commits, with cross-branch merges being extremely painful and error-prone if anyone else has worked in the same area. DVCS like Mercurial and Git allow you to do whatever you like for the history, and cross-branch merges are generally easy and pain-free. I can't say anything about the underlying implementations and why it is that way, but that is my lived experience.

Most of the time with Mercurial and Git I can let the merge tool resolve differences if there are conflicts with the odd line needing manual intervention. Most of the time there aren't conflicts.

And having used SVN a bit...yeah, it's the stone age in comparison. Sometimes you have to make that tradeoff because you can't store everything locally, but I haven't enjoyed the times I've had to use SVN after having used Mercurial and Git.

And Git's user interface sucks.

I need a Mercurial skin on top of Git so the commands make sense.

mekkkkkk · on April 8, 2021

This. I don't know if it's because of a lack of understanding or bad workflow, but almost all of the times I've seen bugs caused by git operations to slip through the cracks, it's been because someone decided that a pretty history was of upmost priority. Rebasing is probably necessary in some cases, but it can be a real foot gun as well.