Wednesday, May 8, 2013

git-imerge: a practical introduction

git-merge on steroids / git-rebase for the masses
This article is a practical introduction to incremental merging using git-imerge: why you need it, what it does, and how to use it.
(Don't fear; this article is not as long as it looks. Most of it consists of big ASCII-art illustrations and example program output.)

Why incremental merge?

The traditional alternatives for combining two branches both have serious problems.
git merge pain
  • You need to resolve one big conflict that mixes up a lot of little changes on both sides of the merge. (Resolving big conflicts is hard!)
  • Merging is all-or-nothing:
    • There is no way to save a partly-done merge, so
      • You can't record your progress.
      • You can't switch to another branch temporarily.
      • If you make a mistake, you cannot revert only part of the merge.
      If you cannot resolve the whole conflict, there is nothing to do but start over.
    • There is no way to test a partly-done merge--the code won't even build until the conflict is completely resolved.
  • It is difficult to collaborate on a merge with a colleague.
git rebase pain
  • It is very problematic to rebase work that has been published.
    • Rebasing is unfriendly to collaboration.
  • You have to reconcile each of the branch commits with all of the changes on master.
  • Rebasing is all-or-nothing:
    • It is awkward to interrupt a rebase while it is in progress: you're not even on a branch, and rebase doesn't preserve the relationship between old commits and new.
  • Rebasing discards history (namely the old version of the branch).
  • Rebasing often requires similar conflicts to be resolved multiple times.

Incremental merge

git-imerge implements a new merging method, incremental merge, that reduces the pain of merging to its essential minimum. git-imerge:
  • Presents conflicts pairwise: you only ever need to resolve one commit from each branch at a single time.
    • Small conflicts (much easier to resolve than large conflicts).
    • You can view commit messages and individual diffs to see what each commit was trying to do.
  • Records all intermediate merges with their correct ancestry, as commits in your repository. An incremental merge that is in progress
    • ...can be interrupted.
    • ...can be pushed to a server.
    • ...can be pulled by a colleague, worked on, and pushed again.
  • Never shows the same conflict twice. Once a conflict has been resolved, it is stored in the DAG to make future merges easier.
  • Lets you test every intermediate state. If there is a problem, you can use "git bisect" to find the exact pairwise merge that was faulty. You can redo that merge and continue the incremental merge from there (retaining earlier pairwise merges).
  • Is largely automated and surprisingly efficient.
  • Allows the final merge to be simplified for the permanent record, omitting the intermediate results.

The basic idea

Suppose you want to merge "branch" into "master":
o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
     \
      A - B - C - D - E - F - G - H - I                ← branch
First draw the diagram a bit differently:
o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |
    A
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch
Now start filling it in, merging one pair at a time:
o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |
    A - A1
    |
    B
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch
"A1" is a merge between a single commit ("1") on master and a single commit ("A") on branch.
Continue...
o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |
    A - A1
    |   |
    B - B1
    |
    C
    |
    D
    |
    E
    |
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch
"B1" is a merge between "A1" and "B". Since "A1" already includes the changes from "1" and "A", this merge only has to add the changes from commit "B", which are hopefully limited in scope. If there is a conflict, it is hopefully small.
Most of these pairwise merges will not conflict, and git-imerge will let do them for you automatically until it finds a conflict:
o - 0 - 1  - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |    |
    A - A1 - A2
    |   |    |
    B - B1 - B2
    |   |    |
    C - C1 - C2
    |   |    |
    D - D1 - D2
    |   |    |
    E - E1 - E2
    |   |
    F - F1   X
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch
Now you have to help resolve the conflict that arose at "X" when merging "E2" and "F1". But Git knows that these two commits share "E1" as a common ancestor ("merge base"), so all you have to do is resolve the change originally made in master commit "2" with the change that was made in branch commit "F":
o - 0 - 1  - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11    ← master
    |   |    |
    A - A1 - A2
    |   |    |
    B - B1 - B2
    |   |    |
    C - C1 - C2
    |   |    |
    D - D1 - D2
    |   |    |
    E - E1 - E2
    |   |    |
    F - F1 - F2
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch
Continue in this manner until the diagram is complete:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A - A1 - A2 - A3 - A4 - A5 - A6 - A7 - A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |    |     |     |
    B - B1 - B2 - B3 - B4 - B5 - B6 - B7 - B8 - B9 - B10 - B11
    |   |    |    |    |    |    |    |    |    |     |     |
    C - C1 - C2 - C3 - C4 - C5 - C6 - C7 - C8 - C9 - C10 - C11
    |   |    |    |    |    |    |    |    |    |     |     |
    D - D1 - D2 - D3 - D4 - D5 - D6 - D7 - D8 - D9 - D10 - D11
    |   |    |    |    |    |    |    |    |    |     |     |
    E - E1 - E2 - E3 - E4 - E5 - E6 - E7 - E8 - E9 - E10 - E11
    |   |    |    |    |    |    |    |    |    |     |     |
    F - F1 - F2 - F3 - F4 - F5 - F6 - F7 - F8 - F9 - F10 - F11
    |   |    |    |    |    |    |    |    |    |     |     |
    G - G1 - G2 - G3 - G4 - G5 - G6 - G7 - G8 - G9 - G10 - G11
    |   |    |    |    |    |    |    |    |    |     |     |
    H - H1 - H2 - H3 - H4 - H5 - H6 - H7 - H8 - H9 - H10 - H11
    |   |    |    |    |    |    |    |    |    |     |     |
    I - I1 - I2 - I3 - I4 - I5 - I6 - I7 - I8 - I9 - I10 - I11

    ↑
  branch
Done!

Simplifying the results

A completed incremental merge contains all of the information you could possibly want to know about combining two branches; all you need to do now is discard any information that you don't want in your permanent record. For example,
  • Commit "I11" is the simple merge of "branch" into "master":
    o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11
        |                                                       |
        A                                                       |
        |                                                       |
        B                                                       |
        |                                                       |
        C                                                       |
        |                                                       |
        D                                                       |
        |                                                       |
        E                                                       |
        |                                                       |
        F                                                       |
        |                                                       |
        G                                                       |
        |                                                       |
        H                                                       |
        |                                                       |
        I ---------------------------------------------------- I11'  ← master
    
        ↑
      branch
    
    (Usually such a diagram is drawn like so:
    o - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - I11'  ← master
         \                                               /
          A -- B -- C --- D --- E --- F --- G --- H --- I       ← branch
    
    .)
  • The rightmost column is the rebase of "branch" onto "master":
    o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
                                                                |
                                                               A11'
                                                                |
                                                               B11'
                                                                |
                                                               C11'
                                                                |
                                                               D11'
                                                                |
                                                               E11'
                                                                |
                                                               F11'
                                                                |
                                                               G11'
                                                                |
                                                               H11'
                                                                |
                                                               I11'  ← branch
    
  • The bottommost row is the rebase of "master" onto "branch":
    o - 0
        |
        A
        |
        B
        |
        C
        |
        D
        |
        E
        |
        F
        |
        G
        |
        H
        |
        I - I1'- I2'- I3'- I4'- I5'- I6'- I7'- I8'- I9'- I10'- I11'  ← master
    
        ↑
      branch
    
  • If you have already published branch, consider converting this into a rebase with history:
    o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
        |                                                       |
        A ---------------------------------------------------- A11'
        |                                                       |
        B ---------------------------------------------------- B11'
        |                                                       |
        C ---------------------------------------------------- C11'
        |                                                       |
        D ---------------------------------------------------- D11'
        |                                                       |
        E ---------------------------------------------------- E11'
        |                                                       |
        F ---------------------------------------------------- F11'
        |                                                       |
        G ---------------------------------------------------- G11'
        |                                                       |
        H ---------------------------------------------------- H11'
        |                                                       |
        I ---------------------------------------------------- I11'
    
                                                                ↑
                                                              branch
    
    Rebase-with-history is a useful hybrid between rebasing and merging, having the advantages that it retains both the old and the new versions of the rebased commits, and that it is a valid way of rebasing already-published history without the known problems of a simple rebase.

Efficiency

In most cases git-imerge does not have to construct the full incremental merge. It uses an efficient algorithm, based on bisection, for filling in large blocks of the incremental merge quickly and homing in on the conflicts. A typical in-progress merge might look like this:
o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |   |    |    |    |    |    |    |    |    |     |     |
    A -   --   --   --   --   -- A6 -   -- A8 - A9 - A10 - A11
    |   |    |    |    |    |    |    |    |
    B -   --   --   --   --   -- B6 - B7 - B8   X
    |   |    |    |    |    |    |
    C -   --   --   --   --   -- C6   X
    |   |    |    |    |    |    |
    D -   --   --   --   --   -- D6
    |   |    |    |    |    |    |
    E - E1 - E2 - E3 - E4 - E5 - E6
    |   |
    F - F1   X
    |   |
    G - G1
    |   |
    H - H1
    |   |
    I - I1

    ↑
  branch
where the "X"s marks pairwise merges that are known to conflict (i.e., they need user attention), and the cross-hatched rectangles are merges that could be skipped because the merges on the boundaries of the rectangles were all successful.

git-imerge

git-imerge implements incremental merging for Git. This section shows how to use git-imerge to do a simple merge.
  • You begin much like with git merge: check out the destination branch and then tell git imerge what branch you want to merge into it:
    $ git checkout master
    $ git imerge start --name=merge-branch --first-parent branch
    
    Multiple imerges can be in progress simultaneously, so you have to give each one a name, in this case "merge-branch".
  • git-imerge then uses bisection to find pairwise merges that conflict, and asks user to fix them one pair at a time:
    Attempting automerge of 1-1...success.
    Attempting automerge of 1-4...success.
    Attempting automerge of 1-6...success.
    Attempting automerge of 9-6...failure.
    Attempting automerge of 5-6...failure.
    Attempting automerge of 3-6...success.
    Attempting automerge of 4-6...failure.
    Attempting automerge of 4-1...success.
    Attempting automerge of 4-4...failure.
    Attempting automerge of 4-3...failure.
    Attempting automerge of 4-2...success.
    Attempting automerge of 9-2...success.
    Autofilling 1-6...success.
    Autofilling 2-6...success.
    Autofilling 3-1...success.
    Autofilling 3-2...success.
    Autofilling 3-3...success.
    Autofilling 3-4...success.
    Autofilling 3-5...success.
    Autofilling 3-6...success.
    Autofilling 4-2...success.
    Autofilling 5-2...success.
    Autofilling 6-2...success.
    Autofilling 7-2...success.
    Autofilling 8-2...success.
    Autofilling 9-1...success.
    Autofilling 9-2...success.
    Attempting automerge of 4-3...failure.
    Switched to branch 'imerge/c-d'
    Auto-merging conflict.txt
    CONFLICT (add/add): Merge conflict in conflict.txt
    Automatic merge failed; fix conflicts and then commit the result.
    
    Original first commit:
    commit b7fe8a65d0cee2e388e971c4b29be8c6cbb25ee1
    Author: Lou User <luser@example.com>
    Date:   Fri May 3 14:03:05 2013 +0200
    
        c conflict
    
    Original second commit:
    commit bd0373cadae08d872536bcda8214c0631e19945a
    Author: Lou User <luser@example.com>
    Date:   Fri May 3 14:03:05 2013 +0200
    
        d conflict
    
    There was a conflict merging commit 4-3, shown above.
    Please resolve the conflict, commit the result, then type
    
        git-imerge continue
    
    Please note that git-imerge tells you exactly which of the original commits need to be combined in this merge, and displays their log messages.
  • You can get a diagram of the current state of the merge (it is in crude ASCII-art for now):
    $ git imerge diagram
    **********
    *??|?????|
    *--+-----+
    *??|#?????
    *??|??????
    *??|??????
    *--+??????
    
    Key:
      |,-,+ = rectangles forming current merge frontier
      * = merge done manually
      . = merge done automatically
      # = conflict that is currently blocking progress
      @ = merge was blocked but has been resolved
      ? = no merge recorded
    
  • After you fix the indicated conflict and commit the result, run git imerge continue to tell git-imerge to incorporate the result and proceed to the next conflict:
    $ git add ...
    $ git commit
    $ git imerge continue
    Merge has been recorded for merge 4-3.
    Attempting automerge of 5-3...success.
    Attempting automerge of 5-3...success.
    Attempting automerge of 9-3...success.
    Autofilling 5-3...success.
    Autofilling 6-3...success.
    Autofilling 7-3...success.
    Autofilling 8-3...success.
    Autofilling 9-3...success.
    Attempting automerge of 4-4...success.
    Attempting automerge of 4-5...success.
    Attempting automerge of 4-6...success.
    Attempting automerge of 9-6...success.
    Autofilling 4-6...success.
    Autofilling 5-6...success.
    Autofilling 6-6...success.
    Autofilling 7-6...success.
    Autofilling 8-6...success.
    Autofilling 9-4...success.
    Autofilling 9-5...success.
    Autofilling 9-6...success.
    Merge is complete!
    $ git imerge diagram
    **********
    *??.?????|
    *??......|
    *??.*....|
    *??.?????|
    *??.?????|
    *--------+
    
    Key:
      |,-,+ = rectangles forming current merge frontier
      * = merge done manually
      . = merge done automatically
      # = conflict that is currently blocking progress
      @ = merge was blocked but has been resolved
      ? = no merge recorded
    
  • When you are done, simplify the incremental merge into a simple merge, a simple rebase, or a rebase-with-history:
    $ git imerge finish --goal=merge
    
    By default, this creates a new branch NAME that points at the result, and checks out that branch:
    $ git log -1 --decorate
    commit 79afd870a52e216114596b52c800e45139badf74 (HEAD, merge-branch)
    Merge: 8453321 993a8a9
    Author: Lou User <luser@example.com>
    Date:   Wed May 8 10:08:07 2013 +0200
    
        Merge frobnicator into floobifier.
    
    (It also deletes all of the temporary data.)
Intermediate data
During an incremental merge, intermediate results are stored directly in your repository as special references:
refs/imerge/NAME/state
A blob containing a little bit of metadata.
refs/imerge/NAME/manual/M-N
Manual merge including all of the changes through commits M on master and N on branch.
refs/imerge/NAME/auto/M-N
Automatic merge including all of the changes through commits M on master and N on branch.
refs/heads/imerge/NAME
Temporary branch used when resolving merge conflicts.
refs/heads/NAME
Default reference name for storing final results.
Project status
As of this writing, git-imerge is very new and still experimental. Please try it out, but use it cautiously (e.g., on a clone of your main Git repository) and give me your feedback!
Contributing
I have lots of ideas for additional features, and am sure that you do too. See the project TODO list for inspiration, and get involved!

No comments: