Skip to content

Myers Diff Algorithm

Algorithm Introduction

The Myers diff algorithm is an efficient sequence comparison algorithm proposed by Eugene W. Myers in 1986. It finds the minimal edit operations (insertions and deletions) required to transform one sequence into another. It's widely used in version control systems (like Git), text comparison tools, and bioinformatics.

Algorithm Principles

The Myers algorithm is based on these key concepts:

  1. Edit Graph: Represents two sequences A and B as a grid where horizontal moves represent deletions, vertical moves represent insertions, and diagonal moves represent matches (no operation).
  2. Shortest Edit Script (SES): The shortest path from the top-left to bottom-right corner of the graph, representing the minimal edit operations.
  3. k-lines: Represent all points in the edit graph where x - y = k.

Algorithm Steps

  1. Construct an edit graph with sequence A on the x-axis and sequence B on the y-axis
  2. Starting from origin (0,0), calculate the shortest path to reach endpoint (N,M)
  3. For each d (edit distance), calculate the furthest reachable point
  4. When the path reaches the endpoint, backtrack to construct the diff result

Code Implementation

def myers_diff(a, b):
    """
    Compute the difference between two sequences using Myers diff algorithm

    Args:
        a: First sequence
        b: Second sequence

    Returns:
        List of edit operations where:
        - ('+', i, x) means insert element x at position i
        - ('-', i, x) means delete element x at position i
    """
    # Create edit graph
    n = len(a)
    m = len(b)
    max_edit = n + m

    # Store furthest x coordinate on each k-line
    v = {1: 0}
    # Store path
    trace = []

    # Compute shortest edit path
    for d in range(max_edit + 1):
        # Save current d's path info
        trace.append(v.copy())

        for k in range(-d, d + 1, 2):
            # Determine whether to move down or right
            if k == -d or (k != d and v.get(k - 1, -1) < v.get(k + 1, -1)):
                x = v.get(k + 1, -1)  # Move down
            else:
                x = v.get(k - 1, -1) + 1  # Move right

            # Calculate y coordinate
            y = x - k

            # Move diagonally (matches)
            while x < n and y < m and a[x] == b[y]:
                x += 1
                y += 1

            # Update v
            v[k] = x

            # If reached endpoint, backtrack and return edit script
            if x >= n and y >= m:
                return _backtrack(a, b, trace, n, m)

    # If no path found (shouldn't happen)
    return []

def _backtrack(a, b, trace, x, y):
    """Backtrack to construct edit script"""
    result = []
    d = len(trace) - 1

    while d > 0:
        v = trace[d]
        k = x - y

        # Determine previous k value
        if k == -d or (k != d and v.get(k - 1, -1) < v.get(k + 1, -1)):
            prev_k = k + 1
        else:
            prev_k = k - 1

        # Calculate previous position
        prev_x = v.get(prev_k, 0)
        prev_y = prev_x - prev_k

        # Handle diagonal moves (matches)
        while x > prev_x and y > prev_y:
            x -= 1
            y -= 1

        # Record edit operation
        if x == prev_x:
            # Insert operation
            result.insert(0, ('+', x, b[prev_y]))
        else:
            # Delete operation
            result.insert(0, ('-', prev_x, a[prev_x]))

        # Update position
        x = prev_x
        y = prev_y
        d -= 1

    return result

# Test
a = "ABCABBA"
b = "CBABAC"
diff = myers_diff(a, b)
print("Diff result:", diff)
import java.util.*;

public class MyersDiff {
    public static void main(String[] args) {
        String a = "ABCABBA";
        String b = "CBABAC";
        List<EditOperation> diff = myersDiff(a, b);
        System.out.println("Diff result:");
        for (EditOperation op : diff) {
            System.out.println(op);
        }
    }

    // Edit operation class
    static class EditOperation {
        char type; // '+' for insert, '-' for delete
        int position;
        char element;

        public EditOperation(char type, int position, char element) {
            this.type = type;
            this.position = position;
            this.element = element;
        }

        @Override
        public String toString() {
            return "(" + type + ", " + position + ", " + element + ")";
        }
    }

    public static List<EditOperation> myersDiff(String a, String b) {
        char[] aChars = a.toCharArray();
        char[] bChars = b.toCharArray();
        int n = aChars.length;
        int m = bChars.length;
        int maxEdit = n + m;

        // Store furthest x coordinate on each k-line
        Map<Integer, Integer> v = new HashMap<>();
        v.put(1, 0);

        // Store path
        List<Map<Integer, Integer>> trace = new ArrayList<>();

        // Compute shortest edit path
        for (int d = 0; d <= maxEdit; d++) {
            // Save current d's path info
            trace.add(new HashMap<>(v));

            for (int k = -d; k <= d; k += 2) {
                // Determine whether to move down or right
                int x;
                if (k == -d || (k != d && v.getOrDefault(k - 1, -1) < v.getOrDefault(k + 1, -1))) {
                    x = v.getOrDefault(k + 1, -1);  // Move down
                } else {
                    x = v.getOrDefault(k - 1, -1) + 1;  // Move right
                }

                // Calculate y coordinate
                int y = x - k;

                // Move diagonally (matches)
                while (x < n && y < m && aChars[x] == bChars[y]) {
                    x++;
                    y++;
                }

                // Update v
                v.put(k, x);

                // If reached endpoint, backtrack and return edit script
                if (x >= n && y >= m) {
                    return backtrack(aChars, bChars, trace, n, m);
                }
            }
        }

        // If no path found (shouldn't happen)
        return new ArrayList<>();
    }

    private static List<EditOperation> backtrack(char[] a, char[] b, List<Map<Integer, Integer>> trace, int x, int y) {
        List<EditOperation> result = new ArrayList<>();
        int d = trace.size() - 1;

        while (d > 0) {
            Map<Integer, Integer> v = trace.get(d);
            int k = x - y;

            // Determine previous k value
            int prevK;
            if (k == -d || (k != d && v.getOrDefault(k - 1, -1) < v.getOrDefault(k + 1, -1))) {
                prevK = k + 1;
            } else {
                prevK = k - 1;
            }

            // Calculate previous position
            int prevX = v.getOrDefault(prevK, 0);
            int prevY = prevX - prevK;

            // Handle diagonal moves (matches)
            while (x > prevX && y > prevY) {
                x--;
                y--;
            }

            // Record edit operation
            if (x == prevX) {
                // Insert operation
                result.add(0, new EditOperation('+', x, b[prevY]));
            } else {
                // Delete operation
                result.add(0, new EditOperation('-', prevX, a[prevX]));
            }

            // Update position
            x = prevX;
            y = prevY;
            d--;
        }

        return result;
    }
}

Complexity Analysis

  • Time complexity: O(ND), where N is the total length of both sequences and D is the edit distance (minimal edit operations)
  • Space complexity: O(ND) for storing path information

Algorithm Example

Let's understand how Myers diff works with a concrete example. Suppose we have two strings: - A = "ABCABBA" - B = "CBABAC"

Edit Graph Representation

We can represent these strings as an edit graph where: - x-axis represents string A - y-axis represents string B - diagonals represent matches (when A[i] = B[j])

    | C B A B A C
  --+-------------
    | 0 1 2 3 4 5
  A | 1
  B | 2
  C | 3
  A | 4
  B | 5
  B | 6
  A | 7

Computation Process

  1. Starting from (0,0), we try to find the shortest path to (7,6)
  2. For each edit distance d, we calculate the furthest reachable point
  3. When we find a path to the endpoint, we backtrack to construct the diff result

Final Diff Result

For our example, Myers algorithm would produce a diff result like: - Delete A[0]: A - Keep B[0]: C - Keep A[1]: B - Keep A[2]: C - Keep A[3]: A - Keep A[4]: B - Keep A[5]: B - Keep A[6]: A - Insert B[5]: C

This represents the minimal edit operations to transform "ABCABBA" to "CBABAC".

Animation Demo

Applications

  1. Version control systems: Git and other VCS use diff algorithms similar to Myers for comparing file versions
  2. Text comparison tools: Tools like diff, Beyond Compare
  3. Collaborative editing: Conflict resolution in real-time collaborative editors
  4. Bioinformatics: DNA sequence alignment

Exercises

  1. Basic implementation: Implement Myers diff algorithm to compute edit operations between two strings.

  2. Visualization: Create a simple visualization tool showing Myers algorithm execution and edit graph.

  3. Optimization: Myers algorithm can consume significant memory for large sequences. Try implementing a space-optimized version storing only necessary information.

  4. Application: Use Myers diff to implement a simple text comparison tool highlighting differences between two text files.

  5. Extension: Modify Myers algorithm to handle replacement operations (changing one element to another) in addition to insertions and deletions.

  6. Challenge: Implement a linear-space variant of Myers algorithm using O(N) space instead of O(ND), where N is total sequence length and D is edit distance.

References

  1. Myers, E. W. (1986). "An O(ND) Difference Algorithm and Its Variations". Algorithmica. 1 (1): 251–266.
  2. Miller, W.; Myers, E. W. (1985). "A File Comparison Program". Software: Practice and Experience. 15 (11): 1025–1040.
  3. Hunt, J. W.; McIlroy, M. D. (1976). "An Algorithm for Differential File Comparison". Computing Science Technical Report, Bell Laboratories 41.


📅 Created 0 days ago ✏️ Updated 0 days ago