Myers Diff Algorithm¶
Algorithm Introduction¶
The Myers diff algorithm is an efficient sequence comparison algorithm proposed by Eugene W. Myers in 1986. It finds the minimal edit operations (insertions and deletions) required to transform one sequence into another. It's widely used in version control systems (like Git), text comparison tools, and bioinformatics.
Algorithm Principles¶
The Myers algorithm is based on these key concepts:
- Edit Graph: Represents two sequences A and B as a grid where horizontal moves represent deletions, vertical moves represent insertions, and diagonal moves represent matches (no operation).
- Shortest Edit Script (SES): The shortest path from the top-left to bottom-right corner of the graph, representing the minimal edit operations.
- k-lines: Represent all points in the edit graph where x - y = k.
Algorithm Steps¶
- Construct an edit graph with sequence A on the x-axis and sequence B on the y-axis
- Starting from origin (0,0), calculate the shortest path to reach endpoint (N,M)
- For each d (edit distance), calculate the furthest reachable point
- When the path reaches the endpoint, backtrack to construct the diff result
Code Implementation¶
def myers_diff(a, b):
"""
Compute the difference between two sequences using Myers diff algorithm
Args:
a: First sequence
b: Second sequence
Returns:
List of edit operations where:
- ('+', i, x) means insert element x at position i
- ('-', i, x) means delete element x at position i
"""
# Create edit graph
n = len(a)
m = len(b)
max_edit = n + m
# Store furthest x coordinate on each k-line
v = {1: 0}
# Store path
trace = []
# Compute shortest edit path
for d in range(max_edit + 1):
# Save current d's path info
trace.append(v.copy())
for k in range(-d, d + 1, 2):
# Determine whether to move down or right
if k == -d or (k != d and v.get(k - 1, -1) < v.get(k + 1, -1)):
x = v.get(k + 1, -1) # Move down
else:
x = v.get(k - 1, -1) + 1 # Move right
# Calculate y coordinate
y = x - k
# Move diagonally (matches)
while x < n and y < m and a[x] == b[y]:
x += 1
y += 1
# Update v
v[k] = x
# If reached endpoint, backtrack and return edit script
if x >= n and y >= m:
return _backtrack(a, b, trace, n, m)
# If no path found (shouldn't happen)
return []
def _backtrack(a, b, trace, x, y):
"""Backtrack to construct edit script"""
result = []
d = len(trace) - 1
while d > 0:
v = trace[d]
k = x - y
# Determine previous k value
if k == -d or (k != d and v.get(k - 1, -1) < v.get(k + 1, -1)):
prev_k = k + 1
else:
prev_k = k - 1
# Calculate previous position
prev_x = v.get(prev_k, 0)
prev_y = prev_x - prev_k
# Handle diagonal moves (matches)
while x > prev_x and y > prev_y:
x -= 1
y -= 1
# Record edit operation
if x == prev_x:
# Insert operation
result.insert(0, ('+', x, b[prev_y]))
else:
# Delete operation
result.insert(0, ('-', prev_x, a[prev_x]))
# Update position
x = prev_x
y = prev_y
d -= 1
return result
# Test
a = "ABCABBA"
b = "CBABAC"
diff = myers_diff(a, b)
print("Diff result:", diff)
import java.util.*;
public class MyersDiff {
public static void main(String[] args) {
String a = "ABCABBA";
String b = "CBABAC";
List<EditOperation> diff = myersDiff(a, b);
System.out.println("Diff result:");
for (EditOperation op : diff) {
System.out.println(op);
}
}
// Edit operation class
static class EditOperation {
char type; // '+' for insert, '-' for delete
int position;
char element;
public EditOperation(char type, int position, char element) {
this.type = type;
this.position = position;
this.element = element;
}
@Override
public String toString() {
return "(" + type + ", " + position + ", " + element + ")";
}
}
public static List<EditOperation> myersDiff(String a, String b) {
char[] aChars = a.toCharArray();
char[] bChars = b.toCharArray();
int n = aChars.length;
int m = bChars.length;
int maxEdit = n + m;
// Store furthest x coordinate on each k-line
Map<Integer, Integer> v = new HashMap<>();
v.put(1, 0);
// Store path
List<Map<Integer, Integer>> trace = new ArrayList<>();
// Compute shortest edit path
for (int d = 0; d <= maxEdit; d++) {
// Save current d's path info
trace.add(new HashMap<>(v));
for (int k = -d; k <= d; k += 2) {
// Determine whether to move down or right
int x;
if (k == -d || (k != d && v.getOrDefault(k - 1, -1) < v.getOrDefault(k + 1, -1))) {
x = v.getOrDefault(k + 1, -1); // Move down
} else {
x = v.getOrDefault(k - 1, -1) + 1; // Move right
}
// Calculate y coordinate
int y = x - k;
// Move diagonally (matches)
while (x < n && y < m && aChars[x] == bChars[y]) {
x++;
y++;
}
// Update v
v.put(k, x);
// If reached endpoint, backtrack and return edit script
if (x >= n && y >= m) {
return backtrack(aChars, bChars, trace, n, m);
}
}
}
// If no path found (shouldn't happen)
return new ArrayList<>();
}
private static List<EditOperation> backtrack(char[] a, char[] b, List<Map<Integer, Integer>> trace, int x, int y) {
List<EditOperation> result = new ArrayList<>();
int d = trace.size() - 1;
while (d > 0) {
Map<Integer, Integer> v = trace.get(d);
int k = x - y;
// Determine previous k value
int prevK;
if (k == -d || (k != d && v.getOrDefault(k - 1, -1) < v.getOrDefault(k + 1, -1))) {
prevK = k + 1;
} else {
prevK = k - 1;
}
// Calculate previous position
int prevX = v.getOrDefault(prevK, 0);
int prevY = prevX - prevK;
// Handle diagonal moves (matches)
while (x > prevX && y > prevY) {
x--;
y--;
}
// Record edit operation
if (x == prevX) {
// Insert operation
result.add(0, new EditOperation('+', x, b[prevY]));
} else {
// Delete operation
result.add(0, new EditOperation('-', prevX, a[prevX]));
}
// Update position
x = prevX;
y = prevY;
d--;
}
return result;
}
}
Complexity Analysis¶
- Time complexity: O(ND), where N is the total length of both sequences and D is the edit distance (minimal edit operations)
- Space complexity: O(ND) for storing path information
Algorithm Example¶
Let's understand how Myers diff works with a concrete example. Suppose we have two strings: - A = "ABCABBA" - B = "CBABAC"
Edit Graph Representation¶
We can represent these strings as an edit graph where: - x-axis represents string A - y-axis represents string B - diagonals represent matches (when A[i] = B[j])
Computation Process¶
- Starting from (0,0), we try to find the shortest path to (7,6)
- For each edit distance d, we calculate the furthest reachable point
- When we find a path to the endpoint, we backtrack to construct the diff result
Final Diff Result¶
For our example, Myers algorithm would produce a diff result like: - Delete A[0]: A - Keep B[0]: C - Keep A[1]: B - Keep A[2]: C - Keep A[3]: A - Keep A[4]: B - Keep A[5]: B - Keep A[6]: A - Insert B[5]: C
This represents the minimal edit operations to transform "ABCABBA" to "CBABAC".
Animation Demo¶
Applications¶
- Version control systems: Git and other VCS use diff algorithms similar to Myers for comparing file versions
- Text comparison tools: Tools like diff, Beyond Compare
- Collaborative editing: Conflict resolution in real-time collaborative editors
- Bioinformatics: DNA sequence alignment
Exercises¶
-
Basic implementation: Implement Myers diff algorithm to compute edit operations between two strings.
-
Visualization: Create a simple visualization tool showing Myers algorithm execution and edit graph.
-
Optimization: Myers algorithm can consume significant memory for large sequences. Try implementing a space-optimized version storing only necessary information.
-
Application: Use Myers diff to implement a simple text comparison tool highlighting differences between two text files.
-
Extension: Modify Myers algorithm to handle replacement operations (changing one element to another) in addition to insertions and deletions.
-
Challenge: Implement a linear-space variant of Myers algorithm using O(N) space instead of O(ND), where N is total sequence length and D is edit distance.
References¶
- Myers, E. W. (1986). "An O(ND) Difference Algorithm and Its Variations". Algorithmica. 1 (1): 251–266.
- Miller, W.; Myers, E. W. (1985). "A File Comparison Program". Software: Practice and Experience. 15 (11): 1025–1040.
- Hunt, J. W.; McIlroy, M. D. (1976). "An Algorithm for Differential File Comparison". Computing Science Technical Report, Bell Laboratories 41.