DSA

Z-Algorithm for String Matching in Java - Interview Preparation Guide

6 min read Updated Mar 27, 2026

Prefix Match Array Techniques

The Z algorithm is the string-matching pattern for reusing previously known match intervals against the full string prefix. Strong candidates explain the [L, R] box invariant clearly, because that is what turns repeated character comparisons into a linear scan.

Interview lens

A strong explanation should name the invariant, the safe transition, and the condition that makes this pattern preferable to brute force.

Pattern Summary Table

Pattern When to Use Key Idea Example
04 15 Z Algorithm String Matching prefix matches at every position are useful for search or string analysis reuse the current prefix-match window before extending comparisons Pattern Search with Z Array

Problem Statement

Given a string or a pattern-plus-text combination, compute how much each suffix-prefix alignment matches without rescanning everything from scratch.

Note

Emphasize the constraints before coding. The real signal is often whether the brute-force search space, update volume, or graph model makes the naive solution impossible.

Pattern Recognition Signals

  • Keywords in the problem: Z array, prefix matches, [L,R] window, linear string matching.
  • Structural signal: positions inside the current match box can borrow information from the mirrored prefix region.
  • Complexity signal: the optimized version avoids repeated rescans, recomputation, or state explosion that brute force would suffer.

Important

If prefix-match lengths at every position are the useful state, think Z algorithm.

Java Template

public int[] zArray(String s) {
    int n = s.length();
    int[] z = new int[n];
    for (int i = 1, l = 0, r = 0; i < n; i++) {
        if (i <= r) z[i] = Math.min(r - i + 1, z[i - l]);
        while (i + z[i] < n && s.charAt(z[i]) == s.charAt(i + z[i])) z[i]++;
        if (i + z[i] - 1 > r) { l = i; r = i + z[i] - 1; }
    }
    return z;
}

Pattern Matching with Z-Array

For finding pattern in text:

  1. build string pattern + "#" + text
  2. compute Z-array
  3. any index with Z[i] == pattern.length() is a match
public List<Integer> zMatch(String text, String pat) {
    String s = pat + "#" + text;
    int[] z = zArray(s);
    List<Integer> out = new ArrayList<>();
    int m = pat.length();
    for (int i = m + 1; i < s.length(); i++) {
        if (z[i] == m) out.add(i - m - 1);
    }
    return out;
}

Dry Run (Conceptual)

text="ababcab", pat="ab"

Combined: "ab#ababcab"

Z values equal to 2 at text offsets where "ab" starts:

  • index 0
  • index 2
  • index 5

So matches are [0,2,5].


Z-Box Invariant

Maintain [l, r] as rightmost segment where substring matches prefix. If i is inside this box, reuse previous Z information before extending.

This reuse is why algorithm is linear O(n).


Problem 1: Pattern Matching with the Z Algorithm

Problem description: Given text and pattern, return every index where pattern occurs inside text.

What we are solving actually: We want pattern matches at every text position, but recomputing common prefixes repeatedly is wasteful. The Z-box keeps the best known matching window and reuses it.

What we are doing actually:

  1. Build pattern + "#" + text so any full match has length pattern.length().
  2. Compute the Z array on the combined string.
  3. Reuse the current [l, r] matching window when possible.
  4. Convert full-match positions back into text indices.
public List<Integer> findOccurrences(String text, String pattern) {
    String combined = pattern + "#" + text;
    int[] z = buildZ(combined);
    List<Integer> ans = new ArrayList<>();

    for (int i = pattern.length() + 1; i < combined.length(); i++) {
        if (z[i] == pattern.length()) {
            ans.add(i - pattern.length() - 1); // Translate combined-string index back to the text index.
        }
    }
    return ans;
}

private int[] buildZ(String s) {
    int[] z = new int[s.length()];
    for (int i = 1, l = 0, r = 0; i < s.length(); i++) {
        if (i <= r) {
            z[i] = Math.min(r - i + 1, z[i - l]); // Reuse the known matching window when possible.
        }
        while (i + z[i] < s.length() && s.charAt(z[i]) == s.charAt(i + z[i])) {
            z[i]++; // Extend the match beyond the current window.
        }
        if (i + z[i] - 1 > r) {
            l = i;
            r = i + z[i] - 1; // Record the new rightmost Z-box.
        }
    }
    return z;
}

Debug steps:

  • print the Z array for "aba#abacaba" to inspect reused windows
  • test a case where pattern appears multiple times with overlap
  • verify the invariant that [l, r] always stores the rightmost segment matching the prefix

Problem-Fit Checklist

  • Identify whether input size or query count requires preprocessing or specialized data structures.
  • Confirm problem constraints (sorted input, non-negative weights, DAG-only, immutable array, etc.).
  • Validate that the pattern gives asymptotic improvement over brute-force under worst-case input.
  • Define explicit success criteria: value only, index recovery, count, path reconstruction, or ordering.

Invariant and Reasoning

  • Write one invariant that must stay true after every transition (loop step, recursion return, or update).
  • Ensure each step makes measurable progress toward termination.
  • Guard boundary states explicitly (empty input, singleton, duplicates, overflow, disconnected graph).
  • Add a quick correctness check using a tiny hand-worked example before coding full solution.

Complexity and Design Notes

  • Compute time complexity for both preprocessing and per-query/per-update operations.
  • Track memory overhead and object allocations, not only Big-O notation.
  • Prefer primitives and tight loops in hot paths to reduce GC pressure in Java.
  • If multiple variants exist, choose the one with the simplest correctness proof first.

Production Perspective

  • Convert algorithmic states into explicit metrics (queue size, active nodes, cache hit ratio, relaxation count).
  • Add guardrails for pathological inputs to avoid latency spikes.
  • Keep implementation deterministic where possible to simplify debugging and incident analysis.
  • Separate pure algorithm logic from I/O and parsing so the core stays testable.

Implementation Workflow

  1. Implement the minimal correct template with clear invariants.
  2. Add edge-case tests before optimizing.
  3. Measure complexity-sensitive operations on realistic input sizes.
  4. Refactor for readability only after behavior is locked by tests.

Common Mistakes

  1. Choosing the pattern without proving problem fit.
  2. Ignoring edge cases (empty input, duplicates, overflow, disconnected state).
  3. Mixing multiple strategies without clear invariants.
  4. No complexity analysis against worst-case input.

  1. Pattern matching tasks via Z-array (CP)
  2. Longest Happy Prefix (LC 1392)
    LeetCode

Key Takeaways

  • This pattern is most effective when transitions are explicit and invariants are enforced at every step.
  • Strong preconditions and boundary handling make these implementations production-safe.
  • Reuse this template and adapt it per problem constraints.

Categories

Tags

Continue reading

Previous Rabin-Karp String Matching in Java - Interview Preparation Guide