Java

Building a Custom Java Agent (Instrumentation API)

4 min read Updated Mar 27, 2026

Load Time Instrumentation and Runtime Introspection

Java agents are powerful because they operate at a boundary most application code never reaches: class loading and runtime instrumentation.

That is also why they deserve more caution than ordinary libraries. An agent can improve observability, profiling, or policy enforcement across the JVM, but it can also make startup opaque, widen blast radius, and complicate rollback if the design is too broad.


What an Agent Is Good At

Custom agents make sense when the concern is genuinely cross-cutting:

  • timing or tracing many classes consistently
  • lightweight runtime diagnostics
  • enforcement of a narrow policy at load time
  • targeted bytecode transformation without changing source

They are much weaker when used to hide behavior that should really live in normal application code.


Choose the Right Entry Mode

Agents have two main entry modes:

  • premain(String, Instrumentation) for JVM startup via -javaagent
  • agentmain(String, Instrumentation) for dynamic attach

Use premain when instrumentation should be part of the normal process startup contract. Use agentmain for diagnostics, emergency analysis, or controlled runtime attach scenarios.

The operational difference matters because startup agents participate directly in boot success and startup time.


A Narrow Transformer Is a Healthy Transformer

public final class Agent {

    public static void premain(String args, Instrumentation inst) {
        ClassFileTransformer transformer = (loader, className, classBeingRedefined, domain, bytes) -> {
            if (className == null || !className.startsWith("com/company/service/")) {
                return null;
            }
            return transformClass(className, bytes);
        };

        inst.addTransformer(transformer, true);
    }

    private static byte[] transformClass(String className, byte[] original) {
        return original;
    }
}

The most important detail here is not the lambda. It is the scope.

Agents should usually begin with:

  • a narrow package match
  • deterministic transformation rules
  • the ability to skip most classes quickly

Broad class matching is one of the fastest ways to turn a useful agent into an infrastructure incident.


Manifest Capabilities Should Be Intentional

Typical manifest entries include:

  • Premain-Class
  • Agent-Class
  • Can-Redefine-Classes
  • Can-Retransform-Classes

Only enable the capabilities you genuinely need.

Extra capabilities are not harmless decoration. They widen the runtime surface and the number of cases the agent must handle correctly.


The Main Design Questions Are Operational

Before shipping an agent, decide:

  • what exact classes are eligible for transformation
  • what happens if transformation fails
  • how the agent is disabled quickly
  • how transformed classes are identified during incidents

Those questions matter more than the bytecode details at first. An elegant transformer with no rollback story is not production-ready.

Tip

A good first rollout transforms a small, named slice of classes and makes its own behavior easy to explain in logs and startup diagnostics.


A Good Canary Pattern

For a request-timing agent, a safe rollout looks like this:

  1. instrument only one narrow package
  2. record transformed class names at startup
  3. compare startup time with and without the agent
  4. compare CPU and latency overhead in canary traffic
  5. keep a kill switch in agent args or deployment config

This treats the agent like infrastructure, which is exactly what it is.


Test the Agent Against the Runtime Matrix

Agent testing should include:

  • bytecode golden tests or transformed-class snapshots
  • startup tests with the agent enabled and disabled
  • compatibility tests across supported JDKs
  • classloader interaction tests where applicable
  • stress tests for retransformation if you support it

Agents are especially sensitive to runtime variation, so version coverage matters more than it does for ordinary application code.


Common Failure Modes

The failures to design against are predictable:

  • transforming too many classes
  • mutating semantics instead of adding orthogonal instrumentation
  • excessive logging from the transform path
  • bytecode that works on one JDK and breaks on another
  • dynamic attach workflows with no clear audit or rollback model

An agent should leave a small, understandable footprint.


Key Takeaways

  • Java agents are best for narrow, cross-cutting runtime instrumentation.
  • The first design question is scope, not clever transformation.
  • Startup behavior, kill switches, and transformed-class visibility are part of the product.
  • Roll agents out like infrastructure changes, with canaries and explicit rollback.

Categories

Tags

Continue reading

Previous ClassLoaders and Plugin Architecture in Java Next Bytecode Engineering with ASM — Practical Intro