Custom Java Agent Instrumentation API Guide

Java agents are powerful because they operate at a boundary most application code never reaches: class loading and runtime instrumentation.

That is also why they deserve more caution than ordinary libraries. An agent can improve observability, profiling, or policy enforcement across the JVM, but it can also make startup opaque, widen blast radius, and complicate rollback if the design is too broad.

What an Agent Is Good At

Custom agents make sense when the concern is genuinely cross-cutting:

timing or tracing many classes consistently
lightweight runtime diagnostics
enforcement of a narrow policy at load time
targeted bytecode transformation without changing source

They are much weaker when used to hide behavior that should really live in normal application code.

Choose the Right Entry Mode

Agents have two main entry modes:

premain(String, Instrumentation) for JVM startup via -javaagent
agentmain(String, Instrumentation) for dynamic attach

Use premain when instrumentation should be part of the normal process startup contract. Use agentmain for diagnostics, emergency analysis, or controlled runtime attach scenarios.

The operational difference matters because startup agents participate directly in boot success and startup time.

A Narrow Transformer Is a Healthy Transformer

public final class Agent {

    public static void premain(String args, Instrumentation inst) {
        ClassFileTransformer transformer = (loader, className, classBeingRedefined, domain, bytes) -> {
            if (className == null || !className.startsWith("com/company/service/")) {
                return null;
            }
            return transformClass(className, bytes);
        };

        inst.addTransformer(transformer, true);
    }

    private static byte[] transformClass(String className, byte[] original) {
        return original;
    }
}

The most important detail here is not the lambda. It is the scope.

Agents should usually begin with:

a narrow package match
deterministic transformation rules
the ability to skip most classes quickly

Broad class matching is one of the fastest ways to turn a useful agent into an infrastructure incident.

Manifest Capabilities Should Be Intentional

Typical manifest entries include:

Premain-Class
Agent-Class
Can-Redefine-Classes
Can-Retransform-Classes

Only enable the capabilities you genuinely need.

Extra capabilities are not harmless decoration. They widen the runtime surface and the number of cases the agent must handle correctly.

The Main Design Questions Are Operational

Before shipping an agent, decide:

what exact classes are eligible for transformation
what happens if transformation fails
how the agent is disabled quickly
how transformed classes are identified during incidents

Those questions matter more than the bytecode details at first. An elegant transformer with no rollback story is not production-ready.

Tip

A good first rollout transforms a small, named slice of classes and makes its own behavior easy to explain in logs and startup diagnostics.

A Good Canary Pattern

For a request-timing agent, a safe rollout looks like this:

instrument only one narrow package
record transformed class names at startup
compare startup time with and without the agent
compare CPU and latency overhead in canary traffic
keep a kill switch in agent args or deployment config

This treats the agent like infrastructure, which is exactly what it is.

Test the Agent Against the Runtime Matrix

Agent testing should include:

bytecode golden tests or transformed-class snapshots
startup tests with the agent enabled and disabled
compatibility tests across supported JDKs
classloader interaction tests where applicable
stress tests for retransformation if you support it

Agents are especially sensitive to runtime variation, so version coverage matters more than it does for ordinary application code.

Common Failure Modes

The failures to design against are predictable:

transforming too many classes
mutating semantics instead of adding orthogonal instrumentation
excessive logging from the transform path
bytecode that works on one JDK and breaks on another
dynamic attach workflows with no clear audit or rollback model

An agent should leave a small, understandable footprint.

Key Takeaways

Java agents are best for narrow, cross-cutting runtime instrumentation.
The first design question is scope, not clever transformation.
Startup behavior, kill switches, and transformed-class visibility are part of the product.
Roll agents out like infrastructure changes, with canaries and explicit rollback.

Find posts and pages

Building a Custom Java Agent (Instrumentation API)