Event-driven architecture: should the bus carry state change or the data itself?

Every event-driven system makes one decision early that it lives with for years. When something happens, does the event say “this happened, here is the id” or does it say “this happened, here is everything about it”? That choice looks like a payload detail. It is really a decision about what your systems are coupled to, how fast they can change, and who gets paged when a schema moves.

Why this choice matters more than it looks

I have watched teams treat the event payload as an afterthought. They get the broker running, agree on a topic name, and then someone asks the question that actually matters: what goes inside the message?

There are two honest answers, and they sit at opposite ends of a spectrum. The first is event notification. The event is a thin signal. It says an order was placed, carries the order id and maybe a link back to the source, and nothing else. Anyone who wants the detail goes and fetches it. Martin Fowler describes this style as carrying just id information and a link back to the sender.

The second answer is event-carried state transfer. The event is fat. It carries the full picture of what changed: the customer, the line items, the prices, the shipping address. Consumers never have to call back because the data is already in their hands.

The fat event feels like the kinder option. One message, no follow-up calls, no dependency on the producer being awake. For a long time that convenience is exactly what makes it dangerous, because of what you are quietly handing out with all that data.

The coupling you do not see until it hurts

When you put a rich object on the bus, you publish your internal model to everyone listening. Today three consumers read the order event and use two fields each. Six months later one of them depends on a field you only added for your own bookkeeping. Now you cannot change that field without breaking a team you may not even know is consuming it.

This is the trap that gets dressed up as convenience. Thoughtworks put it plainly: when consumers receive more detail than they need, they start to rely on it, and that reliance couples them to the producer’s internals. The event stops being a business fact and becomes a copy of your database row. Ben Morris calls this a leaky event: one that mirrors a relational schema and leaks implementation detail into every subscriber.

There is a second problem that only shows up with time. Events drift. They start thin and get heavier, because adding a field to a message you already publish is the path of least resistance. Someone needs one more attribute, the producer adds it, and the payload grows. I have seen an “order created” event accumulate three teams’ worth of fields because each addition was individually reasonable and nobody owned the shape as a whole. Lean payloads are easier to evolve and version; fat ones calcify, because every consumer might depend on every field and you can no longer tell which.

The reason this matters to a business and not just an architect is change velocity. The whole point of splitting systems apart is so teams can move independently. A fat event quietly stitches them back together through the payload. You end up with the operational cost of distributed systems and the change cost of a monolith, which is the worst trade on the table.

Read the advice, then check who is selling it

Here is where I want to be careful, because the industry is not a neutral narrator on this question.

A good deal of the “just put the data in the event” guidance comes from vendors whose business is moving and storing that data. Confluent and Kafka are the clearest example. Their platform is built around durable, replayable logs and stream processing, and both work better the more data flows through and stays inside the platform. Data-heavy events are good for the product. That does not make the advice wrong, but it does mean the incentive points one way, and you should read vendor architecture guidance with that in mind. The more of your state lives in their log, the stickier you are as a customer.

I want to be just as fair in the other direction, because the research here surprised me. The dogmatic version of the thin-event position falls over too. The claim that events should never carry rich data does not survive scrutiny, and there are real cases, which I come to later, where shipping the data is the right call. Fat events are the ones that undermine loose coupling, by tying every consumer to your data shape. The objection people raise against thin events runs the other way: that the callback forces a synchronous dependency on the producer and so breaks the very decoupling you were after. As I show in the next section, that only holds if you call back naively on every read, and you do not have to. Treat neither position as a rule.

The honest picture is that thin and fat events trade different kinds of coupling. A thin event reduces the structural coupling to your data model, but it adds a dependency on availability: because the consumer has to call back for the detail, the producer needs to be up and reachable at the moment it asks. A fat event does the reverse. It removes the callback but ties everyone to your data shape. There is no free option. There is only the question of which kind of coupling you would rather carry, and for most enterprise integration the answer is to avoid coupling everyone to your internal model.

How thin events actually work

If the event only carries an id, the detail has to come from somewhere. The standard answer is notification plus callback: the event announces the change, and any consumer that needs more retrieves it through a well-defined API. The producer exposes a clean contract for “give me the current state of order 4821”, and consumers call it when they care.

This is genuinely better in one important way and genuinely worse in another, and you should know both before you commit.

It is better because the producer keeps control of its model. The API is a deliberate, versioned surface, and the versioning works in the consumer’s favour. A caller pins to the version of the contract it understands, so you can change your underlying data schema, or even how you store the data, without breaking anyone still asking for the previous version. The data schema and the published contract evolve on separate clocks. The event itself stays a stable business fact.

It is worse because you have introduced a runtime dependency. If the producer is down or rate-limiting, the consumer cannot get its detail. You have also lost some visibility into the overall flow. Fowler makes this point well: with notification, the behaviour of the system is not explicit in any single piece of program text, and you often only understand it by watching a live system. A chain of thin events spread across ten services can be hard to follow when something goes wrong.

That cost is real, but it is smaller than it first looks, because the callback does not have to be synchronous or happen on every read. The thing consuming a thin event can be a local entity store that calls back once, keeps a cache of just the fields it needs, and updates that cache as new notifications arrive. Local consumers then read from the cache at memory speed and never touch the producer at all. The producer only has to be available when the cache refreshes, not every time someone wants the data.

The warm-up is the part to watch. If every consumer tries to build its whole store at once, you get a thundering herd hammering the producer’s callback API the moment a service deploys or a cache is rebuilt. Lazy loading takes the edge off: rather than backfilling everything up front, the store fetches a record the first time it is actually asked for and caches it from then on. The load spreads out across real demand instead of arriving in one spike, and records nobody reads never get fetched at all.

That idea is what the patterns below build on, so that thin events do not collapse into a storm of synchronous callbacks.

The patterns that let you stay thin at scale

Keeping data out of the payload is only practical if you have somewhere sensible to put it. Four patterns come up again and again across the primary sources, and between them they cover most situations.

Claim-check, for when the data really is big

Sometimes the payload genuinely is large: an image, a video, a scanned document. You do not want that on the bus, and you may not be allowed to put it there anyway. The claim-check pattern stores the payload in a shared external store and puts only a reference token on the message, so the broker never sees the data itself. Consumers read the token and fetch the payload from the store when they need it.

There is a practical reason this is not optional at the top end. Brokers are tuned for small messages and impose size limits. Azure Service Bus, Kafka and SQS all sit in the region of 256 KB to a megabyte by default. Those limits are loosening over time, but the direction of travel does not change the principle: large blobs do not belong in the event.

CQRS and read models, so consumers stop calling back

The callback dependency is the main weakness of thin events, and this is how you blunt it. With CQRS you separate the write side from the read side. The write side publishes a thin change-notification, and a query processor uses it to update a local read model or materialised view. Consumers then read from their own local view at speed, instead of calling the producer on every event.

You are still moving data, but you are moving it into a shape the consumer owns and controls, rather than coupling the consumer to the producer’s model. The trade is eventual consistency: the read model lags the write model by a little, and you have to design for that. For most reporting, search and dashboard use cases, slightly stale is perfectly acceptable.

Transactional outbox, so the event and the state agree

There is a quiet bug at the heart of many event-driven systems. A service updates its database and then publishes an event. If it crashes between the two, the state and the event disagree, and now your systems are out of sync in a way that is hard to detect.

The transactional outbox closes that gap. You write the state change and the event into the same database transaction, so either both happen or neither does. A separate process then reads the outbox table and publishes the events. Delivery is at-least-once, which means an event can arrive more than once, so consumers have to be idempotent: processing the same event twice must be safe. That is a small discipline to adopt and it removes a whole category of consistency bug.

Domain events and integration events are not the same thing

The last pattern is less a mechanism and more a boundary. Inside a service, your own events can be as rich as you like. They are private. The events you publish to the rest of the organisation should be a different, deliberately stable contract. Keeping internal domain events separate from external integration events stops your model leaking and lets you change your internals without breaking anyone downstream. The integration event is versioned and treated as a public API, because that is what it is.

This is the discipline that stops the thin-to-fat drift. When there is a named boundary and an owner for the published contract, fields cannot sneak in unnoticed.

A way to compare the two

When I am helping a team decide, I find it useful to lay the trade-offs side by side rather than argue in the abstract.

Consideration	Thin event (notification)	Fat event (state transfer)
Couples consumers to	the producer being available	the producer’s data model
Payload size	small and stable	grows over time
Consumer needs producer	yes, to fetch detail later	no, data is already in the event
Versioning cost	lower, the contract is small	higher, every field is a commitment
End-to-end visibility	harder, flow is spread across calls	easier, the event tells the whole story
Best when	internal integration, where teams evolve independently	consumers that cannot call back (external or offline), plus audit and replay

The table makes the real point clear. Neither column is free. You are choosing which problem you would rather have.

When carrying the data is the right call

I default to thin, but I do not pretend it is always right, and there are cases where shipping the data is the better engineering decision.

Cross-organisation integration is one, and it is worth being clear about why. The thin-event model assumes the consumer can call back to your API for the detail. Across a company boundary that assumption often breaks: the other organisation may have no network path into your systems, no credentials to authenticate, and no claim on your rate limits or uptime, and you may not want to expose an internal API to an outside party in the first place. A fat event sidesteps all of that by carrying everything the partner needs, so they never have to reach back into your estate. Inside one organisation, where a shared API is easy, that pressure mostly disappears, which is why the same argument does not push you toward fat events internally. Audit and replay is another case: if you need to reconstruct exactly what was known at the time something happened, a fat event captures that snapshot in a way a callback never can, because the source data has since moved on. And consumers that must keep working while the producer is down have a genuine reason to want the data in hand rather than a promise they can fetch it later.

There is also a failure mode worth naming on the other side. If you go thin and then decompose your services too far, you can end up with chatty synchronous callbacks flying between dozens of tiny services. That is a distributed monolith: all the latency and fragility of the network with none of the independence you were promised. Thin events are a tool, not a virtue. The read-model pattern exists precisely so you do not fall into this.

What I would actually do

For internal enterprise integration, I start thin. The event announces a business fact and carries an id, and I treat that as the default until something gives me a concrete reason to do otherwise.

When a consumer needs detail, I reach for a read model before I reach for a fatter event, so the callback dependency does not turn into a bottleneck. When the data is genuinely large, claim-check keeps it off the bus. When state and events have to agree, the outbox makes that atomic. And I keep a hard line between the rich events I use inside a service and the lean, versioned ones I publish to everyone else.

When I do decide to carry the data, I make it a conscious choice with a written reason, usually cross-organisation reach, audit, or a consumer that has to survive my downtime. That note matters, because it stops the exception quietly becoming the norm.

The thread running through all of this is the same business outcome: systems that can change without waiting for each other. Fat events feel generous in the moment and charge you later in coupling. Thin events ask for a little more engineering up front and pay it back in the freedom to move. For most enterprises, most of the time, that is the better trade.

It is worth remembering which way the industry tends to push while you make that call. A lot of vendor guidance nudges you toward fat events, because the more of your data flows through and settles inside their platform, the more valuable that platform becomes to them. That does not make the advice wrong, but the incentive runs toward fat while your own interest, most of the time, runs toward thin. Weigh the advice knowing who benefits from it.