Event Logs

Most system have some sort of event logs – i.e. what has happened in the system and who did it. And sometimes it has a dual existence – once as an “audit log”, and once as event log, which is used to replay what has happened.

These are actually two separate concepts:

  • the audit log is the trace that every action leaves in the system so that the system can later be audited. It’s preferable that this log is somehow secured (will discuss that another time)
  • the event log is a crucial part of the event-sourcing model where the database only stores modifications, rather than the current state. The current state is the obtained after applying all the stored modifications until the present moment. This allows seeing the state of the data at any moment in the past.

There are a bunch of ways to get that functionality. For audit logs there is Hibernate Envers, which stores all modifications in a separate table. You can also have a custom solution using spring aspects or JPA listeners (that store modifications in an audit log table whenever a change happens). You can store the changes in multiple ways – as key/value rows (one row per modified field), or as objects serialized to JSON.

Event-sourcing can be achieved by always inserting a new record instead of updating or deleting (and incrementing a version and/or setting a “current” flag). There some event-sourcing-native databases – Datomic and Event Store. (Note: “event-sourcing” isn’t equal to “insert-only”, but the approach is very similar)

They still seem pretty similar – both track what has happened on the system in a detailed way. So can an audit log be used as an event log, and can event logs be used as audit logs? And is there a difference?

Audit logs have specific features that you don’t need for event sourcing – storing business actions and being secure. When a user logs in, or views a given item, that wouldn’t issue an update (well, maybe last_login_time or last_seen can be updated, but those are side-effects). But you still may want to see details about the event in your audit log – who logged in, when, from what IP, after how many unsuccessful attempts. You may also want to know who read a given piece of information – if its sensitive personal data, it’s important to know who has seen it and whether the access was not some form of “stalking”. In the audit log you can also have “business events” rather than individual database operations. For example “checkout a basket” involves updating the basket status, decreasing the availability of the items, updating the purchase history. And you may want to just see “user X checked out basket Y with items Z at time W”.

Another very important feature if the audit logs is their integrity. You should somehow protect them from modification (e.g. by digitally signing them). Why? Because at some point that may be evidence in court (e.g. after some employee misuses your system), so the more secure and protected it is, the better evidence it is. This isn’t needed for event-sourcing, obviously.

Event sourcing has other requirements – every modification with every field must be stored. You should also be able to have “snapshots” so that the “current state” isn’t recalculated every time. The event log isn’t just a listener or an aspect around your persistence layer – it’s at the core of your data model. So in that sense it has more impact on the whole system design. Another thing event logs are useful for is consuming events from the system. For example, if you have an indexer module and you want to index changes as the come, you can “subscribe” the indexer to the events and each insert would trigger an index operation. That is not limited to indexing, and can be used for synchronizing (parts of) the state with external systems (which a Search engine is just one example of).

But due to the many similarities – can’t we have a common solution that does both? An audit log is a poor event-sourcing tool per-se, and event-sourcing models are not sufficient for audit logs.

Since event sourcing is a design choice, I think it should take the leading role – if you use event sourcing, then you can add some additional inserts for non-data-modifying business-level operations (login, view), and have your high-level actions (e.g. checkout) represented as events. You can also get a scheduled job to sign the entries (that may mean doing updates in an insert-only model, which would be weird, but still). And you will get a full-featured audit log with a little additional effort over the event sourcing mechanism.

The other way around is a bit tricky – you can still use the audit log to see the state of a piece of data in the past, but that doesn’t mean your data model will be insert-only. If you rely too much on the audit log as if it was event sourcing, maybe you need event sourcing in the first place. If it’s just occasional interest “what happened here two days ago”, then it’s fine having just the audit log.

The two concepts are overlapping so much, that it’s tempting to have both. But in conclusion, I’d say that:

  • you always need an audit log while you don’t necessarily need event sourcing.
  • if you have decided on using event-sourcing, walk the extra mile to get a proper audit log

But of course, as usual, it all depends on the particular system and its use-cases.

The post Event Logs appeared first on Bozho's tech blog.