Audit logs

Introduction

In this article, I will present the idea of an audit log and how we can implement it in the application and database level using Java and PostgreSQL.

What is an audit log

An audit log, also known as an audit trail, is a chronological set of records that provides documentary evidence.

Martin Fowler gives a definition of an audit log:

An audit log is the simplest, yet also one of the most effective forms of tracking temporal information. 
The idea is that any time something significant happens you write some record indicating what happened and when it happened.

An audit log can be implemented in many ways. It can be just a log file or record in the database. Usually, an audit log is easy to write but much harder to read and analyze. There are also many reasons why we want to have it, e.g.:

  • to reconstruct events that can be used for intrusion detection, and problem analysis.
  • to demonstrate compliance. Many regulations, like GDPR, PSD define requirements to have some version of an audit log.

We also need to remember what is the purpose of an audit log is not. In my opinion, an audit log should not be analyzed by business analytics. It would leak internal implementation that can change over time. For this purpose we should share data efficiently, e.g. emitting business events.

From my personal experience, I have a feeling that product requirements are not strictly defined about an audit log. This can lead to misunderstanding what needs to be implemented. We need to answer detailed questions: what do we want to track? How long to store data?

Audit log on application level - Hibernate

In many applications, audit logs are created on the application side. As an example, I mention Hibernate ORM Envers. It allows auditing of all mappings defined by the JPA specification.

You can use it by:

  • adding the hibernate-envers dependency to the project,
  • using annotation @Audited on your entity or properties,
  • make sure that your entity uses immutable and unique identifiers.

The example of an entity can be presented as below:

@Audited
@Entity
public class MyEntity {
  @Id
  @GeneratedValue
  private Integer id;

  @Audited(targetAuditMode = NOT_AUDITED)
  private String notAuditedProperty;

  @Column
  private String auditedProperty;
}

Database structure requires preparation

CREATE TABLE my_entity (
  id SERIAL PRIMARY KEY,
  not_audited_property VARCHAR(255) NOT NULL,
  audited_property VARCHAR(255) NOT NULL
);
CREATE TABLE my_entity_AUD (
  id bigint NOT NULL,
  REV integer NOT NULL,
  REVTYPE TINYINT,
  audited_property VARCHAR(255) NOT NULL,
  PRIMARY KEY(id, REV)
);
CREATE TABLE REVINFO (
  REV INTEGER GENERATED BY DEFAULT AS IDENTITY,
  REVTSTMP BIGINT,
  PRIMARY KEY (REV)
);
ALTER TABLE my_entity_AUD ADD CONSTRAINT fk_my_entity_AUD_REV foreign key (REV) references REVINFO;

Every change will be logged as a revision.

Explicit audit log on application level

If any the of framework doesn’t work for us, developers can decide to make an explicit implementation. In this case, every audit log trail needs to be created “manually”. This way gives us the biggest flexibility for when and what we want to log. For example, we can log access data operations. Custom solutions come with the price of: implementation, testing and maintenance. This solution is for sure the most expensive. Additionally, this method doesn’t track changes made from SQL console.

The disadvantage of audit log on application level

For me, the main disadvantage of audit log on application level is that it is on application-level ;-) Naturally, it will not log the changes made directly from the database level. Realistically, developers from time to time do some database changes, data migrations. After such manual changes, we have no information in our audit log.

The second disadvantage is less technical, more philosophical. For me, the audit log should be immutable. If we use hibernate-envers and we change the table name, we need to write a migration script for the main table and audit the log table. This also can break some compliance rules.

Audit log on database level

In the book The Art of PostgreSQL we can find a recommendation on how to create an audit log on the database level.

hstore

Before we go further, let’s understand hstore field. hstore data type is for storing key-value pairs in a single value.

To enable PostgreSQL hstore extension execute: CREATE EXTENSION hstore;

Audit log SQL implementation

Let’s create the main table:

CREATE TABLE my_entity (
id SERIAL PRIMARY KEY,
property_1 VARCHAR(255) NOT NULL,
property_2 VARCHAR(255) NOT NULL
);

Now, the table for audit logs:

CREATE TABLE audit (
  change_date TIMESTAMPTZ DEFAULT NOW(),
  before HSTORE,
  after HSTORE
);

Now, the function will convert the record to HSTORE format and insert it into the audit table.

CREATE FUNCTION audit() RETURNS TRIGGER LANGUAGE PLPGSQL 
AS $$ 
BEGIN   
    INSERT INTO audit(before, after)
        SELECT HSTORE(OLD), HSTORE(NEW);
    RETURN NEW; 
END;
$$;

Let’s add the trigger for my_entity table.

CREATE TRIGGER audit AFTER UPDATE ON my_entity FOR EACH ROW EXECUTE PROCEDURE audit();

This trigger is dedicated to being used only for 1 table. We can modify the table by adding table_name, so all tables will write audit logs to one table, and we will be able to distinguish them.

Testing

Insert and update value:

INSERT INTO my_entity (property_1, property_2) VALUES ('value1', 'value2');
UPDATE my_entity SET property_1 = 'new_value' WHERE id = 4;

Let’s see how the audit table is structured:

SELECT * FROM audit;
change_date before (hstore) after (hstore)
2022-02-21 17:10:19.845434+01 “id”=>”4”, “property_1”=>”value1”, “property_2”=>”value2” “id”=>”4”, “property_1”=>”new_value”, “property_2”=>”value2”

Now, let’s say we want to see only changes for property_1 column:

SELECT (before -> 'property_1')::VARCHAR as property_1, after - before as diff
FROM audit
limit 1
property_1 diff
value1 “property_1”=>”new_value”


Audit log SQL implementation - pros and cons

As we see in the example, we achieved basic use cases. All changes: made from the SQL level and application level are audited in the database.

As a disadvantage, I would point, out that if we want to skip some field from the audit log, our SQL code became more complicated. As a backend developer, I prefer to take the business logic out of the database.

Audit log risks

An audit log may be very helpful in many situations, but it also provides some risks to the system, e.g.:

  • It can grow too much. We need to ask ourselves: How long do we want to store data? What are the business and technical requirements?
  • It can be the source of data leaks. If an audit log is not properly secured, somebody can use it to get secret information.

Summary

As always, every solution has its own advantages and disadvantages. The key is a collection of clear requirements what we need. Then we can choose a solution that fits us the best.

Personally, I like to have an SQL level audit log because it fulfills business needs in most cases: easy to implement and keeps changes done even via SQL console.