It’s a long time that I have contacts from the Alfresco community with a similar question like the one described below.
After some time the audit trail is enabled in my Alfresco instance, the database dimension explodes. Is the audit trail connected in some way with the data explosion? How can I solve it?
Most of the times this question is connected to the A.A.A.R. analytics that is able to collect (and analyze) audit data together with repository data and workflow data. The bad news is that this issue is completely true. Of course if you request to track every action/event, you cannot avoid to store a lot amount of data. The good news is that there are some ways to solve this problem, directly with the bundled services in Alfresco.
In this post I would like to share a way to solve the problem, deleting the “old” actions audited in a very simple manner.
Description of the development environment
Before sharing the solution, let’s see how the development environment is defined. In a vanilla installation of Ubuntu 14.04 LTS with a brand new Alfresco CE 5.1-EA on board, let’s enable the audit trail following the instructions detailed here. Last but not least, pgadmin has been installed with the simple command below.
sudo apt-get install pgadmin3
How to check the Alfresco audit table
Using pgadmin3, let’s open the alfresco database and point to the
alf_audit_entry table. As you can see, without any access or action into Alfresco, the table is empty.
At every action in Alfresco, one or more rows are stored in the table. More actions and users act on Alfresco, more data increases its dimension.
How to populate the audit trail
Now that we know where the audit trail is stored, let’s act into Alfresco accessing with a successful login several times. As you can see below, the new screenshot shows the
alf_audit_entry table filled with some rows.
From another point of view, using the bundled REST service to retrieve the audit trail in Alfresco, we can see the audit actions using a browser. Below you can see the enquiry of the alfresco-access application, used to audit every access to Alfresco.
How to clean the audit trail
Now that the audit trail is filled with lots of actions, we can start the core discussion on how to reduce it. In other words: how to clean it. Taking a look to the Alfresco documentation, you can find another bundled REST service, used in particular to delete partially or entirely the actions. The goal is clear: executing the REST clean service to delete (partially or entirely) the audit trail.
To develop the task we use a Google Chromium addon called Advanced REST Client. The reason why this tool is used is the method requested from the Alfresco REST service. Infact, the Alfresco audit service clear, requests a POST method.
Below you can see the invocation of the Alfresco audit service clear. In our case without any limit (developing in that way an entire cleaning of the actions) but you can take a look to the Alfresco documentation, to easily develop an partial cleaning of the audit trail.
As you can see from the returning result of the service, in JSON, the cleared actions are totally six but the audited actions were only five. The sixth action deleted is the one generated during the access of the REST service itself.
Now that the audit trail is cleaned, let’s check again the
alf_audit_entry table and we can see that it is empty.
Other possibile solutions
To complete the discussion, I would like to submit to the attention other solutions to reach the same goal.
Directly in the official Alfresco documentation you can find a description of how to schedule or manually trigger the propTablesCleanupTrigger script to clean up audit and property values tables (alf_audit_ and alf_prop_ tables). This is a nice approach, if you want to periodically apply the audit trail cleaning.
Axel Faust explains how to use an SQL script for properly clearing Alfresco unused audit data / property values. This solution is available for PostgreSQL and MySQL.
Probably less useful in practice but with this Alfresco REST service you can delete a single audit actions.
Impact on the A.A.A.R. analytics
Before the conclusions, I would like to briefly discuss the impact of audit trail cleaning in case of analytics using A.A.A.R. What to say is simple: the A.A.A.R. extraction of data is incremental. This means that the actions extracted are all and only the ones generated after the last extraction (based on the
id value). For this reason the cleaning of the audit trail is completely safe, under the condition that the
AAAR_Extract script is executed periodically and with success.
In this post is shared a way to delete the audit actions from an Alfresco instance, in particular from the Alfresco database. This solution is one of the possibile solutions into Alfresco and it’s very useful when the audit trail generates an explosion of data (very common in practice). This solution could be used also when A.A.A.R. analytics is installed, because A.A.A.R. Data Mart stores the historical data without the need to maintain the same data into Alfresco.