A.A.A.R. – Benchmark

rocketAfter the release of the recent A.A.A.R. analytics v4.0, I would like to share some tests to check the improvements of this major version, during the extraction of the informations from Alfresco into the A.A.A.R. Data Mart.

The servers

The environment used for the tests is defined in two different servers: one for Alfresco and one for Pentaho. The Alfresco server is defined as a Virtual Machine detailed below.

Processor of the physical machine: Intel i7
Number of core CPU: N.2
RAM of the VM: 5Gb
Operating System: Ubuntu 15.04
Alfresco version: CE 5.0.d
Number of documents in Alfresco:  1,000,0441
Number of folders in Alfresco: 1,057
Audit trail: disabled
Workflow instances: default

The Pentaho server is defined as a Virtual Machine detailed below:

Processor of the physical machine: Intel i7
Number of core CPU: N.2
RAM of the VM: 5Gb
Operating System: Ubuntu 14.04.02 LTS
Pentaho version: CE 5.4
A.A.A.R. version: 4.0
Pentaho Data Integration memory 3Gb

Test n.1 – Extraction on PostgreSql

The first test consists in the very first extraction into a brand new A.A.A.R. installation on a PostgreSql database. Below the results of the test after the execution of the AAAR_Extract command.

PostgreSql - First extraction(*)

Get audit: 00:00:08,
Get nodes: REST extraction: 01:04:20, DATA MART loading: 00:12:03,
Get workflows: 00:00:45.

Duration: 01:17:16

(*) Time formatted in HH:MM:SS.

After the extraction of the one milion of documents, considering that the ETL is incremental, another extraction has been done and measured. Below the results of the test in the same format and representation.

PostgreSql - Second extraction(*)

Get audit: 00:01:02,
Get nodes: REST extraction: 00:25:18, DATA MART loading: 00:06:13, 
Get workflows: 00:00:15. 

Duration: 00:32:48

(*) Time formatted in HH:MM:SS.

Test n.2 – Extraction on MySql

The first test consists in the very first extraction into a brand new A.A.A.R. installation on a MySql database. Below the results of the test after the execution of the AAAR_Extract command.

MySql - First extraction(*)

Get audit: 00:00:12,
Get nodes: REST extraction: 01:20:48, DATA MART loading: 00:29:33,
Get workflows: 00:00:36.

Duration: 01:52:09

(*) Time formatted in HH:MM:SS.

After the extraction of the one milion of documents, considering that the ETL is incremental, another extraction has been done and measured. Below the results of the test in the same format and representation.

MySql - Second extraction(*)

Get audit: 00:00:52,
Get nodes: REST extraction: 00:16:28, DATA MART loading: 00:00:47, 
Get workflows: 00:00:14. 

Duration: 00:18:21

(*) Time formatted in HH:MM:SS.

Conclusion

In this post is shared the results of some tests on the recent release of A.A.A.R. analytics v4.0. The tests are focused on the repository extraction, considering that on of the most important improvements of this major version is the re-design of the Alfresco extraction.

<< Install A.A.A.R.   |  Up to A.A.A.R. home  |  Advanced configuration >>

I like A.A.A.R.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.