After the release of the recent A.A.A.R. analytics v4.0, I would like to share some tests to check the improvements of this major version, during the extraction of the informations from Alfresco into the A.A.A.R. Data Mart.
The servers
The environment used for the tests is defined in two different servers: one for Alfresco and one for Pentaho. The Alfresco server is defined as a Virtual Machine detailed below.
Processor of the physical machine: | Intel i7 |
Number of core CPU: | N.2 |
RAM of the VM: | 5Gb |
Operating System: | Ubuntu 15.04 |
Alfresco version: | CE 5.0.d |
Number of documents in Alfresco: | 1,000,0441 |
Number of folders in Alfresco: | 1,057 |
Audit trail: | disabled |
Workflow instances: | default |
The Pentaho server is defined as a Virtual Machine detailed below:
Processor of the physical machine: | Intel i7 |
Number of core CPU: | N.2 |
RAM of the VM: | 5Gb |
Operating System: | Ubuntu 14.04.02 LTS |
Pentaho version: | CE 5.4 |
A.A.A.R. version: | 4.0 |
Pentaho Data Integration memory | 3Gb |
Test n.1 – Extraction on PostgreSql
The first test consists in the very first extraction into a brand new A.A.A.R. installation on a PostgreSql database. Below the results of the test after the execution of the AAAR_Extract
command.
PostgreSql - First extraction(*) Get audit: 00:00:08, Get nodes: REST extraction: 01:04:20, DATA MART loading: 00:12:03, Get workflows: 00:00:45. Duration: 01:17:16 (*) Time formatted in HH:MM:SS.
After the extraction of the one milion of documents, considering that the ETL is incremental, another extraction has been done and measured. Below the results of the test in the same format and representation.
PostgreSql - Second extraction(*) Get audit: 00:01:02, Get nodes: REST extraction: 00:25:18, DATA MART loading: 00:06:13, Get workflows: 00:00:15. Duration: 00:32:48 (*) Time formatted in HH:MM:SS.
Test n.2 – Extraction on MySql
The first test consists in the very first extraction into a brand new A.A.A.R. installation on a MySql database. Below the results of the test after the execution of the AAAR_Extract
command.
MySql - First extraction(*) Get audit: 00:00:12, Get nodes: REST extraction: 01:20:48, DATA MART loading: 00:29:33, Get workflows: 00:00:36. Duration: 01:52:09 (*) Time formatted in HH:MM:SS.
After the extraction of the one milion of documents, considering that the ETL is incremental, another extraction has been done and measured. Below the results of the test in the same format and representation.
MySql - Second extraction(*) Get audit: 00:00:52, Get nodes: REST extraction: 00:16:28, DATA MART loading: 00:00:47, Get workflows: 00:00:14. Duration: 00:18:21 (*) Time formatted in HH:MM:SS.
Conclusion
In this post is shared the results of some tests on the recent release of A.A.A.R. analytics v4.0. The tests are focused on the repository extraction, considering that on of the most important improvements of this major version is the re-design of the Alfresco extraction.
<< Install A.A.A.R. | Up to A.A.A.R. home | Advanced configuration >>