23 Dec

Review of the Pentaho Data Integration video course by Itamar Steinberg

pentaho data integration video courseIn this post I have the opportunity to share the review of a brand new Pentaho Data Integration video course by Itamar Steinberg. The full name of the course is mastering data integration (ETL) with pentaho kettle PDI and is available for purchasing on the Udemy website.

The video course is composed by 80 lectures and more then 10 hours of content. It is a walk-through of a real ETL project using Pentaho Data Integration (also known as Kettle), starting from the beginning of the design of the ETL with some easy steps that becomes more and more complex, layer by layer, as you go forward. This Pentaho Data Integration video course also cover some basic concepts of data integration and data warehouse techniques.

Read More

11 Dec

Uploading a mondrian schema to Pentaho using PDI

In this post is shared the solution to upload a mondrian schema to Pentaho BA Server, using the REST API through a transformation of PDI. If you take a look to this thread of the Pentaho forum, the goal seems to be a common problem so we think it could be a good idea to share the solution with the community. I hope this post will be helpful.

Development environment

The source code is developed and tested on a Windows platform and a Linux Ubuntu 14.04 LTS platform. Pentaho BA Server and Pentaho Data Integration are both in the 5.2 version.

Use case

Starting from a file containing the mondrian schema (a XML file), our goal is to develop a PDI transformation to define a Pentaho BA Server Data Source. Of course we would like to define the data source on the mondrian schema so we would like to define a so called “Analysis Data Source”.

Read More

08 Dec

CMIS queries on Alfresco CE 5.0.c


CMIS is the most important standard for ECM interoperability. Alfresco is compliance with CMIS 1.1 with Apache Chemistry and CMIS queries are one of the most powerful way to use this ECM. In this post is shared some tests on CMIS queries on the brand new Alfresco Community Edition v5.0.c.

Description of the test environment

I have already shared the test environment I prefer to test Alfresco and CMIS. This is composed by:

  • An Alfresco v5.0.c installation. In this case I use a vanilla installation with uploaded 10K documents.
  • A Pentaho Data Integration (Kettle) v5.2 installation.
  • CMIS Input plugin for PDI to develop queries on Alfresco.

To understand how the CMIS Input plugin works, take a look at the demonstration page here.

Read More

17 Feb

Advanced tutorial on Pentaho Sparkl (three more tutorials)

pentaho-logoAfter few weeks from the release of the brand new Pentaho Sparkl Application Builder, I shared a basic tutorial to start developing on that platform. From it’s release, a lot of accesses and a good interest has been around it so here there are three brand new articles on some advanced features.

I hope they will help you in some way! 😉

03 Jan

How to install Pentaho Data Integration 5 (aka Kettle)

In this tutorial we are going to see how to install Pentaho Data Integration 5. PDI 5 (called Kettle) is one of the most powerful tool of the Pentaho Suite that develop a pure (and complete) ETL tool. This tutorial is an extraction of the complete wiki section dedicated to this amazing tool.

If you have a Linux based operating system or a Windows based platform, the tutorial should work in any case because, you will see, the steps are very simple and easy and not related on the platform.


Before start to install the PDI 5 (aka Kettle) you have to check to have Java installed in you system. To check it, you have simply to execute the command below from a terminal.

java -version

If you don’t have it, below you can find a tutorial on how to install it. Please, remember that PDI 5 requests Java7.

Download and install PDI 5 Community Edition

After Java7 is available into your system, you have to download the PDI 5 package from the official website or the sourceforge web page. In our case we are going to install the Pentaho Data Integration 5 Community Edition.

Once the ‘pdi-ce-5.0.1-stable.zip’ file will be downloaded in the system, unzip it on the desktop or everywhere else you will like. All the PDI 5 tool is available in a folder described below:


Probably you cannot believe me but this is enough to install PDI 5 into your system. 😉

First run

PDI 5 tool is composed by different executables and services (Spoon, Kitchen, Pan) everyone designed for a specific purpose. To create all the configuration folders and files, you have to run the Spoon tool for the first time. Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the other Kettle tools (Pan and Kitchen). To run the spoon tool, follow the instructions described here.

After the first run you are ready to use it.. for example with the A.A.A.R. solution. 😉

03 Dec

PDI CMIS Input plugin is now compliant to Java6 and Java7

PDI CMIS Input plugin

CMIS Input plugin for Pentaho Data Integration is now compliant to Java6 and Java7. The v1.1 release, downloadable from the Pentaho Marketplace, is ready to be used with Pentaho Data Integration v4.3, v4.4 and v5.0.

Many thanks to Adrián Cadena (fugu.ec) for the contribution with the compilation of the sources and testing with Alfresco Enterprise and Community Edition.

Download CMIS Input plugin

19 Jun

News: CMIS Input plugin released in the official Pentaho marketplace

Do you want to extract informations from your ECM using a standard?

Do you want to know documents, types, metadata and everything is relevant from your Alfresco?

The CMIS Input plugin has been released in the official Pentaho Data Integration marketplace. Below the video showing the use of the marketplace from the Pentaho Data Integration (Kettle).

Pentaho marketplace

(Pentaho marketplace screenshot)

[youtube https://www.youtube.com/watch?v=w96QLFzavtY]

(Matt Caster video)