03 Jan

How to install Pentaho Data Integration 5 (aka Kettle)

In this tutorial we are going to see how to install Pentaho Data Integration 5. PDI 5 (called Kettle) is one of the most powerful tool of the Pentaho Suite that develop a pure (and complete) ETL tool. This tutorial is an extraction of the complete wiki section dedicated to this amazing tool.

If you have a Linux based operating system or a Windows based platform, the tutorial should work in any case because, you will see, the steps are very simple and easy and not related on the platform.


Before start to install the PDI 5 (aka Kettle) you have to check to have Java installed in you system. To check it, you have simply to execute the command below from a terminal.

java -version

If you don’t have it, below you can find a tutorial on how to install it. Please, remember that PDI 5 requests Java7.

Download and install PDI 5 Community Edition

After Java7 is available into your system, you have to download the PDI 5 package from the official website or the sourceforge web page. In our case we are going to install the Pentaho Data Integration 5 Community Edition.

Once the ‘pdi-ce-5.0.1-stable.zip’ file will be downloaded in the system, unzip it on the desktop or everywhere else you will like. All the PDI 5 tool is available in a folder described below:


Probably you cannot believe me but this is enough to install PDI 5 into your system. 😉

First run

PDI 5 tool is composed by different executables and services (Spoon, Kitchen, Pan) everyone designed for a specific purpose. To create all the configuration folders and files, you have to run the Spoon tool for the first time. Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the other Kettle tools (Pan and Kitchen). To run the spoon tool, follow the instructions described here.

After the first run you are ready to use it.. for example with the A.A.A.R. solution. 😉

9 thoughts on “How to install Pentaho Data Integration 5 (aka Kettle)

  1. Avatar

    Finding a missmatch in the pentaho install it referes to a directory Kettle that does not seem to exist. So I tried to invoke ./spoon.sh
    this fails with a # Problematic frame:
    # C [ld-linux-x86-64.so.2+0xe02c]

    I expect it wants some parameter but that’s not real clear based on the documents. I was hoping this could work to give me some Alfresco reporting in share.

  2. Avatar

    Hi Francesco, i’m trying to install pentaho in a ubuntu server 14.04, my problem is that in the server is allready instaled a tomcat server with geoserver service, when I start pentaho that shut down geoserver. Do you have any idea how can I have the to programs running?

  3. Avatar

    I tried to connect with DI Repository with the given details in installation-summary.text file but it gives me
    error-Repository Url is not correct.

    Is there any other prerequisite of pdi including jre 1.5?

    • Francesco Corti

      Hi Avinash,
      The DI Repository is a data base defined to store transformations and jobs.
      I suppose you are trying to do the connection in the wrong way.
      The installation does not depend on the DI Repository (you can use DI also without a repository).
      I hope this helps you.

    • Francesco Corti

      Hi Phillip,

      No, Pentaho BA Server does not require Data Integration and it does not require to point to it.
      To be more precise, Pentaho BA Server has it bundled by default, but as a sort of “Internal engine”.
      I hope this will help you.

  4. Avatar

    hi Francesco,

    I have added the AAAR dashlet to the alfresco user dashboard, but whenever I am trying to access the links from that dashlet it gives me HTTP statu-404 error. The description for that error is the requested resource is not available.

    I have started the Pentaho server but I am not able to start the alfresco services at the same time.

    can you help me with this?


Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.