05 Dec

How to install Pentaho Data Integration 7 (aka Kettle)

Few weeks ago, close to the annual Pentaho Community Meeting, the Pentaho Team released the brand new Pentaho Suite v7 with a complete restyle of the layout (of course, this is only one of the improvements). This a good opportunity for me to update the step by step tutorial on how to install the Pentaho Data Integration (aka Kettle) after the one about the past version 5.

The environment

This tutorial is based on a Ubuntu 16.04 LTS Operating System. Nothing really changes if you use a different Operating System (Windows platforms included) because the installation task is very straightforward.

In this tutorial is used a vanilla installation of Ubuntu 16.04 LTS. The only recommendation is to execute the command below to be sure your environment is correctly updated.

sudo apt-get update
sudo apt-get upgrade

 Installing Java 8

Now that the Operating System is correctly updated, let’s install (or check) the correct installation of Java 8. I usually prefer the Oracle Java release, since a lot of time very easy to install in a Ubuntu Operating System.

To install Oracle Java 8 into your environment you can open a terminal and execute the commands below.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install oracle-java8-set-default

After the installation finishes, you can run the following command to check if everything is working correctly:

java -version

The result should looks like the following content.

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

Before moving to the PDI 7 installation, Let’s check that the JAVA_HOME variable is corretly available into the environment. Using a terminal, let’s execute the command below.

env | grep JAVA_HOME

If you get an empty result, execute nano ~/.bashrc and then append the command below to file content.

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Save it and exit (CTRL+X and Y). Please remember to execute the command below to reload the environment into the current bash shell.

source .bashrc

Installing Pentaho Data Integration 7 (aka Kettle)

After Java8 is available into your environment, please proceed to download the Pentaho Data Integration 7 (aka Kettle) package from the official website or the sourceforge web page. In our case we are going to install the Pentaho Data Integration 7 Community Edition.

Once the pdi-ce-7.0.0.0.25.zip file will be downloaded in the system, unzip it into the desktop or everywhere else you will like (usually the /opt path is suggested) . All the PDI 7 tool will be available in the folder described below:

data-integration

Before launching PDI 7 the very first time, it’s suggested to install  the package below executing the command into a terminal.

sudo apt-get install libwebkitgtk-1.0-0

Probably you cannot believe me but this is enough to install PDI 7 into your system.

Executing Pentaho Data Integration 7 (aka Kettle)

PDI 7 tool is composed by different executables and services (Spoon, Kitchen, Pan), everyone designed for a specific purpose. To create all the default configuration folders and files, you have to run the Spoon tool for the first time. Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the other Kettle tools (Pan and Kitchen). To run the spoon tool, follow the instructions described here.

After the first run, you are ready to use Pentaho Data Integration… enjoy your next ETL.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.