How to create a Kettle endpoint with Sparkl

pentaho-logoThis post is part of the Pentaho Sparkl application builder tutorial. As my habitude the tutorial is described as a “step by step” list of tasks and commands, in my opinion more useful and clear to understand how to do and make it work.

The tutorial is developed using a Linux Ubuntu 12.04 LTS operating system, but even on a different platform the description should be valid and help to understand how to reach the goal.

Prerequisites

This tutorial assumes that Pentaho Business Analytics Platform 5 is correctly installed into your system together with Sparkl. A basic Pentaho application should be created with it. If this is not your environment, please follow this tutorial.

Introduction

A Pentaho application created with Sparkl is a set of dashboards developed with CDE of the Ctools, and Pentaho Data Integration – or Kettle – endpoints developed as PDI jobs or transformations with some specific characteristics. Of course those two subsets of the application will be able to interact to develop exactly what you expect. In this tutorial we are going to see how to create your first Kettle endpoint.

Create a Kettle endpoint

First of all, access to Pentaho User Console as administrator (otherwise you won’t be able to access to Sparkl desktop) and access to Sparkl desktop.

ATTENTION: Do not access from here to the MyFisrtApplication because this link is to the content of the application and not the administration panel.

Sparkl - Menu with application2

Once the Sparkl desktop will be visible, identify your application (in our case ‘MyFirstApplication’) and access to edit it by clicking on the ‘edit’ icon. Now that you are in the administration panel of the application, click on the ‘Elements’ tab to manage the dashboards and Kettle endopints.

Sparkl - Element panel Now click on ‘Add new element’ to get some few parameters:

  • Name used to set a name for the dashboard. In our case we set to ‘myfirstendpoint’.
  • Type used to declare what you are creating (if a dashboard or a Kettle endpoint). In our case we set to ‘Kettle endpoint’.
  • Template to declare to use a Kettle job or a transformation. In our case we set to ‘Clean transformation’.
  • Admin only‘ flag used to set if the endpoint will be accessed to administrators only or not. In our case we set to ‘false’ to leave the access as public.

Click on the plus icon (that submit the request) to create the endpoint. Restart the Pentaho Business Analytics Platform using the ‘stop-pentaho’ and ‘start-pentaho’ scripts in that order (you can find them in the ‘biserver-ce’ folder). Access again to the Pentaho User Console as administrator and access to the application editing, in particular to the tab ‘Elements’ that should appear like the image below.

Sparkl - Elements2

With the icons on the right of the endpoint you can:

  • Run the endpoint (the arrow icon).
  • Delete the endpoint (the trashcan icon).

How to develop your own endpoint using Kettle

From a technical point of view, the endpoints created with Sparkl are typical Pentaho Data Integration – or Kettle – jobs or transformations you can easily edit and develop. The scope of this tutorial is not to show how to develop Kettle jobs or transformations but this is a very important milestone of this impressive tool: integrate what exists and do not re-invent the wheel.

However, about Kettle jobs or transformations, I would like to fix a specific feature. The Kettle endpoint in Sparkl is a particular Kettle job/transformation and you have to respect some simple rules. This is to say that you cannot develop (and test) your own job/transformation under Spoon and after all, deploy it under Sparkl. You have to use a different approach. The good news is that rules are very simple and easy: use an explicit end-step for your jobs and use an “OUTPUT” step as final in your transformations.

Instead of understanding those deep technical details (that I personally ignore), my suggestion is to use a practical approach to develop your own job/transformation:

  • Create the Kettle endpoint using Sparkl.
  • Open the job/transformation with Spoon.
  • Add your steps without modifying the final ones.

I have personally tested this approach and if you leave unchanged the final steps, everything will work correctly even under Sparkl management. Cool!

Below an example of Kettle endpoint (in particular a transformation) opened with Spoon before any editing or development.

Sparkl - Spoon

Some technical details of the created endpoint

The endpoints created with Sparkl are physically stored into the file system as a collection of files. We saw in a past tutorial that the whole application is stored in the folder described below (I would like to remember that ‘MyFirstApplication’ is the name of the application).

<biserver-ce>/pentaho-solutions/system/MyFirstApplication

All the endpoints of the application are stored in the folder described below (I would like to remember that ‘myfirstenpoint’ is the name of the created endpoint).

<application>/endpoints/kettle

There you can find one file for each endpoint:

  • myfirstendpoint.ktr containing an XML describing the standard Kettle transformation.
  • myfirstendpoint.kjb containing an XML describing the standard Kettle job.

Editing those files and restarting the Pentaho Business Analytics Platform, is the correct way to deploy the endpoints in the Sparkl application. 😉

Most of the times you develop a Kettle job or transformation you need to invoke several different jobs and transformations stored in the same repository (in this case: the file system). Doing that, you will need to deploy in the ‘kettle’ folder a lot of files.

This is not bad from a technical point of view, but in this way you will see a lot of “service” endpoints in you application, listed in the ‘Element’ tab of your Sparkl application management. To avoid this there is a Sparkl tip: the Kettle endpoints starting with the character ‘_’ are considered as hidden endpoints from Sparkl. This feature makes your job/transformation able to use all the sub-jobs/sub-transformations you need for the development, and Sparkl correctly organized in terms of usable dashboards and endpoints. This is a very nice feature in my opinion!

Another very powerful feature is the capability to access to the endpoints as an URL, directly in your brand new application. The URL you can use from inside Pentaho or outside in your custom application is described below (I would like to remember that ‘MyFirstApplication’ is the name of the application and ‘myfirstendpoint’ is the name of the created endpoint).

http://<server>:<port>/pentaho/plugin/MyFirstApplication/api/myfirstendpoint

Trying to invoke the URL from every client (browser, web app, ecc.) the result is always a JSON result. This is the format requested from Sparkl for an endpoint, with a specific composition and syntax for the resulting JSON. Below an example of the result when the automatic Kettle endpoint is called by a browser.

{
 "metadata":[
  {"colIndex":0,"colType":"String","colName":"type"},
  {"colIndex":1,"colType":"Number","colName":"value"}
 ],
 "resultset":[
  ["sample",0.0],
  ["transformation",1.0],
  ["as",2.0],
  ["enbdpoint",null]
 ],
 "queryInfo":{"totalRows":4}
}

Conclusion

In this tutorial we have seen how to create a first endpoint on a Sparkl application and some technical details about it. To get more tutorials on that topic, you can see the menu list in this page.

2 thoughts on “How to create a Kettle endpoint with Sparkl

  1. Nice tuto Francesco.

    But still not working 🙁

    Is it posible to edit the transactions without restarting BI? Is there any technical issue for doing the same adding a mongoDb connection?

    Thanks in advance

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.