Building Hadoop clusters review


If you are interested in Hadoop technology probably this is an interesting video course you should evaluate. As you probably know, Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. All the modules in Hadoop are designed with the assumption that hardware failures are common and thus should be automatically handled in software by the framework.

Talking about the video course, we can divide the content in three main macro-sections:
1. how to create and set up a three machines cluster using Amazon EC2,
2. how to install an Hadoop cluster using Apache Ambari,
3. how to start using Hadoop cluster, in particular with Apache Hadoop User Interface (HUE).

The description of all the topics is clear and well done (Sean Mikha, the author, did a good job). All the relevant topics are always detailed before with an explanation of the logic structure and approach and only after with a demostration on how to do it in practice.

Useful also for other purposes, the creation of the virtual machines on Amazon EC2. The practical description and the step by step creation, is not limited to the server’s creation but is detailed also in what concerns the security and connection using, for example, putty ssh client.

apache hadoopIn my opinion the most relevant value of this video course is on the hidden details of the Hadoop cluster installation process. As you will see if you will decide to follow it, the tasks are quite easy to do (probably this a Sean’s merit) but the configuration details and settings are very important if you want to make it work in practice. Following the hints I’m sure every neophyte will gain days of work and lot of nights in googling. ;-)

Enjoy your Hadoop Cluster video course… as usual by Packt Publishing.

Francesco Corti

Solr doesn’t return more than 1,000 objects in Alfresco.

lucene_apacheOnce upon a time Alfresco used Apache Lucene as search engine….

This was great until you had particular needs like, for example, a long duration query or a query that retrieves a huge amount of objects. It was more than a year ago when I wrote a post talking how Alfresco retrieves 1,000 results maximum or query for a couple of minutes.

solrAs you can read in the post, the most suggested solution to the problem was to migrate the indexing engine to Apache Solr. At that time, Alfresco supported both the engines and considered Solr as its future.

Today Lucene and Solr are always supported and Solr is probably the most used, but regarding the same issue, probably something is coming back again.

>>*) <<

As you can read from the JIRA issue, in Alfresco 4.2.e SOLR also returns a maximum of 1,000 results and to solve the issue is suggested to set the parameters below in the file.


This could have a high impact on “big” queries or “long” queries so I would like to share this information with all of you to prevent problems or nights spent on the debugger. ;-)

I hope this will help you.

Francesco Corti

(*) Thanks to Francesco Fornasari and Christian Tiralosi for the hint.

Yet another Alfresco Community upgrade tutorial: from 4.0.d to 4.2.f.

The task to upgrade Alfresco (Community or Enterprise) from a version to another more recent, has to follow a clear and precise path. In every case it is always a critical task and in some cases could be a serious problem for the Organizations (of course this is more critical for Community Editions). In some cases the only possible solution is an Alfresco-to-Alfresco migration instead of an upgrade… but this is another scenario.

In this tutorial is described a step-by-step approach to an upgrade from an Alfresco Community Edition v4.0.d to v4.2.f in one only upgrade step. In every case, even if the involved versions are different, the approach is always the same discussed here.

Needless to say: I am not responsible for any damage that may happen after following the given instructions, which hopefully will not happen.

The (only) correct approach

Before starting I would like to share the (only) correct approach: please remember that the upgrade process for the Alfresco Community Editions is tested (and not guarantee) for the closest versions (for Alfresco Enterprise you can take a look here). This means that the only path you can follow to upgrade a very old version to a recent one is always to develop multiple upgrades.

For example, if you come from the v4.0.d and want to go to the recent v5.0.a, it’s only written in the stars if the direct upgrade will work. The most verified approach is to develop the upgrade process with the steps described below:
– Upgrade from v4.0.d to v4.0.e,
– Upgrade from v4.0.e to v4.2.a,
– Upgrade from v4.2.a to v4.2.b,
– Upgrade from v4.2.b to v4.2.c,
– Upgrade from v4.2.c to v4.2.d,
– Upgrade from v4.2.d to v4.2.e,
– Upgrade from v4.2.e to v4.2.f,
– Upgrade from v4.2.f to v5.0.a.

You can take your own risks “jumping” some steps, and in some cases it would work, but nothing is garantee in every case. In this tutorial I decided to take a reasonable risk, often discussed in the forums and tutorials, and “jump” with a single upgrade process.

Preparing the upgrade

To develop the upgrade I need the Alfresco backup of my v4.0.e production installation. If you don’t know what is an Alfresco backup and how to obtain it, I strongly recommend to take a look here.

In this tutorial I choose to define a brand new server with the recent Alfresco installation (in our case the v4.2.f) but you could choose to use the same server. Of course, in this case, the task is even more critical and the steps are the same but developed in different folders from the “old” version of Alfresco.

The new Alfresco installation

As introduced before, in this tutorial I work in a vanilla server with Ubuntu 14.04 LTS on board. In the server is installed Oracle Java v1.7.60u, always installed as described here.

To install Alfresco you can follow this tutorial even if it describes one specific version (the installation steps don’t change too much). Alternatively you can choose to install it using the easier wizard. In every case you will install the target version of Alfresco, in our case: Alfresco Community v4.2.f.

For the purpose of the post, the way you use to install Alfresco is not relevant but remember that it will be your brand new server, so it’s always suggested to have the most robust and stable one. ;-)

If you have some customizations (custom models, behaviors, actions or something else) not it’s time to install them in the new server. The task is always the same: stop alfresco, deploy the customizzations in the way you always do (AMP, maven, manually) and start Alfresco again.

As final step, it is always suggested to switch off the indexing. In our case we suppose to use Solr but with Lucene it will be the same. To develop the task, please follow the steps below:

cd <alfresco>
./ stop
nano tomcat/shared/classes/

 #solr.port.ssl=8443 (comment it)

Save and exit.

Database restore

Now it’s time to restore the alfresco database from the backup. To do it, please be sure that PostgreSql (or the database you use) is running. If you installed the Alfresco with the wizard you can use the command below.

./ start postgresql

To delete the current Alfresco’s database use the commands below.

cd <postgresql>/bin
./psql -h localhost -U postgres -d postgres

  DROP DATABASE alfresco;
  CREATE DATABASE alfresco WITH owner = alfresco;

To restore the database dump you can use:

./pg_restore -h localhost -U postgres -d alfresco <file.dump>

Filesystem restore

Once the database is restored you have to restore the documents on the file system from the backup.

cd <alfresco>/alf_data
rm -rf contentstore
rm -rf contentstore.deleted

Now it’s time to copy the ‘contentstore’ and ‘contentstore.deleted’ folders form the backup, directly in the ‘alf_data’.

Can’t you see the indexes are not restored? If possible it’s always preferrable to rebuild the indexes from scratch. In the other cases we suggest to restore them from the backup, hoping nothing changed in the structure. :-)

Alfresco bootstrap

Now everything is ready to start alfresco again.

cd <alfresco>
./ start
tailf tomcat/logs/catalina.out

You will see that the starting process is updating the database and everything is necessary to upgrade the system. Errors or problems will be listed here…

Indexes rebuild

As you read before, the Alfresco update has been without the indexes.
Now it’s time to rebuild them following what you read here.

./ stop
nano <alfresco>/tomcat/shared/classes/


cd <alfresco>/alf_data/solr
rm -rf workspace/SpacesStore/*
rm -rf archive/SpacesStore/*
rm -rf workspace-SpacesStore/alfrescoModels/*
rm -rf archive-SpacesStore/alfrescoModels/*
cd <alfresco>
./ start

Enjoy your brand new Alfresco installation…

Francesco Corti

Alfresco roadmap for the next 12 months

roadmapAfter some requests from some users, the new Alfresco roadmap has been released in the official wiki. This roadmap doesn’t seems to be like the others of the past.

I read that the amount of topics are less than the past. By the way, each topic seems to be more detailed and “complete” (in the past most of the items were less specific than this). Comparing with the past roadmaps I can read a lot of “Enterprise only” in some important new features.

Have your own opinion reading the complete roadmap below.

Francesco Corti

Review of the Alfresco CMIS book by Martin Bergljung

Alfresco CMISAs you probably know (or you read it now for the first time) CMIS is an open standard that allows different ECMs to inter-operate over the Internet through the definition of a collection of services and a powerful query language (CMIS-QL), modeled along a subset of SQL.

The goal of this book is to share and explain all the basics of the CMIS, using a practical and technical approach that starts from the history (why the CMIS was born), going through the definition of the (several) services and the query language, and ending with a collection of examples describing how to use CMIS in practice.

CMISOk, CMIS is thought to make different ECMs interoperate, but the amount of different languages and examples described in this book is interesting and well done. Starting from Java (with Apache Chemistry libraries), Javascript + JQuery, Groovy and (bascis of) PHP. Yes, I’m agree with you if you are thinking that the CMIS libraries are more and more than this but the description (and explanation) of the CMIS services (and examples) is all you need to understand how to approach the development using all the different languages supported (.NET, Python, ecc.).

As you can read from the title, Martin Bergljung focuses his description on Alfresco. And this is true because all the examples are developed using an Alfresco repository as referred architecture. But inside the book you can find something more about Alfresco. Personally I have found very interesting the description of the Alfresco Surf together with CMIS standard. Probably this topic is less useful for the most part of the readers (and practical cases) but is an interesting example related to the basics of the Alfresco Share application. Quite interesting also the example on how to make Alfresco and Drupal interact, using CMIS.

packt-publishingLast but not least, I read the book very easily in the first part (the more descriptive) and in the last (full of practical examples in the different languages). I think I will use the book also as manual of the several CMIS services when I will develop something because I suggest you to remember that…

Standard is good!


Francesco Corti

Alfresco Hack-a-thon 2014 – Brussels

Last 16-th of May has been the first Alfresco Global Virtual Hack-a-thon day. One of the physical locations was Brussels, more precisely the CIRB-CIBG (here the post about the event).

I was there with the AAAR project and, as usual, lots of “old” and brand “new” friends. Thank you to Boriss Mejiass, Lanre Abiwon (DarkStar1), Cristina Martín Ruiz, Ole Hejlskov and all the other attendees.

Below a short video about the nice time together.

Francesco Corti

Win your free copy of the Pentaho Reporting video course

pentaho reporting video courseHold a chance to win free download link of the Pentaho Reporting video course, just by commenting this post!


For the contest we have 2 download copies of Pentaho Reporting video course, to be given away to 2 lucky winners.

How you can win:

To win your copy of this video course, all you need to do is come up with a comment below highlighting the reason “why you would like to win this video course”.

Duration of the contest & selection of winners:

The contest is valid for 2 weeks, until the 27-th of May, and is open to everyone. Winners will be selected on the basis of their comment posted, from the author… yes, it’s me! :-)

Packt Publishing videoMany thanks to Pack Publishing for the opportunity!

About the video course:

If you are a Java developer or IT professional who wants to assemble custom reporting solutions with Pentaho Reporting, this video course is ideal for you. Master the advanced concepts within Pentaho Reporting such as sub-reports, cross-tabs, data source configuration, and metadata-based reporting.

 A practical video guide, which dives directly into report generation using various techniques, offering you all of the tips and tricks needed to understand Pentaho Reporting. Learn how to create, modify, implement code, and publish professional reports that will boost your business enterprise to a completely new level.


So, don’t be shy: leave a comment here below!

Francesco Corti