28 Jul

Building Hadoop clusters review


If you are interested in Hadoop technology probably this is an interesting video course you should evaluate. As you probably know, Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. All the modules in Hadoop are designed with the assumption that hardware failures are common and thus should be automatically handled in software by the framework.

Talking about the video course, we can divide the content in three main macro-sections:
1. how to create and set up a three machines cluster using Amazon EC2,
2. how to install an Hadoop cluster using Apache Ambari,
3. how to start using Hadoop cluster, in particular with Apache Hadoop User Interface (HUE).

The description of all the topics is clear and well done (Sean Mikha, the author, did a good job). All the relevant topics are always detailed before with an explanation of the logic structure and approach and only after with a demostration on how to do it in practice.

Useful also for other purposes, the creation of the virtual machines on Amazon EC2. The practical description and the step by step creation, is not limited to the server’s creation but is detailed also in what concerns the security and connection using, for example, putty ssh client.

apache hadoopIn my opinion the most relevant value of this video course is on the hidden details of the Hadoop cluster installation process. As you will see if you will decide to follow it, the tasks are quite easy to do (probably this a Sean’s merit) but the configuration details and settings are very important if you want to make it work in practice. Following the hints I’m sure every neophyte will gain days of work and lot of nights in googling. 😉

Enjoy your Hadoop Cluster video course… as usual by Packt Publishing.

23 Jul

Solr doesn’t return more than 1,000 objects in Alfresco.

lucene_apacheOnce upon a time Alfresco used Apache Lucene as search engine….

This was great until you had particular needs like, for example, a long duration query or a query that retrieves a huge amount of objects. It was more than a year ago when I wrote a post talking how Alfresco retrieves 1,000 results maximum or query for a couple of minutes.

solrAs you can read in the post, the most suggested solution to the problem was to migrate the indexing engine to Apache Solr. At that time, Alfresco supported both the engines and considered Solr as its future.

Today Lucene and Solr are always supported and Solr is probably the most used, but regarding the same issue, probably something is coming back again.

>> https://issues.alfresco.com/jira/browse/ALF-20567(*) <<

As you can read from the JIRA issue, in Alfresco 4.2.e SOLR also returns a maximum of 1,000 results and to solve the issue is suggested to set the parameters below in the alfresco-global.properties file.


This could have a high impact on “big” queries or “long” queries so I would like to share this information with all of you to prevent problems or nights spent on the debugger. 😉

I hope this will help you.

Francesco Corti

(*) Thanks to Francesco Fornasari and Christian Tiralosi for the hint.

17 Jul

Yet another Alfresco Community upgrade tutorial: from 4.0.d to 4.2.f.

The task to upgrade Alfresco (Community or Enterprise) from a version to another more recent, has to follow a clear and precise path. In every case it is always a critical task and in some cases could be a serious problem for the Organizations (of course this is more critical for Community Editions). In some cases the only possible solution is an Alfresco-to-Alfresco migration instead of an upgrade… but this is another scenario.

In this tutorial is described a step-by-step approach to an upgrade from an Alfresco Community Edition v4.0.d to v4.2.f in one only upgrade step. In every case, even if the involved versions are different, the approach is always the same discussed here.

Read More

04 Jul

Alfresco roadmap for the next 12 months

roadmapAfter some requests from some users, the new Alfresco roadmap has been released in the official wiki. This roadmap doesn’t seems to be like the others of the past.

I read that the amount of topics are less than the past. By the way, each topic seems to be more detailed and “complete” (in the past most of the items were less specific than this). Comparing with the past roadmaps I can read a lot of “Enterprise only” in some important new features.

Have your own opinion reading the complete roadmap below.