07 Mar

Solr probe to check the indexes update frequence and live longer

Are you sure that the Solr Index Server of your production environment is updating your content?

As a consequence, are you sure that your Alfresco search indexes are correctly (and regularly) updated?

From a theoretical point of view, the answer is ‘yes’, but in a real life scenario, something could goes wrong and a message in our email box could arrive from your users, saying:

Angry_faceHei fellow, the Alfresco installation that you are maintaining cannot find the documents I have uploaded a lot of time ago: this is not acceptable!

In this post is described a project that has, as its main goal, to avoid this awful issue. The project is called Solr probe (solr-probe) ad it is available on GitHub, for Solr versions greater than 1.4 (the one used from a lot of Alfresco versions).

Read More

17 Jun

Hadoop Essentials – Review

In this post I would like to share my personal review about the recent book from Packt Publishing called Hadoop Essential by Shiva Achari.

Do you really know what Apache Hadoop is?
Are you sure to understand the meaning of “big data” in the real world scenario?
How big data storage issue and data warehouse issue meet Hadoop implementation?
Which are the main tools Apache Hadoop is based on?

If you completely don’t know (but you are interested in) or you want to have a clear and final picture of those topics (and probably much more) you should read this book. Read More

28 Jul

Building Hadoop clusters review

Building_Hadoop_Clusters

If you are interested in Hadoop technology probably this is an interesting video course you should evaluate. As you probably know, Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. All the modules in Hadoop are designed with the assumption that hardware failures are common and thus should be automatically handled in software by the framework.

Talking about the video course, we can divide the content in three main macro-sections:
1. how to create and set up a three machines cluster using Amazon EC2,
2. how to install an Hadoop cluster using Apache Ambari,
3. how to start using Hadoop cluster, in particular with Apache Hadoop User Interface (HUE).

The description of all the topics is clear and well done (Sean Mikha, the author, did a good job). All the relevant topics are always detailed before with an explanation of the logic structure and approach and only after with a demostration on how to do it in practice.

Useful also for other purposes, the creation of the virtual machines on Amazon EC2. The practical description and the step by step creation, is not limited to the server’s creation but is detailed also in what concerns the security and connection using, for example, putty ssh client.

apache hadoopIn my opinion the most relevant value of this video course is on the hidden details of the Hadoop cluster installation process. As you will see if you will decide to follow it, the tasks are quite easy to do (probably this a Sean’s merit) but the configuration details and settings are very important if you want to make it work in practice. Following the hints I’m sure every neophyte will gain days of work and lot of nights in googling. 😉

Enjoy your Hadoop Cluster video course… as usual by Packt Publishing.

23 Jul

Solr doesn’t return more than 1,000 objects in Alfresco.

lucene_apacheOnce upon a time Alfresco used Apache Lucene as search engine….

This was great until you had particular needs like, for example, a long duration query or a query that retrieves a huge amount of objects. It was more than a year ago when I wrote a post talking how Alfresco retrieves 1,000 results maximum or query for a couple of minutes.

solrAs you can read in the post, the most suggested solution to the problem was to migrate the indexing engine to Apache Solr. At that time, Alfresco supported both the engines and considered Solr as its future.

Today Lucene and Solr are always supported and Solr is probably the most used, but regarding the same issue, probably something is coming back again.

>> https://issues.alfresco.com/jira/browse/ALF-20567(*) <<

As you can read from the JIRA issue, in Alfresco 4.2.e SOLR also returns a maximum of 1,000 results and to solve the issue is suggested to set the parameters below in the alfresco-global.properties file.

solr.query.maximumResultsFromUnlimitedQuery=10000
system.acl.maxPermissionChecks=10000

This could have a high impact on “big” queries or “long” queries so I would like to share this information with all of you to prevent problems or nights spent on the debugger. 😉

I hope this will help you.

Francesco Corti

(*) Thanks to Francesco Fornasari and Christian Tiralosi for the hint.