In this post I would like to share my personal review about the recent book from Packt Publishing called Scaling Big Data with Hadoop and Solr (Second Edition) by Hrishikesh Vijay Karambelkar.
The goal of the book is quite clear in the title too: describe in practice how Apache Hadoop and Apache Solr, help organizations to resolve the problem of information extraction from big data. Don’t you think this is a very interesting problem to face? I think so.
In my personal opinion the table of content is definitely a good starting point to describe how the topic is treated.
Chapter 1: Processing Big Data Using Hadoop and MapReduce
Chapter 2: Understanding Apache Solr
Before the solution, the author describes the main components (Apache Hadoop and Apache Solr). If you are a beginner, those chapters are probably not enough to introduce to you the solutions, but if you are an intermediate or an expert, those are two introductory chapters to define the basis. The description is very practical with java examples and source codes (in the perfect style of most of the Packt Publishing books).
Chapter 3: Enabling Distributed Search using Apache Solr
We are talking about Big Data so the Apache Solr engine has to scale in performance and quantity. Because of the Apache Hadoop, Apache Solr engine should be able to work with a distributed environment. This chapter covers this topic with an interesting description: the involvement of MongoDB as NoSql database to accommodate data with varying data models. I personally find this solution very attractive, isn’t it?
Chapter 4: Big Data Search Using Hadoop and Its Ecosystem
Chapter 5: Scaling Search Performance
Now that is described how to define the right architecture for the goal, it’s time to understand how to search the informations. Searching capabilities are described in terms of architecture, commands and practical examples. very interesting also the brief description of the advanced analytics with Solr and R. In the final fifth chapter, the author wants to look for optimizing the running of the big data search instance.
I enjoyed reading this book especially because this is one of my personal goals in content retrieval (especially on semi-structured data) together with Big Data. Of course this is not a book for dummies and before, I suggest to neophytes to read the large amount of introductory books on the described topics.