In this review I would like to share another interesting read about Hadoop from Packt Publishing. It is easily understandable that this time the focus is on a interesting and non common topic: Hadoop backup and recovery strategies.
During the read I liked the description the authors give in the extract below.
“…There is a common misconception that Hadoop protects data loss; therefore, we don’t need to back up the data in the Hadoop cluster. Since Hadoop replicates data three times by default, this sounds like a safe statement; however, it is not 100 percent safe. … Data loss may occur due to various reasons, such as Hadoop being highly susceptible to human errors, corrupted data writes, accidental deletions, rack failures, and many such instances. Any of these reasons are likely to cause data loss.”
Summarizing in one concept, this is probably the final reason why you should read this book. 😉
Coming to the content, we can identify three main parts of the book:
- Rationalize and discuss the backup strategy (chapters 2 and 3).
- How to develop it in practice (chapters 4, 5 and 6).
- Various but not secondary topics, about monitoring and troubleshooting (chapters 7 nd 8).
Before all the chapters described above, an introductionary chapter describes the basics of the hadoop technology (chapter 1). It is easily understandable that this first chapter should not be considered as a detailed description of the technology. In few words, my suggestion is not to read this book as a first book on hadoop.
The importance of the first part about the the backup strategy (chapters 2 and 3), is definitely interesting. In those two chapters you will find some interesting questions (and answers) about the planning (and reasons) about a backup/recovery strategy. Most of the concept are common with other areas (like for instances the databases or file systems) but more oriented and focused on the technology (of course) and the huge amount of data the Hadoop system is going to store.
As a final consideration, I would like to share that this book is for sure an advanced read about the hadoop technology, but should be evaluated if you plan to use Hadoop in practice, especially in a enterprise environment.