Machine Learning & Big Data Blog

5 Leading Books to Read on Big Data & Data Science

3 minute read
Lillian Pierson

If you’ve been following this 6-part series on big data, then you have learned quite a bit about what big data is, how it can benefit your business, and the things to look for when hiring big data talent. For the last and final post in this series, I want to leave you with some recommendations for additional materials that will help continue to guide you on your big data journey.

(This article is part of our Tech Books & Talks Guide. Use the right-hand menu to navigate.)

Big Data: A Revolution that will Transform how we Live, Work and Think

By: Viktor Mayer-Schonberger and Kenneth Cukier


Big Data: A Revolution provides a broad overview of big data and the impact that it’s making on modern society. While it’s by no means a technical book, it does provide a good high-level introduction to what big data is and how it’s affecting practices in areas as diverse as fraud detection and international law enforcement, to linguistics and automated language translation. Well-suited for business managers and analysts, or maybe even C-level executives, Big Data: A Revolution provides insight and guidance on how industries should move forward in the wake of today’s information revolution.

One of the book’s central premises is the notion of “why, not what”. For example, the book states, “the era of big data challenges the way we live and interact with the world. Most strikingly, society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what.” Later in the book, authors reemphasize this point when they write, “Big data is about what, not why. We don’t always need to know the cause of a phenomenon, rather, we can let the data speak for itself,” emphasizing that “knowing what, not why is good enough.” Excerpts from – Big Data: A Revolution that will Transform how we Live, Work and Think

Hadoop: The Definitive Guide, 4th Edition

By: Tom White


Hadoop: The Definitive Guide is a big data book that’s targeted at technical audiences. The book was originally published in 2009, and is currently sold as a 4th edition update. Praised by developers and data engineers the world-over, Hadoop: The Definitive Guide provides how-to’s on building and maintaining distributed, parallel processing data systems with Apache Hadoop (HDFS, MapReduce, and YARN). The 4th Edition update even goes into details on Hadoop 2 deployment, including technical details that you should know about YARN, HBase, Parquet, Flume, Crunch, Pig, Hive, and Spark. The book also presents interesting case studies from the healthcare industries and from genomic sciences.

Data Smart: Using Data Science to Transform Information into Insight

By: John Foreman


Business professionals love Data Smart, it’s as simple as that! Written especially for data science newbies, Data Smart provides a really easy way for readers to grasp the concepts and techniques that underlie data science. Furthermore, the book provides step-by-step tutorials on how to execute these techniques in simple Excel software. Some data science methods covered in Data Smart include:

  • Cluster analysis (including k-means and k-medians methods)
  • Linear programming for document classification
  • Various forms of linear regression analysis
  • Time series forecasting

Although this book won’t teach you everything you need to know in order to start deploying large-scale analytics projects, it will help you learn the basic ABCs of data science and some of the methods comprising it.

Pattern Recognition and Machine Learning

By: Christopher Bishop


Pattern Recognition and Machine Learning is another great book about data science. In contrast to Data Smart, however, Pattern Recognition was written to satisfy the interests and technical capacities of already advanced information scientists and statisticians. The book introduces inferential approximation algorithms that are useful in generating fast answers from questions asked of big data sets. Although the book requires no prerequisite knowledge of pattern recognition or machine learning, it does specify that readers should be skillful in calculus and the basics of probability and linear algebra. Engineers and statisticians have praised the book for its readability and comprehensiveness, although critics have voiced frustration with its non-intuitive math-heavy approach.

Automate workflows to simplify your big data lifecycle

In this e-book, you’ll learn how you can automate your entire big data lifecycle from end to end—and cloud to cloud—to deliver insights more quickly, easily, and reliably.

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing

BMC Bring the A-Game

From core to cloud to edge, BMC delivers the software and services that enable nearly 10,000 global customers, including 84% of the Forbes Global 100, to thrive in their ongoing evolution to an Autonomous Digital Enterprise.
Learn more about BMC ›

About the author

Lillian Pierson

Lillian Pierson, P.E. is a leading expert in the field of big data and data science. She equips working professionals and students with the data skills they need to stay competitive in today's data driven economy. She is the author of three highly referenced technical books by Wiley & Sons Publishers: Data Science for Dummies (2015), Big Data / Hadoop for Dummies (Dell Special Edition, 2015), and Big Data Automation for Dummies (BMC Special Edition, 2016). Lillian has spent the last decade training and consulting for large technical organizations in the private sector, such as IBM, Dell, and Intel, as well as government organizations, from the U.S. Navy down to the local government level As the Founder of Data-Mania LLC, Lillian offers online and face-to-face training courses as well as workshops, and other educational materials in the area of big data, data science, and data analytics.