Definitive Guide to Becoming a Data Scientist

book_definitive_data_scientist_voulgarisAuthor: Zacharias Voulgaris, PHD

Publisher: Technics Publications, LLC; First edition (2014)

Book review by: David Haertzen

As someone focused on improving my data science skills and understanding of data science concepts, I was grateful to have an opportunity to read and review this informative book. The experience of Dr. Voulgaris is inspiring for those of us who want to expand skills in the data science field.

First, the book is divided into 18 chapters and five parts. Data science topics and data science careers are introduced:

  • Data Science - an approach to dealing with the modern challenges of big data and data analysis
  • Big Data - a type of data that cannot be readily managed with traditional relational database software due to its high volume, rapid velocity and extreme variety of formats (structured, unstructured and semi-structured)
  • Data Scientist Careers - a categorization data scientists into the roles of data developers, data researchers, data creatives, data business people and mixed role
  • Data Scientist Thinking - a mindset that makes for an effective data scientist such as: curiosity, experimentation, planning, research, attention to detail and rapid learning
  • Technical Qualifications - a set of data science specific skills that includes computer programming, quantitative tool use, database access, data visualization and big data manipulation.

Second, the book explains some of the critical skills needed by the data scientist. The data scientist is experienced with multiple development platforms which may include Python, R, SAS, Matlab, Java and C++. In addition, big data analytics platforms like Hadoop and Spark belong in the data scientist's toolkit. Do not forget Microsoft Excel which is often used and has been extended with data plug-ins such as Power Query and Power Pivot. The book drills into machine learning and the R platform.

Third, the author explains the process of data science by showing the steps needed in a typical data science effort. In this topic you are introduced to: data preparation, data exploration, data representation, data discovery, data learning, data visualization and creating data products. I suggest that readers add to this approach by studying the CRISP-DM data mining methodology which begins with defining the business opportunity and problem to be solved.

Fourth, the book helps the reader to prepare for transition to jobs in the data science field. It drills into specific skills required and shows how to acquire those skills. Then it explains how to find data science job opportunities. It also provides job interview recommendations. Freelance data science is presented as an optional career path. This advice is very helpful to the up and coming data scientist.

Finally, the book provides extensive appendices. The glossary explains data science terminology while lists of useful websites and data science articles enable the reader to continue learning. Throughout the book, the reader is provided advice about learning data science such as downloading software and finding sample problems.

I give high marks to this book, especially the helpful recommendations for expanding data science skills. The book offers a path toward learning data science and advancing data science careers. The reader will benefit by following up and implementing the recommendations for further study and experience. The Definitive Guide to Becoming a Data Scientist by Dr. Zacharias Voulgaris will be a valuable addition.