Dr. DJ Patil - Former First US Chief Data Scientist

As technology increasingly enables greater and more granular collections of data; that data, in turn, is changing the world. How? Watch this presentation below by Dr. DJ Patil.

It's very difficult to watch that video and not feel inspired and excited to learn about how to leverage the power of data. This book is going to walk you through broad examples of data science techniques and practices beginning with a conceptual background of the technology stack and "best practices" behind data analyses.

The core of the data science discipline comes from applied mathmatics, statistics, and computer science. However, data science is very broad. I would divide all "data scientists" into two roles that are not mutually exclusive:

  1. The data science theorist. This person typically comes from one of those reference disciplines (mathematics, statistics, or computer science). This person understands (and improves) the formulas behind the analyses themselves and will push the boundaries of new models and techniques to develop data science theory.

  2. The applied data scientist. This person typically comes from a reference discipline. This means they want to use the analyses developed by data science theorists to solve real world problems that are relevant to their discipline. Rather than create new data science theory, this person draws from existing theory--often in unexpected and unique ways--to explain "why" and "how" businesses succeed. This includes identifying the best measures/variables needed for that explanation. This person can come from ANY discipline because data analytics are relevant to nearly everything in today's world. For example, sociology might use it to explain and predict relationship patterns. Engineering could use it to create artificial intelligence. Business (our context) may use it to make an organization more efficient and effective. However, the applied data scientists often don't need to understand the mathematics and programming to the same degree that the data science theorist does. The applied data scientist needs to understand the rules, boundaries, and tradeoffs between various types of analyses. But their true strength and "value-add" is that they understand the real-world problems where data analytics need to be applied. They also understand enough of the data analytics to validly apply data solutions and data products to solve those problems.

    The role of applied data scientist has evolved more recently as the technolgies emerge that make it possible to build data products without needing to be a core data scientist.

The purpose of this book is to help you begin to be an applied data scientist. It will teach you to use "drag and drop" tools to understand data and make predictions. You'll notice that there is almost no math in this book whatsoever. And the only coding available in this book is found in a few "supplemental" chapters that are optional depending on what level of "technicality" you want. So, this book is a great place to begin the process of becoming an applied data scientist.

However, if your goal is to become a data science theorist, then this book is also the right place to begin. I've learned from my years of teaching that it's often easier to understand what's inside of the "black box" (data science theory in this case) if you first understand how the black box can and should be used to solve practical problems. So if you decide that you want to go deeper into the theory, this book with give you a nice broad overview of the many contexts of use for data science. But you should follow it up with a deeper curriulum in mathematics or statistics. Linear algebra (from mathematics) or regression (from statistics) would be a good "next step" after this course to get deeper into the theory.

The embedded activity (g1c2ea90694e36001x2) could not be inserted.