A Note about Tools

There are many software tools designed to facilitate data mining and analytics. However, these are often expensive and complicated to install, configure, administer, and use. Simply put, they're not a good fit for learning the basics of data mining. This book uses OpenOffice Calc, a freely available spreadsheet similar to Microsoft Excel, in conjunction with RapidMiner and the R statistical package. The inclusion of these tools is intended to help you see and experience a broad set of software options that can help you analyze data. It is important to keep in mind that we do not attempt to teach all of the capabilities of any of these software packages. They're just good, widely available, and freely accessible tools that you can use to get started in data mining.

All examples used in this book will be illustrated in a Microsoft Windows environment. With some slight variation in user interface, you can complete all examples and exercises by running RapidMiner and R on Macintosh or Linux systems as well. It is recommended that you download and install the relevant software packages on your computer now, so that you can work along with the examples in the book if you would like. RapidMiner can be downloaded from the RapidMiner website, R can be downloaded from the R Project website, and OpenOffice can be downloaded from the Apache OpenOffice website. See the videos below for a short tutorial on how to download and set up RapidMiner and R.

As with all software, versions change over time. Such changes may impact the consistency of this book's content with your experience. The most common observation readers have reported is that their results are slightly different from the screenshots in the book. Most of the time, this is simply because the algorithms implemented in the software have been tuned or improved since the book went to press. Generally, what you see in the book will match what you will see on your computer if you complete all of the steps consistent with the text.

The development team at RapidMiner is constantly working to improve their data science platform. Here's a chance to chat with the folks at RapidMiner and to influence the design of their data science products.