Context and Perspective

Consider some of the activities you've been involved with in the past week or so. Have you purchased groceries or gasoline? Withdrawn cash at an ATM? Attended a concert, movie, or other public event? Perhaps you went out to eat at a restaurant, stopped by your local post office to mail a package, made a purchase online, or placed a phone call to a utility company. Every day, our lives are filled with interactions—encounters with companies, other individuals, the government, and various other organizations.

In today's technology-driven society, many of those encounters involve the transfer of information electronically. That information is recorded and passed across networks in order to complete financial transactions, reassign ownership or responsibility, and enable delivery of goods and services. Think about the amount of data collected each time even one of these activities occurs.

Take the grocery store, for example. If you take items off the shelf, those items will have to be replenished for future shoppers—perhaps even for yourself. After all, you'll need to make similar purchases again when that case of cereal runs out in a few weeks or when the bottle of milk runs dry. The grocery store must constantly replenish its supply of inventory, keeping the items people want to buy in stock while maintaining freshness in the products it sells. It makes sense that large databases are running behind the scenes, recording data about what people buy and how much of it as shoppers check out and pay for their groceries. All of that data must be recorded and then reported to someone whose job it is to reorder items for the store's inventory.

However, in the world of data mining, simply keeping inventory up to date is only the beginning. Does your grocery store require (or at least incentivize) you to carry a frequent shopper card or similar device, which, when scanned at checkout time, gives you the best price on each item you're buying? If so, the store can keep track of not only storewide purchasing trends but individual purchasing trends as well. The store can target market to you by sending mailers with coupons for products you tend to purchase most frequently and suggesting companion items that others may buy when they purchase the same items you do.

Now let's take it one step further. Consider the types of information one provides when filling out the form to receive a frequent shopper card. You probably would have to indicate your address, date of birth (or at least birth year), gender, perhaps the size of your family, annual household income range, or other such information. Think about the possibilities now open to your grocery store as they analyze that vast amount of data they collect at the cash register each day:

  • ZIP code data can help the store locate the areas of greatest customer density, perhaps aiding their decision about the location for construction of their next store.

  • Using demographic information, the store may be able to tailor marketing displays or promotions to the preferences of their specific customers. The store can avoid mailing coupons for baby food to elderly customers, or proactively provide discounts on pet items to customers whose previous buying behavior is consistent with being a dog owner.

  • Coupling receipt time stamps with loyalty card data can even help stores know what time of day or what day of the week to promote certain items.

These are only a few of the many examples of potential uses for data mining. Perhaps as you read through this introduction, some other potential uses for data mining came to your mind. You may have also wondered how ethical some of these applications might be. This text has been designed to help you understand not only the possibilities brought about through data mining but also the techniques involved in making those possibilities a reality, while accepting the responsibility that accompanies the collection and use of such vast amounts of personal information.