How many of you have watched the movie Fight Club?  If you haven’t, I strongly recommend you do.  It’s a movie worth watching, replete with violence, ennui, and a lament for the narcissistic failings of the baby boomers.  Anyways, there’s a scene in the beginning of the movie where Edward Norton talks about how crappy his job is.  He has to assess the risk associated with car accidents for an insurance company using actuarial data.  His job is to figure out whether a recall is the right decision for an auto manufacturer to make – from a financial perspective.  Let’s assume that there is a brake problem that will result in a financial liability associated with wrongful death.  Using actuarial tables, he puts a number to the total financial exposure that a company would have.  He then compares this to the costs associated with a recall.  If a brake problem will result in payouts of $50,000 per claim against 1,000 wrongful death claims (a total cost of $50,000,000), and the cost of a recall is $75,000,000, then the insurance company won’t authorize a recall.  Financially, it makes more sense to let people die in brake failure accidents than to correct the problem.

Hello, weird precursor to Big Data.

This is a small but substantive problem associated with Big Data – a company is given so much information that morality, ethics, and even the rule of law are subsumed to analytical realities.  Big data allows managers and executives to measure every facet of the customer experience, from marketing to returns.  They arm these decision makers with the power to base decisions about policies based upon data that may or may not be advantageous to the consumer or the society.  Actuarial tables are a type of decision-support analytics across a small subset of data, but data that arms insurance companies with information that forces them to make ethically challenging decisions.  As a person, such a decision is appalling – why let 1,000 people die when the fix is available?  As a computer, I the challenge tilts in the other direction, because the cost of each life has been valued at $50,000, and the cost of correcting the problem is $25 million more than the potential payout will be.  I’m not going to say most shareholders of financial institutions that cover insurance have the empathy of a computer, but I’ve worked in financial services. ‘Nuff said.

Ok, so actuarial tables exist for a reason; I get it.  There is a value associated with risk, even if that risk is distasteful to quantify.  But Big Data can be more insidious than cost analyzing car accidents that kill 1,000 people a year.  Much more insidious.  And much more relevant to you as a consumer.


Let’s fast forward ten years into the future.  Big Data is 10,000x more advanced than it is right now.  Companies use it to measure and manage customer satisfaction in every facet of its organization.  An imaginary company, Qualmart, is reviewing the costs associated with customer service.  Big Data is streaming information from a thousand different sources – cash registers, Twitter, Facebook, loyalty programs, credit cards, employee files, product recalls, and anything else you can imagine.  After correlating data from Facebook, Twitter, and its own returned merchandise receipts, it has found that Chinese consumers are 60% more likely to indicate negative experiences on the internet during a product return than the rest of the population.  This information is an algorithmic compendium of data captured from credit card transactions, age/sex/racial data compiled from its loyalty programs, and complaints and negative feedback from Facebook posts, tweets, and employee-documented information from the actual return itself.  As a result, a new corporate policy is issued for the customer service departments of Qualmarts located in Chinatowns of major cities to ensure that the return process is as quick and painless as possible, even if there is no real reason to honor the return.  Well, as someone who is not from China, I don’t think that seems fair.

Let’s look at it from a different perspective.  Let’s say that this retailer determines that white females under the age of 30 that live in Louisiana typically indicate negative experiences on the internet with respect to returns only 15% of the time.  According to sales data captured from stores located in Louisiana, the most returned products are baby carriages, medium sized t-shirts for women that cost less than $15, and women’s shoes in sizes 8 and 9.  Customer loyalty cards suggest that sales are made by young, married white women who make less than $50,000 annually.  The sales for this demographic are 15% of total store sales.  However, the costs associated with the return policy of this demographic exceed 75% of total returns of all returned products.  Since the percentages of returns in this hypothetical situation in Louisiana are from women under the age of 30, Qualmart issues a corporate policy in Louisiana to make the return process in Louisiana more cumbersome.

It can get uglier.  Let’s take it a step further.  Let’s say that our retailer finds that 50% of their returns in Texas are from African-Americans. This information is compared against the internet browing habits of shoppers on; their customers tend to look at competing products on their website after purchasing the product in question.  They decide that the return policy is an issue of “buyer’s remorse” and the impulse buy by the consumer was mitigated by other options available on the website.  Qualmart decides that returns from African-Americans should be challenged.  To ensure compliance with this new directive, they decide to incentivize their Texas employees: if only 35% of all attempted product returns are ultimately returned, they will all get $100 bonuses.  This will have a disproportionate effect on African American consumers in the Texas region.  However, Qualmart will save millions of dollars by instituting this new policy.  And they have data to back up their decision-making.

How about Big Data that has life-or-death repercussions?  Big Data could prove out that Korean pilots are 35% more likely to experience a crash.  Analyzing personnel data from all crashes originating from the US might reveal a disproportionate number of crashes can be attributed to pilots born in South Korea.  However, it might be a causation that has deeper roots; it might be something specific to training or the types of aircraft they fly or cultural barriers.  It could be that Korean pilots fly poorly designed aircraft.  Maybe all of the aircraft shipped to Korea in a specific year came with faulty altimeters or something.


The magic we are looking at is the promise of Big Data – companies are able to predict the behavior of customers even before they exercise the power of their wallet.  If you’re buying online, they can give you recommendation based upon your previous buying history or the buying history of people that fit your demographic so that you are fully satisfied with your purchase.  The insight this provides retailers is almost frightening to consider.  If 75% of people buy one type of widget, only to return it for one or two other styles of widget, what would that suggest?  It is a problem with the first widget, obviously.  Why even present that as an option?  Why not drop the product and offer only the alternative?  Even better, why not claim it is “no longer available” and offer the top two alternatives?  After all, we can measure and manage the customer experience more precisely than ever before. We can make better predictions and smarter decisions. We can target interventions that surpass the gut instinct of executives that came before us, guided by data rather than intuition.

What we are looking at is cultural clichés that are validated by empirical data.   Once we overlay Big Data against predictive analytics, the result is (apologies for the pun) predictable.  Customer segmentation is no longer just a function of increasing sales; it becomes a relevant factor in driving policies that discriminate between racial, gender, age, or socio-economic classes.

In America, we have laws that mitigate the effect of relevant data on the employment process.  It is called “disparate impact.”  In fact, slew of laws exist to mitigate the effect of data on hiring policies.  For instance, it is illegal to refuse to hire a woman if she is pregnant.  Similarly, you cannot refuse service to someone based upon a protected class.  Even if you have empirical data to suggest that gay urban males are more likely to complain about service on social media than other customer segments, you cannot refuse to sell them a product.

But can you do the opposite? It is fair (or legal) to provide a protected class advantageous policies, such as reward programs, special discounts and promotions, or other preferential treatment, because of (or despite) their protected class? Should pregnant women get mommy discounts? Should veterans get discounted fares on airlines?  Should gay urban males receive special discounts in order to mitigate the potential effects of negative social media comments? After all, theirs is a difference between fair and profitable.  If companies were fair, we wouldn’t have to worry about jobs being exported to China.  Companies exist to make a profit.  Although it is illegal to discriminate against a protected class, should it be illegal to give preferential treatment to one if it will result in higher profits?

I don’t know what the first rule of Big Data is, because there are no real rules for Big Data.  It is an endless sea of information that can be used for the forces of good or the forces of evil.  We need to start thinking about Big Data issues as they relate to how our society functions and is governed: security, privacy, and advantageous or discriminatory policies.  The potential for the abuse Big Data is enormous, even as it promises a better tomorrow.

Will Our Data Be Used Against Us In The Future?Jamal is a regular commentator on the Big Data industry. He is an executive and entrepreneur with over 15 years of experience driving strategy for Fortune 500 companies. In addition to technology strategy, his concentrations include digital oil fields, the geo-mechanics of multilateral drilling, well-site operations and completions, integrated workflows, reservoir stimulation, and extraction techniques. He has held leadership positions in Technology, Sales and Marketing, R&D, and M&A in some of the largest corporations in the world. He is currently a senior manager at Wipro where he focuses on emerging technologies.

For more articles:

(Image Credit: Feature Image, Derek Gavey.First Picture:Graham. Second Picture:Official U.S. Navy Page)

Previous post

BI/Analytics Salaries Stagnating, Data Scientists Increasing

Next post

The Importance of Big Data in Australia's Mining Sector