For Big Data Insights, Start Small: Follow the Data Breadcrumbs

May 9, 2019 | by Wolf Ruzicka

This article was originally published in Database Trends and Applications on April 9, 2019

The data on big data indicates that up to 60% of analytics projects fail or are abandoned, costing companies an average of $12.5 million. That’s not the result we seek from data lakes. Instead, companies are increasingly finding themselves mired in data swamps that are overfilled and too muddy to offer any useful visibility. Or are they?

Many companies and government organizations have jumped on the data bandwagon and are hoping to analyze as much data as possible to discover something that might give them a competitive edge or change their operations. Instead of big insights, they have petabytes of big data. They’ve become compulsive data hoarders while struggling to develop an actionable plan for that data.

The motivation to gain insights from data is good but the approach isn’t. Big, traditional, analytical projects are expensive, take years to implement, and aren’t as effective as the marketing materials would lead you to believe. A survey of more than 500 executives by McKinsey Analytics found that 86% of respondents believed that their organizations were only somewhat effective at meeting the goals they set for their data and analytics initiatives.

A better approach to big data is to start small. Instead of taking a giant bite of your data, you nibble at it, keeping risks and expenses low. Time and again, we have seen some of the most valuable business insights derived from surprisingly small datasets. We call this approach “minimal viable prediction,” or MVP.

MVP is an approach replicated from best practices in software product development. The idea is to focus on delivering a minimum viable prediction as fast as you can and iterate from there.

For example, let’s say you are an ecommerce grocer. Your customer experience culminates in customers adding items to their shopping carts. You know that product affinity changes with each customer’s context, such as the time of day, day of the week, weather, and location. What if you wanted to suggest, in near-real time, items that appeal based on the customer’s specific context? Not generic buying patterns based on all Amazon users or the like, but personalized for an individual customer and real data about their real environment when shopping virtually.

You might start by gathering historical, broad market basket data. You may discover that, in general, diaper purchases have the highest affinity with beer purchases. The next data breadcrumb might be a customer’s gender data, if available. Another set of data could be weather data by ZIP code. Suddenly, you are starting to see some trends and you’ve barely started! Perhaps it turns out that product affinity of diapers and beer applies to male customers, but significantly more often during later hours of the day and has overwhelming statistical relevance when it is unusually hot. This is your first MVP that can get immediately validated in the top and bottom lines.

If the first dataset was product affinity, customer gender, and weather data, a second tier of data may include a list of brands and their profitability to you. You dig into that data, and simple A/B testing may uncover that regional brands have higher success rates. While you’re probably not going to get rid of the major consumer packaged goods suppliers, you can keep that correlation in mind, and move on to the next dataset. The key point is to establish a methodical process that iteratively examines your top headache … or opportunity.

The approach is simple and straightforward, but keeping things small is actually difficult. It’s a mindset that requires organizations to answer one question at a time and follow the data path to a conclusion. Armed with these little insights, it’s tempting to want to dive deeper.

Here are a few tips to stay the course and embrace MVP:

Collect thoughtfully. Data becomes dated—fast. Unless your questions focus on historical changes, spring-clean your data regularly and eliminate data clutter.

Iterate.  Don’t waste time by taking too big a bite. Predictive analytics is a process of trial and error. So limit your focus and iterate, trying new paths until you find one that leads you in the right direction.

Prioritize. List all the problems or questions you would like to solve with data analytics. Rank them by potential business impact and ease of assembling the first set of data. Then pick the highest-ranked problem, and follow the breadcrumbs.

Stay focused. If you hit a dead end, retrace your steps and follow the next set of data breadcrumbs. You could easily assemble another set of data but don’t—it will muddle your quest to answer question number one. The more self-discipline and focus you demonstrate, the easier it is to generate insights and achieve a better outcome. As you continue to iterate, include more data, where and when it’s relevant, and only then.

Refine. Once you’ve answered your number-one problem and have your first MVP, refine it again and again so you have an ongoing stream of actionable, affordable predictions.

The cloud is your friend. Cloud-based pay-as-you-go tools, such as Microsoft’s Power BI, are ideal companions for this approach. Data management, analysis, and visualization tools in out-of-the-box, cloud-based packages make it feasible for organizations to become data savvy, regardless of budget or analysts on staff. 

Once you have your first MVP, your project is already more successful than 50% of the big data projects out there. We can attest to seeing results in 2–3 months—rather than 2–3 years. And that’s where the MVP approach excels. By only focusing on the relevant data breadcrumbs and starting small, it becomes much easier to gauge whether you’re on the right track and to correct any mistakes relatively cheaply.

The MVP approach requires a mindset shift inside the organization—one that values agility and accepts trial and error. It’s as much a cultural change as it is a data approach. Initially, the MVP approach’s small scope might feel too small for organizations experiencing a revenue-impacting pain point. The approach was challenged when it was first used in software development too. Have faith that starting small—with limited questions, investment, and time—will produce results faster, less expensively, and without the risk of big-bite data approaches.