Analytics Pitfalls—A Check List
Gartner predicts that through 2017 60% of big data projects
will fail to go beyond piloting and experimentation and ultimately will be abandoned. In this article, we’ll
outline the potential pitfalls that account for this low success rate and suggest ways you can address them.
There are several internal challenges that impede success: data-related, team-related, and even at the executive
- Disparate Data Sources and Data Silos – The typical enterprise that we encounter has many
internal databases, sometimes even in the thousands, that don’t talk to each other. They also lack a robust
API layer that enables programmatic access to data. Instead, they steer towards data warehouses, where big
data is collected and stored in one place before it’s used for analysis. Data warehouses have their place
but are often overrated. Costly and time-consuming to implement, data warehouses also add to your data glut
and introduce delays into the analytics process.
- Data Warehouses are Not the Only Option – Data warehouses can also increase your storage
costs and make the data analytics process more laborious. Each time you need access to the data, the
warehouse extracts data into data cubes creating duplicate data (which can quickly grow your data
surplus)—sometimes referred to data explosion. Another downfall of the warehouse approach is that the
data-cube data isn’t updated automatically, potentially leaving you with stale data that creates noise in
your system making the data cleaning process more protracted.
- Dirty Data – Dirty or inconsistent data is also a prevalent issue. Empty data fields,
misspellings, and so on, can mess up your insights. Data inconsistencies also create a problem. For example,
a company’s marketing system may define the northeast U.S. in a way that includes Washington, D.C. However,
the sales team categorizes D.C. as the Mid-Atlantic. When these datasets are merged, you have a problem. You
can buy tools that can pull data out of disparate systems, but these typically don’t flag data conflicts.
- Need for an Analytics Roadmap – Data sprawl is a huge problem, users generate data at
alarming rates. Without an analytics planner, things can get out of control. An analytics roadmap is needed.
- Internal vs. External Expertise – Finding the experience to manage data analytics projects
from the inside is tough. Any team embarking on data mining, predictive analytics, and predictive modeling
must be very well versed in math and statistics. External experts bring valuable know-how and a better
understanding of what’s needed. Generating that kind of know-how internally is typically beyond the reach of
- Fear and Fiefdom – Data fiefdom can also get in the way of success. Data owners don’t want
to lose their relevance or control of the data they generate/own. Sharing their data sets with other teams
or external consultants may expose their mistakes or even hinder their ability to do their jobs.
Leadership Team Challenges
- Old-School Mindset – The C-suite and managerial team also has a role to play in why
analytics projects fail. Many business leaders trust the old way of doing things. Instead of basing their
decisions on actionable results, they want to review each bar of the “green bar report”, creating delays.
- Lack of Continuous Involvement – Executives are rarely involved throughout the analytics
process. They may be involved in the kick-off meeting, but fail to re-engage until several months down the
lines when they find that the outcomes are not what they wanted or their priorities have changed. Of course,
the analytics team was kept in the dark. This approach is the antithesis of agile.
Consider the following external challenges to why analytics projects fail.
- The Big Bang Approach – Our viewpoints are often shaped by external influencers. It’s
challenging to go against the flow. If everyone says you need to take the big bang, warehouse,
multi-million-dollar approach to data analytics, people don’t even consider other options.
- Pretty Visualizations vs. Actionable Insights – A lot of vendors impress you with
pretty visuals that look great, but aren’t actionable. Sexy always wins over quality. This is augmented by a
lack of leadership involvement throughout the process. Without continuous involvement, business leaders may
not understand what’s presented to them or realize that the outcome or product isn’t what they really
needed. This approach also means that mistakes aren’t detected along the way. However, with an agile
approach, stakeholders are involved throughout the project and mistakes can be detected and remediated in
real-time. A pretty visualization is nice to have, but not a must-have. It’s merely a means to an end –
Common Misconceptions About Predictive Analytics
Analytics is a one-time effort.
False. You can’t just set everything up and reap the benefits. Done right, analytics involves a continuous feedback loop – it never stops.
It’s a black box effort.
False. The project must happen transparently, in the open, with immediate disciplined, regular feedback.
Actionable insights require big data.
Actionable insights don’t require big data. Some of the most valuable business insights are derived from surprisingly small data sets (it also costs less and minimizes risk to start small). Instead of focusing on big data, we recommend focusing on the right data at the right time and finding a way to ask the right questions of that data.
There must be a tool for this.
Unfortunately, not. There’s no such thing as a predictive analytics tool that you can install, press a button and marvel at your insights. Instead, many enterprises invest in more than one tool (some on-premise, some in the cloud), each of which requires customization. And not all these tools will be future-ready and may need to be switched out with time. As your business needs change and digital transformation reaches a new step on the maturity ladder, your choice of tools will also change.
My data will give me 100% certainty.
False. There is no 100% certainty. In many cases, a margin of error of 30-40% accuracy may be good enough.
Actionable insights are essential.
Not always. Sometimes the goal could be to achieve no actionable data from your efforts. For example, if you’re using data and predictive analytics to alert you of potential equipment failure in a medical environment, no data in the form of alerts means there’s nothing to report and all is well with the equipment.
New, Nimbler Approaches to Analytics
Though there isn’t an approach that avoids all above-mentioned pitfalls, there are methods that help to avoid at
least some of them. We’ve introduced several best practices into our client engagements that break down these
barriers and produce rapid, iterative, actionable insights and give management what they need without alienating
data owners or breaking the bank.
1. A Software Development Approach
This is a novel way of looking at things, but data analytics projects have many parallels with software product
development. Instead of delivering a piece of software, you’re delivering a data product and product/software
development best practices still apply. Yet, most data analytics consultants and external vendors don’t have
that product mentality. They prefer to focus on maximizing billable hours. This prevents the transfer of
knowledge and intellectual property to the product owner who is going to run the tool, aka the customer.
Just as a proof of concept (POC) in software development must prove the unknown, the POC in an analytics project
uncovers any impediments to delivering value in the fastest possible way. Once the POC is
affirmed, you are led to your first minimal
viable prediction (MVP).
An MVP approach is all about disregarding the noise and assembling only the data that correlates with your number
one problem, as fast as you can, and iterating from there. While your executives or business sponsor doesn’t
need to be involved all the time, be sure to schedule periodic, short feedback loops. Make it your goal to
deliver minuscule pieces of fast progress. This will ensure they see immediate, incremental results, get exactly
what they need, and drive greater engagement.
The MVP approach goes against the grain of an all-in, high-risk, long-haul data analytics practice. It also keeps
you focused and helps you avoid many of the pitfalls above. With this kind of discipline, you’ll find it easier
to say no to frivolous feature requests, charts, graphs, etc. because you need something actionable ASAP.
Adopting an iterative approach, rather than diving into your big data all at once, can pay dividends and put
AI-driven decision-making quickly within your reach. This sounds straightforward, but it can be tempting to jump
into the data lake in front
of you instead of just following small, relevant, iteratively assembled data breadcrumbs.
Read more about how
to implement MVP.
2. API-in-a-Box Approach
Unfortunately, a software development approach to analytics doesn’t solve the problem of disparate systems. You
may have the right method, but if your systems don’t talk to each other, getting to the data is challenging.
Data ownership, or fear and fiefdom, often impedes or makes sharing data difficult. While you can counteract
fiefdom through evangelism (it behooves your organization to prove how sharing data creates value across the
board), we’ve come up with a concept to aid in this effort. We call it “API-in-a-Box.” API-in-a-Box has another
benefit, it allows systems to quickly talk to each other without having to conduct a costly and time-consuming
traditional system integration.
API-in-a-Box breaks out data silos without data owners fearing they’ll lose control of their data. By packing all
relevant technology into a container and giving each department access to the data that’s relevant to them (or
that the data owner feels comfortable sharing) via an API – silos are easily overcome. An API-in-a-Box can be
spun up in days, eliminating the time-consuming data integration problem. Plus, after data errors are found and
one department’s data is merged with another, actionable insights start to emerge and the barriers of fear and
fiefdom start to break down.
3. Internal vs. External Data Sources Concept
We also recommend moving away from internal vs. external data sources concepts. For your analytics project, it
shouldn’t really matter where your data comes from. Use as many relevant data sources as you need and as few as
you can. Try to create and maintain as little data as is necessary. If someone is already collecting, cleaning,
and making that data available, why should you replicate their efforts? Data can be readily leased or
purchased. For example, weather
data is available for purchase and is much cheaper than collecting your own. Some companies generate so
much data that it even generates new business opportunities.
Externally collected data can even have a monetization value. A client assembles sports data from national
leagues, packages it into an API, and licenses it. All the data is created and maintained by the leagues, but
the company simply utilizes that data and provides it to others who may not have the resources to assemble that
data for a fee.
New Technologies Enabling these New Approaches
Getting started on any data analytics project can take several months – all that data gathering, cleaning,
structuring, model building, enhancing, and reviewing takes time. However, emerging technology is enabling
nimbler approaches. Cloud services, for example, have revolutionized analytics. Everything you need to crunch
data is available out-of-the-box. Prior to these developments, you’d have to work with Software-as-a-Service
(SaaS), manage your own data center, and deal with non-user-friendly processes. Quite frankly, it was a pain.
Here are just a few examples of the numerous cloud services for analytics:
- Microsoft: Machine Learning, Azure Analysis Services, and Video Indexer.
- Amazon Web Services: Quicksight, Deep Learning AMIs, and Rekognition.
- Google Cloud: Cloud Dataflow, Cloud Speech API, Cloud Machine Learning Engine.
- GovCloud: Provides additional security protections with the same user-friendly tools.
Cloud services are democratizing analytics and putting actionable insights into the hands of IT and non-IT teams
in ways that weren’t previously possible.
Of course, data analytics is more than just best practices and data integration/storage/crunching tools. Data
scientists often need access to huge computational power quickly, on very short notice, and for a short period
of time. Likewise, proximity and access to cognitive services like artificial intelligence (AI) and machine
learning in the cloud (Microsoft Azure and Amazon Web Services) are needed in unplannable ways.
Technologies such as Kublr enable data scientists to move huge amounts of
data between clouds, the data center, or wherever your data needs to go. This provides unprecedented access to
data and computes power that can scale up and down as quickly as you need it.
What’s Next? Food for Thought as Your Data Analytics Projects Evolve
Hopefully, these best practices and technology insights have provided some food for thought about how you
approach your next data analytics project.
But don’t stop with your first MVP. As your organization’s analytics capabilities mature, you’ll be able to
incrementally feed your engine more data—in terms of volume and diversity. First, you start understanding simple
correlations, then you’ll start getting information, then you’ll get predictions and in the next step, you
receive recommendations. As your system matures, it will slowly turn into an AI engine.
One final area to ponder are advancements
in AI and how that factors into your BI strategy.
More and more decision-making is being delegated to machines. Whereas in the past machines were relied upon for
alert-only notifications, now they enable real-time, context-aware decisions. For example, AI monitors and
auto-corrects manufacturing processes, help the C-suite make strategic decisions, and marketers determine which
promotions they should offer.
AI can seem overwhelming. But just as you approach your next data analytics project by starting small, iterating
constantly, and providing regular feedback to business sponsors, apply the same baby
step approach to AI.
Intrigued? To learn more, please reach out to Raymond Velez at firstname.lastname@example.org or