Five Reasons Why Data Science Projects Fail

Data science and machine learning are the key drivers of digital transformation. But beyond the hype, deriving business value from data science projects is tough. A study by Dimensional Research reveals just 4% of companies succeed in deploying Machine Learning models to a production environment. And, even worse, the gap between enterprises gaining value from data science and those struggling to do so is widening!

What are the reasons for such a high failure rate and how can enterprises overcome such a state of affairs? 

1.  Data Silos

The success of AI and ML projects depends on high-quality data. The biggest impediment to high-quality enterprise data is data silos.

Data silos are insular information systems incapable of operating with other enterprise systems. For instance, an enterprise may set up a CMS with one vendor and later set up a field management system with another vendor. Integrating data from these systems results in explicit client behavior and sales trends. But these systems may adopt different data protocols and formats and cannot communicate with one another. Most enterprises grow organically, and each growth phase results in new databases. Also, many employees set up ad-hoc systems to get things done easily, without going through enterprise IT. The net result is data spread across multiple databases in multiple formats. Many databases reside in inaccessible silos, not even connected to the Internet full-time.

Data silos cause data fragmentation, which leads to duplication and stale data. The data fed into the analytics engine becomes poor or incomplete. The business decisions based on such flawed results might become counterproductive. 

The workaround is connecting all data sources to an integrated pane. This requires migrating databases or setting up API connectors. It may also need to change the format for compatibility. Next is the processing of cleansing data and weeding out duplicate data. 

But these are easier said than done. Many companies lack data infrastructure or do not have the volumes to justify the investment.

The process usually takes months of effort and the active supervision of the workforce. Since the time and the efforts come at the cost of immediate operational or customer-facing duties, causing projects to fail.

Solutions to pre-empt data silos are: 

  • Understand the data. Not knowing the data leads to the classic Garbage in = Garbage out situation.
  • Unearth poor information sharing or data hoarding practices. Integrate the siloed data sets into the enterprise system. Develop policies, checklists, and other protocols to enforce proper data usage. Standardize information sharing policies across the enterprise. At times, silos serve some purpose, such as data security. Make the right trade-off between inefficiencies and data quality to security.
  • Upgrade the infrastructure. Often primitive or aging IT infrastructure causes silos. Develop production-grade cloud-based data pipelines systems. 

2. Talent Crunch

There is a severe talent crunch in the data science space. LinkedIn reported a shortage of 150,000 data scientists in 2018. The problem has become more acute now. In 2020, QuantHub rated “data science/analytics” as the second most difficult skill set to find, after “cybersecurity.”

The upgradation of skills has not kept pace with the advancement in technology. Talent is scarce, even when cash-rich enterprises are ready to pay for it. Most times, the high HR costs make the project unviable. Multiple data science projects require large teams, making the job even more challenging.

A qualified data scientist needs proficiency in math, programming, analytical tools such as Spark, Hoop and SQL, and more. They also need machine learning and data visualization skills and expertise in data wrangling. On top of all these, they need to understand the business well. Many enterprises run data science projects with key team members lacking in key skills. The result is a failure.

The solution:

  • Hire individuals with basic skills. Test for aptitude rather than competencies. Offer training in-house.
  • Offer training to upgrade skills and as an opportunity for career advancement.
  • Explore freelancing to source talent globally. 
  • Offer inducements to keep existing talent. 

3. Lack of Business Context

An infatuation with data science solutions or using data science because it is the “in-thing” is the sure route to failure. At times, data science projects resemble a hammer in search of the nail, by stating the obvious. For instance, seasoned business managers could find out segment profitability using back-of-the-envelope calculations. They do not need the k-nearest neighbours’ algorithm for the task. 

Analytical solutions work best when aligned with the business context. If the project does not deliver efficiency improvements or better products, it is a failure, even if it works well. 

 

Most data science projects involve modelling machine-learning algorithms. The bias in training algorithms often leads to flawed results, and the project fails. Consider the case of a leading bank that attempted to develop an algorithm for loan processing. The bank trained the algorithm with the credit approval memos of all the loan applications processed over the last 10 years. The data scientist did not know that the bank prepared the credit approval memos only for loans pre-screened by experienced relationship managers. These loans already had a high chance of approval. The algorithm did not have data from the rejected loan applications! 

The solution:

  • Form a team of data scientists, the IT team, and business users upfront. Have clarity on how the project will make a difference. 
  • Consider the value-addition the project will deliver. Make sure the project solves a pressing business problem, based on feedback from front-line staff and customers.

4. Difficulties with Model Deployment 

Only 15% of top enterprises have deployed AI capabilities into production. The deployment gap is the top reason the other 85% falter.

The root cause is the gap between the data scientists who develop the models and those who implement them at the enterprise level. Most data scientists do not have architectural views on the production pipeline. They embark on the project and face difficulties when trying to integrate it with the enterprise system. 

The solution:

  • Adopt the rapid try-fail-repeat process to zero in on a workable model. Automate manual and iterative steps to test new features and validate models fast. Automation speeds up the testing workflow from months to days. 
  • Undertake white box modelling. White-box models offer insights into project behaviour and the variables that influence the model. It makes the inner workings of the model transparent and makes it easier to interpret results. The traditional black-box models are hard to interpret, lack accountability, and cannot scale.

5. Overlooking Culture

Not everyone in the enterprise will be on board with data initiatives. The C-suite may draw up a strategy, but implementation depends on the line managers and rank-and-file employees. Turf wars, cozy-status-quos, and fear of uncertainty lead to resistance. 

Implanting data science projects without factoring in the culture dooms the project. As solutions:

  • Educate the workforce and sell the project. Convince the workforce of the benefits on offer.
  • Promote internal champions who have credibility. 
  • Help and support everyone involved work through the change process.

Data science projects are resource-intensive. Their failures leave a tremendous impact on the bottom line. Follow the safeguards mentioned in this tech blog for IT professionals to avoid such a state. Here is a seven-point cheat sheet for the success of your wider digital transformation initiatives.

Tags:
Email
Twitter
LinkedIn
Skype
XING
Ask Chloe

Submit your request here, my team and I will be in touch with you shortly.

Share contact info for us to reach you.
Ask Chloe

Submit your request here, my team and I will be in touch with you shortly.

Share contact info for us to reach you.