Artificial Intelligence (AI) is fast becoming ubiquitous. Enterprises, cutting across sizes, rely on AI to make informed business decisions, and to automate processes.
But AI is only as good as the data fed into it. The bulk of AI applications depend on machine learning algorithms. Algorithms derive trends and insights from historical and live data. While the world produces about 2.5 quintillion bytes of digital data every day, 75% of enterprise data comes unstructured. Such unstructured data is not suitable for developing machine-learning algorithms. The remaining 25% of data also face issues such as being trapped in silos, containing in-built bias, unidentifiable, or simply being wrong.
This IT blog explains how to make your enterprise data ready for AI.
1. Build an AI-ready Data Architecture
Revamp the enterprise data architecture. Draw up rules, policies, and models to:
- Identify the type and nature of the data. Define the use, management, and integration of each data type across the enterprise. This ensures using relevant data for AI.
- Design data storage locations. The cloud is ideal for storing data required for analytics. But some data will have to remain isolated for security, strategic, or compliance needs.
- Store data by data-type or fact. For each piece of data, identify the use cases, who has ownership of the data and who decides regarding the data.
- Apply “one fact in one place” and a “single source of truth” to cut redundancy and reduce data storage costs. Develop a catalogue of services, eliminating the redundancy of tools and processes.
- Make the data accessible. There is no value in having huge swathes of data if it is not accessible when required. Data silos and AI analytics do not go hand-in-hand. Even the best machine-learning model falters when it relies on sparse or static data from remote, unreliable sources. Link data across locations using secure connectors. Create points of integration across enterprise applications.
- Develop a robust system to access data for model training and scoring. AI requires accessing source data at the scoring time. If the set-up is difficult or expensive to integrate and transform at runtime, the AI implementation falters.
- Ensure automatic cleansing and rationalisation of data, for real-time analytics.
Well-designed data architecture simplifies data and streamlines its flow across the enterprise.
2. Shift the Enterprise Data to the Cloud
Big Data analytics and AI involves grappling with zettabytes of data. The cloud is the only viable depository for such data.
Many cloud providers, such as Amazon Textract, Amazon EMR, Azure AI, and Google BigQuery ML offer AI services to analyse video, text or image files. These solutions offer simple point-and-click interfaces to apply unstructured data for AI. The alternative is to hire expensive data scientists.
Most cloud providers use the object store format. The format is scalable, non-hierarchical, and easily accessible. Users access enterprise data directly, without navigating a structure or tree. The metadata offers in-depth information on the data and delivers better insights.
But transferring unstructured enterprise data to the cloud is a challenge.
It is easy to plug in and transfer a file system of a few terabytes in size. But large enterprises have millions of files, with the data quantity extending to petabytes. Much of this data spreads across multiple silos, separated by vast physical distances. Large distances introduce latency. For instance, even with a 1 GB/second connection, transmitting 10 PB of data takes four months of continuous transfer.
Clusters and frameworks such as Hadoop offer a limited solution. Amazon Snowmobile moves data using a semi-truck to move data from client location to the AWS cloud data centre. The process still takes a few weeks, for large data sets.
Storing data in the cloud in the first place solves the problem of using data for analytics, but raises issues in day-to-day use of data. The remote locations of data centres add to the latency. Also, object stores are not suitable for storing files. Several services improve performance by storing the master copy in the cloud and caching files locally.
3. Hire an Effective Team
The best of policies remain ineffective without a competent team to run the policies and manage exemptions.
Build a team of data scientists, project managers, machine-learning engineers, and other specialists. Talent is in short supply in these areas. Enterprises could look at engaging independent contractors or training in-house talent. Make sure the data scientists stick to the disruptive and innovative AI work rather than cleaning enterprise data or fixing routine issues.
4. Align the enterprise data with the required use cases.
Aligning data for AI will not work without clarity of objectives.
- Determine the outcome variable upfront. Understand the purpose of deploying the AI system.
- Define consistent data standards and apply it across the board. Set up quality measures at the source to enforce such standards. Poor-quality data only speeds up poor decisions. Any manipulation of stored data is a waste of resources.
- Check if the data is fair and free from bias. AI-powered systems scale and amplify patterns. For some use cases, this has huge negative implications. For instance, if the algorithms for loan approval use datasets with bias, the bank risks discriminating applicants. The AI model will perpetuate such bias.
- Label data accurately to apply the rules. Labelling enterprise data has been a manual, time-consuming and repetitive task. Automation makes the task easy and fast.
- Capture metadata up front, and then keep it accurate and up to date, managing it as a corporate asset.
- Apply provisional analytics sandboxes to discover hidden data relationships and anomalies
A large healthcare provider revamped its data platform to process 35 trillion operations under 20 minutes. The firm gains $100 million+ in operational efficiency every year. AI enables thousands of such use cases for enterprises. But such implementations depend on AI-ready data.