Delivering AI Innovation with Data Architecture and Data Lakes

Many enterprises drive innovation through Artificial Intelligence (AI) based projects. But implementing AI requires upgrades to the data architecture. Legacy architecture rarely delivers the agility, flexibility, and speed needed for AI-powered innovation. The potency of Machine Learning algorithms depends on the quality of data. Making business decisions on stale data leads to bias or errors, and machine learning amplifies such errors. Here are five data architecture considerations to ensure data integrity for AI-powered innovations.

1. Invest in cloud-based data platforms

The smooth rollout of AI projects requires infrastructure, such as data lakes and analytical engines. Deploying such infrastructure in-house makes the data architecture complex. The need for enormous resources and investments makes it unviable for most enterprises, anyway. The cloud offers a solution to both these issues.

Cloud enables rapid scale-up of AI tools and capabilities, cost-effectively. Cloud service providers offer cutting edge-tools with near-infinite capacity and flexible plans. Adopting these tools makes the enterprise agile, and speeds up deployment from weeks to mere minutes.

Serverless data platforms enable building and operating data-centric applications with infinite scale. Enterprises do not have to go through the hassles of installing and configuring solutions or managing workloads.
Containerized data solutions automate the deployment of additional compute power and data-storage systems. This capability enables easy scale-up on demand for data platforms.

Smart businesses drive their AI agenda in innovative ways. They may, for instance, combine a cloud data platform with container technology to run microservices. They may use the stack to offer self-service capabilities, reduce cost, or boost analytical capabilities.

Denodo’s automated infrastructure management makes cloud migration easy for enterprises. The tool offers PaaS support for cloud and hybrid environments. A modern data services layer supports most cloud standards, including OpenAPI, OAuth 2.0, SAML, OData 4, and GraphQL. It delivers interoperability with current cloud systems.

2. Use modular platforms

Preparing AI proof of concepts is easy. The hard part is scaling it up to enterprise settings. Most enterprises do not have the talent or the infrastructure needed for the task.

Adopting modular data architecture makes it easier to scale up AI solutions. Modular components offer the flexibility to replace parts of the tech stack as needed. Other components of the data architecture remain unaffected when such change takes place.

Consider the case of a utility-services company that offers customers energy consumption reports. A modular stack that delivers these ends contains:

An independent data layer co-opting commercial databases with open-source components.
Proprietary enterprise service bus to sync data with back-end systems.
Microservices hosted in containers to run business logic on the data.
API-based interfaces and data pipelines to simplify integration between disparate tools and platforms.

Several analytics solutions simplify building such end-to-end solutions in a modular architecture.

Enterprises may use the Denodo platform to connect to myriad data sources, combine data from disparate sources, and publish it. Data management teams may set secure, selective access to data holdings.

3. Switch to domain-based architecture

Artificial Intelligence works by harvesting data and requires data warehouses and data lakes. But the traditional approach of centralised enterprise warehouses and data lakes is out of favour. Agile enterprises adopt distributed domain-driven designs that enable decoupled data access. Here, data sets reside with business domains. APIs and integrators offer access to such data on a need-basis. For instance, the accounts team may access data stored by the manufacturing team using an API built for the purpose. While this is simpler and speeds up time-to-market, it also carries risks. Data may become fragmented. Without proper attention, the process may become inefficient. IT teams often end up reconciling fragmented data sources. Success depends on:

Incentivising domain owners to keep the data analytics-ready.
Developing APIs to enable secure, automated access of data from the source. Enterprises may offer employees access through APIs.
Adopting data virtualization to organise access to integrate distributed data assets.

Denodo helps enterprises integrate data from disparate databases and silos. The Denodo data virtualization platform enables enterprise-wide data governance, automated multi-cloud infrastructure management, AI-powered automated smart query acceleration, embedded data preparation capabilities for self-service analytics, and a host of other capabilities. It enables enterprises to set up a single source of truth.

4. Set up real-time processing

Tech advances have made real-time data processing viable and cost-effective for enterprises. Enterprises use AI-based real-time processing for predictive analytics, personalized marketing, making real-time alerts, and so on. These resources unlock many business possibilities. For instance, fleet businesses such as Uber make accurate ETA predictions. Insurance companies analyze real-time behavioural data from smart devices and offer custom rates. Real-time streaming companies such as Netflix process user preferences to customize offerings. Platforms such as Graphite or Splunk trigger business actions, such as sending alerts to sales representatives.

Central to such efforts are enabling tools. Data lakes retain granular transactions. APIs extract data from disparate systems to data lakes and integrate insights into front-end applications. Supporting tools dip into the data lakes to offer real-time analytics.

Denodo offers in-memory processing and an active data catalogue to speed up data processing. A dynamic query optimizer enables intelligent query execution strategy.

5. Rise of flexible, extensible data schemas

Successful AI rollout requires flexible data models capable of handling fast-paced changes.

Conventional data models are rigid. Most enterprises use pre-defined data models. It becomes difficult to co-opt data elements or data sources to such models. Change may affect the integrity of the data.

New “schema light” approaches offer greater flexibility. It reduces complexity and makes data management agile.
Storing unstructured data becomes easy. Unstructured data makes up 80% of enterprise data. Most conventional data models leave out valuable information contained in unstructured data.
Data-point modelling makes data models extensible. Adding or removing data elements becomes easy and disruption-free.
NoSQL based graph databases enable easy scale-up. Developers may use these databases to model data relationships in flexible ways. JavaScript Object Notation (JSON) allows changing database structures without changing business information models.

Denodo’s automated lifecycle management features automate most data management tasks. It helps enterprises deploy the best data models.

Developing a competitive edge through AI-powered innovations rests on a fresh approach to defining, implementing, and integrating data stacks. State-of-the-art platforms such as Deonodo become a strategic enabler that allows the enterprise to leverage the cloud, virtualization, and other techniques to such ends.

Here is an interesting read on the top IT innovation success stories during the COVID-19 shutdown.

Andre Rodrigues

As a software and IT solutions advisor, Andre leads a team of technology consultants for implementing Account-based Marketing strategies to IT customers. In his 30 years of working experience, across the region, Andre has helped numerous clients improve existing business systems and IT infrastructure. This experience has helped Andre secure a unique knowledge and understanding of the challenges faced by these sectors.