GenAI is now a household phenomenon. Large Language Models (LLM) models interpret complex datasets to discover patterns. But genAI does not find use in business applications that depend on real-time proprietary data feeds. Integrating genAI capabilities with live business data is still a challenge.
The Importance of Streaming Data
Today, the bulk of relevant enterprise data is streaming data. This includes data generated by POS, social media feeds, sensors, IoT devices, and more. Such data flows in real-time and changes constantly. Business systems derive insights from such streams.
The Limitations of GenAI Models
Conventional AI is based on predefined models. The data scientists hardcode problem-specific issues into the model. The model performs the task at hand but nothing else. The developer has to engineer a new model for every new problem.
Data scientists build LLMs on huge general data sets and train the algorithms once. The data engineering happens at prompt time instead of model creation time, making the model reusable.
Generative Pre-Trained Transformer (GPT) co-opts the transformer architecture. The underlying neural network can weigh the importance of different input parts and generate outputs for each prompt.
But even GPTs come with limitations. The LLM models are trained on huge datasets. But such datasets are static. It doesn’t not reflect the latest updates.
If the model has no access to the latest enterprise data, the model cannot answer queries related to it. When the data is sparse or ambiguous, the model hallucinates.
Consider an airline deploying AI-powered chatbots for customer support. The ChatGPT bot can answer generic, static questions such as “What is the check-in luggage limit for my flight.” The ChatGPT algorithms can scour for such information from the airline’s website or the OpenWeb.
But when a customer asks the chatbot, “Is my flight on time?” the bot falters. GenAI cannot answer such questions since it lacks access to the customer’s data. Such data remains locked up in enterprise systems and inaccessible to the genAI application.
For the GPT-powered chatbot to work, the airline company has to provide data from their internal data stores to the genAI application.
The Advanced Capabilities of GPT-4o
OpenAI’s latest GPT-4 Omni (GPT-4o) overcomes much of the limitations of the earlier models.
The model is capable of advanced reasoning. A larger context window that supports up to 128,000 tokens allows the model to sustain coherence over a longer time. A big limitation of the first-generation genAI models was the inability to maintain coherence over time.
Support for file uploads means the LLM capabilities are no longer dated or limited to any knowledge cut-off date. Users can analyse any specific data set.
Improved natural language processing allows better interpretation of unstructured data.
Automating data processing and pattern recognition enables faster trend identification. The accuracy improves, and time-intensive tasks speed up. The model comes with an average response time of 320 milliseconds.
The model also introduces voice capabilities into GPT. GPT-4o can understand inputs in any combination of text, audio, and image. It can also generate outputs in any of those forms.
How to Make GPT-4 Understand Business Data
Users can integrate search to make the model understand the latest, real-time, contextual business data.
The user can prepend relevant information to the prompt and instruct GPT to use such information as a prefix to the prompt.
The pre-requisite, however, is creating ready-to-use and ready-to-access information sets. Most businesses face challenges in collecting all information relevant to each customer.
The relevant information depends on different factors that change for each specific situation. Such information often lies scattered across different databases and systems.
The trick is to create:
- Comprehensive Information feeds related to specific customers and
- General business policies relevant to the situation.
Creating Comprehensive Customer Feeds
Using GPT 4-o for business customer-facing applications requires customer feeds that the model can access at any time.
For instance, to answer a customer query regarding a specific flight, the bot would need access to:
- The customer’s identity.
- The flights booked by the customer.
- The details of the aircraft assigned to the flight.
Consolidating such information from multiple sources and getting a unified view of the data is not easy. And LLM models do not allow querying such information instantly.
Enter event streaming.
Event streaming platforms tap into information feeds as soon as new data emerges or the existing data changes.
The platform aggregates raw data from information feeds. It then transforms and filters it into suitable views and saves it for future use. Data scientists can program to connect the stored feeds with GPT prompts.
Stream processing makes it viable to construct a unified view of each customer that is easy to query with low latency. It also unlocks several other use cases. For instance, it can deliver product suggestions based on real-time user behaviour.
Real-time databases or message queues also get the work done. But event streaming platforms offer several USPs to such options. Event streaming platforms can handle high data volumes with low latency. The LLMs can use these platforms to process and respond to a large number of requests. Event streaming platforms also offer robust security.
Robust platforms such as Confluent make the task effortless. Confluent’s connectors, for instance, make it easy to read and aggregate data from isolated systems.
Connecting the Changing Enterprise Knowledge Base to GPT-4
Prepending all information to each prompt works for unique data, such as customer information. But the method is a wasteful approach for generic data sets that remain constant for all situations. The process entails the unnecessary exchange of tokens that inflate the usage bill.
Consider the example of the airline customer service bot. When the bot has to answer a question such as “Is my flight on time,” it needs unique information such as the customer name and the flight booking details. Prepending the information to the search works best to offer the GPT prompt such information. But when the query is generic, such as, “How much baggage can I carry on the flight,” the answer comes from enterprise-level policies. Such information does not change with the customer. But such information can also be dynamic and change fast. It is still important to make sure the GPT application accesses the most relevant and latest real-time information.
A more efficient way to make enterprise-level knowledge available to GPT is through embeddings.
Embeddings transform words or phrases in prompts into dense vectors of numbers. These vectors capture the semantic meaning and relationships between different pieces of text.
OpenAI’s embedding API allows data scientists to calculate the embeddings. When the user submits a piece of text, the embedding comes back as a vector of numbers.
A vector database organises and stores these vectors.
Retrieval Plugin, a proxy layer between ChatGPT and the vector database, provides the glue to make the two talk to each other.
Applications
With GPT-4 paired with streaming data, the model can interpret business-relevant insights. They can track customer sentiment and emerging trends well. Such architecture can enhance insights in many ways.
In today’s fast-paced world, brand reputation can change within hours. Businesses can add social media streams to GPT-4o to understand sentiments. When negative sentiment spikes, the GPT triggers alerts. Business managers to make interventions to address the issues. The business can also understand the sentiments surrounding competitors for comparison.
Analysing transaction data allows businesses to predict customer behaviour and personalise shopping experiences. They can also use such insights to orchestrate targeted and relevant marketing campaigns.
Another emerging area of application is dynamic pricing. Combining GPT-4 with streaming transaction data allows businesses to adjust demand-based pricing.
The Limitations
The above methods enable using genAI applications that solve live business problems.
But as of now, LLMs have a limit to the input that the prompts can accept. Such limits, or context window, include the input data and also all possible outputs. GPT is expanding the context window fast. But in the near term, the limited context window limits the possibilities of prepending relevant information to the feeds.
There is always the challenge of maintaining data quality in the information streams and embeddings. Inaccuracies or biases in data streams make the insights unreliable. Robust validation and fine-tuning of data are crucial to ensure accuracy and reduce the risk of unreliable outputs.
The business also needs to maintain transparency in data usage, catering to the growing trend of Responsible AI.
A bigger worry is the risk of prompt injection attacks. Enterprising hackers can use the same technique of prepending to get GPT to act in malicious ways. The success of the architecture would depend on implementing strong controls against injection.
Side-by-side, the business cannot lose sight of data protection regulations such as GDPR.
State-of-the-art streaming platforms such as Confluent enable businesses to build robust data feeds. These feeds make live business data accessible to GPT models in double-quick time.