Mastering Generative AI Scaling: Are You the Right CIO?

Reality has started to catch up with the gen AI hype. Enterprises have been launching gen AI pilot projects left, right and centre. But just over one out of ten enterprises succeed in scaling up such pilots into workable business models. And only 15% of enterprises have been able to derive meaningful bottom-line impact with gen AI.

In many enterprises, gen AI risks becoming a solution in search of a problem. Here is how CIOs can acquire the ability to scale up their gen AI pilots as needed and deliver meaningful bottom-line impact.

Set Priorities

A major reason gen AI projects fail to scale is CIOs spreading resources thin across several initiatives. Prioritisation requires a determined attempt to avoid random or focus-less experiments.

Effective prioritisation requires the CIOs to work with business leaders to identify use cases where gen AI can make the maximum impact on business growth or profitability. CIOs need to allow business leaders to make business choices and focus on the technical implications of such choices.

CIos should also take the lead to make a cost-benefit analysis to reconcile potential with viability. Not all gen AI projects cost the same. For instance, a chatbot with live response capabilities needs low latency and will consume huge resources. On the other hand, a code documentation tool need not have high responsiveness and hence needs fewer resources. The former, while costing more, may deliver better business value and returns. The latter, while delivering lesser business value, may be more affordable and easier to implement. Make the trade-off considering the ROI, budget, and business utility.

When scaling up multiple pilots, CIOs must create a strategic roadmap. The best practice is to prioritise easy wins and pilots that cause minimal business disruption. Early success makes it easier to roll out more complex projects.

Get Budgeting Right

It isn’t easy to budget for GenAI, as it is a new technology. Costs often spiral out of control due to the sheer scale of gen AI data usage and model interactions.

There is no workaround to a definite budgetary allocation, though.

To ensure proper budgeting, get a perspective on costs. Scaling up gen AI pilots involves hardware, software, talent, and infrastructure investments. The figures vary depending on model complexity, data volumes, infrastructure, and use cases. As basic costs, gen AI projects need high-performance GPUs, TPUs, or specialised AI accelerators. And these costs are high. A single high-end GPU such as NVIDIA H100 costs over $40,000 now. The costs associated with cloud infrastructure for computing, storage, and networking also add up fast. But LLM costs have dropped over time and continue to decline. Advancements in hardware and increased competition among providers have contributed to this decline

Factor in associated costs. Many businesses underestimate the incidental costs associated with gen AI projects. For instance, cleaning and labelling a large dataset can cost millions. Ensuring legal compliance and licensing add to the cost. Hiring and training a skilled workforce is also costly. As a rule of thumb, the enterprise will spend $3 as additional costs for every $1 spent on building gen AI applications.

Do not underestimate run costs. For many gen AI projects, the run costs are often greater than build costs. Deploying and managing gen AI models in production requires robust MLOps infrastructure. Optimal performance depends on ongoing updates and retraining. Maintaining data pipelines also incurs huge costs. Training and running large-scale gen AI models incur huge energy costs as well.

CIOs can adopt several methods to cut costs and make the scale-up financially viable. They could, for instance:

Focus on the architecture. Well-architected models reduce computational requirements during training and inference.
Consider specific infrastructure management strategies. For instance, preloading embeddings avoids calculation every time, saving processing time and energy. The huge reduction in computational load results in lower hardware costs and energy consumption. But these savings come with a trade-off of increased storage and active memory management costs.
Ensure code reusability. Identify common needs or functions across multiple use cases. Build these common elements as reusable assets or modules. Reusable code, tools, and frameworks reduce costs and increase the development speed of gen AI use cases by 30% to 50%.
Keep tech sprawl in check. Development teams often set up their environments for each pilot or project. The downside of such an approach is the presence of multiple infrastructures, LLMs, tools, and scaling approaches. The ensuing tech sprawl leads to complexities, higher costs and inefficiencies.

Make Sure the Technology Supports Scale-Up

Effective scale-up will not happen unless the technology supports scale-up in the first place.

Integrating individual components into a wider, scalable system is a major roadblock towards scaling gen AI. The main challenge is orchestrating multiple components, interactions, integrations, and dependencies at scale. Gen AI systems involve many components. It could include machine learning, deep learning models, and natural language processing. Each component comes with its complexities and creates a web of dependencies. These components are also interconnected, and changes in one area impact the performance of others.

To resolve these challenges, ensure each use case project can access models, databases, prompt libraries, and applications.

Set up a robust API gateway to route requests to the best models, including suitable third-party models. The API also helps to authenticate users, log request-and-response pairs, and bill teams for usage.

Invest in end-to-end automation. Many companies automate only some elements of the workflow. Value comes by automating the entire solution. It could include data cleansing, integration, and pipeline construction. End-to-end automation accelerates time to production and enables efficient use of cloud resources.

Embed testing and validation into the release process for each model. The probabilistic nature of gen AI models means results always carry the risk of inconsistency. This necessitates frequent changes. Robust observing and triaging capabilities make it possible to automate changes as needed.

Keep flexibility in mind when setting up the environment. Adopt standards used by major providers to make it easy to switch providers or models. For instance, the KFServing is the most adopted serverless solution for deploying gen AI models. Terraformis is the most popular solution for infrastructure as code and open-source LLMs. Over-engineering for flexibility, however, adds to the overheads and diminishes returns. The onus is on the CIO and the implementation to make an optimal trade-off depending on the enterprise circumstances.

Create a single source of truth. In large companies, development teams often create different versions of the same information set. This leads to version conflicts and confusion. Making each team access and update information from a central source pre-empts these issues.

Have strict governance structures in place to ensure consistency and integrity.

Ensure implementation of effective risk protocols. Map potential risks associated with each use and devise mitigating strategies.

Involve Everyone

Many enterprises mistake treating gen AI as a technology program rather than a broad business priority.

Gen AI can make an impact only with broad-based support from the rank-and-file workforce. If gen AI remains a predominantly tech function, with CIOs dictating the enterprises on how to go about with the adoption, it will fail.

Success depends on approaching gen AI as a team sport, with the CIO as the head coach. Establish a cross-functional team to develop protocols and standards that support scale. These cross-functional teams can be centres of excellence that prioritise use cases. Or it could be strategic delivery units that ease implementation.

The onus is on the CIO to manage change, especially resistance to change. They need to:

Adapt the scaled-up model to fit the company’s internal knowledge base and workflows to reduce friction on new system adoption.
Communicate the benefits of the new models and assuage any misgivings.
Incentivise the adoption of the new models.

The hype around Gen AI continues, but the honeymoon period is over. Only enterprises with a proactive CIO who can scale gen AI pilots into workable projects can leverage the benefits of gen AI. And such enterprises will enjoy huge competitive advantages and strategic dominance.

Durjoy

Durjoy is a hands-on expert in Software Automation, Infrastructure and Fintech at DigiLeap Technologies, a specialist solutions provider for Chatbots, RPA, Process Automation and Financial Compliance. He has previously worked in the mobile payments sector and before that has had a long stint at Citibank across multiple roles in their operations and technology division. He loves picking on business processes in various sectors and transforming them through automation. Durjoy can be contacted at durjoyb@digileap.net.