How to avoid common AI pitfalls in the workplace

Share This Post


Customers place orders by talking to a voice-enabled AI model. Machine-learning algorithms work out which orders the kitchen should make first. A screen shows AI-synthesised customer feedback from review sites and social-media platforms. Fast-food restaurants tend to have high staff turnover: new joiners here can query a chatbot to see how much of each ingredient ought to go on a medium-size pizza.

The Plano Pizza Hut is a small parable of generative-AI adoption by firms. The technology is making its way into all corners of the workplace. But it still feels incremental, not transformative. AI boosters talk of superintelligence, the end of work and of data centres in space. Here on planet Earth, the technology merely increases the chances of having the right number of pepperoni slices on your next takeaway.

Humble experiments such as these raise important questions for companies trying to use generative AI in the workplace: are the benefits just incremental and, if so, what is holding up progress? Last, what should companies do to make the most of it? All these questions are tackled in the new season of “Boss Class”, our subscriber-only podcast on work and management, released on January 29th. It finds that although AI models are improving rapidly, adoption still takes time. Organisations and employees have to adjust to make the technology work.

On the first question, as to whether AI is worth its salt, boosters will rightly argue that a pizza restaurant is not the best test of the technology. But that is another way of saying that its impact is very unevenly distributed. A recent analysis by Indeed, a jobs site, found that a large majority of skills mentioned in a typical posting for a software-development role could be profoundly affected by AI; most of the listed skills in a typical nursing job are currently beyond the technology.

The firms behind the AI models point to rising volumes of activity and claim these lead to hefty productivity gains. In December OpenAI reported that ChatGPT Enterprise was saving users an average of 40-60 minutes on each day they used it. The newest models are approaching parity with industry experts on many real-world tasks, according to GDPval, an evaluation published by OpenAI in September. Their capabilities are improving all the time.


View Full Image

Chart: The Economist

But many firms are still waiting for the benefits to materialise. A big new survey of executives in America, Australia, Britain and Germany, conducted by researchers from the Federal Reserve Bank of Atlanta, Macquarie University, the Bank of England and the Bundesbank, shows that almost three-quarters of businesses are using AI in some way. Yet 86% of bosses across these four countries report the technology has had no impact on labour productivity over the past three years (see chart).

If this makes for a confusing picture, it echoes the experience of actually using the technology. An AI model can outperform the world’s best mathematicians while still being stumped by the number of “r”s in “strawberry”. Its confidence in asserting things that are completely wrong would make an economist proud. Working with AI involves a mixture of achievement, sycophancy and disappointment. This is a faithful reflection of office life, but not exactly what was promised.

As for the question of why progress is halting, the best answer is that general-purpose technologies, from electricity to the internet, all take time to have their full impact. The era of generative AI is still in its infancy. “It’s like we’re all accountants and Microsoft Excel was invented last weekend,” says Bret Taylor, the chairman of OpenAI and a co-founder of Sierra, a startup that builds customer-service AI agents (tools which act autonomously).

The firms behind the AI models—the likes of OpenAI, Anthropic and others—are all trying to make their products more useful to organisations. Mike Krieger, who works on new products at Anthropic, the firm behind Claude, makes a distinction between models’ horizontal and vertical capabilities. Horizontal capabilities are the kinds of generic activities that are useful to almost all white-collar workers: writing, conducting research, making a PowerPoint slide without becoming homicidal.

Vertical capabilities are harder to get right because they involve specific skills: building a cashflow model in banking, say. The big AI firms are trying to amass more industry expertise by hiring specialists, among other things. But working out what it is that people do all day is hard enough when you sit right next to them, let alone if you’re a software engineer with no experience of the outside world.

A host of AI startups is trying to plug gaps of this sort, but it also takes time for markets to mature. Mr Taylor recalls that in the early days of the internet, firms spent shedloads of money to make their websites work. Now they can get much of what they need off the shelf. In time, he says, the same will be true of AI agents. “I’m hopeful that five years from now, it’ll be a very mature landscape of vendors who sell agents as solutions to problems rather than people selling models and saying, ‘Here’s a bunch of wood, build a house.’”

In other words, companies are still having to make sense of the technology for themselves. And that leads to the third question: how to manage all the problems that generative AI throws up in firms. These problems are behavioural, technical and organisational.

Behavioural problems can affect the average worker and the corner office. Employees are best placed to come up with uses for AI, says Ethan Mollick, a professor at the Wharton School at the University of Pennsylvania. But they also have lots of reasons to avoid AI, or to keep quiet about using it. They might want to take credit for work done by machines, or avoid advertising that they have more free time. Above all, they might not want to signal that their jobs can be done by AI. (“Look, boss, I’m redundant.” “Yes, you are.”)

Wage against the machine

Firms encourage adoption in all sorts of ways. Some offer cash bonuses to employees who automate tasks. Some have dashboards that show how each department uses the technology. Performance reviews can specifically call out AI adoption.

But carrots and sticks of this sort get you only so far if trust is lacking between employees and executives. Being honest about the uncertainty that lies ahead might sound like a bromide, but it is vital. “There are jobs that are going to disappear,” says Nimish Panchmatia, the head of AI at DBS, South-East Asia’s largest bank. “But new jobs are going to get created as well.” The bank runs programmes to help its employees learn new skills that might, for instance, help turn a customer-service agent into a salesperson.

Often the behavioural problem to solve is not apprehension but convenience. Glowforge, a Seattle-based manufacturer of desktop laser-cutting machines, tried a third-party AI sales-coaching tool that emailed summaries of sales calls to its staff. “Every single sales rep had routed it directly into the bin,” says Dan Shapiro, its CEO. “It was too noisy and it didn’t have a place in the rhythm of the team.”

Glowforge has since built its own tool. It, too, automatically listens in to sales calls and emails its views on what went well and badly. But now the AI’s feedback forms part of a weekly discussion between the salesperson and their managers; the expectation that it will be talked about means people pay much more attention to the tool. “You can have a superior product, but if it doesn’t fit into somebody’s workflow, if it doesn’t fit into their day, it’s tough to get adoption,” agrees Cameron Davies, the head of AI for Yum! Brands, the owner of Pizza Hut, Taco Bell and other brands.

Overenthusiasm is another behavioural problem to solve. The unpredictability of AI’s strengths and weaknesses—what Mr Mollick and others have christened the “jagged frontier”—means that it takes time to develop intuition for how to use the technology. Painful lessons are learned in the initial rush to adopt AI. Last year the Australian arm of Deloitte, a consultancy, issued a partial refund to the federal government for writing a report littered with AI-generated errors. This month the West Midlands police force in Britain admitted that a decision to ban Israeli fans from a football match in Birmingham was partly based on an AI hallucination about a match that never took place.

Avoiding horror stories like these also means solving a variety of technical issues. Yet the hidden costs of doing so are easily overlooked, says Rama Ramakrishnan, a former tech executive who now teaches at the Massachusetts Institute of Technology. The first cost is to adapt the model to the specific use case. This means training it on the right data, fine-tuning it and driving hallucinations down. Mr Davies of Yum! Brands says that by drawing on small language models, which are trained on subsets of data and focused on specific tasks, voice-ordering applications at the firm’s restaurants have less scope to hallucinate. “I don’t need the model that you’re ordering a pizza from to be able to tell you about the most famous economist in the world.” (Nice idea, though.)

Still, sometimes even hallucinations can be valuable. Brice Challamel, the head of AI strategy at OpenAI, describes AI as a teammate capable of playing several different roles—an assistant that helps with repetitive tasks, an expert that explains complex concepts, a coach that provides feedback and a creative partner that comes up with ideas. What counts as a hallucination if it comes from the expert persona could count as imagination when the AI is being asked to brainstorm.

Glowforge’s sales-coach tool is a good example of how errors can be tolerated, or even turned to advantage. The AI often gets its feedback wrong—asserting that a sales opportunity has been missed, say, when the call was designed to tend a client relationship. But the tool has also been engineered to be “low conviction” in its judgments: its views are deliberately designed to be fodder for discussion.

Because generative AI works on the basis of probabilities, you can never know for sure what it is going to come up with. So the second hidden cost is to put safeguards in place for those use cases where errors matter. Sierra, for example, uses a “supervisor model” to monitor real-time interactions between customers and its AI agents, with humans on hand to step in if needed. Another model evaluates conversations after the fact and pushes tricky cases towards human reviewers.

Problems become much more tractable when the tasks given to agents are narrow, says Mr Taylor. Retailers have standard criteria for returning items, for example, which means a customer-service agent can ask specific questions about when the item was bought and whether it has been used, before working out what to do.

The same kind of thinking is visible at Garfield, a British startup that was the first firm in the world to be regulated to provide AI legal services. Garfield helps creditors pursue small claims, defined as unpaid debts below £10,000 ($13,800). Taking people to court for unpaid bills is a daunting process for most people; if a lawyer gets involved, it quickly becomes uneconomic. Generative AI can make this much more affordable. Businesses can connect their accounting software to Garfield, which ingests invoices and tells them whether they have a valid claim; it can then send out letters for action, which are often enough to prompt debtors to cough up, and help claimants in court, too.

Philip Young, one of the firm’s co-founders and the only lawyer on the team, says that the idea works in part because the small-claims process has “relatively well-defined inputs and outputs and has a relatively finite universe of possibilities”. More complex litigation claims would have to cope with many more permutations, which would increase the potential for errors.

As well as behavioural and technical issues, firms must also solve a variety of organisational problems to make AI work for them. Finding the right talent is an obvious issue. Failing to give the machines access to the right data is another common pitfall.

Models also have to be evaluated to ensure that the output is high-quality. For some tasks, this is quite simple. Sarah Guo, an AI investor in Silicon Valley, says that one of the reasons software engineering is in the vanguard of AI adoption is because verifying whether a bit of code works is relatively easy. In other areas, evaluating whether something is up to scratch is much harder. Trying to make a model funny, she says, is much harder because funniness is “soft and fuzzy”.

Lots of corporate tasks fall into this fuzzier category. So human experts are needed to define what counts as good enough. They are also needed to supply unwritten knowledge about how to get stuff done (the GDPval evaluation which suggests that frontier models can rival industry experts excludes tasks that depend on tacit knowledge). Harnessing this kind of in-house expertise is, in part, an organisational challenge. Mr Mollick points to the example of one large firm in which senior engineers and subject-matter experts are being put into small cross-departmental teams to move fast on specific projects.

Moving faster in one area can cause bottlenecks in another, however. Vibe-coding, a slangy term for using natural-language prompts to get an AI to write a computer program, makes it much easier for novices to create apps and features. In one way, this approach is a boon. Coding tools like Claude Code and platforms like Lovable or Replit allow end users and product managers to show what it is they want to build, rather than wasting endless hours on PowerPoint decks and lengthy documents. The phrase “demo, don’t memo” is now circulating inside some tech firms.

But that leads to a new problem. “You’ve stopped having the bottleneck at how quickly can you write code, and now you’ve got the bottleneck at how quickly can you review the code,” says Hannah Calhoon, the head of AI at Indeed. Jim Swanson, the chief information officer of Johnson & Johnson, a pharmaceutical firm, says that he used to hear managers in different territories rave about how they had used AI to improve the invoicing process, forgetting that meant more work piling up for the finance team.

J&J is an example of how the early rush to experiment with AI has evolved into something more measured. The firm started off with a let-a-thousand-flowers-bloom ethos. That led to a lot of weeds, too. According to Mr Swanson, 85% of the value generated was attributable to just 15% of these applications. J&J has now switched to a more focused approach, in which a central AI council and a data council ensure that the most fruitful projects are being nodded through and that the right data are available to make them work.

Metrics are also maturing, away from crude targets for AI usage and towards things that matter to the business. “One of the most important things you can do…is specify a business outcome you’re trying to drive more than a technical outcome,” says Mr Taylor. His startup, Sierra, uses outcomes-based pricing, which means clients are charged only when the AI agent actually solves a customer’s problem; if a human has to get involved, it’s free.

None of this is to downplay how remarkable generative AI is, or how quickly it is advancing into the workplace. As it makes more technical advances, tasks that were beyond it will become feasible. New business models and organisational forms will follow. Bosses in America, Australia, Britain and Germany may not have seen much impact from AI yet but the new survey shows they expect large job losses and productivity gains in the next three years.

It also helps not to get too carried away by the idea of an alien intelligence. To make AI work within organisations, a prosaic set of management problems needs to be solved. These include well-designed incentives for adoption, guardrails to mitigate problems, and systems for choosing, measuring and implementing applications. You need a mixture of pragmatism and ambition, says Mr Swanson. You need to be “a cynical optimist”.



Source link

spot_img

Related Posts

spot_img