The Problem

Never before in human history could we use English to ask a computer to do general purpose things until now with AI (ChatGPT, LLM, etc)

However if you ask it to do something new it often doesn’t follow the rules!

AI is good at transforming (the T in GPT) text from one form to another.

But if you try to give it rules it will fail to follow them reliably.

The dark ages

We tried numerous things trying to bring accuracy and reliability up

Each had incremental but diminishing gains, but the accuracy was not acceptable for our film-research purposes.

Transformers, robots in disguise

AI seems like it’s capable of Logic but I suspect it’s just lots of transformations of its internal representations of concepts (neural net weights) and any provided to it (the ‘prompt’).

Getting business logic out of the prompts and limiting them purely to transformations, then putting logic in code has made a huge difference to our sanity, with improvements in:

  • Accuracy
  • Testability
  • Prompt volatility (less butterfly effects of prompt changes causing large unexpected failures)

Show, don’t tell

Like a toddler, AI loves to be shown examples (few-shot in-context learning)

So any time it misbehaves or does good, we take the data and turn them into examples to use next time.

Taming the complexity

Data flows in and out of code/LLMs in our setup, so errors can compound and there’s plenty to go wrong. This is why we weren’t impressed with hyped team-of-bots (agentic) approaches.

Keeping code and prompts close together, we have built tight unit and integration prompt (Jest) tests that run automatically (Github Actions) across our pipeline, if (and only if) the prompts or code related to them change.

This means we can:

  • Understand overall accuracy increase/decrease
  • Swap in new AI models and see effects on performance
  • Make code changes and do releases with confidence

Wrapping up

This diagram outlines what we were doing the painful way, and what we look like now

We have a large number of these running in a pipeline alongside regular API calls and logic

What are we building?

We are building an AI Researcher to super-charge the Development Process :

  • Identifying potential industry/subject rumours and risks for ideas
  • Finding direct and representation contact details
  • Develop the pitch and story opportunities
Share