Dr Sarah Mercer

Hi! I'm Sarah, a Principal Researcher at The Alan Turing Institute, working at the intersection of agents and generative AI.

drsezzer | @drsezzer.bsky.social | @FanOfJavi | Sarah Mercer
smercer[at]turing.ac.uk

The propensity of LLMs to portray humanlike behaviour fascinates me. Since the publication of the Willowbrook report, I have continued to explore the capacity of generative agents to mimic human behaviour… exploring their ability to maintain believable and consistent personas, their capacity to make human-like mistakes, and their (in)ability to get angry!

DALLE-3 generated image of Willowbrook

Inspired by the Stanford Smallville paper, a simulation comprising 12 characters and 10 locations - including a library, cafe, farm shop, village green and various residences - was developed to further explore the capacity of generative-agents to portray human like behaviours.

Unlike Smallville, the Willowbrook simulation does not maintain a shared representation of the agents’ environment, meaning that the agents’ reality is purely LLM generated (with the exception of the initial character and location descriptions). This reality is held within each agent’s memory of what they observed, did and heard. As such, any error in the way the LLM is directing the agents is magnified as the simulation progresses and the agents’ memories of such inaccuracies are retained, or even acted upon. This gives us a novel way to evaluate the influence different LLM models have on the lives of the Willowbrook residents.

A few people have asked me what a generative agent actually is, how they are implemented and the frameworks I use. I don’t use a framework, as I did not want an additional layer of abstraction between me and the LLM. The key to designing a good persona-agent lies in it’s initial biography and it’s memory retention mechanism. I expand on this here.

Prior to working at the Turing, I was a researcher in Cyber Security. The interest garnered by LLMs at the beginning of 2023 obviously had an impact on the cyber security community. The paper below, was my attempt to bring some evidenced thinking to the fairly polarised (at the time) debate, given my familiarity of developing LLM based applications and intuition for their strengths and weaknesses. Note: Technical readers may prefer the unedited version of the paper, as linked below.

Research Publications

Generative Agents:

Welcome to Willowbrook, The simulated society built by generative agents, December 2023.

Cyber Security / Protective Security:

Insider risk:
- We Need to Talk About the Insider Risk from AI - short article, ‘Emerging threat: organisations face growing risks from artificial insiders as well as human ones.’, 8th January 2025, RUSI.
- ‘I’m Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk’ (preprint-v1) on arXiv, 26th March 2025.
Generative AI in Cyber Security, Assessing impact on current and future malicious software, June 2024.
Formal PDF | Final (unedited) Draft

Other bits:

A Breif Analysis of DeepSeek-R1 and its implications for Generative AI - arXiv a quick turn-around report covering the release of DeepSeek’s R1 model and the implications on the wider eco-system, 3rd February 2025.
- Just for the giggles: I asked ChatGPT-o1 Deep research to generate a similar report, here’s its paper.
My thoughts on Coding Assistants - a short piece on my first experience with Cursor, and how such tools are starting to impact (benefit) experienced coders.

Alongside my own research looking at the human like capacity of generative agents, I also provide technical expertise to the CETaS team, specifically Generative AI.

CETaS papers:

Securing the UK’s Research Eco-system, my contribution focused on how AI is different; what specific issues do UK academics need to think about to ensure their research is less vulnerable to those with hostile intent. Unedited/draft of why AI is different (before word limits hit!).
Evaluating Malicious Generative AI Capabilities, Understanding inflection points in risk, July 2024.
The Rapid Rise of Generative AI, Assessing risks to safety and security, December 2023.

Current Projects:

Psychometric testing for generative agents. Is it a good idea to use generative agents as replacement humans in social science? HEXACO-Rep (public repo), draft paper
Willowbrook baseline configuration; as an evaluation mechanism for new LLM models.
Unhappy Willowbrook - Can generative agents act upon their emotions? Research Proposal

All rights are reserved for the contents on this site (drsezzer.github.io), same for uploaded PDFs unless otherwise stated within the document itself.
(c) 2025 Sarah Mercer.

Web site generated using GitHub Pages with Jekyll Theme Midnight and using the jemoji plugin (emoji cheat sheet).