drsezzer |
@drsezzer.bsky.social |
@FanOfJavi |
Sarah Mercer
smercer[at]turing.ac.uk
The propensity of LLMs to portray humanlike behaviour fascinates me. Since the publication of the Willowbrook report, I have continued to explore the capacity of generative agents to mimic human behaviour… exploring their ability to maintain believable and consistent personas, their capacity to make human-like mistakes, and their (in)ability to get angry!
DALLE-3 generated image of Willowbrook
Inspired by the Stanford Smallville paper, a simulation comprising 12 characters and 10 locations - including a library, cafe, farm shop, village green and various residences - was developed to further explore the capacity of generative-agents to portray human like behaviours.
Unlike Smallville, the Willowbrook simulation does not maintain a shared representation of the agents’ environment, meaning that the agents’ reality is purely LLM generated (with the exception of the initial character and location descriptions). This reality is held within each agent’s memory of what they observed, did and heard. As such, any error in the way the LLM is directing the agents is magnified as the simulation progresses and the agents’ memories of such inaccuracies are retained, or even acted upon. This gives us a novel way to evaluate the influence different LLM models have on the lives of the Willowbrook residents.
A few people have asked me what a generative agent actually is, how they are implemented and the frameworks I use. I don’t use a framework, as I did not want an additional layer of abstraction between me and the LLM. The key to designing a good persona-agent lies in it’s initial biography and it’s memory retention mechanism. I expand on this here.
Prior to working at the Turing, I was a researcher in Cyber Security. The interest garnered by LLMs at the beginning of 2023 obviously had an impact on the cyber security community. The paper below, was my attempt to bring some evidenced thinking to the fairly polarised (at the time) debate, given my familiarity of developing LLM based applications and intuition for their strengths and weaknesses. Note: Technical readers may prefer the unedited version of the paper, as linked below.
A Breif Analysis of DeepSeek-R1 and its implications for Generative AI - arXiv a quick turn-around report covering the release of DeepSeek’s R1 model and the implications on the wider eco-system, 3rd February 2025.
My thoughts on Coding Assistants - a short piece on my first experience with Cursor, and how such tools are starting to impact (benefit) experienced coders.
Alongside my own research looking at the human like capacity of generative agents, I also provide technical expertise to the CETaS team, specifically Generative AI.
Psychometric testing for generative agents. Is it a good idea to use generative agents as replacement humans in social science? HEXACO-Rep (public repo), draft paper