I’m driven to stay ahead of evolving threats to national security, researching innovative approaches that blend human expertise with artificial intelligence (AI) capabilities.
I’m currently working with a deep tech start up where our mission is to develop AI agents capable of operating at (and beyond) the level of seasoned human experts. Recently, I had the privilege of conducting studies to explore the potential of human-machine teaming with their amazing AI agents.
I began my research by capturing the expertise and decision-making processes of security professionals, using a methodology based on the critical decision method. The cognitive processes underlying their actions and choices were then coded into mathematical models that could be understood by both humans and machines.
A team of machine learning engineers constructed a model that AI systems could understand, allowing them to emulate the captured expert behaviour. Using reinforcement learning techniques, the team then trained several AI agents to navigate through complex scenarios.
The exact process of training and tweaking AI is mysterious to me, but the results were illuminating: AI augmented with the expert model consistently outperformed AI counterparts lacking this human guidance.
But would the augmented AI agents outperform their human counterparts?
Effective human-machine teaming requires AI agents that actually perform their intended functions, are dependable and align with the task requirements. We cannot realise the potential of human-AI collaboration, combining human insight with AI computational power, if AI agents cannot be trusted.
So, to facilitate collaboration between human experts and AI, the team and I designed interactions based on the shared expert model. These interactions enabled AI agents to articulate their goals, assess situations, evaluate risks and rewards, and justify their actions, which is a critical step towards transparency and bridging the gap between human intuition and machine logic.
I conducted a study to evaluate the usability, workload, and trustworthiness of the augmented agent. Drawing on insights from a combination of interviews and administration of standardised scales, I gathered valuable feedback from seasoned security experts and novices with less than a year’s experience.
Quantitative evaluations revealed high levels of trust in the AI system, excellent usability scores, and light workload estimates. Qualitative feedback highlighted strengths, such as its usefulness for novices seeking to grasp fundamental concepts, the importance of clear user interface design, intuitive (or at the very least explorative) agent interactions and the need for training tailored to users with different skill levels.
However, challenges remained, particularly in understanding the rationale behind AI-generated suggestions and aligning explanations with users’ expectations. Participants also expressed a desire for more enhanced agent interaction features, like the ability to ask for more information or for clarification – like we would ask our human team mates if they did something confusing or provoked our curiosity.
In response to this feedback, the team outlined a roadmap for future enhancements, including weighting or ranking AI suggestions, conversational interfaces for more natural narrative style interactions and personalised training tailored to individual user profiles so that users can train side by side with the agents.
This is the first of many in a series of studies with Tulpa to explore the potential of pairing human wisdom with trustworthy AI to secure our nation.