Design patterns for effective human machine teaming

After a decade or so of researching human machine teaming, I know that user interface (UI) design plays a key role in building understanding, trust and optimising performance. I want to share some key UI patterns that I have used to nurture confidence in people’s interactions with AI systems.

I’ve grouped patterns by purpose. Firstly are those that empower people with greater control to facilitate interaction with AI-driven systems. Following this, are patterns designed to promote transparency and accountability, ensuring people have visibility into AI-generated processes and outcomes. Next are patterns that elevate the overall user experience, fostering trust and optimising engagement. Then patterns that support risk awareness, equipping people with information to assess AI appropriately. Lastly, I offer patterns that have helped human machine teaming by optimising teamwork and productivity and highlighting risks. 

It’s not an exhaustive list. I’m continually revising, refreshing and generating novel UI patterns in light of further research insights. For now, I hope to provide practitioners with a few proven patterns to leverage when designing human centred AI interfaces.

User Interaction and Control

Predictable AI content region

AI-generated content should occupy discernible UI regions and integrate into the interface without monopolising the entire space, allowing people to engage with content when they choose to. Ensuring a predictable UI is particularly important when the outcomes are critical and where people engage in time-sensitive tasks as consistency encourages familiarity and habituation.

Using AI to automate tasks can inadvertently disrupt previously learned information architecture and force people to visually search the UI for recognisable content. This costs time, increases the likelihood of errors and creates additional cognitive workload.

To facilitate interactions between humans and machines, it’s important to ensure that people can clearly distinguish between the contributions of the AI and the choices they have made themselves. Clarity in delineating between machine-generated content and our own actions enhances confidence and efficiency.

Allow users to switch the AI on or off

The sparkle emoji represents AI capability in a lot of applications. You could use this or a simple toggle switch to allow people to control whether they interact with AI or not. 

Disabling AI should be non-destructive, meaning any content already generated using AI should remain until it is deleted as usual. 

Highlighting AI generated content

In many interfaces, AI-generated content is highlighted in purple to allow people to quickly identify AI content as they visually scan. Highlighting AI-generated content and differentiating it from human-generated content promotes transparency. Transparency builds trust by ensuring people are aware of the source of the information or action, ensuring that they can appropriately attribute responsibility for errors or inaccuracies. 

Why am I seeing this?

Adding a “Why” hint provides people with contextual information about the AI-generated content. Informing people of the rationale behind specific AI-generated content allows them to decide if they want to trust the content or not or adjust their own behaviour to receive more desirable results.

Allow user input and adjustment

Allowing people to provide feedback enables underlying models to learn about user preferences and better recognise when the system responses do not meet their expectations. Allowing people to make adjustments contributes to improvements in the accuracy of the AI model and provides them with a sense of control. 

Depending on the architecture, models can automatically adjust the response or build a catalogue of the feedback to be analysed and fine tuned later.

For example, TextFX lets us adjust the amount of creativity in an output and Google Translate seeks our feedback through like/dislike buttons and also allows us to suggest a better translation.

Allow users to select models

Different AI models may excel in different areas or produce content with different styles, quality or accuracy. Allowing people to select the model provides greater control and flexibility allowing them to tailor the experience to their specific needs and preferences. 

I have observed that allowing people to select from a range of models encourages exploration and experimentation and builds trust and confidence in incorporating AI into their workflow.

Criteria sliders for AI presets

In instances where a product analyses data sets that we cannot directly interact with, the implementation of a data criteria slider can help us to adjust and fine-tune AI presets based on meaningful criteria. 

The data and algorithms that the machine uses may be more complicated than non-experts can understand, so categories need to be carefully researched and validated to ensure alignment between the adjustable criteria and the machine’s underlying algorithms. 

For instance, in a past study I generated a taxonomy of hundreds of different behavioural criteria to allow software developers to create models to define how crowds and individual AI agents behaved in a simulated training environment. Providing the trainers with control over that level of detail proved to be overly complicated so I conducted interviews to understand their needs and desired level of interaction with the AI. 

I employed affinity mapping and card sorting to help to define ‘invisible clusters’ of criteria that they found meaningful when adjusting the presets and then conducted A/B testing to measure the effectiveness of the criteria in helping the trainers to achieve their desired outcomes.

Transparency and Accountability

Offer informative modals

Many AI applications require people to input information, for example during profile setup. Informative modals provide guidance about how these choices impact the accuracy and reliability of the performance of the AI system. Without explanations and information, people might create their own theories for how AI systems work and why outcomes change, which undermines understanding and explainability. 

In a human-AI collaboration study, I invited participants to “think aloud” when they completed tasks with and without additional informative modals about the AI. Observing how people with different levels of technical knowledge and experience engage with informative modals, the information they found helpful and how it informed their understanding of the AI system helped me to create better patterns.

Provide explanations for AI generated content

Explanations help people to understand how the system operates. This involves clarifying the data sources used, the filtering methods and functions applied to generate the end results. Transparency empowers people to make informed decisions about the suitability of AI for their specific needs. This transparency enhances confidence and fosters trust in AI-driven solutions.

In certain cases, product owners may be reluctant to share detailed technical information about AI-powered products due to commercial and intellectual property concerns. In such situations, explanations can still provide valuable insights while respecting confidentiality. For example, explanations could offer selective transparency by providing a high-level overview of the general principles and processes behind the AI system’s operation, or by using metaphors to convey the overall methodology without revealing proprietary algorithms or data sources.

I have found that metaphors and high-level explanations are better than no explanation at all when helping people to understand AI system operation without delving into detailed technical information by helping them to build useful mental models of how an AI system operates in a way that does not overwhelm them with technical details. You can test the effectiveness through wizard of oz sessions where a researcher simulates the generation of counterfactual explanations in response to participants’ queries and notes their reaction to the explanations and whether it usefully helps them build their mental model of the AI system.

For example, Netflix provides an explanation in plain language of how their recommender system works.

Display confidence indicators

Incorporating a confidence indicator within the interface effectively provides the AI with a voice to convey messages such as “Yes, I’m confident about this” or “Hmm, I’m unsure.” This feature helps people to assess the reliability of AI suggestions, ultimately building trust. 

However, while confidence scores convey the fallibility of AI systems, they can be subjective and potentially mislead people into a false sense of security. It’s important that people understand the implications of varying confidence levels and know how to react if the system expresses less than full confidence.

When designing for analysts, I use the Probability Yardstick, a measure that is familiar to them, which segments the probability scale into seven distinct numerical ranges, each assigned with descriptive terms. Informed by academic research and tailored to align with the average reading ability, this yardstick simplifies complex probabilities into terms that are easily understandable. Notably, the scale isn’t continuous to avoid giving people a false impression of accuracy.

A previous study I conducted on a beta version of an AI powered software tool highlighted the importance of providing explanations alongside confidence indicators. Despite AI confidently offering information, it was still inaccurate and confused the heck out of analysts. Therefore, it is important to supplement confidence indicators with explanatory narratives to ensure people can make informed decisions about the AI outputs.

Show proficiency

Some AI systems change over time, potentially affecting the usefulness and accuracy of their outputs. Showing the proficiency of an AI system in a specific task helps people to manage their expectations and highlights the possibility that the AI might need further training or the task may require manual intervention. 

However, proficiency can be subjective and people may be misled by false positives or negatives. Therefore, you must thoroughly evaluate the context of use, impact of errors and accountability for potential risks.

I conduct usability testing sessions where people are deliberately presented with error scenarios or degraded performance from the AI system to observe how they react to these situations and to assess whether proficiency indicators effectively prepare them for potential errors or inaccuracies in AI outputs. 

In the real world, some applications clearly signify beta versions of AI functions to help us to incorporate proficiency information into our workflow. However I know from experience (because I’m one of them) that people using beta versions may assume a higher level of safety and quality than is actually tested.

Display activity logs

An activity log offers a window into the AI system’s operations, allowing people to track their actions and decisions. It fosters accountability by enabling people to back track and identify the causes of mistakes or inaccuracies. Additionally, I have found that activity logs allow people to analyse past actions and outcomes, to identify patterns, trends and areas for improvement.

Provenance

Provenance encompasses the origin and complete history of the data, the data scientist responsible for model training, and the specific machine learning model employed within an AI application. Transparency in how the model changes over time empowers people to make well-informed decisions about using the application. 

While it may be feasible for companies to develop and deploy AI without stringent provenance tracking in non-safety-critical scenarios, in high-risk contexts, legal, ethical, and failure analysis procedures will look to answer things like:

  • Who trained the model and when?
  • Which versions of training and test data were used?
  • What factors contributed to model performance?
  • Was the dataset updated, and by whom?
  • Could biases have been introduced?

By understanding the provenance of the model, people can pinpoint errors or biases to specific models or datasets, allowing them to evaluate its suitability for their needs, improving trust. 

For example, I studied data scientists, compliance officers and business owners who work with AI applications in high-risk contexts and analysed their workflow to understand how they currently track and access provenance information. I identified key touch points, pain points and gaps in their workflows relating to model selection, evaluation, and decision-making processes. I then used card sorting exercises to understand how different types of end users prioritise different aspects of provenance information (e.g., data origin, model training details, performance metrics) depending on their goals, which helped to inform the organisation and hierarchy of information within the UI. I created prototypes of UI designs that presented provenance information in different formats (e.g., timelines, tables, charts) and conducted usability testing sessions with people in technical and non-technical roles to evaluate the effectiveness of each design in helping them to understand and use provenance information.

Enhanced User Experience

Enable multimodal interaction

AI’s ability to interpret context in different formats is continuously improving. For example, ChatGPT facilitates multi-modal interaction by allowing us to upload images and annotate directions and questions directly. We can upload a file, image or video and provide the task intent through text or voice and then run the prompt.

To keep up with advancing AI technology, teams have to adopt and agile philosophy. The key is understanding what people want to do, so I have found that user journey mapping provides a solid basis to identify touch points where people upload files, images, or videos, provide task intents and receive responses for multi-modal interaction with the AI system. Keeping the end goal or job-to-be-done in mind when  conducting iterative design sessions helps the team to stay focussed on innovations that might improve people’s experiences and not get drawn into over-engineering features with cool tech.

Features for Human Machine Teaming

I’ve conducted a lot of research on human machine teaming in high risk use cases, where military personnel must make critical, time-sensitive decisions. Here I’ve just focussed on presenting UI patterns that I have found useful. However, I also know that the key to effective human machine teaming is ensuring that your AI is fit for purpose. Also, training human machine teams together is essential to develop people’s understanding of the AI system, the appropriate use, limits and risks.

Allow human takeover

I’ve already discussed allowing people to choose whether to switch AI on or off. However, in some systems, automation or working with autonomous systems is integral to achieving a goal or mission. So, when the AI fails, UI patterns must give people sufficient time to acknowledge the limitations of AI systems and allow them to apply human judgement, expertise and preferences.

There are so many safety factors to consider when switching between human operator and autonomous control of a task! You need to decide whether the trigger for takeover is human or AI initiated. You need to ensure that context and workflow is maintained throughout the transition. You need a confirmation dialogue to help prevent accidental overrides. You need to provide system operators with feedback to indicate when the system has transitioned from AI to human control and vice versa, ensuring transparency and clarity throughout the process.

I use probes based on the critical incident method to investigate near misses and accidents involving switching control between operators and automation. The methodology details timelines of accidents and what the AI and the people involved were doing at each stage. My advice is that takeover requires comprehensively exhaustively and rigorously testing for system failures and understanding what might stop the human from intervening.

Implement controls and guardrails to guide intent

People need controls and guardrails to help them understand the available functions and achieve the most useful outcomes with AI. 

Prompt builders, suggestions and explanations allow users to discover the capabilities of AI tools while learning how to craft effective prompts. They also provide guardrails and help to effectively structure our intent. 

For instance, many image generation tools provide explicit controls over the underlying model, enabling us to specify parameters such as style, relevance to reference image, model type, and negative prompts. ClickUp requires us to specify our intent by selecting a discipline before engaging with the prompt while Google Gemini offers response modification options, allowing us to adjust responses with preset options.

I’ve used wizard of oz techniques to simulate the behaviour of the AI tool, providing prompts, suggestions, and explanations to guide people’s interactions. This allowed me to observe their responses in real-time, providing insights into their needs for guidance and support. The design of controls and guardrails needs to be continuously iterated based on feedback to refine explanations and adjust the placement and visibility of prompts to improve usability and ensure that people can easily navigate and understand the available functions.

Risk awareness

Recognising the potential for inaccurate and potentially harmful consequences when incorporating AI into workflows, people should be warned about risks associated with AI predictions or assessments. 

Ideally people should also be informed how they can mitigate the risks.

For example, I found that operators of autonomous vehicles can quickly become accustomed to alerts and that some alerts are simply pointless distractions from their current tasks, so research is required to decide which alerts are necessary for optimal performance. 

I’ve achieved this with autonomous systems in the past by observing how operators actually engage with the system, deciding whether an alert or warning is worth interrupting their flow. Through prototyping and testing risk alerts presented in various modalities (visual, auditory, tactile), I’ve then assessed operators’ comprehension of the associated information to ensure that they can handle warnings and alerts effectively without becoming confused or overwhelmed.

I’m not a huge fan of disclaimers, which can be distracting and misleading and in my opinion they should not be an excuse for negligence or launching AI that is unfit for purpose or hasn’t been rigorously tested. The key is to conduct careful research to thoroughly understand the risks around the use case before introducing AI into the workflow.

Adaptive error handling

Adaptive error handling dynamically adjusts error handling strategies based on the context, operator input, and system capabilities. It works by recognising and classifying errors and applying a progressive disclosure approach, that starts with concise and user-friendly prompts that guide users towards resolving errors.

I have found that offering clear and contextually relevant error messages and corrective suggestions reduces people’s frustration and improves their confidence when interacting with AI systems. Something like: “Sorry, I couldn’t understand your command. Please try again in a quieter environment or type your request instead.” If the person repeats the command but still encounters issues, the system then offers alternative input options, such as typing the command or selecting from predefined options. For complex errors or technical issues, people can access more detailed explanations, troubleshooting tips or help resources.

Error messages need to be concise, informative, and presented in an appropriate format across all input modalities to maintain consistency and clarity. Observing people in their operational environment as they engage with AI systems using various input methods reveals how they switch between different formats, why they choose each method and the challenges they face in conveying their intentions. Asking people to help to define how to balance the level of detail in error messages with technical details can help to avoid overwhelming them while still providing sufficient guidance for error resolution.

Image matching

In pattern matching tasks, analysts team up with computer vision to identify target images within vast datasets. Analysts engage with the system by inputting their target image, triggering the AI to initiate a comparison process. 

In instances where the computer vision system detects a match, it highlights the corresponding image, aiding the analyst in their search. Additionally, further information, similarity and confidence scores could accompany each matched image. 

Spot the difference

When leveraging computer vision to enhance visual search tasks, a discrete notification could appear, accompanied by a distinct visual cue to draw attention to the detected item plus a brief explanation. 

Highlight bias

Bias highlighting involves signalling to analysts when content, recommendations or their behaviour may be influenced by bias, whether it’s implicit bias in data collection, algorithmic biases, or human cognitive bias that may skew outcomes.

The goal is to provide transparency and foster critical thinking by making people aware of the presence of bias within themselves or the system. People may choose to adjust the sensitivity of the bias detection or opt to receive notifications only for specific types of bias.

In previous studies involving the use of eye tracking equipment to identify and monitor bias, attention and visual search patterns while analysts worked with computer vision, an interesting observation emerged: although analysts habituated to specific areas of the UI and therefore missed detecting certain objects initially, they were prompted to re-examine the images when the computer vision correctly identified an object within a space they had overlooked. They were then able to find objects that the computer had not been able to detect or classify. 

In another study involving monitoring human behavioural data such as eye and body movements, we found consistent behavioural patterns that indicated the presence or absence of cognitive bias in decision making. Analysts who exhibited behavioural signs of cognitive bias could then be alerted to the fact via the UI. 

As you read through the patterns outlined in this article, please let me know which ones resonate with you. I’m confident that implementing these patterns will enrich your user experience and cultivate trust through transparency in your AI systems. I encourage you to put them to the test in your own projects and applications and would love to hear your feedback. I’m also keen to work with you to help design effective collaborations between people and machines. 

Leave a Reply

Your email address will not be published. Required fields are marked *