A look behind the curtains on my most popular workshop — reflected in the context of a post-coronavirus world
As we look at 2021 and what technology surprises may await, I ask a simple question: when people return to the office, are they going to want to use shared whiteboard markers and conference room consoles?
The coronavirus pandemic has already shaped our world in numerous ways, but there are some impacts to the tech landscape yet to be felt. Interestingly, these impacts are threefold:
- People are more likely to be doing work in a semi-private home environment where voice might be acceptable.
- Office workers returning post-pandemic are going to find it less socially acceptable to share touchscreens, whiteboard markers, and mice/keyboards.
- Offices will move away from high-density open-plan situations, leaning back towards more spread out, semi-private workspaces.
All three of these cultural transformations open the door for new voice-enabled productivity scenarios which were previously impractical in loud, impersonal open work environments.
While it is incredibly unfortunate news that COVID-19 continues to elude us with new, more virulent modes of transmission, this also opens the door for us to think of ways to adapt prior to the broader return-to-work movement. But how might you adapt?
I’ve written extensively about the topic of voice user interface design: how you don’t need inventive use cases, how conversation factors into the equation, where the difficulties lie in wait, and much more. And I’ve been teaching voice design to classes around the world since 2017, adapting my material based on class feedback and the evolution of the industry.
Think of voice design as an additional capability, like mobile design was back in the late part of the first decade of this century. It’s not the same as the design you’re doing right now, and not everyone will work on it full-time. But an understanding of the technology, possibilities, and constraints will help you adapt your products to your customers’ evolving needs.
The challenge for first-time voice designers who were trained as traditional user experience designers is that you must learn to work with an invisible medium. It’s easier for folks from outside design (like writers or linguists) to make this leap, in some respects. But that’s where my workshop comes in.
Working with voice technology is unlike working with pixels, because voice technology is powered by algorithms generated by a form of machine learning. When you’re designing for experiences powered by artificial intelligence, you have to work with uncertainty. You don’t know exactly what the system will do in every circumstance.
Most of your work, in fact, will be putting up rails to help customers when things fail. After all, when things go right, there’s not much UI involved in the request “Alexa, turn off the living room lights.” It’s when things go wrong that things get interesting. To know how things might go wrong, you must understand the technology.
It’s for this reason that anyone designing for voice user interfaces should learn about the underlying technologies: automated speech recognition, natural language understanding, text-to-speech engines, entity recognition, and in some cases a variety of platform-specific systems.
There are things voice is good for — and there are problems one should never rely upon voice alone to solve. A video I love to show before exploring constraints is the Scottish Elevator sketch from Burnistown — it perfectly and hilariously illustrates what happens when you try to apply voice in an inappropriate situation.
Many of these constraints are technical in some way:
- What kind of tech: Grammar-based or natural language?
- What kind of hardware: near-field versus far-field microphones?
- Multimodality: Will you support more than just voice?
And some are more linguistic or cultural:
- Interaction length: One shot or multi-turn?
- How can you balance vernacular ease with acoustic uniqueness?
- Diversity: Will your system support multiple languages, genders, and dialects?
There are several books and compendiums of guidelines for voice design out there, but for the purposes on my workshop I’ve compiled the 7 concepts that drove my work on Alexa and Cortana most prominently:
- Reincorporate customer utterances.
- Go beyond GUI parity.
- Optimize responses for efficiency.
- Choose your personality battles wisely.
- Use questions to guide multi-turn interactions.
- Listen to every dialog during design.
- Consider earcons, but use sparingly.
For each of these, we dive into a bit more detail with examples in the class itself, but that level of detail would turn this into a book, not a Medium post!
When you’re dealing with invisible designs, the techniques and processes you probably learned in the past for communication and delivery (like sketching and wireframing) may no longer work. So how do you iterate on a voice design and make it real for your stakeholders?
Some existing techniques remain useful:
- Storyboards are critical. If you don’t understand the context of use, how will you know where your customer is looking? What’s in their hands? What room they’re in?
- Flows become more important. Flow diagrams were a helpful tool for complex user interfaces or site maps, but in voice user interfaces they are almost always a critical part of making the invisible tangible.
And some techniques are added:
- Intents: The list of discrete customer desires that the system can turn into concrete actions, like playing music or giving the weather forecast for a particular city.
- Sample utterances: An interim deliverable where you brainstorm as many different ways a customer might ask for each intent as possible, in order to jump-start early alpha and beta versions of your natural language system.
- Sample dialogs: The process of writing out sample scripts for key customer scenarios. These scenarios should include key combinations of parameters and errors. Sample dialogs tend to be living documents — you’ll draft them once, then revise them several times for consistency along the way.
- Prompt lists: The final list of text strings to be “hooked up” by your programmers. At some point you’ll scrub your dialogs and remove redundant text to reuse as many strings as possible, resulting in a final list which you pair with unique string IDs that make it easier for your development partners to call your text in code.
In the workshop, we walk through examples of each deliverable before providing exercise time to progress from storyboards to example sample dialogs and flows. Students can either work from a common class example Alexa skill or work with their own app or skill idea.
Beyond straightforward voice design lie the more advanced concepts you may want for more robust systems:
- Common error patterns and how to handle them: As mentioned before, most of your voice UI will be handling errors. I cover 4 common patterns and how to approach them: unsupported intent, misunderstood intent, incomplete data, and inconclusive results.
- Multimodal design delivery: How to assess what additional interaction models might benefit your customers in addition to voice, and how you can layer those interactions on top of an existing voice design in your deliverables. These conversations build on content from my recent book, Design Beyond Devices: Creating Multimodal, Cross-Device Experiences.
Other advanced topics that sometimes come up in Q&A: feature discovery techniques, algorithmic bias mitigation, personality tuning, context over time, chatbots,
There are several other books that cover these and related topics in further detail — check out my Medium post Books on Speech: A Voice Designer’s Reading List.
Ben Sauer at Clearleft assembled a master list of a ton of conversational/voice design guidelines in one place, including mine— check them out at the Clearleft website.
If you’d like more hands-on experience with me, your best bet is to sign up for one of my voice design workshops. At present, the only scheduled session for “Giving Voice to Your Voice Designs” in 2021 is January 20–29 (10 hours over 4 sessions) with Rosenfeld Media. Since you’ve made it to the end of this piece, use discount code PLATZ10 to save 10% off of single workshop tickets.
Thanks for reading, and may the voice be with you!
Cheryl Platz is a world-renowned designer, author, and speaker whose experience in voice design spans over a decade of work on video games, Windows Automotive, Cortana, and Alexa. She is owner of design education firm Ideaplatz, LLC and a full-time Principal UX Designer at the Bill & Melinda Gates Foundation.