Introducing ToxMod: Voice-Native Moderation for a Safer Internet

<Cross-posted from our blog here.>

In 2019, Modulate introduced the concept of “voice skins” to the world. Transcending old-school voice changers, our VoiceWear service enabled players to take on the authentic voice of their chosen character. As we began to bring that technology to the world of gaming, we were blown away by the positive reception — both from game studios seeing the potential in a unique new customizable asset, and from players experiencing a new level of immersion and fun in voice chat.

Of all the feedback we received about voice skins, one comment intrigued us the most. Many players, from all demographics, reported that voice skins were actually the thing allowing them to participate in voice chat at all. In speaking with these players, we rapidly came to understand that many players simply don’t feel comfortable putting their real voice out there given the unfortunate toxicity and harassment which sadly is all too prevalent there. And it was clear this wasn’t simply anecdotal — studies show that 48% of all in-game toxicity now takes place through voice, and that 22% of players stop playing a game entirely when they’re faced with that sort of toxicity. Given the increasing importance of voice chat for both socializing and coordinating in-game, there was a clear need for a tool which could prevent these negative experiences.

As we discussed this issue with our partner studios, we quickly realized that the same neural network innovations that powered voice skins — allowing for real-time analysis of speech — could be leveraged to create such a moderation tool. By processing the signals our voice skin networks generate, we could actually assess voice chats for toxicity live as they happen, with much greater accuracy than any other tools out there thanks to our ability to analyze not just what is being said, but also the emotion, prosody, and volume it’s spoken with.

We’re thrilled to share that this new service, coined ToxMod, is now available for all customers. ToxMod is the world’s first truly voice-native moderation service, providing a natural complement to VoiceWear’s immersive voice skins. While voice skins help everyone get in on the action and have a great time doing so, ToxMod keeps an eye out for the bad actors to ensure that nobody is damaging the experience of others.

The complementary technology between these two services enables some truly unique possibilities. For instance, ToxMod uses VoiceWear’s emotional understanding to differentiate between the aggressive “F*** you!” and the enthusiastic “F*** yeah!” or to identify the wary but aggressive demeanor of someone attempting to groom a victim, in ways that just can’t be replicated with text analysis alone.

Perhaps even more importantly, ToxMod can do all of this directly on each player’s device, in real-time, unlocking two unique capabilities. The first is that we can react in real-time to offensive speech — not just shutting down whole conversations, but also taking more nuanced actions like blocking individual words like racial slurs or, in the case of younger players, personal information like a phone number. The second is that ToxMod can preserve player privacy far better than any other voice moderation tool. Since it’s processing the data on-device, the only reason it would send any data where anyone else can hear it would be if it detects a high probability of toxicity. Even then, the first stop for that data would be Modulate’s secure servers, which run even more sophisticated algorithms to validate the suspicion of toxicity. Only when there is a strong sense that something problematic is occurring will any of your audio be shared with a human moderation team.

The Modulate team is thrilled to be expanding our platform with such a powerful tool for safety and equality to the world, and we’re committed to continuing to work until voice chat truly becomes the natural, powerful, and safe communication channel it needs to be.

Footer