Discover DeepTone™

DeepTone™ captures the diversity of emotional expressions
and provides a rich, acoustic map of the human voice.

Insights

Build a custom layer of analytics by combining different DeepTone™ models.
Here's some examples.
illustration of male and female speakers

Speaker

Gather unique attributes from the speakers in your audio such as voice gender, and language.

illustration of a speaker with human sounds coming out of it

Acoustic

Detect when there is speech activity, silence, noise, music.

illustration of conceptual emotion as part of the brain process

Emotion

Identify the speaker's emotions and energy levels over time.

Speech Model

Speech detection

icon for python language sdkicon for cloud api icon for on prem apiicon for swift language sdk
icon for android language sdk

Speech, music, silence or other 

Use case examples:

Contact Centers: speaker share, cross-talk, talk-to-listen share, hold music detection.

Conference tools: Speech, silence and noise detection.

Robotics and Voicebots: Detect speech in challenging environments.

Telemedicine: Patient-doctor talk time.

With the speech model you can detect speech activity in your audio (not the same as speaker detection) and derive meaningful insights while creating the first layer for conversational analytics.

For real-time use cases, when low latency for output results is needed, we have an optimized version of this model that reacts faster to changes in the audio.

Read our docs
Gender Model

Gender classification

icon for python language sdkicon for cloud api icon for on prem apiicon for swift language sdk
icon for android language sdk

Male, female, unknown 

Use case examples:

Gaming community: identify female from male gamers to prioritize toxicity detection.

Conference tools: promote inclusivity by mapping gender-wise speaker share.

Robotics and Voicebots: further personalize interactions and vocabulary used by detecting the speaker's gender.

With the gender model you can classify speakers based on their voice gender, a measure based on a combination of pitch and intonation. If the confidence in the classification is too low, the result is set to unknown.

It is important to understand that the biomarkers that are picked up in the human voice do not represent a person's gender identity, therefore this is often referred to as voice or bio gender.

Read our docs

Arousal Model

Energy Detection

icon for python language sdkicon for cloud api icon for on prem apiicon for swift language sdk
icon for android language sdk

high, medium, low 

Use case examples:

Contact Centers: track agent engagement levels, enable real-time coaching to promote agent engagement.

Gaming community: identify female from male gamers to prioritize toxicity detection.

Conference tools: detect energy levels in a conversation to infer which topics lead to high engagement, and gauge the motivation level of team members.

Robotics and Voicebots:  further personalize interactions and vocabulary used by detecting the speaker's energy levels.

With the arousal model you are able to get a highly reliable classification of someone's voice energy. Low energy can be related to negative emotions: sadness, lack of interest, boredom. High energy can be related to both positive (happiness) or negative (irritation) emotions.

Read our docs

Apps that use this model

Real-Time Coaching (widget)
Powered by OTO
Emotion Model   

Emotion Classification

icon for python language sdkicon for cloud api icon for on prem apiicon for swift language sdk
icon for android language sdk

happy, irritated, netural, tired

Use case examples:

Gaming community: detect negative interactions between community members (angry, hateful and aggressive) to prioritize toxicity detection.

Conference tools: detect negativity interactions between team members, prioritize mental well-being, and promote work-life balance.

Robotics and Voicebots: detect moods and adapt responses to adjust empathy.

With the emotion model you are able to explore and experiment with emotional classification of a voice and how that can be translated into context-relevant metrics.

Keep in mind that the behavioural markers picked up in a voice might represent different things depending on the cultural background of a conversation.

Provided with sizeable and contextually relevant data, we can customize our models to adapt the output for a specific context or use case.

Read our docs

LANGUAGE MODEL      BETA  

Language Detection

icon for python language sdkicon for cloud api icon for on prem apiicon for swift language sdk
icon for android language sdk

EU Languages: EN, ES, FR, DE, IT, and unknown

Use case examples:

Contact Centers: detect multilingual conversations between customers and agents.

Robotics and Voicebots: detect language and adapt responses to the language when applicable.


With the language model you are able to detect what languages are being spoken without initially having to label data.

Provided with sizeable and contextually relevant data, we can customize the model by adding more languages, in case you have a specific need.

Read our docs

SPEAKER MAP     BETA  

Speaker Detection

icon for python language sdkicon for cloud api icon for on prem apiicon for swift language sdk
icon for android language sdk

Speaker 1, Speaker 2... etc.

Use case examples:

Conference tools: speaker separation in mono files, cross-talk.

Robotics and Voicebots: speaker recognition based on  speaker's voice print recording.

With the Speaker Map model, you can identify who speaks when throughout an audio file.
Prior knowledge of the number of speakers or splitting the speakers into channels is not needed.
In this beta version, we currently only support batch processing, not streaming.

Read our docs