When people think of accessibility and assistive technology, images of disabilities such as blindness or deafness or color blindness are most commonly conjured up. Wheelchair users, amputees and autistic people are also common. These are all conditions that are easily recognizable, or at least ones that the majority of disabled people have heard of. Speech disabilities, however…they are overlooked.
Thankfully, that’s changing. Several of the biggest companies in tech, like Amazon, Apple, and Google, are using their massive war chests to devote resources to making their respective virtual assistants more accessible to people with speech disabilities. Language delays such as stuttering are disabilities, as are blindness and autism, but have so far been ignored because they require technical adaptation. But it’s more than just a higher level of sensitivity that is disabled by speech delays. An equally large part of the problem is technical – digital assistants like Alexa and Siri are built and trained using typical language models. That is, without stuttering. It’s hard enough for engineers to teach a machine how to parse normal speech; It’s exponentially harder when you’re throwing a curveball like an atypical speech pattern. To that end, I’ve heard, according to sources familiar with the matter, that in recent years Apple has added speech and language pathologists to its Siri team to understand the physiology of speech in order to make Siri more graceful when analyzing atypical speech.
Alongside the biggest companies with the deepest pockets, smaller ones like Deepgram are doing their part to make voice-first experiences more accessible. With a target market aimed at developers, Deepgram’s products include a speech-to-text API, model training services, and more. The company is blunt on their website that current automatic speech recognition “sucks” and they’ve come up with a better model that results in faster and more accurate transcriptions.
One of Deepgram’s latest projects is Classroom Captioner. In a February blog post, Senior Developer Advocate Kevin Lewis wrote, in part, that Classroom Captioner is “intended to ease the concerns of students who need or prefer a textual representation of what’s happening in a lecture.” The idea behind Classroom Captioner stems from the realization that, as Lewis alludes to, many students need some sort of text element to enhance the listening experience when listening to a lecture. Perhaps bimodal sensing is beneficial, or perhaps viewing a live transcript of what is being said will aid cognitive processing. Maybe they have a hearing impairment. Whatever the reason, Deepgram’s premise with Classroom Captioner is solid.
“Voice is our primary method of communication and the primary way we learn and communicate in the classroom or other educational settings,” Lewis told me in an email interview conducted last month. “Whether you have a hearing impairment or learn better with captions and other text-based images, classrooms should be a place where everyone’s needs are addressed. At Deepgram, we believe everyone should have a common understanding of what is being said, and this tool helps facilitate that.”
In short, Classroom Captioner about delivering fair experiences. Lewis explained that it is usually the institution’s responsibility to provide accessibility tools; in the case of lectures, shared accommodation is another student as a co-writer. It works, Lewis said, but the problem is that the notes can be biased. “[Maybe] the note-taker doesn’t write something down because they consider it ‘common knowledge’, while the student who needs the notes doesn’t have the full understanding of the subject,” he said. “In addition to bias, the very process of asking for and receiving help with hearing impairments limits those who can access it.”
Deepgram’s inclination to use ASR or Automatic Speech Recognition to solve this lecture problem with Classroom Captioner makes sense since most teachers and professors are already using computers in their lectures. “Our API is easy to use and set up right in a web browser,” said Lewis. “Developing the experience was fairly easy – we already shared tools for developers to create live transcriptions, so this was just a process of customizing the application-specific code, adding the ability to create and join rooms, as well as” Create roles like ‘teacher’ or ‘student’ to make the transcription clearer.”
Overall, Lewis said there’s “always room for improvement” in how tech companies are helping people who need live captions. He added that the need for hybrid learning environments due to the pandemic has forced companies to put more effort into prioritizing accessibility over closed captioning and the like. This applies in particular to maintaining equal access for people with hearing impairments. As for Deepgram, Lewis noted that Classroom Captioner in particular has a bright future. It can be deployed “in just five minutes” with no technical background required. But with a little know-how, Classroom Captioner can be customized by developers who want to implement their own ideas to match their school’s style guide and more. “Our Developer Relations team looks forward to continuing to create accessibility project examples like this one to showcase all that is possible with Deepgram,” said Lewis.
At the end of the day, Lewis and his team are all about building accessible tools.
“It’s important for my team and I that we democratize tools like this [Classroom Captioner] to make learning more accessible and equitable for all,” he said.