Speech Recognition Companies
Explore 31 Speech Recognition companies in our AI directory. Leading companies include Uniphore, Verbit, Dialpad.
Uniphore
Chennai, India
Uniphore provides an enterprise-grade Business AI Cloud platform focused on bridging the gap between consumer and business AI applications. Their core technology centers on a composable and secure AI architecture encompassing data, knowledge, models, and agents, with a strong emphasis on speech analytics and conversational AI. Uniphore targets large enterprises seeking to deploy and manage AI solutions across their operations with a focus on data sovereignty and control.
Verbit
New York, United States
Verbit is a US-based AI company specializing in highly accurate transcription and captioning services. Their core technology, the Captivate™ ASR engine and enhanced by Gen.V™ generative AI, delivers rapid, customizable transcripts with automated summarization and keyword extraction. Verbit primarily serves speech-intensive industries like legal and education, offering solutions to improve accessibility, enhance productivity, and derive actionable insights from audio and video content.
Dialpad
San Francisco, United States
Dialpad is a US-based provider of an all-in-one cloud communications platform integrating voice, video, messaging, and a contact center solution. Their core technology leverages real-time Voice AI to provide features like automated call transcription, agent coaching, and autonomous workflow execution for tasks like appointment scheduling and refunds. Dialpad targets businesses seeking to improve contact center performance and streamline communications across multiple channels, with a focus on security and integration with existing CRM and collaboration tools.
Observe.AI
San Francisco, United States
Observe.ai provides AI Agents for enterprise contact centers, automating and improving customer interactions across voice channels. Their technology utilizes advanced speech recognition and natural language processing to accurately understand complex, real-world conversations – even with background noise and interruptions – and integrate with existing CRM and workflow systems. This enables businesses to automate call resolution, improve agent performance through AI-powered quality assurance, and achieve predictable outcomes in customer service operations.
Loom
San Francisco, United States
Loom is a video messaging platform that enables asynchronous communication through quick screen and camera recordings. Utilizing automatic speech recognition (ASR) technology, Loom provides searchable video transcripts and captions for improved accessibility and information retrieval. Primarily targeting professionals and teams, Loom streamlines communication and documentation workflows, offering a more efficient alternative to traditional email and meetings.
AISpeech
Suzhou, China
AISpeech is a leading specialized large-model conversational AI platform company in China, enabling intelligent connectivity and streamlined operations.
Cogito
Boston, United States
Cogito, now part of Verint, delivers real-time AI-powered coaching and performance analytics for contact centers. Their core technology utilizes proprietary AI models to analyze voice conversations, providing both customer experience (CX) and employee experience (EX) scoring during live calls. This enables targeted, in-the-moment guidance for agents, with a focus on improving key metrics like average handle time, customer satisfaction, and revenue generation for large enterprises in sectors like telecommunications and healthcare.
AssemblyAI
San Francisco, United States
AssemblyAI develops highly accurate speech-to-text APIs, including their flagship LeMUR model, and a suite of audio intelligence features like speaker diarization, entity detection, and topic detection. Their key innovation lies in offering low-latency, high-accuracy transcription optimized for real-time and asynchronous applications, alongside advanced features like content moderation and redaction. Serving a diverse market including contact centers, media companies, and research institutions, AssemblyAI processes millions of minutes of audio data monthly and is recognized for consistently achieving industry-leading Word Error Rates (WER) in independent evaluations.
Otter.ai
Mountain View, United States
Otter.ai develops AI-powered meeting solutions, most notably its Otter Meeting Agent platform, which provides real-time transcription, automated summaries, and AI-driven action item detection. The platform leverages advanced speech recognition and natural language processing to create searchable meeting records and facilitate collaboration, integrating with popular video conferencing tools like Zoom, Google Meet, and Microsoft Teams. Otter.ai currently serves a broad professional market, with reported user testimonials indicating significant time savings – up to 33% according to one VP of Sales at Aiden Technologies – and increased productivity for teams reliant on frequent meetings.
Chorus.ai
San Francisco, United States
Chorus.ai, now integrated within ZoomInfo, delivers conversation intelligence software that analyzes sales calls and meetings. Their platform utilizes AI-powered speech and text analytics to identify key conversation patterns, coaching opportunities, and deal-critical insights. This technology primarily serves revenue-focused teams within B2B organizations to improve sales performance and forecasting accuracy.
Ambience Healthcare
San Francisco, United States
Ambience Healthcare provides an AI-powered platform that automates clinical documentation and coding for U.S. healthcare systems. Utilizing natural language processing and speech recognition, the platform generates structured data from patient encounters, reducing administrative burden on clinicians. Ambience targets health systems seeking to improve revenue cycle management, ensure compliance, and allow physicians to focus on patient care rather than documentation.
Deepgram
San Francisco, United States
Deepgram is a US-based provider of voice AI APIs for enterprise applications, offering unified speech-to-text, text-to-speech, and LLM orchestration. Their platform distinguishes itself through a single API designed to minimize complexity, latency, and cost compared to component-based solutions, and supports both real-time and batch processing with telephony integrations. Deepgram targets developers and businesses requiring highly accurate and scalable voice intelligence for applications like contact centers, voice assistants, and conversational AI systems.
Speechmatics
Cambridge, United Kingdom
Speechmatics is a UK-based technology company specializing in accurate, low-latency Automatic Speech Recognition (ASR) and speech-to-text solutions. Their core offering is a Speech API providing transcription, real-time translation, and text-to-speech capabilities, deployable on-device, on-premise, or in the cloud. Speechmatics targets enterprises requiring high-quality voice AI with a focus on data privacy, offering a non-logging standard deployment option.
Corti
Copenhagen, Denmark
Corti is a Danish AI infrastructure provider specializing in healthcare applications. Their core product is a highly accurate medical Automatic Speech Recognition (ASR) API that converts clinical conversations into structured data and documentation. Corti targets healthcare developers and providers seeking to rapidly build and deploy voice-enabled workflows – such as automated note-taking, report generation, and point-of-care support – without managing complex AI infrastructure.
Sanas
Palo Alto, United States
Sanas provides a real-time Speech AI platform specializing in accent and language translation for improved communication clarity. Their core technology modulates speech to neutralize accents and remove noise while preserving vocal characteristics, enabling natural-sounding conversations in over 25 languages. Sanas targets call centers and communication-heavy businesses seeking to enhance customer and employee experiences, reduce communication friction, and improve key performance indicators like CSAT and AHT.
Modulate
Cambridge, United States
Modulate is a US-based AI platform that analyzes live and recorded voice conversations to deliver real-time insights into content, intent, and emotional state. Their core technology decodes multi-dimensional voice signals – including deception, toxicity, and synthetic speech – to provide actionable alerts and APIs. Modulate targets businesses requiring enhanced fraud prevention, trust & safety measures, and customer experience improvements through proactive voice intelligence, serving sectors like gaming, contact centers, and online communities.
Fano Labs
Hong Kong, Hong Kong
Fano Labs specializes in speech recognition and NLP for Asian languages, serving financial services and customer service industries.
Fixie.ai
Seattle, United States
Fixie.ai develops the Ultravox platform, enabling developers to build and deploy AI agents powered by a next-generation, open-source Speech Language Model (SLM). Ultravox focuses on natural speech understanding to facilitate more human-like conversational AI experiences. The company targets businesses seeking to integrate scalable voice AI capabilities into their applications and workflows.
Krisp
San Francisco, United States
Krisp develops AI-powered tools to enhance the quality and productivity of virtual meetings. Their core product is an AI Meeting Assistant that combines industry-leading noise cancellation with automated transcription, summarization, and accent conversion. Krisp targets professionals and teams seeking to improve communication clarity and efficiency in remote and hybrid work environments by automating key meeting tasks.
Voiceitt
Tel Aviv, Israel
Voiceitt develops AI-powered speech recognition technology specifically designed to understand non-standard speech patterns, including those resulting from speech impairments, accents, or aging-related conditions. Their core product is a customizable API and software solution leveraging a proprietary database of atypical speech and advanced machine learning. Voiceitt primarily serves individuals with speech disabilities, as well as accessibility applications for accented speakers and those in the Deaf community, enabling greater communication independence and access to voice-controlled technologies.
Soapbox Labs
Dublin, Ireland
SoapBox Labs develops voice AI specifically designed for children, enabling speech recognition in educational apps with child privacy protection.
Speechly
Helsinki, Finland
Speechly is a Finnish company specializing in real-time Automatic Speech Recognition (ASR) technology delivered via a streaming API. Their core product is a cloud-based ASR engine optimized for low-latency transcription and understanding, particularly in demanding applications like real-time communication and interactive voice response systems. Speechly targets developers building voice-enabled applications requiring high accuracy and speed, offering a developer-friendly alternative to traditional, batch-oriented speech-to-text solutions.
Lelapa AI
Johannesburg, South Africa
Lelapa AI develops Natural Language Processing (NLP) technology specifically for African languages, originating from the Masakhane research community. Their core product, the Vulavula API, provides resource-efficient speech-to-text and transcription services for real-time call processing and analysis. Lelapa AI targets businesses operating in African markets seeking to improve customer experience, ensure compliance, and gain actionable insights from multilingual customer interactions.
Rev.com
Austin, United States
Rev.com provides AI-powered transcription and captioning services, specializing in solutions for the legal industry. Their core offering is a 96%+ accurate AI transcription engine designed for high-volume processing of legal evidence like depositions, police reports, and bodycam footage, supplemented by a network of 14,000+ human transcriptionists for 99%+ accuracy when required. Rev targets law firms and legal professionals by offering tools for evidence review, timeline creation, and secure transcript management directly within their platform.
iFlytek
Hefei, China
and aiming for a professional, informative tone: iFlytek develops advanced AI-powered language solutions, including its core Jieli speech recognition platform and translation tools supporting over 60 languages. The company’s innovations center on deep learning models for accurate speech-to-text, text-to-speech, and machine translation, demonstrated in products like their real-time transcription services for meetings and content creation. As China’s leading provider in this space, iFlytek increasingly focuses on international expansion and serves sectors including education, digital marketing, and professional communication.
Nuance Communications
Burlington, United States
Nuance Communications, now a Microsoft company, develops AI-powered solutions for clinical and administrative healthcare documentation. Their core technology centers on speech recognition and natural language processing applied to create tools like Dragon Medical One, which automates clinical documentation and enhances radiology reporting. Nuance primarily serves healthcare providers and aims to improve clinician productivity, reduce administrative burden, and enhance patient care through AI-driven workflows.
SoundHound
Santa Clara, United States
SoundHound AI develops and licenses voice AI technologies that enable conversational interfaces for a variety of industries, including automotive, retail, and finance. Their core offering is a fully independent voice AI platform capable of handling over 10 billion conversations annually, focusing on agentic AI solutions that automate complex tasks. SoundHound differentiates itself by offering a complete, customizable voice AI solution – rather than relying on cloud-based assistants – allowing businesses to own the entire interaction and maximize ROI through cost reduction and revenue generation.
Speak AI
Toronto, Canada
Speak AI is a Canadian company specializing in AI-powered transcription and analysis of audio and video data. Their core product utilizes Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) to convert media into searchable, transcribed text and extract key insights. Speak AI primarily serves researchers and businesses needing to efficiently process and analyze qualitative data from interviews, meetings, and other spoken content.
Whisper (OpenAI)
San Francisco, United States
OpenAI’s Whisper is an open-source automatic speech recognition (ASR) system trained on a massive, diverse 680,000-hour dataset of multilingual speech. Utilizing a Transformer-based encoder-decoder architecture, Whisper excels in robustness to accents and background noise, offering both transcription and translation capabilities across multiple languages. This technology targets developers seeking to integrate highly accurate and versatile speech-to-text functionality into a wide range of applications, particularly where diverse audio conditions or multilingual support are critical.
Acoustic.ai
Copenhagen, Denmark
Acoustic.ai develops voice AI solutions for automotive and consumer electronics, focusing on noise cancellation and voice enhancement.
Rev AI
San Francisco, United States
Rev AI provides a speech-to-text API specializing in automated transcription and speech recognition services. Their core technology centers on a diverse, large-dataset trained AI model designed for high accuracy across varied audio qualities and accents. They target developers and businesses requiring scalable, programmatic transcription solutions for applications like voice search, media monitoring, and accessibility services.