What is speech AI used for?

Speech AI powers voice assistants (Siri, Alexa), live meeting transcription, voice cloning for content creators, multilingual customer service, accessibility tools for the hearing-impaired, and dubbing for video localisation.

How realistic is AI-generated voice?

Modern TTS systems achieve MOS (Mean Opinion Score) scores comparable to human speech. Leading systems from ElevenLabs, Play.ht, and LMNT can clone a voice from seconds of audio.

Who are the top voice AI companies?

Top companies include Spotify AI, Uniphore, Verbit, alongside ElevenLabs, Deepgram, AssemblyAI, and Whisper (OpenAI).

Voice & Speech AI Companies

Voice and speech AI companies build systems for automatic speech recognition (ASR), text-to-speech (TTS), voice cloning, and real-time translation. Advances in neural codec language models have made synthetic voices nearly indistinguishable from human speech, enabling new applications in accessibility, entertainment, and enterprise communications.

132 Companies 42 Countries

Showing the top 120 of 132 Voice & Speech AI companies by capital raised. Browse all 132 →

Spotify AI

Stockholm, Sweden

and publicly available knowledge: Spotify leverages advanced machine learning models – including collaborative filtering and natural language processing – to power personalized music and podcast recommendations through features like Discover Weekly and Release Radar. Their AI capabilities extend to audio analysis for features such as the DJ automated DJ experience and podcast transcription, as well as content moderation systems designed to ensure platform safety. With over 574 million monthly active users globally, Spotify’s AI-driven personalization is a key differentiator in the competitive streaming market and contributes significantly to user engagement and retention.

enterprise $65.0B

Uniphore

Chennai, India

Uniphore provides an enterprise-grade Business AI Cloud platform focused on bridging the gap between consumer and business AI applications. Their core technology centers on a composable and secure AI architecture encompassing data, knowledge, models, and agents, with a strong emphasis on speech analytics and conversational AI. Uniphore targets large enterprises seeking to deploy and manage AI solutions across their operations with a focus on data sovereignty and control.

scaleup $610M

Verbit

New York, United States

Verbit is a US-based AI company specializing in highly accurate transcription and captioning services. Their core technology, the Captivate™ ASR engine and enhanced by Gen.V™ generative AI, delivers rapid, customizable transcripts with automated summarization and keyword extraction. Verbit primarily serves speech-intensive industries like legal and education, offering solutions to improve accessibility, enhance productivity, and derive actionable insights from audio and video content.

scaleup $550M

Epidemic Sound

Stockholm, Sweden

Epidemic Sound is a Swedish provider of royalty-free music and sound effects for content creators. Their platform utilizes AI-powered search and recommendation algorithms to facilitate efficient content matching and discovery within a vast library of audio assets. Targeting video creators, marketers, and podcasters, Epidemic Sound offers a subscription-based licensing model providing unrestricted usage rights for their audio content globally.

scaleup $450M

Dialpad

San Francisco, United States

Dialpad is a US-based provider of an all-in-one cloud communications platform integrating voice, video, messaging, and a contact center solution. Their core technology leverages real-time Voice AI to provide features like automated call transcription, agent coaching, and autonomous workflow execution for tasks like appointment scheduling and refunds. Dialpad targets businesses seeking to improve contact center performance and streamline communications across multiple channels, with a focus on security and integration with existing CRM and collaboration tools.

scaleup $418M

Suno

Cambridge, United States

Suno is a US-based generative AI company specializing in the creation of original music from text-based prompts. Their core technology utilizes AI models to compose full songs, including lyrics and instrumentation, allowing users to rapidly prototype and produce musical content. Suno targets a broad market including musicians, content creators, and hobbyists seeking accessible tools for music production and exploration, offering a platform for both creation and discovery.

startup $375M

Mobvoi

Beijing, China

Mobvoi is a Chinese technology company specializing in voice AI and intelligent wearables. Their core technology centers around a proprietary Chinese Natural Language Processing (NLP) engine powering voice assistants and features across their product line, most notably the TicWatch series of smartwatches. Mobvoi primarily targets the Chinese market with localized AI experiences, while also offering select wearables internationally with a focus on health and fitness tracking.

scaleup $300M

Observe.AI

San Francisco, United States

Observe.ai provides AI Agents for enterprise contact centers, automating and improving customer interactions across voice channels. Their technology utilizes advanced speech recognition and natural language processing to accurately understand complex, real-world conversations – even with background noise and interruptions – and integrate with existing CRM and workflow systems. This enables businesses to automate call resolution, improve agent performance through AI-powered quality assurance, and achieve predictable outcomes in customer service operations.

scaleup $213M

Loom

San Francisco, United States

Loom is a video messaging platform that enables asynchronous communication through quick screen and camera recordings. Utilizing automatic speech recognition (ASR) technology, Loom provides searchable video transcripts and captions for improved accessibility and information retrieval. Primarily targeting professionals and teams, Loom streamlines communication and documentation workflows, offering a more efficient alternative to traditional email and meetings.

enterprise $203M

AISpeech

Suzhou, China

AISpeech is a leading specialized large-model conversational AI platform company in China, enabling intelligent connectivity and streamlined operations.

scaleup $200M

Suki AI

Redwood City, United States

Suki AI develops an ambient clinical intelligence platform that utilizes voice AI and natural language processing to automate clinical documentation workflows. Their technology captures and analyzes patient-physician conversations to generate comprehensive notes, orders, and instructions directly within existing Electronic Health Record (EHR) systems. Suki AI targets healthcare providers and organizations seeking to reduce administrative burden, improve physician burnout, and enhance revenue cycle management through streamlined documentation processes.

scaleup $165M

Cogito

Boston, United States

Cogito, now part of Verint, delivers real-time AI-powered coaching and performance analytics for contact centers. Their core technology utilizes proprietary AI models to analyze voice conversations, providing both customer experience (CX) and employee experience (EX) scoring during live calls. This enables targeted, in-the-moment guidance for agents, with a focus on improving key metrics like average handle time, customer satisfaction, and revenue generation for large enterprises in sectors like telecommunications and healthcare.

scaleup $130M

Poly AI

London, United Kingdom

Poly AI develops conversational AI solutions for enterprise contact centers, enabling fully autonomous handling of customer voice calls. Their core technology focuses on delivering highly natural, multilingual voice interactions that replicate human agent conversations, distinguishing them through a customer-led approach to AI training. Poly AI targets businesses seeking to scale customer service while maintaining a high-quality, localized brand experience, particularly within the hospitality and service industries.

scaleup $120M

AssemblyAI

San Francisco, United States

AssemblyAI develops highly accurate speech-to-text APIs, including their flagship LeMUR model, and a suite of audio intelligence features like speaker diarization, entity detection, and topic detection. Their key innovation lies in offering low-latency, high-accuracy transcription optimized for real-time and asynchronous applications, alongside advanced features like content moderation and redaction. Serving a diverse market including contact centers, media companies, and research institutions, AssemblyAI processes millions of minutes of audio data monthly and is recognized for consistently achieving industry-leading Word Error Rates (WER) in independent evaluations.

startup $115M

Otter.ai

Mountain View, United States

Otter.ai develops AI-powered meeting solutions, most notably its Otter Meeting Agent platform, which provides real-time transcription, automated summaries, and AI-driven action item detection. The platform leverages advanced speech recognition and natural language processing to create searchable meeting records and facilitate collaboration, integrating with popular video conferencing tools like Zoom, Google Meet, and Microsoft Teams. Otter.ai currently serves a broad professional market, with reported user testimonials indicating significant time savings – up to 33% according to one VP of Sales at Aiden Technologies – and increased productivity for teams reliant on frequent meetings.

startup $113M

ElevenLabs

New York, United States

ElevenLabs specializes in realistic voice AI, offering a platform for text-to-speech generation and voice cloning powered by proprietary models like their flagship voice agent technology. Their platform provides access to over 5,000 voices in 70+ languages, and recently expanded with the launch of the Iconic Marketplace featuring digitally-recreated voices of prominent figures such as Matthew McConaughey and Sir Michael Caine. ElevenLabs targets content creators, developers, and businesses seeking to integrate high-quality, customizable voice solutions into applications ranging from audiobooks and gaming to virtual assistants and accessibility tools.

startup $101M

Descript

San Francisco, United States

Descript develops a cross-platform audio and video editing platform centered around speech-to-text technology, enabling users to edit media by directly manipulating transcripts. Key innovations include Overdub, a realistic voice synthesis tool allowing users to correct or add to recordings using AI-generated speech, and Studio Sound, which enhances audio clarity with a single click. Targeting podcasters, video creators, and marketing teams, Descript has gained traction for its unique transcript-based workflow and recently launched Underlord, an AI-powered video editor capable of generating and editing video content from text prompts.

startup $100M

Chorus.ai

San Francisco, United States

Chorus.ai, now integrated within ZoomInfo, delivers conversation intelligence software that analyzes sales calls and meetings. Their platform utilizes AI-powered speech and text analytics to identify key conversation patterns, coaching opportunities, and deal-critical insights. This technology primarily serves revenue-focused teams within B2B organizations to improve sales performance and forecasting accuracy.

startup $100M

Ambience Healthcare

San Francisco, United States

Ambience Healthcare provides an AI-powered platform that automates clinical documentation and coding for U.S. healthcare systems. Utilizing natural language processing and speech recognition, the platform generates structured data from patient encounters, reducing administrative burden on clinicians. Ambience targets health systems seeking to improve revenue cycle management, ensure compliance, and allow physicians to focus on patient care rather than documentation.

scaleup $100M

Parloa

Berlin, Germany

Parloa delivers a generative AI-powered platform for contact center automation, enabling enterprises to deploy and manage personalized “AI agents” that handle high-volume customer interactions. Their technology orchestrates the full AI agent lifecycle – from development to deployment and optimization – focusing on complex tasks like scheduling, refunds, and personalized recommendations. Parloa targets large enterprises seeking to improve customer loyalty and efficiency, and their platform is designed for high-stakes environments requiring precision and scalability in customer communication.

startup $100M

Deepgram

San Francisco, United States

Deepgram is a US-based provider of voice AI APIs for enterprise applications, offering unified speech-to-text, text-to-speech, and LLM orchestration. Their platform distinguishes itself through a single API designed to minimize complexity, latency, and cost compared to component-based solutions, and supports both real-time and batch processing with telephony integrations. Deepgram targets developers and businesses requiring highly accurate and scalable voice intelligence for applications like contact centers, voice assistants, and conversational AI systems.

startup $86M

Liberate

San Francisco, United States

Liberate builds reasoning-driven AI agents that automate core insurance operations across sales, servicing, and claims via voice, email, SMS, and digital channels. The company raised a $50M Series B led by Battery Ventures in 2025 at a $300M valuation.

startup $72M

Speechmatics

Cambridge, United Kingdom

Speechmatics is a UK-based technology company specializing in accurate, low-latency Automatic Speech Recognition (ASR) and speech-to-text solutions. Their core offering is a Speech API providing transcription, real-time translation, and text-to-speech capabilities, deployable on-device, on-premise, or in the cloud. Speechmatics targets enterprises requiring high-quality voice AI with a focus on data privacy, offering a non-logging standard deployment option.

scaleup $70M

Giga ML

San Francisco, United States

Giga builds real-time voice AI agents for enterprise customer support, with a unified orchestration layer that listens, reasons, checks systems, and responds in under half a second. An early customer is DoorDash; it raised a $61M Series A led by Redpoint Ventures.

startup $65M

Corti

Copenhagen, Denmark

Corti is a Danish AI infrastructure provider specializing in healthcare applications. Their core product is a highly accurate medical Automatic Speech Recognition (ASR) API that converts clinical conversations into structured data and documentation. Corti targets healthcare developers and providers seeking to rapidly build and deploy voice-enabled workflows – such as automated note-taking, report generation, and point-of-care support – without managing complex AI infrastructure.

startup $60M

WIZ.AI

Singapore, Singapore

WIZ.AI builds human-like conversational AI 'Talkbots' that understand Southeast Asian languages and accents, including Singlish, Bahasa, Thai, and Tagalog, for enterprise customer engagement. It serves over 300 clients across the region with voice and chat automation.

scaleup $56M

Papercup

London, United Kingdom

Papercup provides AI-powered dubbing and voice-over solutions for video content, utilizing a patented technology stack trained on extensive licensed voice data. Their platform combines synthetic voices with human editorial post-editing to deliver natural-sounding, culturally nuanced audio localization. Papercup targets enterprise-level content creators and media companies seeking scalable and cost-effective methods to expand global reach without sacrificing audience engagement.

startup $50M

Infinitus Systems

San Francisco, United States

Infinitus Systems develops a voice AI platform that automates administrative and clinical phone calls for U.S. healthcare providers and payers. Their technology specifically addresses time-consuming tasks like prior authorization and routine patient communication, utilizing AI agents to handle calls without human intervention. This solution aims to reduce administrative burden, improve staff productivity, and ultimately enhance patient outcomes within the healthcare system.

startup $50M

Sanas

Palo Alto, United States

Sanas provides a real-time Speech AI platform specializing in accent and language translation for improved communication clarity. Their core technology modulates speech to neutralize accents and remove noise while preserving vocal characteristics, enabling natural-sounding conversations in over 25 languages. Sanas targets call centers and communication-heavy businesses seeking to enhance customer and employee experiences, reduce communication friction, and improve key performance indicators like CSAT and AHT.

startup $50M

Hume AI

New York, United States

Hume AI builds empathic AI that understands and responds to human emotional expressions. Provides APIs for emotion recognition in voice, face, and language.

startup $50M

Skit.ai

Bengaluru, India

Skit.ai (formerly Vernacular.ai) provides conversational voice AI for debt collections, marketing, and servicing, automating large volumes of consumer calls for the accounts-receivable industry. It supports English plus 10+ Indian languages and serves collection agencies in the US.

startup $48M

aiOla

Tel Aviv, Israel

aiOla transforms frontline speech into structured, validated data for enterprise systems. Voice-agentic workflows replace manual data entry.

startup $40M

Modulate

Cambridge, United States

Modulate is a US-based AI platform that analyzes live and recorded voice conversations to deliver real-time insights into content, intent, and emotional state. Their core technology decodes multi-dimensional voice signals – including deception, toxicity, and synthetic speech – to provide actionable alerts and APIs. Modulate targets businesses requiring enhanced fraud prevention, trust & safety measures, and customer experience improvements through proactive voice intelligence, serving sectors like gaming, contact centers, and online communities.

startup $36M

Decagon

San Francisco, United States

Decagon delivers AI-powered virtual agents for enterprise customer support, specializing in voice and chat channels. Their core technology focuses on customizable conversational AI with cross-channel memory, enabling personalized and connected customer interactions. Decagon targets companies seeking to significantly increase customer support deflection rates, scale operations to 24/7 availability, and improve key customer experience metrics like First Response Time and Customer Satisfaction.

startup $35M

Fano Labs

Hong Kong, Hong Kong

Fano Labs specializes in speech recognition and NLP for Asian languages and dialects (Cantonese, Mandarin, Thai, Bahasa and more), serving financial services and customer-service clients like HSBC and DBS. A 2015 University of Hong Kong spin-off founded by Miles Wen and Victor Li; backed by Horizons Ventures and Alibaba.

commercial $30M

PlayHT

San Francisco, United States

PlayHT is a US-based AI company specializing in realistic text-to-speech (TTS) and voice cloning technology delivered via API. Their platform offers over 200 AI voices in 40+ languages, focusing on low-latency synthesis for applications requiring natural-sounding, multi-speaker audio. PlayHT targets content creators and enterprises seeking to automate voiceovers and generate audio content at scale.

startup $29M

Cartesia

San Francisco, United States

Cartesia builds fast, realtime AI models for voice and speech. Their Sonic model enables sub-100ms latency text-to-speech for conversational AI.

startup $27M

Fixie.ai

Seattle, United States

Fixie.ai develops the Ultravox platform, enabling developers to build and deploy AI agents powered by a next-generation, open-source Speech Language Model (SLM). Ultravox focuses on natural speech understanding to facilitate more human-like conversational AI experiences. The company targets businesses seeking to integrate scalable voice AI capabilities into their applications and workflows.

startup $25M

LiveKit

San Francisco, United States

LiveKit is an open-source platform for building realtime audio and video applications. Powers voice AI agents with ultra-low latency infrastructure.

startup $23M

PlayAI

San Francisco, United States

PlayAI develops voice cloning and text-to-speech technology. Their platform creates custom AI voice models from audio samples, enabling natural-sounding speech synthesis for content creators and businesses.

startup $21M

Krisp

San Francisco, United States

Krisp develops AI-powered tools to enhance the quality and productivity of virtual meetings. Their core product is an AI Meeting Assistant that combines industry-leading noise cancellation with automated transcription, summarization, and accent conversion. Krisp targets professionals and teams seeking to improve communication clarity and efficiency in remote and hybrid work environments by automating key meeting tasks.

startup $17M

Iconic

London, United Kingdom

Iconic is a UK-based gaming-AI startup building AI-native game worlds powered by on-device small language models that enable real-time, voice-driven interactions without an internet connection. Its voice agents act as in-world characters whose dialogue and actions are generated dynamically within designer-set bounds. It raised a $13M seed round co-led by Kindred Ventures and Northzone.

startup $17M

Intella

Cairo, Egypt

Intella is a Cairo-founded speech-intelligence startup specializing in Arabic-first AI. Its proprietary speech-recognition models support transcription, analytics and conversational tools across more than 25 Arabic dialects. It raised a $12.5M Series A led by Prosus in 2025, bringing total funding to roughly $16.9M.

startup $17M

Bland AI

San Francisco, United States

Bland AI provides enterprises with AI-powered phone agents capable of handling both inbound and outbound calls using natural language processing. Their core technology centers on customizable voice models trained on client-provided recordings and transcriptions, offering a branded conversational experience. Targeting businesses across verticals like finance, healthcare, and logistics, Bland AI differentiates itself through on-premise data security and seamless integration capabilities for automating customer support, sales, and operational communications.

startup $16M

Voiceitt

Tel Aviv, Israel

Voiceitt develops AI-powered speech recognition technology specifically designed to understand non-standard speech patterns, including those resulting from speech impairments, accents, or aging-related conditions. Their core product is a customizable API and software solution leveraging a proprietary database of atypical speech and advanced machine learning. Voiceitt primarily serves individuals with speech disabilities, as well as accessibility applications for accented speakers and those in the Deaf community, enabling greater communication independence and access to voice-controlled technologies.

startup $15M

Rinna

Tokyo, Japan

Rinna is a Japanese AI company specializing in conversational AI and virtual character development. Their core technology centers around creating highly realistic AI personalities capable of natural language interactions, initially demonstrated through integrations with LINE and evolving into AI-powered virtual YouTubers (AITubers). Rinna targets businesses and entertainment sectors seeking to leverage advanced AI for customer engagement, content creation, and immersive digital experiences, with a strong focus on the Japanese market.

startup $15M

Podcastle

Wilmington, United States

Podcastle is a US-based software company offering an all-in-one platform for video and podcast creation directly within a web browser. Their core technology centers on AI-powered tools for audio and video editing, including features for noise reduction, automatic editing, and AI voice generation. Podcastle targets long-form content creators seeking a streamlined, browser-based solution for recording, editing, and distributing professional-quality audio and video content.

startup $14M

Resemble AI

San Francisco, United States

Resemble AI develops a generative AI platform specializing in voice and audio technology, offering products like real-time voice cloning via their Chatterbox model, and audio editing tools like Edit. Their key innovations include DETECT-3B Omni, a multi-modal deepfake detection model consistently ranked among the industry’s most robust, alongside PerTh, an AI-powered watermarking solution for content provenance. Resemble AI serves enterprise and government clients – including Fortune 500 companies – with solutions for content creation, security, and speaker verification, and is trusted by over 3 million teams worldwide.

startup $12M

Vapi

San Francisco, United States

Vapi provides a platform for developers to build and deploy configurable voice AI agents. Their core technology is a comprehensive API enabling advanced conversational AI functionality for phone-based applications. Vapi targets a broad market ranging from startups to large enterprises seeking to automate phone operations and create scalable voice AI products.

startup $12M

Soapbox Labs

Dublin, Ireland

SoapBox Labs develops voice AI specifically designed for children, enabling speech recognition in educational apps with child privacy protection.

commercial $11M

GreyLabs AI

Mumbai, India

GreyLabs AI builds voice-AI agents specialized for India's banking, financial services, and insurance (BFSI) sector, powering sales, collections, and customer support at scale. It raised a roughly $10.2M Series A led by Elevation Capital in 2025.

startup $10M

Udio

New York, United States

Udio is a US-based generative AI company specializing in music creation. Their platform utilizes text-to-music AI technology, enabling users to generate complete songs from simple text prompts. Udio targets musicians, content creators, and hobbyists seeking rapid prototyping or royalty-free music generation capabilities.

startup $10M

Speechly

Helsinki, Finland

Speechly is a Finnish company specializing in real-time Automatic Speech Recognition (ASR) technology delivered via a streaming API. Their core product is a cloud-based ASR engine optimized for low-latency transcription and understanding, particularly in demanding applications like real-time communication and interactive voice response systems. Speechly targets developers building voice-enabled applications requiring high accuracy and speed, offering a developer-friendly alternative to traditional, batch-oriented speech-to-text solutions.

startup $10M

Murf AI

San Francisco, United States

Murf AI develops a text-to-speech (TTS) platform offering over 200 AI voices across 20+ languages, powering realistic voiceovers for video content, presentations, and marketing materials. Their core technology leverages advanced neural network architectures to generate highly natural-sounding speech, and they provide both a user-friendly AI Voice Generator and robust Text-to-Speech APIs & SDKs for developers. Murf AI serves a broad market including content creators, educators, and businesses seeking scalable voice solutions, and is recognized for its speed and efficiency in building voice agents.

commercial $10M

Speechify

Los Angeles, United States

Speechify develops a text-to-speech (TTS) platform leveraging advanced AI voice synthesis to convert digital text – including documents, web pages, and ebooks – into natural-sounding audio. Their core product is a cross-platform application offering both audio reading and voice typing capabilities. With over 55 million users, Speechify primarily targets individuals seeking enhanced accessibility, learning support, and increased productivity through hands-free information consumption.

startup $10M

Recall.ai

San Francisco, United States

Recall.ai develops APIs and SDKs for extracting high-fidelity audio, transcripts, and metadata from video conferencing platforms. Their core technology focuses on isolating individual speaker audio streams within meetings to improve transcript accuracy and recording quality. The company targets developers building applications requiring detailed meeting intelligence, and differentiates itself through superior audio separation compared to standard screen recording solutions.

startup $10M

Amper Music

New York, United States

Amper Music, a Shutterstock company, provides an AI-powered music composition platform that generates original, royalty-free tracks. Utilizing generative algorithms and machine learning, Amper enables content creators – including video producers, advertisers, and game developers – to quickly and affordably produce customized music tailored to specific moods, styles, and lengths. This solution streamlines the music licensing process and offers a cost-effective alternative to traditional music sourcing.

commercial $9M

Smallest.ai

Bengaluru, India

Smallest.ai develops real-time voice AI, including speech-to-text, text-to-speech, and voice agents, with its Lightning text-to-speech model and AWAAZ multilingual model for Indian languages. Its products power high-volume conversational automation for support, collections, and onboarding.

startup $8M

Bolna

Bengaluru, India

Bolna is a voice-AI platform that lets enterprises design, deploy, and monitor voice agents tuned for Indian telephony, supporting more than ten Indian languages and noisy real-world conditions. It raised a $6.3M seed round led by General Catalyst.

startup $6M

Visualfy

Valencia, Spain

Visualfy develops AI-based sound-recognition technology that identifies household and environmental sounds such as doorbells, alarms and crying babies and translates them into visual and haptic alerts for deaf and hard-of-hearing users. It offers both home devices and venue accessibility systems.

startup $5M

Retell AI

San Francisco, United States

Retell AI provides a platform for businesses to build and deploy AI-powered voice agents for automating phone calls. Their technology leverages real-time knowledge base synchronization and natural language processing to handle customer interactions, including navigating IVR systems, scheduling appointments, and facilitating warm transfers to live agents. Retell AI targets companies seeking to improve call center efficiency and customer service through scalable, automated phone solutions, as demonstrated by deployments with companies like Everise.

startup $5M

Speechki

San Francisco, United States

Speechki is a text-to-speech platform offering 500+ AI voices in 77 languages. Backed by Greycroft and Alchemist, they enable content creators to convert text to natural-sounding audio at scale.

startup $5M

Loman AI

Austin, United States

Loman AI provides a 24/7 voice-AI phone agent for restaurants that takes pickup and delivery orders, books reservations, answers FAQs and processes payments, integrating with POS systems such as Square, Toast and SpotOn. It aims to capture phone orders that busy staff would otherwise miss.

startup $4M

EzDubs

San Francisco, United States

EzDubs developed AI-powered real-time speech translation and dubbing technology that preserves a speaker's voice across languages, targeting live conversation and media localization. The company was acquired by Cisco in 2025 to bolster its multilingual communication capabilities.

startup $4M

Camb.ai

Dubai, United Arab Emirates

Camb.ai builds proprietary speech AI models (MARS and BOLI) for real-time dubbing and translation across more than 100 languages while preserving the original speaker's voice and emotion. It has dubbed live sports and film content for partners including the Australian Open.

startup $4M

Lelapa AI

Johannesburg, South Africa

Lelapa AI develops Natural Language Processing (NLP) technology specifically for African languages, originating from the Masakhane research community. Their core product, the Vulavula API, provides resource-efficient speech-to-text and transcription services for real-time call processing and analysis. Lelapa AI targets businesses operating in African markets seeking to improve customer experience, ensure compliance, and gain actionable insights from multilingual customer interactions.

startup $3M

Nanovate

Cairo, Egypt

Nanovate builds Arabic-native AI agents for businesses across the Middle East and North Africa, providing chat and voice assistants spanning 22 Arabic dialects, automation workflows and a no-code dashboard for enterprises to deploy their own Arabic-speaking AI. It targets expansion into Saudi Arabia and the UAE.

startup $2M

Wittify AI

Riyadh, Saudi Arabia

Wittify AI is a Saudi conversational AI company building Arabic-first AI agents with proprietary speech recognition and text-to-speech trained on more than 25 Arabic dialects. Its no-code platform lets businesses deploy AI agents across websites, CRMs, phone systems and contact centers. The company raised a $1.5M pre-seed round and operates across Riyadh, Dubai and Cairo.

startup $2M

Rev.com

Austin, United States

Rev.com provides AI-powered transcription and captioning services, specializing in solutions for the legal industry. Their core offering is a 96%+ accurate AI transcription engine designed for high-volume processing of legal evidence like depositions, police reports, and bodycam footage, supplemented by a network of 14,000+ human transcriptionists for 99%+ accuracy when required. Rev targets law firms and legal professionals by offering tools for evidence review, timeline creation, and secure transcript management directly within their platform.

scaleup $0M

Vatis Tech

Bucharest, Romania

Vatis Tech is a Bucharest-based speech-AI company providing automatic transcription, real-time speech-to-text and multilingual translation through an API, with high accuracy for Romanian and challenging domains such as medical, legal and call-center audio. It serves developers and enterprises needing accurate audio-to-text conversion.

startup $0M

iFlytek

Hefei, China

iFlytek develops advanced AI-powered language solutions, including its core Jieli speech recognition platform and translation tools supporting over 60 languages. The company’s innovations center on deep learning models for accurate speech-to-text, text-to-speech, and machine translation, demonstrated in products like their real-time transcription services for meetings and content creation. As China’s leading provider in this space, iFlytek increasingly focuses on international expansion and serves sectors including education, digital marketing, and professional communication.

enterprise Est. 1999

Nuance Communications

Burlington, United States

Nuance Communications, now a Microsoft company, develops AI-powered solutions for clinical and administrative healthcare documentation. Their core technology centers on speech recognition and natural language processing applied to create tools like Dragon Medical One, which automates clinical documentation and enhances radiology reporting. Nuance primarily serves healthcare providers and aims to improve clinician productivity, reduce administrative burden, and enhance patient care through AI-driven workflows.

enterprise Est. 1992

Cosito

Boston, United States

Cosito, an MIT-founded startup, makes AI-powered voice-capture tools (smart microphones/sensors) that let frontline manufacturing and warehouse teams log quality checks, inventory and maintenance data by speaking—no forms or typing—syncing structured data to ERP systems in real time.

startup Est. 2023

Emotech

London, United Kingdom

Emotech develops multimodal AI solutions focused on enhancing customer and user interactions, with key products including a multilingual speech platform and customizable generative AI avatars. Their technology specializes in realistic AI-driven speech synthesis – notably offering Arabic chatbots with dialect support – and a unique AI-powered pronunciation assessment tool for language learning. Emotech targets businesses seeking to improve customer service, create immersive digital experiences, and innovate in areas like education and gaming, demonstrated by claims of a 30% boost in customer satisfaction for early adopters.

startup Est. 2014

Endel

Berlin, Germany

Endel is a German technology company developing AI-powered generative audio environments designed to improve cognitive performance and wellbeing. Their core product utilizes a patented algorithm that creates personalized soundscapes adapting in real-time to user-specific data like time of day, weather, and biometrics. Endel targets individuals seeking to enhance focus, reduce stress, and improve sleep quality through scientifically-backed auditory experiences.

startup Est. 2018

Sonantic

London, United Kingdom

Sonantic develops realistic, emotionally-expressive AI voices for digital media. Their core technology utilizes a proprietary neural network trained on human performance data to generate nuanced vocal performances from text. Acquired by Spotify, Sonantic primarily serves the gaming, animation, and audiobook industries, offering a solution for scalable and high-quality voice acting.

startup Est. 2018

SoundHound

Santa Clara, United States

SoundHound AI develops and licenses voice AI technologies that enable conversational interfaces for a variety of industries, including automotive, retail, and finance. Their core offering is a fully independent voice AI platform capable of handling over 10 billion conversations annually, focusing on agentic AI solutions that automate complex tasks. SoundHound differentiates itself by offering a complete, customizable voice AI solution – rather than relying on cloud-based assistants – allowing businesses to own the entire interaction and maximize ROI through cost reduction and revenue generation.

enterprise Est. 2005

Speak AI

Toronto, Canada

Speak AI is a Canadian company specializing in AI-powered transcription and analysis of audio and video data. Their core product utilizes Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) to convert media into searchable, transcribed text and extract key insights. Speak AI primarily serves researchers and businesses needing to efficiently process and analyze qualitative data from interviews, meetings, and other spoken content.

startup Est. 2018

iZotope

Cambridge, United States

iZotope develops advanced audio processing software leveraging machine learning for tasks like mixing, mastering, and dialogue editing. Their core technology centers on neural networks trained on vast datasets of professionally produced audio to deliver intelligent assistance and automated solutions for common audio challenges. Targeting audio engineers, musicians, and post-production professionals, iZotope provides tools that streamline workflows and enhance sonic quality with data-driven precision.

scaleup Est. 2001

RingCentral

Belmont, United States

RingCentral provides a unified cloud communications platform integrating voice, video, messaging, and contact center solutions. Their core AI technology focuses on real-time conversation intelligence and automation within these communication channels, offering features like call transcription, sentiment analysis, and automated workflows. RingCentral targets businesses of all sizes seeking to improve agent productivity, enhance customer experiences, and gain actionable insights from their communications data.

enterprise Est. 1999

Teachable Machine

Mountain View, United States

Teachable Machine is a web-based platform developed by Google that enables users to rapidly create machine learning models using a no-code interface. The platform focuses on image, audio, and pose-based recognition, allowing individuals to train custom models directly within their browser. Primarily targeting educators, artists, and hobbyists, Teachable Machine lowers the barrier to entry for machine learning by eliminating the need for programming expertise and facilitating quick prototyping for integration into web applications and creative projects.

enterprise Est. 2017

Acoustic.ai

Copenhagen, Denmark

Acoustic.ai develops voice AI solutions for automotive and consumer electronics, focusing on noise cancellation and voice enhancement.

commercial Est. 2018

AIVA

Luxembourg City, Luxembourg

AIVA is a Luxembourg-based company specializing in AI-driven music composition. Their core technology is a generative AI model capable of autonomously composing original soundtracks across a variety of genres and styles. AIVA targets content creators in film, gaming, and advertising seeking royalty-free, customizable music solutions, offering an alternative to traditional music licensing and composition.

commercial Est. 2016

Fish Audio

Shanghai, China

Fish Audio offers studio-grade AI text-to-speech and instant voice cloning with 1,000+ voices in 70+ languages. Their open-source models have gained significant developer adoption.

startup Est. 2023

Fliki

Bengaluru, India

Fliki combines text-to-speech with AI video creation. Their platform converts text and blog posts into videos with lifelike AI voices, serving content creators and marketing teams worldwide.

startup Est. 2021

Fireflies.ai

San Francisco, United States

Fireflies.ai develops an AI-powered meeting assistant that automatically transcribes, summarizes, and analyzes conversational data across various video conferencing platforms. Their core technology centers on speech-to-text conversion and natural language processing to identify speakers and extract key insights from meetings. Fireflies.ai targets professional teams seeking to improve meeting productivity and knowledge management through searchable conversation archives.

commercial Est. 2016

Rev AI

San Francisco, United States

Rev AI provides a speech-to-text API specializing in automated transcription and speech recognition services. Their core technology centers on a diverse, large-dataset trained AI model designed for high accuracy across varied audio qualities and accents. They target developers and businesses requiring scalable, programmatic transcription solutions for applications like voice search, media monitoring, and accessibility services.

commercial Est. 2020

Boomy

Berkeley, United States

Boomy develops a platform enabling users to create original music tracks via artificial intelligence. Their core technology utilizes generative AI models—specifically, a combination of diffusion and transformer models—to compose music across various genres based on user-defined parameters. Targeting both amateur musicians and content creators, Boomy uniquely allows users to commercially distribute and potentially earn royalties from AI-generated compositions.

commercial Est. 2019

Soundraw

Tokyo, Japan

Soundraw develops an AI-powered music generation platform focused on providing royalty-free music for content creators. Their core technology utilizes an in-house trained AI model to compose original instrumentals, allowing for granular customization via a stem-based mixer. Soundraw uniquely targets the need for legally safe, customizable background music, enabling monetization opportunities for users without copyright concerns.

commercial Est. 2020

Fathom

San Francisco, United States

Fathom develops AI-powered note-taking and meeting assistants designed for professional use. Their core product utilizes large language models to automatically generate summaries, action items, and searchable transcripts from virtual meetings across platforms like Zoom, Google Meet, and Microsoft Teams. Fathom has gained traction among knowledge workers and teams seeking to improve meeting productivity and information retention, evidenced by integrations with platforms like Slack and Notion.

startup Est. 2021

LMNT

San Francisco, United States

LMNT develops real-time text-to-speech (TTS) technology focused on low-latency and high fidelity voice generation. Their core product is a voice cloning and streaming API enabling developers to create custom AI voices for applications requiring conversational interfaces. Targeting game developers, virtual assistant creators, and interactive application builders, LMNT was founded by a team with prior experience at Google and emphasizes scalability for production deployments.

startup Est. 2020

Neets.ai

Copenhagen, Denmark

Neets.ai develops real-time, low-latency text-to-speech (TTS) APIs for developers. Their core technology focuses on highly customizable and expressive neural voice cloning and generation, enabling creation of unique synthetic voices from limited audio data. Targeting game developers, metaverse platforms, and interactive voice response (IVR) systems, Neets.ai recently launched a public beta program for their voice API following seed funding in late 2023.

startup Est. 2022

Rime

San Francisco, United States

Rime develops customizable text-to-speech (TTS) models for enterprise applications, focusing on accurate pronunciation and multi-lingual support. Their core technology centers on AI voice models designed to improve customer experience in conversational AI systems. Targeting businesses requiring high-quality voice interactions, Rime launched in 2023 with a focus on reducing call abandonment rates through natural-sounding TTS.

startup Est. 2023

Sesame AI

San Francisco, United States

Sesame AI develops specialized AI hardware and software focused on creating highly realistic and responsive voice-based conversational AI. Their core product is a novel AI chip architecture designed to efficiently run large language models for natural language processing, specifically prioritizing low-latency voice interaction. Currently in research preview, Sesame AI targets the emerging market for on-device, lifelike AI assistants and aims to advance the field of human-computer interaction.

startup Est. 2024

Gladia

Paris, France

Gladia provides a unified API for audio transcription and enrichment, enabling developers to convert spoken language into structured data. Their core technology focuses on a single, multilingual speech-to-text (STT) endpoint capable of processing over 100 languages – including mid-sentence language switching – without requiring separate models. Gladia targets developers building voice products, global support systems, and multilingual voice agents, and aims to simplify audio data processing across diverse linguistic environments.

startup Est. 2022

Neuphonic

London, United Kingdom

Neuphonic develops real-time voice AI solutions for analyzing customer interactions. Their core technology centers on a proprietary, low-latency speech-to-text and natural language understanding (NLU) platform designed for high-accuracy transcription and sentiment analysis. Targeting contact centers and financial institutions, Neuphonic launched with seed funding and focuses on providing actionable insights from voice data to improve customer experience and compliance.

startup Est. 2024

WellSaid Labs

Seattle, United States

WellSaid Labs develops a text-to-speech (TTS) platform generating realistic voiceovers from written scripts. Their core technology utilizes AI models trained on licensed recordings of professional voice actors, offering over 120 distinct voices with customizable accents and styles. The company targets content creators, training departments, and marketing teams seeking scalable and cost-effective voiceover solutions.

startup Est. 2018

Wispr Flow

San Francisco, United States

Wispr Flow develops voice-to-text software enabling hands-free dictation directly into any application on desktop and mobile devices. Their core product, “Flow,” utilizes advanced speech recognition AI to convert spoken language into formatted text with a focus on speed and accuracy across diverse writing contexts. Targeting professionals and individuals seeking increased productivity, Wispr Flow emphasizes seamless integration and real-time text placement within existing workflows.

startup Est. 2021

Gnani.ai

Bangalore, India

Gnani.ai develops conversational AI agents for automating customer experience (CX) workflows across multiple channels including voice, chat, and messaging platforms. Their core technology centers on intent-based AI agents capable of integrating with existing CRM and telephony systems to deliver automated, measurable customer interactions. Targeting enterprise clients, particularly in regions requiring multilingual support, Gnani.ai focuses on scaling personalized service while maintaining local language nuances.

startup Est. 2016

Synthflow

Berlin, Germany

Synthflow develops a Voice AI platform enabling enterprise call automation through custom AI agents. Their core technology focuses on low-latency speech recognition and generation for handling high-volume phone conversations. Targeting sales and support functions, Synthflow showcases applications like real estate lead qualification and aims to provide scalable, automated phone support solutions for businesses.

startup Est. 2023

Telli

Berlin, Germany

Telli develops an AI-powered sales and customer service platform specializing in automated outbound phone calls. Their core product is an AI Sales Agent utilizing voice AI to autonomously conduct outreach, qualify leads, and schedule meetings. Targeting sales and customer support teams, Telli claims to deliver significant improvements in engagement and cost reduction, with integrations designed for existing CRM systems.

startup Est. 2024

Happyrobot

San Francisco, United States

HappyRobot develops an AI-powered operating system designed to automate end-to-end tasks within enterprise logistics and operations. Their core technology centers on continuously-learning AI agents benchmarked for both technical performance and behavioral consistency, enabling autonomous decision-making in complex, real-world environments. The company focuses on deployments within large enterprises, demonstrating a focus on practical application evidenced by deployments in high-stakes operational settings.

startup Est. 2023

Cerence

Burlington, United States

Cerence develops AI-powered conversational and agentic platforms specializing in voice and speech recognition. Their core product is the CaLLM™ family of large language models and the Cerence SDK, enabling natural language interactions primarily within the automotive industry. Cerence notably focuses on embedded, on-device AI solutions for in-cabin experiences and has achieved deployments with major automotive manufacturers.

enterprise Est. 2019

Respeecher

Kyiv, Ukraine

Respeecher is a voice AI company specializing in high-fidelity voice cloning and text-to-speech solutions for professional media production. Their core technology focuses on nuanced, emotionally-aware speech synthesis—going beyond simple text-to-speech to deliver performance-quality vocalizations. Respeecher primarily serves the film, gaming, and broader media industries, and gained prominence through work enabling voice restoration and cross-lingual dubbing, notably supporting Ukrainian humanitarian efforts through celebrity voice cloning.

startup Est. 2018

AgentPhone

San Francisco, United States

AgentPhone provides phone-number infrastructure for AI agents: an API (and MCP server) to provision US/Canadian numbers, attach them to agent personas, place and receive voice calls with real-time transcription, and manage SMS, so agents can call, text, and act without a human in the loop. Founded 2026 in San Francisco by Manav Modi and Meet Modi.

startup Est. 2026

Elyra

San Francisco, United States

Elyra is an AI reservation system for restaurants whose voice and email agents answer every call and message 24/7 to handle bookings, modifications, and questions, connected to a reservation engine that optimizes table placement in real time based on party size, guest history, and seating rules. Y Combinator-backed (P26).

startup Est. 2025

Callab AI

San Francisco, United States

Callab AI provides AI voice agents for enterprise and legacy telephony systems (CUCM, on-prem PBX, carrier-grade SIP), replacing IVR menus and automating support, sales, and scheduling calls without replacing existing infrastructure. Features include batch calling, automatic recording/transcription with sentiment tagging, and context-aware routing to live agents. Y Combinator-backed.

startup Est. 2025

Phonely

San Francisco, United States

Phonely builds AI phone-call agents that handle inbound and outbound calls for businesses, creating an 'AI vocal twin' from existing call recordings. The Y Combinator-backed company (S24) targets call centers and receptionist workflows.

startup Est. 2023

Thoughtly

New York, United States

Thoughtly develops human-like AI voice agents that businesses can deploy with no-code tools to automate inbound and outbound calls for customer service, sales, and marketing. The platform lets teams build and launch voice agents in minutes.

startup Est. 2023

Intron Health

Lagos, Nigeria

Intron Health built Africa's first clinical speech-recognition platform, Sahara, trained on millions of audio clips to transcribe accented African speech with high accuracy. Originally for medical documentation, it now powers voice recognition across healthcare, legal, financial, and government use cases.

startup Est. 2020

Heidi Health

Melbourne, Australia

Heidi Health builds an AI medical scribe and clinical productivity assistant that transcribes patient visits, structures notes, generates referrals, and applies codes across electronic health records. It supports more than two million consults each week in 110 languages.

scaleup Est. 2019

Spitch

Lagos, Nigeria

Spitch is a Nigerian voice-AI company that builds speech-to-text and text-to-speech APIs tuned for African languages and accents, starting with Yoruba, Hausa, Igbo and Nigerian-accented English. Its tools let developers add local-language voice capabilities to call centers, media and learning platforms without machine-learning expertise.

startup Est. 2024

Vambo AI

Johannesburg, South Africa

Vambo AI builds AI infrastructure for African languages, offering translation, speech recognition and natural-language processing across more than 60 African and global languages via its Jua-Tanga model series. Its products include a translation app, a developer API platform and classroom tools for underserved-language speakers.

startup Est. 2023

Botnoi Group

Bangkok, Thailand

Botnoi Group is a Thai conversational-AI company that builds AI chatbots, voicebots, AI agents and digital humans, with its Botnoi Voice platform offering hundreds of synthetic voices across more than 20 languages. Its products are used by dozens of enterprises in Thailand including retail, banking and healthcare.

startup Est. 2017

AI4Nepal

Kathmandu, Nepal

AI4Nepal builds production-ready multilingual AI products for Nepal's languages, including automatic speech recognition, text-to-speech, translation APIs, chatbots and NLP toolkits. Founded in 2024 in Kathmandu by IIT Madras alumni, it works in strategic partnership with AI4Bharat to adapt low-resource language models for Nepali and regional languages.

startup Est. 2024

DataQueue

Groningen, Netherlands

DataQueue is a Palestinian-Dutch voice AI company building an Arabic-first voice AI orchestration platform, with products including VoiceHub for conversational agents and CallStudio for multilingual speech analytics across major Arabic dialects. Founded in 2021 by Bashir Alsaifi and headquartered in Groningen, Netherlands, it serves enterprises in MENA and Europe.

startup Est. 2021

ToumAI

Rabat, Morocco

ToumAI is a Moroccan AI company providing multilingual voice analytics and autonomous voice agents that understand African and MENA languages and dialects such as Moroccan Darija, Wolof and Swahili. Founded in 2021 in Rabat, it raised USD 1M in pre-seed funding and operates across Morocco, Senegal and Malta.

startup Est. 2021

Pindo

Kigali, Rwanda

Pindo is a Rwandan voice-AI company building speech recognition and conversational agents for African languages including Kinyarwanda, Kiswahili and Luganda. Its platform lets banks, microfinance institutions and fintechs automate customer interactions such as loan applications and balance inquiries over ordinary phone calls.

startup Est. 2020

Mideind

Reykjavik, Iceland

Mideind is an Icelandic language-technology company specializing in natural-language processing and AI for the Icelandic language. It builds neural machine translation, spelling and grammar checking, and the voice assistant Embla, and contributes core tools to Iceland's national language-technology program.

startup Est. 2015

Maqsam

Amman, Jordan

Maqsam is a Jordanian company providing an Arabic AI-powered contact-center platform for the MENA region. It combines cloud telephony with proprietary Arabic speech-to-text, AI agents, call summaries and sentiment analysis tuned to Gulf, Levantine and Egyptian dialects.

startup Est. 2019

Browse All Companies

Frequently Asked Questions

What is speech AI used for?: Speech AI powers voice assistants (Siri, Alexa), live meeting transcription, voice cloning for content creators, multilingual customer service, accessibility tools for the hearing-impaired, and dubbing for video localisation.
How realistic is AI-generated voice?: Modern TTS systems achieve MOS (Mean Opinion Score) scores comparable to human speech. Leading systems from ElevenLabs, Play.ht, and LMNT can clone a voice from seconds of audio.
Who are the top voice AI companies?: Top companies include Spotify AI, Uniphore, Verbit, alongside ElevenLabs, Deepgram, AssemblyAI, and Whisper (OpenAI).

Voice & Speech AI Companies

Where the Voice & Speech AI companies are

Browse other categories

Frequently Asked Questions