Voice AI Companies
Explore 33 Voice AI companies in our AI directory. Leading companies include Dialpad, Mobvoi, Suki AI.
Dialpad
San Francisco, United States
Dialpad is a US-based provider of an all-in-one cloud communications platform integrating voice, video, messaging, and a contact center solution. Their core technology leverages real-time Voice AI to provide features like automated call transcription, agent coaching, and autonomous workflow execution for tasks like appointment scheduling and refunds. Dialpad targets businesses seeking to improve contact center performance and streamline communications across multiple channels, with a focus on security and integration with existing CRM and collaboration tools.
Mobvoi
Beijing, China
Mobvoi is a Chinese technology company specializing in voice AI and intelligent wearables. Their core technology centers around a proprietary Chinese Natural Language Processing (NLP) engine powering voice assistants and features across their product line, most notably the TicWatch series of smartwatches. Mobvoi primarily targets the Chinese market with localized AI experiences, while also offering select wearables internationally with a focus on health and fitness tracking.
Suki AI
Redwood City, United States
Suki AI develops an ambient clinical intelligence platform that utilizes voice AI and natural language processing to automate clinical documentation workflows. Their technology captures and analyzes patient-physician conversations to generate comprehensive notes, orders, and instructions directly within existing Electronic Health Record (EHR) systems. Suki AI targets healthcare providers and organizations seeking to reduce administrative burden, improve physician burnout, and enhance revenue cycle management through streamlined documentation processes.
Poly AI
London, United Kingdom
Poly AI develops conversational AI solutions for enterprise contact centers, enabling fully autonomous handling of customer voice calls. Their core technology focuses on delivering highly natural, multilingual voice interactions that replicate human agent conversations, distinguishing them through a customer-led approach to AI training. Poly AI targets businesses seeking to scale customer service while maintaining a high-quality, localized brand experience, particularly within the hospitality and service industries.
ElevenLabs
New York, United States
ElevenLabs specializes in realistic voice AI, offering a platform for text-to-speech generation and voice cloning powered by proprietary models like their flagship voice agent technology. Their platform provides access to over 5,000 voices in 70+ languages, and recently expanded with the launch of the Iconic Marketplace featuring digitally-recreated voices of prominent figures such as Matthew McConaughey and Sir Michael Caine. ElevenLabs targets content creators, developers, and businesses seeking to integrate high-quality, customizable voice solutions into applications ranging from audiobooks and gaming to virtual assistants and accessibility tools.
Parloa
Berlin, Germany
Parloa delivers a generative AI-powered platform for contact center automation, enabling enterprises to deploy and manage personalized “AI agents” that handle high-volume customer interactions. Their technology orchestrates the full AI agent lifecycle – from development to deployment and optimization – focusing on complex tasks like scheduling, refunds, and personalized recommendations. Parloa targets large enterprises seeking to improve customer loyalty and efficiency, and their platform is designed for high-stakes environments requiring precision and scalability in customer communication.
Deepgram
San Francisco, United States
Deepgram is a US-based provider of voice AI APIs for enterprise applications, offering unified speech-to-text, text-to-speech, and LLM orchestration. Their platform distinguishes itself through a single API designed to minimize complexity, latency, and cost compared to component-based solutions, and supports both real-time and batch processing with telephony integrations. Deepgram targets developers and businesses requiring highly accurate and scalable voice intelligence for applications like contact centers, voice assistants, and conversational AI systems.
Speechmatics
Cambridge, United Kingdom
Speechmatics is a UK-based technology company specializing in accurate, low-latency Automatic Speech Recognition (ASR) and speech-to-text solutions. Their core offering is a Speech API providing transcription, real-time translation, and text-to-speech capabilities, deployable on-device, on-premise, or in the cloud. Speechmatics targets enterprises requiring high-quality voice AI with a focus on data privacy, offering a non-logging standard deployment option.
Papercup
London, United Kingdom
Papercup provides AI-powered dubbing and voice-over solutions for video content, utilizing a patented technology stack trained on extensive licensed voice data. Their platform combines synthetic voices with human editorial post-editing to deliver natural-sounding, culturally nuanced audio localization. Papercup targets enterprise-level content creators and media companies seeking scalable and cost-effective methods to expand global reach without sacrificing audience engagement.
Infinitus Systems
San Francisco, United States
Infinitus Systems develops a voice AI platform that automates administrative and clinical phone calls for U.S. healthcare providers and payers. Their technology specifically addresses time-consuming tasks like prior authorization and routine patient communication, utilizing AI agents to handle calls without human intervention. This solution aims to reduce administrative burden, improve staff productivity, and ultimately enhance patient outcomes within the healthcare system.
Hume AI
New York, United States
Hume AI builds empathic AI that understands and responds to human emotional expressions. Provides APIs for emotion recognition in voice, face, and language.
aiOla
Tel Aviv, Israel
aiOla transforms frontline speech into structured, validated data for enterprise systems. Voice-agentic workflows replace manual data entry.
Modulate
Cambridge, United States
Modulate is a US-based AI platform that analyzes live and recorded voice conversations to deliver real-time insights into content, intent, and emotional state. Their core technology decodes multi-dimensional voice signals – including deception, toxicity, and synthetic speech – to provide actionable alerts and APIs. Modulate targets businesses requiring enhanced fraud prevention, trust & safety measures, and customer experience improvements through proactive voice intelligence, serving sectors like gaming, contact centers, and online communities.
Decagon
San Francisco, United States
Decagon delivers AI-powered virtual agents for enterprise customer support, specializing in voice and chat channels. Their core technology focuses on customizable conversational AI with cross-channel memory, enabling personalized and connected customer interactions. Decagon targets companies seeking to significantly increase customer support deflection rates, scale operations to 24/7 availability, and improve key customer experience metrics like First Response Time and Customer Satisfaction.
PlayHT
San Francisco, United States
PlayHT is a US-based AI company specializing in realistic text-to-speech (TTS) and voice cloning technology delivered via API. Their platform offers over 200 AI voices in 40+ languages, focusing on low-latency synthesis for applications requiring natural-sounding, multi-speaker audio. PlayHT targets content creators and enterprises seeking to automate voiceovers and generate audio content at scale.
Cartesia
San Francisco, United States
Cartesia builds fast, realtime AI models for voice and speech. Their Sonic model enables sub-100ms latency text-to-speech for conversational AI.
Fixie.ai
Seattle, United States
Fixie.ai develops the Ultravox platform, enabling developers to build and deploy AI agents powered by a next-generation, open-source Speech Language Model (SLM). Ultravox focuses on natural speech understanding to facilitate more human-like conversational AI experiences. The company targets businesses seeking to integrate scalable voice AI capabilities into their applications and workflows.
LiveKit
San Francisco, United States
LiveKit is an open-source platform for building realtime audio and video applications. Powers voice AI agents with ultra-low latency infrastructure.
Bland AI
San Francisco, United States
Bland AI provides enterprises with AI-powered phone agents capable of handling both inbound and outbound calls using natural language processing. Their core technology centers on customizable voice models trained on client-provided recordings and transcriptions, offering a branded conversational experience. Targeting businesses across verticals like finance, healthcare, and logistics, Bland AI differentiates itself through on-premise data security and seamless integration capabilities for automating customer support, sales, and operational communications.
Rinna
Tokyo, Japan
Rinna is a Japanese AI company specializing in conversational AI and virtual character development. Their core technology centers around creating highly realistic AI personalities capable of natural language interactions, initially demonstrated through integrations with LINE and evolving into AI-powered virtual YouTubers (AITubers). Rinna targets businesses and entertainment sectors seeking to leverage advanced AI for customer engagement, content creation, and immersive digital experiences, with a strong focus on the Japanese market.
Resemble AI
San Francisco, United States
Resemble AI develops a generative AI platform specializing in voice and audio technology, offering products like real-time voice cloning via their Chatterbox model, and audio editing tools like Edit. Their key innovations include DETECT-3B Omni, a multi-modal deepfake detection model consistently ranked among the industry’s most robust, alongside PerTh, an AI-powered watermarking solution for content provenance. Resemble AI serves enterprise and government clients – including Fortune 500 companies – with solutions for content creation, security, and speaker verification, and is trusted by over 3 million teams worldwide.
Vapi
San Francisco, United States
Vapi provides a platform for developers to build and deploy configurable voice AI agents. Their core technology is a comprehensive API enabling advanced conversational AI functionality for phone-based applications. Vapi targets a broad market ranging from startups to large enterprises seeking to automate phone operations and create scalable voice AI products.
Murf AI
San Francisco, United States
Murf AI develops a text-to-speech (TTS) platform offering over 200 AI voices across 20+ languages, powering realistic voiceovers for video content, presentations, and marketing materials. Their core technology leverages advanced neural network architectures to generate highly natural-sounding speech, and they provide both a user-friendly AI Voice Generator and robust Text-to-Speech APIs & SDKs for developers. Murf AI serves a broad market including content creators, educators, and businesses seeking scalable voice solutions, and is recognized for its speed and efficiency in building voice agents.
Retell AI
San Francisco, United States
Retell AI provides a platform for businesses to build and deploy AI-powered voice agents for automating phone calls. Their technology leverages real-time knowledge base synchronization and natural language processing to handle customer interactions, including navigating IVR systems, scheduling appointments, and facilitating warm transfers to live agents. Retell AI targets companies seeking to improve call center efficiency and customer service through scalable, automated phone solutions, as demonstrated by deployments with companies like Everise.
iFlytek
Hefei, China
and aiming for a professional, informative tone: iFlytek develops advanced AI-powered language solutions, including its core Jieli speech recognition platform and translation tools supporting over 60 languages. The company’s innovations center on deep learning models for accurate speech-to-text, text-to-speech, and machine translation, demonstrated in products like their real-time transcription services for meetings and content creation. As China’s leading provider in this space, iFlytek increasingly focuses on international expansion and serves sectors including education, digital marketing, and professional communication.
Cosito
Boston, United States
MIT-founded startup building AI-powered microphones that let frontline teams log data by voice—no physical forms, no typing.
Emotech
London, United Kingdom
Emotech develops multimodal AI solutions focused on enhancing customer and user interactions, with key products including a multilingual speech platform and customizable generative AI avatars. Their technology specializes in realistic AI-driven speech synthesis – notably offering Arabic chatbots with dialect support – and a unique AI-powered pronunciation assessment tool for language learning. Emotech targets businesses seeking to improve customer service, create immersive digital experiences, and innovate in areas like education and gaming, demonstrated by claims of a 30% boost in customer satisfaction for early adopters.
Sonantic
London, United Kingdom
Sonantic develops realistic, emotionally-expressive AI voices for digital media. Their core technology utilizes a proprietary neural network trained on human performance data to generate nuanced vocal performances from text. Acquired by Spotify, Sonantic primarily serves the gaming, animation, and audiobook industries, offering a solution for scalable and high-quality voice acting.
SoundHound
Santa Clara, United States
SoundHound AI develops and licenses voice AI technologies that enable conversational interfaces for a variety of industries, including automotive, retail, and finance. Their core offering is a fully independent voice AI platform capable of handling over 10 billion conversations annually, focusing on agentic AI solutions that automate complex tasks. SoundHound differentiates itself by offering a complete, customizable voice AI solution – rather than relying on cloud-based assistants – allowing businesses to own the entire interaction and maximize ROI through cost reduction and revenue generation.
RingCentral
Belmont, United States
RingCentral provides a unified cloud communications platform integrating voice, video, messaging, and contact center solutions. Their core AI technology focuses on real-time conversation intelligence and automation within these communication channels, offering features like call transcription, sentiment analysis, and automated workflows. RingCentral targets businesses of all sizes seeking to improve agent productivity, enhance customer experiences, and gain actionable insights from their communications data.
Acoustic.ai
Copenhagen, Denmark
Acoustic.ai develops voice AI solutions for automotive and consumer electronics, focusing on noise cancellation and voice enhancement.
Fireflies.ai
San Francisco, United States
Fireflies.ai develops an AI-powered meeting assistant that automatically transcribes, summarizes, and analyzes conversational data across various video conferencing platforms. Their core technology centers on speech-to-text conversion and natural language processing to identify speakers and extract key insights from meetings. Fireflies.ai targets professional teams seeking to improve meeting productivity and knowledge management through searchable conversation archives.
Rev AI
San Francisco, United States
Rev AI provides a speech-to-text API specializing in automated transcription and speech recognition services. Their core technology centers on a diverse, large-dataset trained AI model designed for high accuracy across varied audio qualities and accents. They target developers and businesses requiring scalable, programmatic transcription solutions for applications like voice search, media monitoring, and accessibility services.