Openai
Generate text, images, audio, and video using large language models and multimodal AI. Create chat completions, generate and edit images from text prompts, convert text to speech, transcribe and translate audio, generate video, and create text embeddings for search and retrieval. Fine-tune models on custom training data, run evaluations to measure model performance, and moderate content against policy categories. Manage vector stores for semantic file search, upload and organize files, and submit batch processing jobs for asynchronous bulk requests. Conduct real-time speech-to-speech conversations via WebRTC or SIP. Administer organizations, projects, users, API keys, and audit logs programmatically. Receive webhook notifications for background responses, batch jobs, fine-tuning jobs, eval runs, and incoming realtime calls.
Google Cloud Speech
Convert audio to text transcriptions and synthesize natural-sounding speech from text using Google's neural network models. Perform synchronous, asynchronous, and streaming speech-to-text recognition across 125+ languages. Create and manage recognizer configurations for reusable transcription settings. Adapt speech models with custom phrase sets, custom classes, and boost values to improve accuracy for domain-specific vocabulary. Identify distinct speakers via speaker diarization and recognize multi-channel audio. Generate subtitle/caption output in SRT format. Synthesize text or SSML into audio using Standard, WaveNet, Neural2, Studio, and Chirp voice types with configurable pitch, speaking rate, volume, and encoding. Produce long-form audio content asynchronously.
OfficialAiml API
Unified gateway to 400+ AI/ML models for text generation, image generation, video generation, music generation, speech-to-text, text-to-speech, content moderation, 3D model generation, vision/OCR, embeddings, and AI-powered web search. Generate chat completions and reasoning with models like GPT, Claude, Gemini, DeepSeek, and Llama. Create images from text prompts using Flux, Stable Diffusion, and DALL-E. Generate videos from text or images asynchronously. Convert speech to text and text to speech in 120+ languages. Moderate content for safety classification. Generate 3D objects from text or images. Extract text and structured data from images via OCR. Produce text embeddings for semantic search. Search the web for real-time information. Create AI Assistants for customer support and data analysis. Interact in real time via WebSocket for voice and text. Receive webhook notifications for async operation completion.
OfficialAivoov
Convert text to speech audio using 2300+ AI voices from Google, Amazon, IBM, and Microsoft. Generate audio in MP3 and WAV formats across 155+ languages. Combine multiple voices in a single request to create conversational audio. Control pitch, speaking rate, and volume per text segment using SSML. Browse and filter available voices by language.
OfficialApipie Ai
Access hundreds of AI models from multiple providers (OpenAI, Anthropic, Google, Meta, etc.) through a unified OpenAI-compatible API. Send chat completions to language models with streaming, function calling, and structured output. Generate images, convert text to speech, analyze images with vision models, and create text embeddings. Augment responses with real-time web search grounding, upload documents for RAG-based retrieval, and reduce hallucinations via integrity checking. Discover and filter available models by type, provider, pricing, and performance. Manage routing preferences (cost or performance optimized), configure model pooling for redundancy, enable persistent conversational memory, and track API usage with detailed cost and token analytics.
OfficialAstica Ai
Analyze images using computer vision for object detection, face detection, OCR, content moderation, tagging, and GPT-powered descriptions. Generate AI images from text prompts. Convert text to speech with 500+ voices, voice cloning, and multilingual support. Transcribe speech to text from audio files or streams. Generate natural language text using GPT-S for question answering, content creation, and diverse text generation. Upscale images using AI enhancement. Train and run custom AI models for vision and NLP tasks.
OfficialAssemblyai
Transcribe pre-recorded and live audio/video to text with support for 99+ languages, speaker diarization, and multichannel audio. Apply audio intelligence models to extract summaries, sentiment analysis, entity detection, topic detection, key phrases, and content moderation from transcripts. Redact personally identifiable information from text and audio. Generate SRT/VTT subtitles and segment transcripts into paragraphs, sentences, or auto-chapters. Stream real-time speech-to-text via WebSocket connections. Upload audio/video files for processing. Manage and delete transcripts. Access an LLM gateway to apply large language models (Claude, GPT, Gemini) to transcribed speech data for summarization, Q&A, and custom analysis. Translate transcripts across 99+ languages. Receive webhook notifications when transcriptions complete or fail.
OfficialBolna
Create, configure, and manage conversational Voice AI agents that make and receive phone calls. Initiate outbound calls with dynamic context variables, handle inbound calls with caller identification, and automate batch calling campaigns via CSV uploads with scheduling and auto-retry. Upload PDF documents and URLs as knowledge bases for RAG-powered conversations. Configure function calling tools for live call transfers, calendar booking, and custom API integrations. Retrieve call execution history including transcripts, recordings, cost breakdowns, and extracted data. Purchase and manage phone numbers, import or clone custom voices, and connect external LLM, TTS, ASR, and telephony providers.
OfficialFireflies
Record, transcribe, and analyze meeting conversations from platforms like Zoom, Google Meet, and Webex. Retrieve, search, and manage meeting transcripts with AI-generated summaries, action items, sentiment analysis, and keywords. Upload audio files for transcription. Ask questions about meetings using the AskFred AI assistant. Add a bot to live meetings for automatic recording, pause and resume recordings, and create live action items or soundbites. Manage users and teams, organize meetings into channels, query contacts, and receive webhook notifications when transcriptions complete.
Deepgram
Use Deepgram REST APIs for pre-recorded speech-to-text, text-to-speech, text intelligence, model discovery, project administration, usage, request troubleshooting, and billing reporting. Supports asynchronous processing via callbacks for transcription, speech synthesis, and text analysis.
Groqcloud
Run AI inference on open-source language models with ultra-low latency using Groq's LPU hardware. Generate text via chat completions and the Responses API, produce structured JSON outputs, and perform function calling with built-in, remote (MCP), or local tools. Transcribe and translate audio using Whisper models, convert text to speech, and analyze images with multimodal vision models. Support chain-of-thought reasoning, content moderation with custom policies, asynchronous chat-completions batch processing, and Groq file upload/download workflows. List and query available hosted models.