Audio Agent
The Audio Agent provides automated audio transcription and intelligent question-answering capabilities. It processes audio files, generates transcriptions using AI-powered speech recognition, and enables users to query the audio content using natural language.
Base URL
/api/agents/audio_agent
Authentication
All endpoints require authentication. Sign up to the https://nextneural.superteams.ai to get your API key.
How It Works
The Audio Agent performs comprehensive audio processing:
- Audio Transcription: Converts audio files to text using AI-powered speech recognition
- Intelligent Search: Finds relevant content from your audio based on natural language queries
- Answer Generation: Provides contextual answers to questions about your audio content
- Conversation Management: Maintains chat history and conversation context for seamless interactions
Endpoints
1. Health Check
Check if the Audio Agent service is running.
Endpoint: GET /health
Authentication: None required
Response:
{
"status": "healthy",
"service": "AUDIO Agent"
}
2. Process Audio File
Upload and process an audio file to generate transcription.
Endpoint: POST /process-audio
Authentication: Required
Request Body:
{
"file_name": "meeting_recording.mp3",
"kb_document_id": 123
}
Parameters:
file_name(required): Filename of the audio in the media directorykb_document_id(required): Reference to the knowledge base document ID for ownership verification
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_name": "meeting_recording.mp3",
"kb_document_id": 123
}'
Response (New Processing):
{
"message": "Audio file processed and stored successfully.",
"already_processed": false,
"audio_id": 456,
"kb_document_id": 123,
"filename": "meeting_recording.mp3",
"transcript": "This is the full transcription of the audio file..."
}
Response (Already Processed):
{
"message": "Audio already processed",
"already_processed": true,
"audio_id": 456,
"kb_document_id": 123,
"filename": "meeting_recording.mp3",
"transcript": "This is the full transcription of the audio file...",
"processed_date": "2025-01-14T10:30:00"
}
Notes:
- The audio file must exist in the configured media directory
- Document ownership is verified before processing
- If audio was already processed, returns cached result without re-processing
- High-accuracy AI transcription ensures quality results
3. Re-parse Audio File
Re-process an already processed audio file by deleting old data and re-transcribing.
Endpoint: POST /reparse-audio
Authentication: Required
Request Body:
{
"file_name": "meeting_recording.mp3",
"kb_document_id": 123
}
Parameters:
file_name(required): Filename of the audio in the media directorykb_document_id(required): Reference to the knowledge base document ID
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/reparse-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_name": "meeting_recording.mp3",
"kb_document_id": 123
}'
Response:
{
"message": "Audio file re-parsed and stored successfully.",
"reparsed": true,
"audio_id": 457,
"kb_document_id": 123,
"filename": "meeting_recording.mp3",
"transcript": "This is the newly generated transcription..."
}
Notes:
- Deletes existing audio record and re-processes from scratch
- Useful when transcription quality was poor or audio was updated
4. Ask Question (RAG Query)
Query the audio content using natural language. The system retrieves relevant chunks and generates contextual answers.
Endpoint: POST /ask_audio
Authentication: Required
Request Body:
{
"question": "What were the main topics discussed in the meeting?",
"kb_document_id": 123
}
Parameters:
question(required): Natural language question about the audio contentkb_document_id(required): Knowledge base document ID to query
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What were the main topics discussed?",
"kb_document_id": 123
}'
Response:
{
"answer": "Based on the audio transcription, the main topics discussed were: 1) Project timeline and milestones, 2) Budget allocation for Q2, 3) Team resource planning, and 4) Client feedback on the prototype.",
"kb_document_id": 123
}
Supported Query Types:
- Specific questions: "What is the project deadline?"
- Summary requests: "Give me a summary of the audio"
- Full transcript: "Show me the complete transcript"
- Key highlights: "What are the important points?"
- Topic exploration: "What topics are covered?"
Notes:
- Intelligent search finds the most relevant content from your audio
- Generates accurate, contextual answers
- Handles greetings and casual conversation naturally
5. Create Conversation
Create a new conversation session for chat history tracking.
Endpoint: POST /conversations/create
Authentication: Required
Request Body:
{
"audio_id": 456,
"kb_document_id": 123,
"title": "Meeting Discussion"
}
Parameters:
audio_id(optional): ID of the audio file for this conversationkb_document_id(optional): Knowledge base document IDtitle(optional): Custom title for the conversation
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": 123,
"title": "Q4 Planning Meeting"
}'
Response:
{
"id": 789,
"user_id": 1,
"audio_id": 456,
"kb_document_id": 123,
"title": "Q4 Planning Meeting",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:30:00"
}
Notes:
- If
kb_document_idis provided withoutaudio_id, the system finds the most recent audio for that document - Document ownership is verified
- Conversations track chat history and context
6. Add Message to Conversation
Add a user or assistant message to an existing conversation.
Endpoint: POST /conversations/{conversation_id}/messages
Authentication: Required
Path Parameters:
conversation_id(required): ID of the conversation
Request Body:
{
"conversation_id": 789,
"audio_id": 456,
"role": "user",
"content": "What were the action items?"
}
Parameters:
conversation_id(required): ID of the conversationaudio_id(optional): Audio context for this messagerole(required): Either "user" or "assistant"content(required): Message content
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789/messages" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"conversation_id": 789,
"role": "user",
"content": "What were the action items?"
}'
Response:
{
"id": 1234,
"conversation_id": 789,
"audio_id": 456,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
}
Notes:
- Conversation must belong to the authenticated user
- Updates conversation's
last_message_attimestamp - Messages are ordered by timestamp
7. Get Conversation History
Retrieve all conversations for the authenticated user.
Endpoint: GET /conversations/history
Authentication: Required
Query Parameters:
limit(optional, default: 100): Maximum number of conversations to return
Request Example:
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history?limit=50" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
[
{
"id": 789,
"fileName": "meeting_recording.mp3",
"analyzedAt": "2025-01-14T10:35:00",
"duration": "5 messages",
"audioId": 456,
"kbDocumentId": 123
},
{
"id": 788,
"fileName": "Q3 Review",
"analyzedAt": "2025-01-13T15:20:00",
"duration": "12 messages",
"audioId": 455,
"kbDocumentId": 122
}
]
Notes:
- Returns conversations in reverse chronological order (newest first)
- Shows message count as "duration"
- Displays audio filename or KB document title
- Only returns user's own conversations
8. Get Specific Conversation
Retrieve a specific conversation with all its messages.
Endpoint: GET /conversations/{conversation_id}
Authentication: Required
Path Parameters:
conversation_id(required): ID of the conversation to retrieve
Request Example:
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"id": 789,
"user_id": 1,
"audio_id": 456,
"kb_document_id": 123,
"title": "Q4 Planning Meeting",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:35:00",
"audio": {
"audio_id": 456,
"kb_document_id": 123,
"file_name": "meeting_recording.mp3"
},
"messages": [
{
"id": 1234,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
},
{
"id": 1235,
"role": "assistant",
"content": "The action items mentioned were...",
"timestamp": "2025-01-14T10:35:05"
}
]
}
Notes:
- Only the conversation owner can access it
- Returns 404 if conversation doesn't exist or access denied
- Messages are ordered chronologically
9. Delete Conversation
Delete a conversation and all its messages.
Endpoint: DELETE /conversations/{conversation_id}
Authentication: Required
Path Parameters:
conversation_id(required): ID of the conversation to delete
Request Example:
curl -X DELETE "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"success": true,
"message": "Conversation deleted successfully"
}
Notes:
- Only the conversation owner can delete it
- All messages are cascade deleted
- Returns 404 if conversation doesn't exist or access denied
Data Models
AudioInfo Structure
{
"id": 456,
"kb_document_id": 123,
"file_name": "meeting_recording.mp3",
"file_size": 5242880,
"total_character": 15000,
"full_text": "Complete transcription text...",
"date_time": "2025-01-14T10:30:00",
"user_id": 1
}
Field Descriptions:
id: Unique identifier for the audio recordkb_document_id: Reference to knowledge base documentfile_name: Original audio filenamefile_size: Size of transcript file in bytestotal_character: Total character count in transcriptfull_text: Complete transcription textdate_time: Processing timestampuser_id: Owner of the audio record
Conversation Structure
{
"id": 789,
"user_id": 1,
"audio_id": 456,
"kb_document_id": 123,
"title": "Meeting Discussion",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:35:00",
"is_active": true
}
Message Structure
{
"id": 1234,
"conversation_id": 789,
"audio_id": 456,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
}
Error Responses
All endpoints may return the following error responses:
400 Bad Request:
{
"detail": "kb_document_id is required."
}
403 Forbidden:
{
"detail": "Access denied. Document 123 does not belong to user 1"
}
404 Not Found:
{
"detail": "File not found in media directory."
}
404 Not Found (Conversation):
{
"detail": "Conversation not found or access denied"
}
500 Internal Server Error:
{
"detail": "Transcription failed: [error message]"
}
Best Practices
Audio Quality Requirements
- Format: MP3, WAV, M4A, or other common audio formats
- Duration: Any length (longer files take more time to process)
- Audio Quality: Clear speech, minimal background noise
- Language: English (primary), with support for multiple languages
- Bitrate: 128 kbps or higher recommended
Recording Guidelines
- Use a good quality microphone
- Record in a quiet environment
- Speak clearly and at moderate pace
- Avoid overlapping speech in multi-speaker scenarios
- Keep audio files under 100MB for optimal processing
Query Best Practices
- Specific Questions: Ask direct questions for precise answers
- Summary Requests: Use keywords like "summary", "overview", "main points"
- Full Transcript: Request "complete transcript" or "everything said"
- Contextual Queries: Reference specific topics or speakers when possible
- Follow-up Questions: Use conversations to maintain context
Integration Workflow
# 1. Upload audio to media directory (via your file upload system)
# Audio file: meeting_recording.mp3
# 2. Process the audio file
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_name": "meeting_recording.mp3",
"kb_document_id": 123
}'
# 3. Create a conversation
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": 123,
"title": "Meeting Analysis"
}'
# 4. Ask questions about the audio
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What were the main action items?",
"kb_document_id": 123
}'
# 5. Retrieve conversation history
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history" \
-H "Authorization: Bearer YOUR_TOKEN"
Performance Considerations
- Transcription Time:
- Short audio (< 5 min): 10-30 seconds
- Medium audio (5-20 min): 30-90 seconds
- Long audio (> 20 min): 1-5 minutes
- Query Response Time: Typically 1-3 seconds for standard queries
- Storage: Each minute of audio generates approximately 150-200 words
- Caching: Processed audio is cached; use
/reparse-audioto force re-processing
Security Features
- User Isolation: All queries are private to your account
- Document Ownership: Only you can access your documents
- Authentication: All endpoints require valid API tokens
- Conversation Privacy: Your conversations are completely private
Troubleshooting
Transcription Issues
Problem: Poor transcription quality
- Cause: Background noise, unclear speech, low audio quality
- Solution: Re-record with better audio quality, use
/reparse-audioendpoint
Problem: Transcription failed
- Cause: Unsupported audio format, corrupted file, API issues
- Solution: Convert to MP3/WAV, verify file integrity, check API key configuration
Query Issues
Problem: "No relevant information found"
- Cause: Query doesn't match audio content, audio not processed
- Solution: Verify audio was processed, rephrase query, ask for summary first
Problem: Incomplete answers
- Cause: Query doesn't match available content well
- Solution: Ask more specific questions, request full transcript
Performance Issues
Problem: Slow transcription
- Cause: Large audio file
- Solution: Split large files, retry if timeout occurs
Problem: Slow query responses
- Cause: Complex query or large audio file
- Solution: Use more specific queries
Limitations
- Language Support: Optimized for English; other languages may have reduced accuracy
- Audio Length: Very long files (> 2 hours) may have processing delays
Future Enhancements
- Speaker diarization (identify different speakers)
- Multi-language support with automatic detection
- Real-time streaming transcription
- Audio quality analysis and enhancement
- Custom vocabulary and domain-specific training
- Integration with video files (extract audio track)