Audio Agent

The Audio Agent provides automated audio transcription and intelligent question-answering capabilities. It processes audio files, generates transcriptions using AI-powered speech recognition, and enables users to query the audio content using natural language.

Base URL

/api/agents/audio_agent

Authentication

All endpoints require authentication. Sign up to the https://nextneural.superteams.ai to get your API key.

How It Works

The Audio Agent performs comprehensive audio processing:

Audio Transcription: Converts audio files to text using AI-powered speech recognition
Intelligent Search: Finds relevant content from your audio based on natural language queries
Answer Generation: Provides contextual answers to questions about your audio content
Conversation Management: Maintains chat history and conversation context for seamless interactions

Endpoints

1. Health Check

Check if the Audio Agent service is running.

Endpoint: GET /health

Authentication: None required

Response:

{
  "status": "healthy",
  "service": "AUDIO Agent"
}

2. Process Audio File

Upload and process an audio file to generate transcription.

Endpoint: POST /process-audio

Authentication: Required

Request Body:

{
  "file_name": "meeting_recording.mp3",
  "kb_document_id": 123
}

Parameters:

file_name (required): Filename of the audio in the media directory
kb_document_id (required): Reference to the knowledge base document ID for ownership verification

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "meeting_recording.mp3",
    "kb_document_id": 123
  }'

Response (New Processing):

{
  "message": "Audio file processed and stored successfully.",
  "already_processed": false,
  "audio_id": 456,
  "kb_document_id": 123,
  "filename": "meeting_recording.mp3",
  "transcript": "This is the full transcription of the audio file..."
}

Response (Already Processed):

{
  "message": "Audio already processed",
  "already_processed": true,
  "audio_id": 456,
  "kb_document_id": 123,
  "filename": "meeting_recording.mp3",
  "transcript": "This is the full transcription of the audio file...",
  "processed_date": "2025-01-14T10:30:00"
}

Notes:

The audio file must exist in the configured media directory
Document ownership is verified before processing
If audio was already processed, returns cached result without re-processing
High-accuracy AI transcription ensures quality results

3. Re-parse Audio File

Re-process an already processed audio file by deleting old data and re-transcribing.

Endpoint: POST /reparse-audio

Authentication: Required

Request Body:

{
  "file_name": "meeting_recording.mp3",
  "kb_document_id": 123
}

Parameters:

file_name (required): Filename of the audio in the media directory
kb_document_id (required): Reference to the knowledge base document ID

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/reparse-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "meeting_recording.mp3",
    "kb_document_id": 123
  }'

Response:

{
  "message": "Audio file re-parsed and stored successfully.",
  "reparsed": true,
  "audio_id": 457,
  "kb_document_id": 123,
  "filename": "meeting_recording.mp3",
  "transcript": "This is the newly generated transcription..."
}

Notes:

Deletes existing audio record and re-processes from scratch
Useful when transcription quality was poor or audio was updated

4. Ask Question (RAG Query)

Query the audio content using natural language. The system retrieves relevant chunks and generates contextual answers.

Endpoint: POST /ask_audio

Authentication: Required

Request Body:

{
  "question": "What were the main topics discussed in the meeting?",
  "kb_document_id": 123
}

Parameters:

question (required): Natural language question about the audio content
kb_document_id (required): Knowledge base document ID to query

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were the main topics discussed?",
    "kb_document_id": 123
  }'

Response:

{
  "answer": "Based on the audio transcription, the main topics discussed were: 1) Project timeline and milestones, 2) Budget allocation for Q2, 3) Team resource planning, and 4) Client feedback on the prototype.",
  "kb_document_id": 123
}

Supported Query Types:

Specific questions: "What is the project deadline?"
Summary requests: "Give me a summary of the audio"
Full transcript: "Show me the complete transcript"
Key highlights: "What are the important points?"
Topic exploration: "What topics are covered?"

Notes:

Intelligent search finds the most relevant content from your audio
Generates accurate, contextual answers
Handles greetings and casual conversation naturally

5. Create Conversation

Create a new conversation session for chat history tracking.

Endpoint: POST /conversations/create

Authentication: Required

Request Body:

{
  "audio_id": 456,
  "kb_document_id": 123,
  "title": "Meeting Discussion"
}

Parameters:

audio_id (optional): ID of the audio file for this conversation
kb_document_id (optional): Knowledge base document ID
title (optional): Custom title for the conversation

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": 123,
    "title": "Q4 Planning Meeting"
  }'

Response:

{
  "id": 789,
  "user_id": 1,
  "audio_id": 456,
  "kb_document_id": 123,
  "title": "Q4 Planning Meeting",
  "started_at": "2025-01-14T10:30:00",
  "last_message_at": "2025-01-14T10:30:00"
}

Notes:

If kb_document_id is provided without audio_id, the system finds the most recent audio for that document
Document ownership is verified
Conversations track chat history and context

6. Add Message to Conversation

Add a user or assistant message to an existing conversation.

Endpoint: POST /conversations/{conversation_id}/messages

Authentication: Required

Path Parameters:

conversation_id (required): ID of the conversation

Request Body:

{
  "conversation_id": 789,
  "audio_id": 456,
  "role": "user",
  "content": "What were the action items?"
}

Parameters:

conversation_id (required): ID of the conversation
audio_id (optional): Audio context for this message
role (required): Either "user" or "assistant"
content (required): Message content

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789/messages" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": 789,
    "role": "user",
    "content": "What were the action items?"
  }'

Response:

{
  "id": 1234,
  "conversation_id": 789,
  "audio_id": 456,
  "role": "user",
  "content": "What were the action items?",
  "timestamp": "2025-01-14T10:35:00"
}

Notes:

Conversation must belong to the authenticated user
Updates conversation's last_message_at timestamp
Messages are ordered by timestamp

7. Get Conversation History

Retrieve all conversations for the authenticated user.

Endpoint: GET /conversations/history

Authentication: Required

Query Parameters:

limit (optional, default: 100): Maximum number of conversations to return

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history?limit=50" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

[
  {
    "id": 789,
    "fileName": "meeting_recording.mp3",
    "analyzedAt": "2025-01-14T10:35:00",
    "duration": "5 messages",
    "audioId": 456,
    "kbDocumentId": 123
  },
  {
    "id": 788,
    "fileName": "Q3 Review",
    "analyzedAt": "2025-01-13T15:20:00",
    "duration": "12 messages",
    "audioId": 455,
    "kbDocumentId": 122
  }
]

Notes:

Returns conversations in reverse chronological order (newest first)
Shows message count as "duration"
Displays audio filename or KB document title
Only returns user's own conversations

8. Get Specific Conversation

Retrieve a specific conversation with all its messages.

Endpoint: GET /conversations/{conversation_id}

Authentication: Required

Path Parameters:

conversation_id (required): ID of the conversation to retrieve

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "id": 789,
  "user_id": 1,
  "audio_id": 456,
  "kb_document_id": 123,
  "title": "Q4 Planning Meeting",
  "started_at": "2025-01-14T10:30:00",
  "last_message_at": "2025-01-14T10:35:00",
  "audio": {
    "audio_id": 456,
    "kb_document_id": 123,
    "file_name": "meeting_recording.mp3"
  },
  "messages": [
    {
      "id": 1234,
      "role": "user",
      "content": "What were the action items?",
      "timestamp": "2025-01-14T10:35:00"
    },
    {
      "id": 1235,
      "role": "assistant",
      "content": "The action items mentioned were...",
      "timestamp": "2025-01-14T10:35:05"
    }
  ]
}

Notes:

Only the conversation owner can access it
Returns 404 if conversation doesn't exist or access denied
Messages are ordered chronologically

9. Delete Conversation

Delete a conversation and all its messages.

Endpoint: DELETE /conversations/{conversation_id}

Authentication: Required

Path Parameters:

conversation_id (required): ID of the conversation to delete

Request Example:

curl -X DELETE "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "success": true,
  "message": "Conversation deleted successfully"
}

Notes:

Only the conversation owner can delete it
All messages are cascade deleted
Returns 404 if conversation doesn't exist or access denied

Data Models

AudioInfo Structure

{
  "id": 456,
  "kb_document_id": 123,
  "file_name": "meeting_recording.mp3",
  "file_size": 5242880,
  "total_character": 15000,
  "full_text": "Complete transcription text...",
  "date_time": "2025-01-14T10:30:00",
  "user_id": 1
}

Field Descriptions:

id: Unique identifier for the audio record
kb_document_id: Reference to knowledge base document
file_name: Original audio filename
file_size: Size of transcript file in bytes
total_character: Total character count in transcript
full_text: Complete transcription text
date_time: Processing timestamp
user_id: Owner of the audio record

Conversation Structure

{
  "id": 789,
  "user_id": 1,
  "audio_id": 456,
  "kb_document_id": 123,
  "title": "Meeting Discussion",
  "started_at": "2025-01-14T10:30:00",
  "last_message_at": "2025-01-14T10:35:00",
  "is_active": true
}

Message Structure

{
  "id": 1234,
  "conversation_id": 789,
  "audio_id": 456,
  "role": "user",
  "content": "What were the action items?",
  "timestamp": "2025-01-14T10:35:00"
}

Error Responses

All endpoints may return the following error responses:

400 Bad Request:

{
  "detail": "kb_document_id is required."
}

403 Forbidden:

{
  "detail": "Access denied. Document 123 does not belong to user 1"
}

404 Not Found:

{
  "detail": "File not found in media directory."
}

404 Not Found (Conversation):

{
  "detail": "Conversation not found or access denied"
}

500 Internal Server Error:

{
  "detail": "Transcription failed: [error message]"
}

Best Practices

Audio Quality Requirements

Format: MP3, WAV, M4A, or other common audio formats
Duration: Any length (longer files take more time to process)
Audio Quality: Clear speech, minimal background noise
Language: English (primary), with support for multiple languages
Bitrate: 128 kbps or higher recommended

Recording Guidelines

Use a good quality microphone
Record in a quiet environment
Speak clearly and at moderate pace
Avoid overlapping speech in multi-speaker scenarios
Keep audio files under 100MB for optimal processing

Query Best Practices

Specific Questions: Ask direct questions for precise answers
Summary Requests: Use keywords like "summary", "overview", "main points"
Full Transcript: Request "complete transcript" or "everything said"
Contextual Queries: Reference specific topics or speakers when possible
Follow-up Questions: Use conversations to maintain context

Integration Workflow

# 1. Upload audio to media directory (via your file upload system)
# Audio file: meeting_recording.mp3

# 2. Process the audio file
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "meeting_recording.mp3",
    "kb_document_id": 123
  }'

# 3. Create a conversation
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": 123,
    "title": "Meeting Analysis"
  }'

# 4. Ask questions about the audio
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were the main action items?",
    "kb_document_id": 123
  }'

# 5. Retrieve conversation history
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history" \
  -H "Authorization: Bearer YOUR_TOKEN"

Performance Considerations

Transcription Time:
- Short audio (< 5 min): 10-30 seconds
- Medium audio (5-20 min): 30-90 seconds
- Long audio (> 20 min): 1-5 minutes
Query Response Time: Typically 1-3 seconds for standard queries
Storage: Each minute of audio generates approximately 150-200 words
Caching: Processed audio is cached; use /reparse-audio to force re-processing

Security Features

User Isolation: All queries are private to your account
Document Ownership: Only you can access your documents
Authentication: All endpoints require valid API tokens
Conversation Privacy: Your conversations are completely private

Troubleshooting

Transcription Issues

Problem: Poor transcription quality

Cause: Background noise, unclear speech, low audio quality
Solution: Re-record with better audio quality, use /reparse-audio endpoint

Problem: Transcription failed

Cause: Unsupported audio format, corrupted file, API issues
Solution: Convert to MP3/WAV, verify file integrity, check API key configuration

Query Issues

Problem: "No relevant information found"

Cause: Query doesn't match audio content, audio not processed
Solution: Verify audio was processed, rephrase query, ask for summary first

Problem: Incomplete answers

Cause: Query doesn't match available content well
Solution: Ask more specific questions, request full transcript

Performance Issues

Problem: Slow transcription

Cause: Large audio file
Solution: Split large files, retry if timeout occurs

Problem: Slow query responses

Cause: Complex query or large audio file
Solution: Use more specific queries

Limitations

Language Support: Optimized for English; other languages may have reduced accuracy
Audio Length: Very long files (> 2 hours) may have processing delays

Future Enhancements

Speaker diarization (identify different speakers)
Multi-language support with automatic detection
Real-time streaming transcription
Audio quality analysis and enhancement
Custom vocabulary and domain-specific training
Integration with video files (extract audio track)

Base URL​

Authentication​

How It Works​

Endpoints​

1. Health Check​

2. Process Audio File​

3. Re-parse Audio File​

4. Ask Question (RAG Query)​

5. Create Conversation​

6. Add Message to Conversation​

7. Get Conversation History​

8. Get Specific Conversation​

9. Delete Conversation​

Data Models​

AudioInfo Structure​

Conversation Structure​

Message Structure​

Error Responses​

Best Practices​

Audio Quality Requirements​

Recording Guidelines​

Query Best Practices​

Integration Workflow​

Performance Considerations​

Security Features​

Troubleshooting​

Transcription Issues​

Query Issues​

Performance Issues​

Limitations​

Future Enhancements​

Base URL

Authentication

How It Works

Endpoints

1. Health Check

2. Process Audio File

3. Re-parse Audio File

4. Ask Question (RAG Query)

5. Create Conversation

6. Add Message to Conversation

7. Get Conversation History

8. Get Specific Conversation

9. Delete Conversation

Data Models

AudioInfo Structure

Conversation Structure

Message Structure

Error Responses

Best Practices

Audio Quality Requirements

Recording Guidelines

Query Best Practices

Integration Workflow

Performance Considerations

Security Features

Troubleshooting

Transcription Issues

Query Issues

Performance Issues

Limitations

Future Enhancements