AI-Powered Podcast Production 2026 — From Recording to Publishing with Descript, ElevenLabs & Auphonic
Overview
Podcasting in 2026 has been transformed by AI. What once required expensive studio equipment, hours of manual editing, and professional audio engineers can now be done with three AI tools working together. This tutorial walks you through producing a complete podcast episode — from raw recording to published RSS feed — using Descript for AI-powered editing, ElevenLabs for voice cloning and filler-word removal, and Auphonic for broadcast-quality audio mastering. You’ll learn how to edit audio like a word processor, generate synthetic voiceovers, and automatically level your audio to professional standards. No prior podcasting or audio engineering experience required.
Prerequisites
- A computer running macOS, Windows, or Linux (16GB RAM recommended for Descript)
- A microphone (built-in is OK for testing, USB/XLR preferred for production)
- Descript account (free tier works; Pro tier at $24/month unlocks features)
- ElevenLabs account (free tier: 10,000 characters/month; Starter at $5/month)
- Auphonic account (free tier: 2 hours/month; Pay-as-you-go at $12/hour)
- A 5-10 minute raw audio recording (could be a solo monologue or conversation)
- Optional: hosting platform account (Buzzsprout, Transistor, or Spotify for Podcasters)
- Node.js 18+ and FFmpeg installed (for optional local processing)
Step 1: Record Your Raw Audio
Before any AI magic, you need raw source material. Use any recording tool — Descript has a built-in recorder, or use QuickTime (Mac), Voice Recorder (Windows), or OBS Studio (cross-platform).
Recording best practices:
- Record in a quiet room with soft furnishings to minimize echo
- Keep your microphone 6-12 inches from your mouth
- Record at 48kHz/16-bit WAV or FLAC for best quality
- Record for 5-10 minutes — don’t worry about mistakes, filler words, or pauses
Command-line recording with FFmpeg (optional):
# List audio devices
ffmpeg -f avfoundation -list_devices true -i ""
# Record from built-in microphone
ffmpeg -f avfoundation -i ":0" -acodec pcm_s16le -ar 48000 -ac 1 \
podcast_raw.wav
Step 2: Import and Transcribe in Descript
Descript uses AI to transcribe your audio instantly, turning the waveform into editable text.
- Open Descript → New Project → “Import” → select your
podcast_raw.wav - Descript automatically transcribes the file. Wait 30-60 seconds for a 10-minute recording.
- Review the transcript. Descript’s speaker detection labels different voices automatically
- If speaker labels are wrong: click on a word → “Change Speaker” → assign to correct speaker
Pro tip: Click the “Transcription” menu → “Retranscribe” if you need to switch the AI model. The “Studio” model is most accurate for English podcasts.
Step 3: Edit Your Podcast Like a Word Processor
The killer feature: delete text in the transcript, and the audio is deleted too. No waveform cutting required.
Remove filler words:
- Click the “Studio” panel (wand icon) → “Remove Filler Words”
- Check: “um”, “uh”, “like”, “you know”, “actually”
- Click “Apply” — Descript removes them and uses AI to smooth the gaps
Cut mistakes and pauses:
- Highlight the text of a mistake in the transcript
- Press Delete — the audio is removed
- For long silences: Studio panel → “Remove Silence” → threshold at 0.5s
Rearrange sections:
- Select a paragraph in the transcript
- Cut (⌘X) and paste (⌘V) to reposition the entire audio clip
- Descript crossfades transitions automatically
Add a “Regenerate” section with AI voice:
- Place your cursor where you want new audio
- Click the ”+” icon → “AI Actions” → “Generate Audio”
- Type: “Welcome back to the show. In this episode, we’re exploring how AI is reshaping…”
- Choose your voice (ElevenLabs integration if connected; otherwise Descript’s stock voices)
- Click “Generate” — synthetic audio is inserted seamlessly
Export the raw transcript for review:
- File → Export → “Transcript (TXT)” — save as
episode_transcript.txt - Share with a guest or editor for fact-checking before final polish
Step 4: Enhance with ElevenLabs AI Voice
ElevenLabs takes your podcast to the next level with voice cloning, multilingual dubbing, and ultra-realistic text-to-speech.
Option A: Voice cloning for consistent narration:
- In ElevenLabs, go to “Voice Lab” → “Voice Cloning”
- Upload a 3+ minute clean recording of your voice (no background noise)
- ElevenLabs creates a voice model (instant cloning in version 2+)
- Name it “Podcast Host Voice”
- Go to “Speech Synthesis” → select your cloned voice
- Enter any script and click “Generate”
- Download the WAV file — this sounds indistinguishable from your real voice
Option B: Dub your podcast into multiple languages:
- In ElevenLabs, go to “Dubbing” → “New Dubbing”
- Upload your exported Descript WAV file
- Select source language (e.g., English) and target languages (e.g., Spanish, Japanese, German)
- Speaker detection will map each voice automatically
- Click “Dub” — wait 2-5 minutes per language
- Download each dubbed track as a separate WAV file
Fix filler words at the AI level:
- ElevenLabs Pro users: Enable “Automatic Filler Word Removal” in project settings
- Set sensitivity to “High” — it catches 90%+ of umms and ahhs at this level
Step 5: Master Audio with Auphonic
Auphonic uses AI to apply broadcast-quality leveling, noise reduction, and loudness normalization.
- Go to Auphonic.com → “New Production”
- Upload your edited WAV from Descript (or the dubbed version from ElevenLabs)
- Configure these settings:
- Algorithm: “Humans” (for speech) or “Humans and Music” (if you have intro/outro music)
- Loudness target: -16 LUFS (podcast standard; -23 LUFS for broadcast)
- Leveler strength: 65% (balances volume fluctuations without sounding compressed)
- Noise reduction: Enable + set to “Medium” — removes fan hum, AC noise, traffic rumble
- Filter infrasound and ultrasound: Enable — removes sub-20Hz rumbles
- Add intro/outro music: upload your theme song, set fade-in at 0.5s, fade-out at 3s
- Click “Start Processing” — Auphonic runs AI analysis for 2-3 minutes
- Download the result: choose WAV 48kHz/24-bit for maximum quality
Verify loudness compliance:
# Check integrated loudness with FFmpeg
ffmpeg -i episode_mastered.wav -af loudnorm=I=-16:dual_mono=true -f null -
# Look for "Integrated loudness: I = -16.0 LUFS" — target achieved
Step 6: Generate Show Notes and Metadata with AI
Use the transcript from Step 3 to auto-generate episode metadata.
Option: Use Descript’s AI show notes:
- In Descript, click “AI Actions” → “Generate Show Notes”
- Select format: “Timestamps + Summary + Key Takeaways”
- The AI produces a formatted document ready for your RSS feed
Option: Use a local script with Claude/OpenAI API:
from openai import OpenAI
client = OpenAI()
with open("episode_transcript.txt", "r") as f:
transcript = f.read()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "Generate podcast show notes with: 1) 2-sentence summary, 2) 3 key takeaways as bullet points, 3) 5 timestamped highlights."
}, {
"role": "user",
"content": f"Transcript:\n{transcript[:10000]}"
}]
)
show_notes = response.choices[0].message.content
print(show_notes)
Step 7: Upload to Your Podcast Host
- Log into your hosting platform (e.g., Buzzsprout, Transistor, Spotify for Podcasters)
- Click “New Episode”
- Upload your Auphonic-mastered
episode_mastered.wav - Paste the AI-generated show notes into the description field
- Set publish date and time
- Add episode artwork (use Canva AI or Midjourney for generated cover art)
- Click “Publish” — your hosting platform generates the RSS feed
Test the RSS feed:
# Validate RSS feed URL
curl -s "https://yourpodcast.libsyn.com/rss" | head -50
# Look for valid XML with <channel>, <item>, <enclosure> tags
Step 8 (Advanced): Automate the Pipeline with n8n
Set up an n8n workflow that triggers when you drop a raw WAV into a Google Drive folder:
- Trigger: Google Drive → “File Watcher” on
/Podcast/Raw/folder - Upload to Descript: Use Descript API → create project → import file
- Wait for transcription → use webhook to detect completion
- Export transcript: Descript API → export TXT
- Generate notes: OpenAI node → send transcript → receive show notes
- Upload to Auphonic: Auphonic API → process audio → download mastered file
- Publish: Webhook → Buzzsprout/Transistor API → create episode
This pipeline cuts your per-episode time from 2-3 hours to ~15 minutes.
What You’ve Built
You now have a complete, professionally produced podcast episode:
- Clean, edited audio with no filler words or awkward pauses
- Synthetic voiceovers or dubs in multiple languages
- Broadcast-quality mastering at -16 LUFS
- AI-generated show notes and metadata
- An RSS-ready episode file
The entire workflow takes under 30 minutes for a 10-minute episode, compared to 2-3 hours with traditional tools.
Troubleshooting
AI voice sounds robotic or unnatural: Ensure your ElevenLabs voice clone was trained on at least 3 minutes of clean audio. For better results, use the “Professional Voice Cloning” plan ($99/month) which supports 30+ minutes of training data. Also check that the Stability slider in ElevenLabs is set to 40-50% — too high produces monotone output.
Auphonic introduces distortion on music segments: If your podcast has intro/outro music, set Algorithm to “Humans and Music” and reduce Leveler strength to 45%. Music transitions should use short crossfades (0.5s) to avoid the leveler overcompensating on silence gaps.
Descript transcription is inaccurate for technical terms: Upload a custom vocabulary list: Descript → Project → “Custom Vocabulary” → add domain-specific terms like “LangChain”, “RAG pipeline”, “vector embedding”. The transcription engine will prioritize these terms.
Podcast audio is too quiet compared to other shows: Common issue. Your target should be -16 LUFS (integrated) with -1 dB true peak. Use the FFmpeg verification command from Step 5. If Auphonic’s output is still quiet, increase “Leveler Makeup Gain” to 60-70%.
Dubbing produces out-of-sync audio: ElevenLabs dubbing adds some processing delay. In your podcast editor, align the dubbed track’s first word with the original’s first word, then use “Sync All” in Descript to auto-align the rest.
Next Steps
- Explore Descript’s “Eye Contact” feature for video podcasts (AI reframes your eyes to look at the camera)
- Set up ElevenLabs “Sound Effects” to add SFX from text descriptions
- Build an automated publishing pipeline with the GitHub Actions + n8n integration described in Step 8
- Try Auphonic’s “Multitrack Production” for interviews with remote guests
- Distribute your RSS feed to Apple Podcasts, Spotify, Google Podcasts, and Amazon Music