
Audio Guide
Production Standards
Sample rates, loudness targets, dynamic range, voice recording specs, and mobile optimisation standards.
What you'll learn in this guide
β14 LUFS
Integrated loudness target for digital audio ads (EBU R128)
EBU R128 Loudness Standard
48 kHz
Recommended sample rate for video production audio
AES Digital Audio Standards
β60 dB
Maximum noise floor for professional voice recording
Professional Studio Standards
85%
Of social media audio is consumed on mobile phone speakers
Meta Audio Consumption Report 2024
Production Standards
Professional audio requires professional standards. This guide covers sample rates, bit depth, loudness targets for digital and broadcast, dynamic range, voice recording specifications, music licensing, and mobile optimisation.
Audio Production Specifications
| Standard | Specification | When to Use | Common Mistake |
|---|---|---|---|
| Sample Rate | 44.1 kHz (standard) Β· 48 kHz (video) | Always set before recording, never convert up | Recording at 22 kHz then upsampling β adds no quality |
| Bit Depth | 16-bit minimum Β· 24-bit for mastering | Record in 24-bit, deliver in 16-bit or 24-bit | Recording in 8-bit for "smaller files" β destroys dynamic range |
| Loudness β Digital | β14 LUFS integrated (EBU R128) | All social media, digital ads, and web audio | Mastering to 0 dB peak β causes clipping on playback |
| Loudness β Radio | β23 LUFS integrated (broadcast) | Radio spots, broadcast TV, and podcast pre-rolls | Using digital loudness targets for radio β too loud for broadcast |
| Dynamic Range | Minimum 8 LU Β· avoid over-compression | Music beds, voice, and mixed audio | Compressing everything to 2 LU β sounds fatiguing and unnatural |
| Voice Recording | Professional studio Β· noise floor β60 dB | All voiceover work, narrator, and dialogue | Recording in untreated rooms β echo and room noise |
| Music Licensing | Clear for territory, platform, and duration | Before any deployment, without exception | Using "royalty-free" music without reading the licence terms |
| AI Voice Quality | Review at 1.25Γ and 0.75Γ for artifacts | Every AI-generated voiceover before deployment | Approving AI voice at normal speed only β misses subtle glitches |
| Disclaimer Audio | Audibly distinct Β· unhurried Β· comparable volume | Legal disclaimers, terms, and regulatory notices | Speed-reading disclaimers at low volume β regulatory violation |
| Mobile Optimisation | Test on earbuds, phone speaker, and noisy env. | Every final audio asset before campaign launch | Only testing on studio monitors β 85% of audience uses phone speakers |
Sample Rate and Bit Depth: Getting the Foundation Right
Sample rate and bit depth are the two foundational settings that determine the quality ceiling of your entire audio production. Getting them wrong at the start cannot be fixed later β no amount of post-production can add quality that was never captured.
Sample Rate: How Many Snapshots Per Second
Sample rate determines how many times per second the audio signal is captured. Higher sample rates capture higher frequencies and more detail.
44.1 kHz (Standard Digital Audio) The CD-quality standard. Captures frequencies up to 22.05 kHz β well above the human hearing range (20 Hz β 20 kHz). Use this for:
- Standalone audio ads (no video)
- Podcast ads and audio-only content
- Music tracks and jingles for digital distribution
48 kHz (Video Production Standard) The standard for any audio that will be paired with video. All major video editing software (Premiere Pro, DaVinci Resolve, Final Cut) defaults to 48 kHz. Use this for:
- Social media video ads (Instagram Reels, TikTok, YouTube)
- Television and broadcast spots
- Any audio that will be muxed into a video container
Critical Rule: Never Upsample If you record at 44.1 kHz and need 48 kHz for video, you cannot simply convert up β upsampling does not add real frequency information. It just adds empty data. Always record at the target sample rate or higher. If you have 44.1 kHz audio that must go into a 48 kHz video project, use a professional sample-rate converter (not a simple resample) and accept the minor quality trade-off.
Bit Depth: Dynamic Range Resolution
Bit depth determines how many discrete volume levels can be represented. Higher bit depth = more dynamic range = quieter noise floor.
16-bit: 96 dB dynamic range. Sufficient for final delivery. CD-quality.
24-bit: 144 dB dynamic range. Record and master in 24-bit. The extra headroom means you can record at conservative levels without losing detail, and quiet passages stay clean. Final delivery can be dithered down to 16-bit if needed.
Never record in 8-bit. 8-bit audio has only 48 dB of dynamic range β you will hear quantisation noise (a granular, crunchy texture) on every quiet passage. This is not "lo-fi aesthetic" β it is unusable for professional advertising.
Loudness Standards: Digital vs. Broadcast
Loudness is the single most common production mistake in audio advertising. Too loud and the audio clips, distorts, and fatigues the listener. Too quiet and the ad disappears into the background. Getting loudness right requires understanding the difference between peak level and integrated loudness.
Peak Level vs. Integrated Loudness
Peak level is the loudest instantaneous moment in the audio. It is measured in dBFS (decibels relative to full scale). A peak of 0 dBFS means the audio has reached the absolute maximum β any louder and it clips (distorts).
Integrated loudness (LUFS β Loudness Units Full Scale) is the average perceived loudness over the entire duration of the audio. This is what matters for listener experience. Two ads can have the same peak level but wildly different integrated loudness β one may sound twice as loud as the other.
Digital Ads: β14 LUFS Integrated (EBU R128)
This is the target for all social media platforms, digital display ads, and web-based audio. Why β14 LUFS?
- Spotify normalises to β14 LUFS. YouTube normalises to β14 LUFS. If your audio is louder, the platform will turn it down automatically β and the result often sounds worse than if you had mastered to the target
- β14 LUFS gives your audio breathing room. There is space for dynamic variation, for moments of emphasis, for emotional contrast
- Peak should not exceed β1 dBTP (true peak). This leaves a 1 dB safety margin to prevent clipping on lossy codec conversion (MP3, AAC)
Radio and Broadcast: β23 LUFS Integrated
Broadcast standards (EBU R128 for Europe, ATSC A/85 for North America) target β23 LUFS. This is significantly quieter than digital β if you submit a β14 LUFS ad for broadcast, it will be rejected or the broadcaster will compress it aggressively, destroying your careful production.
- Radio spots: master to β23 LUFS integrated
- Podcast pre-rolls distributed via RSS: target β16 to β18 LUFS (closer to podcast norms)
- Television spots: β23 LUFS to β24 LUFS per broadcaster requirements
Platform-Specific Loudness Quirks:
- TikTok: No official normalisation standard. Creators trend louder. Target β12 to β14 LUFS for TikTok specifically
- Instagram Reels: Follows Meta normalisation (β14 LUFS)
- YouTube Shorts: β14 LUFS normalised. Ads louder than β13 LUFS are automatically turned down
- Twitter/X Spaces: Live audio, no normalisation β master to β14 LUFS for consistency
How to Measure: Use a loudness meter plugin in your DAW (iZotope Insight, Youlean Loudness Meter β free, or Waves WLM Plus). Measure integrated loudness over the full duration of the ad, not just a section. ZorgSocial Audio Export includes a built-in loudness readout that flags non-compliant files before upload.
Dynamic Range and Voice Recording Specifications
Dynamic range is the difference between the quietest and loudest parts of your audio. It is what makes audio sound natural, engaging, and professional β or flat, fatiguing, and amateur.
Why Dynamic Range Matters for Ads
Over-compressed audio (where the dynamic range is crushed to 2β3 LU) sounds aggressive, fatiguing, and cheap. It is the sonic equivalent of an all-caps email β it may grab attention for a moment, but listeners instinctively pull away. The "loudness war" approach that dominated music production in the 2000s does not work in advertising.
The 8 LU Minimum Rule: Maintain at least 8 LU (Loudness Units) of dynamic range in your final master. This means the loudest moment should be at least 8 LU louder than the average. This gives your audio:
- Natural-sounding voice dynamics (emphasis words are louder, transitions are softer)
- Musical breathing room (crescendos and decrescendos that create emotional movement)
- Listener comfort (the ear does not fatigue from constant loudness)
When More Dynamic Range Is Better:
- Storytelling ads and brand films: 10β14 LU for cinematic feel
- Podcast ads and conversational formats: 8β10 LU for natural speech
- Music-forward ads: 8β12 LU to let the music breathe
When Less Dynamic Range Is Acceptable:
- Short-form social ads (6β15 seconds): 6β8 LU β less time for dynamics to develop
- Noisy-environment ads (transit, outdoor): 6β8 LU β need to cut through ambient noise
- Direct-response radio spots: 6β8 LU β clarity and consistency matter more than nuance
Voice Recording Specifications:
Voiceover is the most critical audio element in most ads. Poor voice recording ruins everything downstream.
Professional Studio Requirements:
- Noise floor: β60 dB or lower. This means you should hear NOTHING when the voice talent is silent β no air conditioning hum, no computer fan, no traffic rumble
- Room treatment: acoustic panels or foam to eliminate reflections and echo. A voice recorded in an untreated room has a "bathroom" quality that screams amateur
- Microphone: Large-diaphragm condenser for studio (Neumann U87, AKG C414) or dynamic (Shure SM7B) for less-treated spaces. USB microphones are not acceptable for professional advertising
- Pop filter: Always. Plosives (P, B, T sounds) create low-frequency bursts that distort the recording
- Distance: 15β20 cm from microphone. Too close creates proximity effect (boomy bass). Too far captures too much room sound
Remote Recording Specifications: When talent cannot come to a studio, remote recording is acceptable IF:
- Talent uses a professional microphone (not laptop mic or phone)
- Room is treated (closet recording is better than an open room)
- Noise floor is verified before recording (send a 10-second silence test)
- Audio is recorded locally (not through a video call codec β Zoom/Teams audio compression destroys quality)
- ZorgSocial Remote Session Manager handles real-time monitoring and local recording sync
Music Licensing: Legal Compliance for Audio Ads
Music licensing is the area where brands most frequently make costly mistakes. Using unlicensed music in advertising is not a minor oversight β it is a legal liability that can result in takedowns, lawsuits, and six-figure penalties. Every piece of music in your ad must be cleared before deployment.
The Three Rights You Need:
1. Synchronisation Rights (Sync Rights) The right to pair music with visual content (video ads). This is obtained from the music publisher (who represents the songwriter). Without sync rights, your video ad is infringing copyright the moment it goes live.
2. Master Recording Rights The right to use a specific recording of a song. This is obtained from the record label (who owns the master recording). A sync licence alone is not enough β you need both sync rights AND master rights to use a commercial recording.
3. Performance Rights The right for the music to be "performed" publicly (which includes streaming, broadcasting, and public playback). This is typically handled through performing rights organisations (PROs) like ASCAP, BMI, PRS, or SACEM. In many cases, the platform (YouTube, Spotify) covers performance rights through blanket licences.
"Royalty-Free" Does Not Mean "Free"
This is the most dangerous misconception. Royalty-free music means you pay a one-time licence fee (instead of ongoing royalties), but:
- You must still purchase the licence
- The licence has territorial restrictions (it may not cover GCC countries)
- The licence has usage restrictions ("social media" may not include paid ads)
- The licence has duration restrictions (1 year? Perpetual? Check the fine print)
- Multiple advertisers may use the same track β your "unique" jingle may appear in a competitor's ad
AI-Generated Music Licensing: Music generated by AI tools (including ZorgSocial Music Generator) has a clearer licensing path:
- ZorgSocial-generated music: full commercial rights included in your subscription. No additional licensing needed. Clear for all territories and platforms
- Third-party AI music (Suno, Udio, etc.): check the platform terms. Some restrict commercial advertising use even on paid plans
- AI-generated music cannot infringe existing copyrights IF the model was trained legally β but this is an evolving legal area. ZorgSocial Music Generator is trained exclusively on licensed datasets
MENA-Specific Licensing Considerations:
- Saudi Arabia and UAE have active copyright enforcement β do not assume "it is not enforced in this region"
- Arabic music rights are often fragmented across multiple publishers and collecting societies
- Traditional Arabic maqam compositions may be in public domain, but specific recordings are not
- Work with a licensing specialist who understands GCC copyright law when using regional Arabic music
The Licensing Checklist (Before Every Deployment):
- Territory: Is the licence valid for every country where the ad will run?
- Platform: Does the licence cover the specific platforms (paid social, programmatic display, streaming audio)?
- Duration: How long can you use the music? Does the licence expire?
- Exclusivity: Is the track exclusive to you, or can competitors use the same music?
- Modifications: Can you edit, remix, or shorten the track? Some licences prohibit modification
- Attribution: Does the licence require a credit or attribution? (Difficult to provide in a 15-second ad)
AI Voice Quality Assurance
AI-generated voiceover is increasingly indistinguishable from human voice β but only when properly quality-checked. The most common failure mode is approving AI voice at normal playback speed, where subtle artifacts are hard to detect, and then having listeners notice them in headphones or on repeat listens.
The 1.25Γ / 0.75Γ Review Method:
Before approving any AI-generated voiceover:
Step 1: Listen at 1.25Γ speed. Speeding up the audio exposes timing artifacts β unnatural pauses, robotic rhythm, inconsistent pacing between sentences. If the voice sounds "off" at 1.25Γ, listeners will subconsciously feel it at normal speed.
Step 2: Listen at 0.75Γ speed. Slowing down the audio exposes tonal artifacts β metallic undertones, unnatural formant transitions, breathing that sounds synthesised. Slow playback makes subtle glitches obvious.
Step 3: Listen on phone speakers. Phone speakers have limited frequency range and amplify certain midrange frequencies where AI artifacts tend to live. An AI voice that sounds perfect on studio monitors may sound robotic on a phone speaker β and 85% of your audience will hear it on phone speakers.
Step 4: Listen in a noisy environment. Put on the audio in a coffee shop, on a busy street, or with a fan running. Background noise masks some frequencies and unmasks others β artifacts that were hidden in quiet listening may become prominent.
Common AI Voice Artifacts to Listen For:
- Metallic resonance: A slight ringing or tinny quality, especially on sibilants (S, SH sounds)
- Breathing inconsistency: Either no breathing at all (uncanny) or breathing that sounds pasted in (rhythmically wrong)
- Emphasis errors: Wrong words emphasised in a sentence, or flat delivery on words that should carry emotional weight
- Formant shifts: The voice subtly changes character mid-sentence β sounds like a different person for a fraction of a second
- Plosive handling: P, B, T sounds that either pop excessively or are unnaturally soft
- Sentence boundary artifacts: Slight click or silence gap at the transition between sentences
AI Voice in Arabic β Special Considerations:
Arabic AI voice presents unique quality challenges:
- Tashkeel errors: Incorrect vowelisation changes meaning. "ΩΩΨͺΩΨ¨Ω" (wrote) vs. "ΩΩΨͺΩΨ¨" (books) β the AI must handle diacritics correctly
- Dialect consistency: If the voice starts in Gulf Arabic, it must not drift into Egyptian or Levantine mid-sentence
- Emphatic consonants: Arabic emphatic consonants (Ψ΅Ψ ΨΆΨ Ψ·Ψ ΨΈ) require distinct pharyngealisation that AI sometimes softens
- Connected speech: Arabic connected speech (liaison) patterns differ from English β AI trained primarily on English may produce unnatural Arabic prosody
When to Use Human Voice Instead:
- Emotional storytelling that requires genuine feeling
- Brand spokesperson roles where authenticity is critical
- Regulatory disclaimers in some jurisdictions (check local requirements)
- Premium brand campaigns where any perception of "AI" would damage brand equity
Disclaimer Audio and Mobile Optimisation
Two areas that are consistently overlooked in audio production β and both can sink an otherwise excellent campaign: regulatory disclaimer audio that violates compliance rules, and audio that sounds great in the studio but terrible on phone speakers.
Disclaimer Audio Standards:
Regulatory disclaimers in audio ads are legally required in many industries and jurisdictions. The rules are specific and enforcement is increasing.
Key Requirements:
- Audibly distinct: The disclaimer must be clearly distinguishable from the main ad content. It cannot be buried under music or sound effects
- Unhurried delivery: Speed-reading disclaimers is a regulatory red flag. The disclaimer must be delivered at a natural, comprehensible pace β typically no faster than 160 words per minute (vs. 140β150 WPM for the main ad content)
- Comparable volume: The disclaimer must be at a volume comparable to the main ad content. Dropping disclaimer volume by 6+ dB (making it half as loud) is a common violation
- Full duration: Do not truncate disclaimers to fit time constraints. If the disclaimer does not fit, the ad is too long β shorten the ad content, not the disclaimer
Industry-Specific Disclaimer Rules:
- Financial services (GCC): Central Bank of UAE (CBUAE) and Saudi Central Bank (SAMA) require specific risk disclosures. In audio, these must be read verbatim, not paraphrased
- Healthcare/Pharma: Side effects and contraindications must be delivered at the same pace and volume as efficacy claims. Rushed disclaimer reading is an FDA/DOH violation
- Real estate: "Prices subject to change" and regulatory authority licence numbers must be included in full
- Food and beverage: Health claims require qualifying statements delivered at comparable prominence
ZorgSocial Compliance Checker for Audio: The Audio Compliance Checker automatically analyses disclaimer segments for:
- WPM (words per minute) comparison vs. main content
- Volume level comparison (LUFS differential)
- Duration adequacy relative to word count
- Language verification (Arabic vs. English disclaimer matches the ad language)
Mobile Optimisation: The 85% Rule
85% of social media audio is consumed on mobile phone speakers β not headphones, not studio monitors, not car stereos. If your audio does not sound good on a phone speaker, it does not sound good to your audience.
Phone Speaker Limitations:
- Frequency range: Most phone speakers reproduce 200 Hz β 15 kHz effectively. Below 200 Hz (bass) is essentially inaudible. Above 15 kHz (air/shimmer) is reduced
- This means: bass-heavy music beds disappear. Sub-bass rumble that sounded powerful in the studio is simply absent on phone speakers
- Voice clarity is paramount β phone speakers reproduce the 1β4 kHz range (voice presence frequencies) well, so voice-forward mixes translate best
The Three-Speaker Test: Before any audio goes live, test on three devices:
1. Studio monitors or quality headphones β this is your reference. The audio should sound excellent here
2. Phone speaker (held at arm's length) β simulate the most common listening scenario. Can you understand every word of the voiceover? Does the music bed muddy the voice? Is the CTA clear?
3. Phone speaker in a noisy environment β play the audio on your phone while standing near a busy road, in a coffee shop, or with a TV on in the background. This is how most of your audience will hear it. If the key message and CTA are not clear in this scenario, the audio needs remixing
Mobile Mix Tips:
- Boost voice presence (2β4 kHz) by 1β2 dB for phone speaker clarity
- Roll off music bed below 150 Hz β it adds nothing on phone speakers and muddies the voice
- Use sidechain compression to duck the music bed whenever voice is active
- Keep the voice 6β8 dB above the music bed (not 3β4 dB like a music mix)
- Test mono compatibility β phone speakers are mono. Stereo effects that cancel in mono will disappear
Production Quality Checklist: Pre-Launch Validation
Before any audio asset goes live, run it through this comprehensive production checklist. These checks take 10 minutes and prevent costly mistakes that are embarrassing to fix after launch.
Technical Checks:
- Sample rate matches target: 44.1 kHz (audio-only) or 48 kHz (video)
- Bit depth: 16-bit or 24-bit (never 8-bit)
- Integrated loudness: β14 LUFS (digital) or β23 LUFS (broadcast)
- True peak: does not exceed β1 dBTP
- Dynamic range: minimum 8 LU
- No clipping or distortion at any point in the audio
- Fade in/out at start and end (no abrupt cuts that cause pops)
Voice Checks:
- Noise floor: β60 dB or lower (listen to silence sections)
- No mouth clicks, lip smacks, or excessive sibilance
- Pronunciation is correct for all brand names, product names, and technical terms
- Arabic diacritics/tashkeel are correct (for Arabic voiceover)
- Pacing is natural and consistent throughout
- Emphasis on the right words, especially the CTA
Music and SFX Checks:
- Music is properly licensed for territory, platform, and duration
- Music bed does not overpower the voice at any point
- SFX are appropriate volume (complement, not distract)
- No copyright-infringing musical references or samples
- Sonic logo is present and correctly placed (if applicable)
Compliance Checks:
- Disclaimer audio meets regulatory standards (WPM, volume, clarity)
- All required legal disclosures are present and complete
- Language matches target market requirements
- No misleading audio effects (e.g., fake urgency sounds for non-urgent offers)
Mobile Readiness:
- Tested on phone speaker at arm's length
- Voice is intelligible in noisy environment
- Bass-heavy elements are not relied upon for meaning
- Mono compatibility verified (no cancellation issues)
File Delivery:
- File format: WAV (uncompressed) for master, MP3/AAC (320 kbps) for delivery
- File naming convention followed (Brand_Campaign_Version_Date)
- Metadata includes copyright, usage rights, and expiry date
- Backup master file archived in brand asset library
ZorgSocial Audio Quality Gate: ZorgSocial includes an automated Audio Quality Gate that runs these checks automatically before any audio file is published through the platform. Non-compliant files are flagged with specific remediation instructions. The Quality Gate checks loudness, peak levels, noise floor, mono compatibility, and disclaimer compliance β reducing pre-launch QA from 10 minutes to 10 seconds.
Apply what you learned in ZorgSocial
Run the Audio Quality Gate on your next campaign
Every concept in this guide maps directly to ZorgSocial tools. Explore the step-by-step tutorials for hands-on application.
Next Step
Apply this inside ZorgSocial
Use ZorgSocial AI tools to build your audio campaign.
A/B Testing Audio
Audio Decision Framework