G-126 min
Production Standards

Audio Guide

Production Standards

Sample rates, loudness targets, dynamic range, voice recording specs, and mobile optimisation standards.

What you'll learn in this guide

Sample rate and bit depth
Loudness standards
Dynamic range
Voice recording specs
Music licensing
AI voice quality
Mobile optimisation
1Key Statistics

–14 LUFS

Integrated loudness target for digital audio ads (EBU R128)

EBU R128 Loudness Standard

48 kHz

Recommended sample rate for video production audio

AES Digital Audio Standards

–60 dB

Maximum noise floor for professional voice recording

Professional Studio Standards

85%

Of social media audio is consumed on mobile phone speakers

Meta Audio Consumption Report 2024

2Overview

Production Standards

Professional audio requires professional standards. This guide covers sample rates, bit depth, loudness targets for digital and broadcast, dynamic range, voice recording specifications, music licensing, and mobile optimisation.

3Audio Production Specifications

Audio Production Specifications

StandardSpecificationWhen to UseCommon Mistake
Sample Rate44.1 kHz (standard) Β· 48 kHz (video)Always set before recording, never convert upRecording at 22 kHz then upsampling β€” adds no quality
Bit Depth16-bit minimum Β· 24-bit for masteringRecord in 24-bit, deliver in 16-bit or 24-bitRecording in 8-bit for "smaller files" β€” destroys dynamic range
Loudness β€” Digital–14 LUFS integrated (EBU R128)All social media, digital ads, and web audioMastering to 0 dB peak β€” causes clipping on playback
Loudness β€” Radio–23 LUFS integrated (broadcast)Radio spots, broadcast TV, and podcast pre-rollsUsing digital loudness targets for radio β€” too loud for broadcast
Dynamic RangeMinimum 8 LU Β· avoid over-compressionMusic beds, voice, and mixed audioCompressing everything to 2 LU β€” sounds fatiguing and unnatural
Voice RecordingProfessional studio Β· noise floor –60 dBAll voiceover work, narrator, and dialogueRecording in untreated rooms β€” echo and room noise
Music LicensingClear for territory, platform, and durationBefore any deployment, without exceptionUsing "royalty-free" music without reading the licence terms
AI Voice QualityReview at 1.25Γ— and 0.75Γ— for artifactsEvery AI-generated voiceover before deploymentApproving AI voice at normal speed only β€” misses subtle glitches
Disclaimer AudioAudibly distinct Β· unhurried Β· comparable volumeLegal disclaimers, terms, and regulatory noticesSpeed-reading disclaimers at low volume β€” regulatory violation
Mobile OptimisationTest on earbuds, phone speaker, and noisy env.Every final audio asset before campaign launchOnly testing on studio monitors β€” 85% of audience uses phone speakers
4Sample Rate and Bit Depth: Getting the Foundation Right

Sample Rate and Bit Depth: Getting the Foundation Right

Sample rate and bit depth are the two foundational settings that determine the quality ceiling of your entire audio production. Getting them wrong at the start cannot be fixed later β€” no amount of post-production can add quality that was never captured.

Sample Rate: How Many Snapshots Per Second

Sample rate determines how many times per second the audio signal is captured. Higher sample rates capture higher frequencies and more detail.

44.1 kHz (Standard Digital Audio) The CD-quality standard. Captures frequencies up to 22.05 kHz β€” well above the human hearing range (20 Hz – 20 kHz). Use this for:

  • Standalone audio ads (no video)
  • Podcast ads and audio-only content
  • Music tracks and jingles for digital distribution

48 kHz (Video Production Standard) The standard for any audio that will be paired with video. All major video editing software (Premiere Pro, DaVinci Resolve, Final Cut) defaults to 48 kHz. Use this for:

  • Social media video ads (Instagram Reels, TikTok, YouTube)
  • Television and broadcast spots
  • Any audio that will be muxed into a video container

Critical Rule: Never Upsample If you record at 44.1 kHz and need 48 kHz for video, you cannot simply convert up β€” upsampling does not add real frequency information. It just adds empty data. Always record at the target sample rate or higher. If you have 44.1 kHz audio that must go into a 48 kHz video project, use a professional sample-rate converter (not a simple resample) and accept the minor quality trade-off.

Bit Depth: Dynamic Range Resolution

Bit depth determines how many discrete volume levels can be represented. Higher bit depth = more dynamic range = quieter noise floor.

16-bit: 96 dB dynamic range. Sufficient for final delivery. CD-quality.

24-bit: 144 dB dynamic range. Record and master in 24-bit. The extra headroom means you can record at conservative levels without losing detail, and quiet passages stay clean. Final delivery can be dithered down to 16-bit if needed.

Never record in 8-bit. 8-bit audio has only 48 dB of dynamic range β€” you will hear quantisation noise (a granular, crunchy texture) on every quiet passage. This is not "lo-fi aesthetic" β€” it is unusable for professional advertising.

5Loudness Standards: Digital vs. Broadcast

Loudness Standards: Digital vs. Broadcast

Loudness is the single most common production mistake in audio advertising. Too loud and the audio clips, distorts, and fatigues the listener. Too quiet and the ad disappears into the background. Getting loudness right requires understanding the difference between peak level and integrated loudness.

Peak Level vs. Integrated Loudness

Peak level is the loudest instantaneous moment in the audio. It is measured in dBFS (decibels relative to full scale). A peak of 0 dBFS means the audio has reached the absolute maximum β€” any louder and it clips (distorts).

Integrated loudness (LUFS β€” Loudness Units Full Scale) is the average perceived loudness over the entire duration of the audio. This is what matters for listener experience. Two ads can have the same peak level but wildly different integrated loudness β€” one may sound twice as loud as the other.

Digital Ads: –14 LUFS Integrated (EBU R128)

This is the target for all social media platforms, digital display ads, and web-based audio. Why –14 LUFS?

  • Spotify normalises to –14 LUFS. YouTube normalises to –14 LUFS. If your audio is louder, the platform will turn it down automatically β€” and the result often sounds worse than if you had mastered to the target
  • –14 LUFS gives your audio breathing room. There is space for dynamic variation, for moments of emphasis, for emotional contrast
  • Peak should not exceed –1 dBTP (true peak). This leaves a 1 dB safety margin to prevent clipping on lossy codec conversion (MP3, AAC)

Radio and Broadcast: –23 LUFS Integrated

Broadcast standards (EBU R128 for Europe, ATSC A/85 for North America) target –23 LUFS. This is significantly quieter than digital β€” if you submit a –14 LUFS ad for broadcast, it will be rejected or the broadcaster will compress it aggressively, destroying your careful production.

  • Radio spots: master to –23 LUFS integrated
  • Podcast pre-rolls distributed via RSS: target –16 to –18 LUFS (closer to podcast norms)
  • Television spots: –23 LUFS to –24 LUFS per broadcaster requirements

Platform-Specific Loudness Quirks:

  • TikTok: No official normalisation standard. Creators trend louder. Target –12 to –14 LUFS for TikTok specifically
  • Instagram Reels: Follows Meta normalisation (–14 LUFS)
  • YouTube Shorts: –14 LUFS normalised. Ads louder than –13 LUFS are automatically turned down
  • Twitter/X Spaces: Live audio, no normalisation β€” master to –14 LUFS for consistency

How to Measure: Use a loudness meter plugin in your DAW (iZotope Insight, Youlean Loudness Meter β€” free, or Waves WLM Plus). Measure integrated loudness over the full duration of the ad, not just a section. ZorgSocial Audio Export includes a built-in loudness readout that flags non-compliant files before upload.

6Dynamic Range and Voice Recording Specifications

Dynamic Range and Voice Recording Specifications

Dynamic range is the difference between the quietest and loudest parts of your audio. It is what makes audio sound natural, engaging, and professional β€” or flat, fatiguing, and amateur.

Why Dynamic Range Matters for Ads

Over-compressed audio (where the dynamic range is crushed to 2–3 LU) sounds aggressive, fatiguing, and cheap. It is the sonic equivalent of an all-caps email β€” it may grab attention for a moment, but listeners instinctively pull away. The "loudness war" approach that dominated music production in the 2000s does not work in advertising.

The 8 LU Minimum Rule: Maintain at least 8 LU (Loudness Units) of dynamic range in your final master. This means the loudest moment should be at least 8 LU louder than the average. This gives your audio:

  • Natural-sounding voice dynamics (emphasis words are louder, transitions are softer)
  • Musical breathing room (crescendos and decrescendos that create emotional movement)
  • Listener comfort (the ear does not fatigue from constant loudness)

When More Dynamic Range Is Better:

  • Storytelling ads and brand films: 10–14 LU for cinematic feel
  • Podcast ads and conversational formats: 8–10 LU for natural speech
  • Music-forward ads: 8–12 LU to let the music breathe

When Less Dynamic Range Is Acceptable:

  • Short-form social ads (6–15 seconds): 6–8 LU β€” less time for dynamics to develop
  • Noisy-environment ads (transit, outdoor): 6–8 LU β€” need to cut through ambient noise
  • Direct-response radio spots: 6–8 LU β€” clarity and consistency matter more than nuance

Voice Recording Specifications:

Voiceover is the most critical audio element in most ads. Poor voice recording ruins everything downstream.

Professional Studio Requirements:

  • Noise floor: –60 dB or lower. This means you should hear NOTHING when the voice talent is silent β€” no air conditioning hum, no computer fan, no traffic rumble
  • Room treatment: acoustic panels or foam to eliminate reflections and echo. A voice recorded in an untreated room has a "bathroom" quality that screams amateur
  • Microphone: Large-diaphragm condenser for studio (Neumann U87, AKG C414) or dynamic (Shure SM7B) for less-treated spaces. USB microphones are not acceptable for professional advertising
  • Pop filter: Always. Plosives (P, B, T sounds) create low-frequency bursts that distort the recording
  • Distance: 15–20 cm from microphone. Too close creates proximity effect (boomy bass). Too far captures too much room sound

Remote Recording Specifications: When talent cannot come to a studio, remote recording is acceptable IF:

  • Talent uses a professional microphone (not laptop mic or phone)
  • Room is treated (closet recording is better than an open room)
  • Noise floor is verified before recording (send a 10-second silence test)
  • Audio is recorded locally (not through a video call codec β€” Zoom/Teams audio compression destroys quality)
  • ZorgSocial Remote Session Manager handles real-time monitoring and local recording sync
7Music Licensing: Legal Compliance for Audio Ads

Music Licensing: Legal Compliance for Audio Ads

Music licensing is the area where brands most frequently make costly mistakes. Using unlicensed music in advertising is not a minor oversight β€” it is a legal liability that can result in takedowns, lawsuits, and six-figure penalties. Every piece of music in your ad must be cleared before deployment.

The Three Rights You Need:

1. Synchronisation Rights (Sync Rights) The right to pair music with visual content (video ads). This is obtained from the music publisher (who represents the songwriter). Without sync rights, your video ad is infringing copyright the moment it goes live.

2. Master Recording Rights The right to use a specific recording of a song. This is obtained from the record label (who owns the master recording). A sync licence alone is not enough β€” you need both sync rights AND master rights to use a commercial recording.

3. Performance Rights The right for the music to be "performed" publicly (which includes streaming, broadcasting, and public playback). This is typically handled through performing rights organisations (PROs) like ASCAP, BMI, PRS, or SACEM. In many cases, the platform (YouTube, Spotify) covers performance rights through blanket licences.

"Royalty-Free" Does Not Mean "Free"

This is the most dangerous misconception. Royalty-free music means you pay a one-time licence fee (instead of ongoing royalties), but:

  • You must still purchase the licence
  • The licence has territorial restrictions (it may not cover GCC countries)
  • The licence has usage restrictions ("social media" may not include paid ads)
  • The licence has duration restrictions (1 year? Perpetual? Check the fine print)
  • Multiple advertisers may use the same track β€” your "unique" jingle may appear in a competitor's ad

AI-Generated Music Licensing: Music generated by AI tools (including ZorgSocial Music Generator) has a clearer licensing path:

  • ZorgSocial-generated music: full commercial rights included in your subscription. No additional licensing needed. Clear for all territories and platforms
  • Third-party AI music (Suno, Udio, etc.): check the platform terms. Some restrict commercial advertising use even on paid plans
  • AI-generated music cannot infringe existing copyrights IF the model was trained legally β€” but this is an evolving legal area. ZorgSocial Music Generator is trained exclusively on licensed datasets

MENA-Specific Licensing Considerations:

  • Saudi Arabia and UAE have active copyright enforcement β€” do not assume "it is not enforced in this region"
  • Arabic music rights are often fragmented across multiple publishers and collecting societies
  • Traditional Arabic maqam compositions may be in public domain, but specific recordings are not
  • Work with a licensing specialist who understands GCC copyright law when using regional Arabic music

The Licensing Checklist (Before Every Deployment):

  • Territory: Is the licence valid for every country where the ad will run?
  • Platform: Does the licence cover the specific platforms (paid social, programmatic display, streaming audio)?
  • Duration: How long can you use the music? Does the licence expire?
  • Exclusivity: Is the track exclusive to you, or can competitors use the same music?
  • Modifications: Can you edit, remix, or shorten the track? Some licences prohibit modification
  • Attribution: Does the licence require a credit or attribution? (Difficult to provide in a 15-second ad)
8AI Voice Quality Assurance

AI Voice Quality Assurance

AI-generated voiceover is increasingly indistinguishable from human voice β€” but only when properly quality-checked. The most common failure mode is approving AI voice at normal playback speed, where subtle artifacts are hard to detect, and then having listeners notice them in headphones or on repeat listens.

The 1.25Γ— / 0.75Γ— Review Method:

Before approving any AI-generated voiceover:

Step 1: Listen at 1.25Γ— speed. Speeding up the audio exposes timing artifacts β€” unnatural pauses, robotic rhythm, inconsistent pacing between sentences. If the voice sounds "off" at 1.25Γ—, listeners will subconsciously feel it at normal speed.

Step 2: Listen at 0.75Γ— speed. Slowing down the audio exposes tonal artifacts β€” metallic undertones, unnatural formant transitions, breathing that sounds synthesised. Slow playback makes subtle glitches obvious.

Step 3: Listen on phone speakers. Phone speakers have limited frequency range and amplify certain midrange frequencies where AI artifacts tend to live. An AI voice that sounds perfect on studio monitors may sound robotic on a phone speaker β€” and 85% of your audience will hear it on phone speakers.

Step 4: Listen in a noisy environment. Put on the audio in a coffee shop, on a busy street, or with a fan running. Background noise masks some frequencies and unmasks others β€” artifacts that were hidden in quiet listening may become prominent.

Common AI Voice Artifacts to Listen For:

  • Metallic resonance: A slight ringing or tinny quality, especially on sibilants (S, SH sounds)
  • Breathing inconsistency: Either no breathing at all (uncanny) or breathing that sounds pasted in (rhythmically wrong)
  • Emphasis errors: Wrong words emphasised in a sentence, or flat delivery on words that should carry emotional weight
  • Formant shifts: The voice subtly changes character mid-sentence β€” sounds like a different person for a fraction of a second
  • Plosive handling: P, B, T sounds that either pop excessively or are unnaturally soft
  • Sentence boundary artifacts: Slight click or silence gap at the transition between sentences

AI Voice in Arabic β€” Special Considerations:

Arabic AI voice presents unique quality challenges:

  • Tashkeel errors: Incorrect vowelisation changes meaning. "ΩƒΩŽΨͺَبَ" (wrote) vs. "كُΨͺُب" (books) β€” the AI must handle diacritics correctly
  • Dialect consistency: If the voice starts in Gulf Arabic, it must not drift into Egyptian or Levantine mid-sentence
  • Emphatic consonants: Arabic emphatic consonants (ء، آ، ط، ΨΈ) require distinct pharyngealisation that AI sometimes softens
  • Connected speech: Arabic connected speech (liaison) patterns differ from English β€” AI trained primarily on English may produce unnatural Arabic prosody

When to Use Human Voice Instead:

  • Emotional storytelling that requires genuine feeling
  • Brand spokesperson roles where authenticity is critical
  • Regulatory disclaimers in some jurisdictions (check local requirements)
  • Premium brand campaigns where any perception of "AI" would damage brand equity
9Disclaimer Audio and Mobile Optimisation

Disclaimer Audio and Mobile Optimisation

Two areas that are consistently overlooked in audio production β€” and both can sink an otherwise excellent campaign: regulatory disclaimer audio that violates compliance rules, and audio that sounds great in the studio but terrible on phone speakers.

Disclaimer Audio Standards:

Regulatory disclaimers in audio ads are legally required in many industries and jurisdictions. The rules are specific and enforcement is increasing.

Key Requirements:

  • Audibly distinct: The disclaimer must be clearly distinguishable from the main ad content. It cannot be buried under music or sound effects
  • Unhurried delivery: Speed-reading disclaimers is a regulatory red flag. The disclaimer must be delivered at a natural, comprehensible pace β€” typically no faster than 160 words per minute (vs. 140–150 WPM for the main ad content)
  • Comparable volume: The disclaimer must be at a volume comparable to the main ad content. Dropping disclaimer volume by 6+ dB (making it half as loud) is a common violation
  • Full duration: Do not truncate disclaimers to fit time constraints. If the disclaimer does not fit, the ad is too long β€” shorten the ad content, not the disclaimer

Industry-Specific Disclaimer Rules:

  • Financial services (GCC): Central Bank of UAE (CBUAE) and Saudi Central Bank (SAMA) require specific risk disclosures. In audio, these must be read verbatim, not paraphrased
  • Healthcare/Pharma: Side effects and contraindications must be delivered at the same pace and volume as efficacy claims. Rushed disclaimer reading is an FDA/DOH violation
  • Real estate: "Prices subject to change" and regulatory authority licence numbers must be included in full
  • Food and beverage: Health claims require qualifying statements delivered at comparable prominence

ZorgSocial Compliance Checker for Audio: The Audio Compliance Checker automatically analyses disclaimer segments for:

  • WPM (words per minute) comparison vs. main content
  • Volume level comparison (LUFS differential)
  • Duration adequacy relative to word count
  • Language verification (Arabic vs. English disclaimer matches the ad language)

Mobile Optimisation: The 85% Rule

85% of social media audio is consumed on mobile phone speakers β€” not headphones, not studio monitors, not car stereos. If your audio does not sound good on a phone speaker, it does not sound good to your audience.

Phone Speaker Limitations:

  • Frequency range: Most phone speakers reproduce 200 Hz – 15 kHz effectively. Below 200 Hz (bass) is essentially inaudible. Above 15 kHz (air/shimmer) is reduced
  • This means: bass-heavy music beds disappear. Sub-bass rumble that sounded powerful in the studio is simply absent on phone speakers
  • Voice clarity is paramount β€” phone speakers reproduce the 1–4 kHz range (voice presence frequencies) well, so voice-forward mixes translate best

The Three-Speaker Test: Before any audio goes live, test on three devices:

1. Studio monitors or quality headphones β€” this is your reference. The audio should sound excellent here

2. Phone speaker (held at arm's length) β€” simulate the most common listening scenario. Can you understand every word of the voiceover? Does the music bed muddy the voice? Is the CTA clear?

3. Phone speaker in a noisy environment β€” play the audio on your phone while standing near a busy road, in a coffee shop, or with a TV on in the background. This is how most of your audience will hear it. If the key message and CTA are not clear in this scenario, the audio needs remixing

Mobile Mix Tips:

  • Boost voice presence (2–4 kHz) by 1–2 dB for phone speaker clarity
  • Roll off music bed below 150 Hz β€” it adds nothing on phone speakers and muddies the voice
  • Use sidechain compression to duck the music bed whenever voice is active
  • Keep the voice 6–8 dB above the music bed (not 3–4 dB like a music mix)
  • Test mono compatibility β€” phone speakers are mono. Stereo effects that cancel in mono will disappear
10Production Quality Checklist: Pre-Launch Validation

Production Quality Checklist: Pre-Launch Validation

Before any audio asset goes live, run it through this comprehensive production checklist. These checks take 10 minutes and prevent costly mistakes that are embarrassing to fix after launch.

Technical Checks:

  • Sample rate matches target: 44.1 kHz (audio-only) or 48 kHz (video)
  • Bit depth: 16-bit or 24-bit (never 8-bit)
  • Integrated loudness: –14 LUFS (digital) or –23 LUFS (broadcast)
  • True peak: does not exceed –1 dBTP
  • Dynamic range: minimum 8 LU
  • No clipping or distortion at any point in the audio
  • Fade in/out at start and end (no abrupt cuts that cause pops)

Voice Checks:

  • Noise floor: –60 dB or lower (listen to silence sections)
  • No mouth clicks, lip smacks, or excessive sibilance
  • Pronunciation is correct for all brand names, product names, and technical terms
  • Arabic diacritics/tashkeel are correct (for Arabic voiceover)
  • Pacing is natural and consistent throughout
  • Emphasis on the right words, especially the CTA

Music and SFX Checks:

  • Music is properly licensed for territory, platform, and duration
  • Music bed does not overpower the voice at any point
  • SFX are appropriate volume (complement, not distract)
  • No copyright-infringing musical references or samples
  • Sonic logo is present and correctly placed (if applicable)

Compliance Checks:

  • Disclaimer audio meets regulatory standards (WPM, volume, clarity)
  • All required legal disclosures are present and complete
  • Language matches target market requirements
  • No misleading audio effects (e.g., fake urgency sounds for non-urgent offers)

Mobile Readiness:

  • Tested on phone speaker at arm's length
  • Voice is intelligible in noisy environment
  • Bass-heavy elements are not relied upon for meaning
  • Mono compatibility verified (no cancellation issues)

File Delivery:

  • File format: WAV (uncompressed) for master, MP3/AAC (320 kbps) for delivery
  • File naming convention followed (Brand_Campaign_Version_Date)
  • Metadata includes copyright, usage rights, and expiry date
  • Backup master file archived in brand asset library

ZorgSocial Audio Quality Gate: ZorgSocial includes an automated Audio Quality Gate that runs these checks automatically before any audio file is published through the platform. Non-compliant files are flagged with specific remediation instructions. The Quality Gate checks loudness, peak levels, noise floor, mono compatibility, and disclaimer compliance β€” reducing pre-launch QA from 10 minutes to 10 seconds.

11Try This in ZorgSocial

Apply what you learned in ZorgSocial

1Open Audio Settings in ZorgSocial and set sample rate to 48 kHz and bit depth to 24-bit
2Record or generate your voiceover β€” verify noise floor is below –60 dB in the waveform view
3Add your music bed and set voice level 6–8 dB above the music using the mixer fader
4Run the Loudness Meter to confirm –14 LUFS integrated and –1 dBTP true peak maximum
5Apply the 1.25Γ— and 0.75Γ— AI voice review if using AI-generated voiceover
6Test playback on phone speaker and in a noisy environment using the Mobile Preview mode
7Run the Audio Quality Gate to auto-check loudness, peak, noise floor, and compliance
8Export final audio and archive the WAV master in your Brand Assets Library
12In ZorgSocial

Run the Audio Quality Gate on your next campaign

Every concept in this guide maps directly to ZorgSocial tools. Explore the step-by-step tutorials for hands-on application.

Next Step

Apply this inside ZorgSocial

Use ZorgSocial AI tools to build your audio campaign.