Pro Audio for Zoom / Teams / Teleconferencing software

This is a living document, being constantly updated - email me if you think I’ve missed anything out or got something wrong! - last updated 31st December 2022. If you’ve got anything that should usefully be on here, please let me know. I have no affiliation with these products or manufacturers.


Since early 2020, live performance has increasingly had to find ways to perform online. Whilst there are established systems for broadcasting in a conventional sense, like YouTube and Instagram Live, these don’t involve much in the way of interactivity or sense of community. Many people turned quickly to teleconferencing systems, particularly Zoom, whilst others developed bespoke systems using systems like Frozen Mountain or Twillio.

In this article I’ll look at some general considerations for using teleconferencing software, and also some specifics of using Zoom.

General principles of pro audio and teleconferencing software

Broadcast/transmission side

Most teleconferencing software systems allows little control over the audio side of things. Some systems use their own bespoke code, whilst others use open-source systems like WebRTC. Systems using WebRTC are great in that they work in almost any browser, using support that is built-into the browser itself, however different browsers implement that support slightly differently, so for example the noise reduction and echo cancellation algorithms sound different according to whether someone is using Chrome or Safari.

Typically teleconferencing software uses a form of automatic gain-control (AGC) where it will adjust the incoming audio level so it it’s around -14LUFS loud. This is the same loudness level used by YouTube and other streaming services, but it is a lot louder (and has a lot less dynamic range) than the -23LUFS typically used for broadcast audio.

Teleconferencing software will often adjust the bit-rate of your audio transmission stream depending on your internet bandwidth, so the audio quality you broadcast can vary considerably. When you’re broadcasting it’s important to eliminate as many other devices that might be using your internet connection as possible to give your stream the maximum bandwidth. Even then, most telelconfercing software is pretty low quality. To give an example, Microsoft Teams announced in July 2021 its new “high quality music mode”, which provides up to 32kHz sample rate, up to 128Mbps of data, in mono! For many of us, this is what we could consider very low quality audio! For reference, a typical sample rate is 44.1kHz or 48kHz, and an uncompressed data rate for stereo audio is 1411MBps. Teleconferencing software always favours low-latency (delay) over sound quality. It may also speed up and down the audio if it lags behind and then needs to catch up, which it does by adjusting the pitch of the audio - again not great for musical content or material with rhythmic content.

Pretty much all teleconferencing software is mono by default.

Teleconferencing software typically allows us to get audio in to it either by using an audio input to it, or by using a screen sharing / presentation style feature. Using the audio input is often easiest if we have audio coming in from a mixing desk, sound card, etc. Unless the software has a “music mode”, “original sound”, “high quality audio” or other similar feature, that audio input will be treated as speech, and have a low sample rate, low data rate, will be mono, and will have echo cancellation and noise reduction applied to it; often all to the detriment of the sound quality of anything that isn’t speech.

Teleconferencing software typically distinguishes between a mic input and audio shared via screen sharing. Often the mic input will have echo and noise reduction on it, but the screen share audio won’t. We can sometimes transmit better audio by forcing an audio input into the screen sharing system, by using software like Rogue Amoeba’s Loopback, to bring in external audio so it can be shared. But this is not without issue either, and I’ve often had crackles and glitches with this method on computers that are part of complex audio systems. We also need to be aware of the video side of things too - screen shared video is typically high resolution but very low frame rate; whereas what the software sees as a webcam is typically low resolution but high frame rate.

Audience side

Systems such as WebRTC are implemented differently in each browser, and as a result your show can sound different from browser to browser, and between desktop/laptop devices and phone/tablet devices.

On many occasions we want our listeners to keep their mic’s muted during a performance, but for shows with more interactivity that might not be the case. To avoid echo’ing and harsh noise reduction artefacts we want audience members to wear headphones wherever possible.

Teleconferncing software often uses ducking techniques, where the audio from other participants is often dipped in volume, apart from the active speaker. This system falls down when someone leaves their mic open and coughs, as their cough will then duck the volume of the person we were supposed to be listening to. This system is also tricky when there are multiple hosts speaking at once.

More significantly is…

The Bluetooth Headset Profile Problem

There is a little known issue with Bluetooth headphones/headsets. If the audience member use a set of wireless/Bluetooth headphones, and they select to also use that as the microphone in their teleconferencing software then all audio in their headphones will change from stereo to mono, and is reduced to 8kHz sample rate. This is because the Bluetooth data bandwidth is quite low, so if you’re not using the mic built in to your headphones, all that data can be used to provide high quality stereo audio; but if you use the mic, half that data is used for the microphone, and only half the data is then available for the headphone audio. This is a feature of Bluetooth itself, so this applies to Mac’s and PC;s, desktops and tablets, cheap Bluetooth headphones and the most expensive Apple headsets.

You can hear how problematic this is:

  • connect some bluetooth headphones to your computer.

  • close all applications.

  • listen to this track: https://www.youtube.com/watch?v=VnzIIhLNHqg
    it should sound super high quality, very stereo and lovely.

  • Now, open up Zoom or Meet or something and start a new meeting, with the Mic input set to your bluetooth headphones
    You should instantly hear the audio in your headphones flatten into mono, and reduce in audio quality to about 10% of what it was.

Users can get round this by using wired headphones, Bluetooth headphones that don’t have a built-in mic (quite rare these days) or by using their Bluetooth headphones and selecting to use the build-in computer microphone rather than the headphones’s microphone. And yes, it does apply to AirPods too.

There’s no hardware way around this so it’s best dealt with by informing the users how to get round it.

Zoom

Zoom has become a very popular platform for live performance. Whilst it has been receiving updates that gradually make it better and better for this, it still has a number of quirks that make it tricky to utilise when you want to deliver high quality audio over it.

Broadcast side

Zoom provides many more options over audio than most teleconferencing software - you can choose your outgoing audio level (i.e. turn automatic gain control off) and adjust the echo cancellation and noise reduction features. You can also broadcast in stereo.

There are two approaches to sending audio into Zoom - via the microphone, and via screen sharing. Each of these systems has its own quirks.

Using the mic input.

Zoom has features called Original Audio for Musicians, and High Fidelity Music mode, which broadcasts the audio at maximum quality, allows you to disable noise reduction and/or echo cancellation. You can enable this in settings, and then once you’re in a meeting you can turn Original Sound for Musicians on. You used to be able to set this to be on all the time, but now you have to turn it on every time you start or join a meeting. If you are on a business account, you may need to log into the Zoom website and adjust the setting there “Allow users to select original sound in their client settings”.

For many reasons we might want to have the audiences mic’s muted whilst we are broadcasting. Another reason though is that Zoom will dip the volume of your broadcast when someone new speaks. If you want to broadcast in stereo, you need to enable “Allow users to select stereo audio in their client settings” in the settings of the Zoom account of the meeting host. These settings are only accessible via the Zoom website, not on the Zoom app. This needs to happen before the meeting is started to take effect. User’s will then be able to enable Stereo Audio in the settings of their client app (it is mono by default). There is an especially fiddly step where users have to log our of their Zoom account and then log back in (quitting and restarting does not work), before the “Enable stereo audio” becomes visible in the Audio settings of their Zoom app. Note, if you are broadcasting in stereo, any recordings made with Zoom will be in mono. Also see below regards desktop/tablet devices.

Another quirk with Zoom and stereo audio is that the Zoom app gets very confused by multi-channel sound cards. If Zoom sees you have a mono or stereo sound card connected it seems fine. If you have a sound card with more than 2 mic/line inputs, it reverts to mono sound.

Screen Sharing

Zoom has some quite complex and useful features for sharing screens. When you first enable screen sharing with audio Zoom will install a new audio driver on your computer called Zoom Audio Driver. This software has no settings you can adjust.

In Logic, QLab, Ableton, etc, select Zoom Audio Driver as the output for that software. You could also use software like Loopback to route audio from a Soundcard to the Zoom Audio Driver.

In Zoom, click Share, and choose which screen or window you want to share. In the bottom left, click Share Sound > Stereo (high fidelity). You can also Share Audio only. This means your Zoom call can carry on as it normally would, but audience members will hear high quality audio from your computer too. Audience members see a banner over their Zoom screen indicating that you are sharing your audio. This is great for you to provide a soundtrack to a live performance that is happening elsewhere.

Any audio from your application that is routed to the Zoom Audio Driver is also routed to the device you have specified in Zoom’s audio output, so you can hear it through your headphones or other sound device

Whilst screen sharing is very good, I have encountered some issues with audio software not being entirely happy routing to the Zoom Audio Driver, causing glitches and clicks on the broadcast feed. But others have found this stable, so worth testing with your set-up before broadcasting.

On the video front, screen sharing provides a high resolution, low frame-rate screen capture whereas the camera input is a low resolution but high frame rate video feed. When screen sharing, you can choose to “optimise for video” which changes the screen capture to a high frame rate low resolution screen capture.

Audience side

There is a very fiddly process by which audience members can enable stereo sound in the app. It’s detailed here. Essentially the audience members has to sign out of their account, then log back in to make changes.

Currently, you can not receive stereo audio as a user if you are using the tablet or phone version of the Zoom app. The audience member needs to be on a computer to receive stereo audio.

Obviously on many occasions we want our listeners to keep their mic’s muted during a performance, but for shows with more interactivity that might not be the case. To avoid echo’ing and harsh noise reduction artefacts we want users to wear headphones wherever possible. But see above regards the The Bluetooth Headset Profile Problem.

ZoomOSC

ZoomOSC is a great piece of software that can be utilised to manage a Zoom broadcast. It essentially allows you to remote control many of the functions of Zoom (muting, unmuting, pinning participants, etc) using OSC from software like QLab.

What do I use?

For simple Zoom meetings and collaborative working, I use Zoom on my laptop (with this camera and microphone running into a video capture card) and screen sharing.

For seminars that might require more complexity, I’ve landed on using a two computer set-up. I typically have a laptop running my audio applications. I also have a microphone, a camera, which all route into an Atem Mini Pro video switcher. This means I can use the Atem to switch between audio inputs and video inputs independently, and control audio levels independently. I then have a second computer that is running Zoom and picks up the output of the Atem over USB as if it were a web camera. I have Zoom set up with Original Sound and High Fidelity Audio on, with Echo and Noise cancellation off, and a manual setting on the audio input. This gives me high quality audio and video into Zoom from all my devices. The compromise with this solution is that I’m prioritising frame rate over screen resolution on my laptop by bringing the laptop in as if it were a webcam. This means the frame rate is high but the resolution can be low, and if my internet connection is slow then text on my screen can be almost unreadable. To get my mic input sounding nice and loud I have a compressor on the Atem audio input which is pushing my mic level up to -14LUFS.