Spatial Conversations in Times of Social Distancing
As we shift from the novelty of working from home to the normalization of social distancing in all aspects of life, Mile Two has started holding social events over teleconferencing. In addition to our regular meetings that have all transitioned to remote, we’ve been doing team lunches, happy hours, music bingo, gaming parties, and more through video calls. The slew of video conferencing that’s happening has made it so we not only get our work done but also find solace in the relationships we have built with our co-workers, and it has even led to developing some new ones. Throughout this ordeal, many of us have found we feel just as close to our friends and family who live in other parts of the country or world as we do to the ones in our own towns, and the same goes for coworkers who live in different cities and states. Teleconferencing has become an equalizer in terms of connecting regardless of location, however, it is not a substitute for the organic nature of group conversations that occur in person.
One issue we’ve found in large groups, especially when dealing with social events, is the loss of multiple conversations happening at once, for instance when one speaker mentions a topic and multiple conversations emerge at the same time with different groups discussing the idea until they percolate back into the larger group. This happens in most in-person situations already—think of the last time you were with a group at a bar. By necessity of your own attention and reach of your voice, you interact with the people in close proximity and have smaller group conversations that others can overhear and join in on. What if we could replicate this in a teleconference?
A Lesson from Gaming
Several existing services have the ability to do this already. One example is avatar-based environments such as massively-multiplayer online role-playing games (MMORPGs) and their less “gamey” counterparts—Second Life and its open-source counterpart OpenSim being options. The function present in these virtual worlds is spatial audio or 3D audio, where using delays on stereo audio, your mind can be essentially tricked into believing a sound is originating from a specific location in your world. This feature (as of our current social isolation circumstance) is not present in most of the main vendors that offer teleconferencing. I’m looking at you Google Meet, MS Teams/Skype, and yes, even you Zoom.
So what do you do as a software developer with a problem and a bit of time on your hands? Yup, you mock it up! Here it is: https://github.com/MileTwo/spatial_audio.
The prototype demonstrates that it’s possible to control spatial audio in the client using native APIs for browsers and with relative ease. It works as follows:
- The browser captures audio using getUserMedia and sends it out in chunks to the backend.
- The backend just forwards the audio along to any listeners.
- When playing back audio, the frontend will use AudioContext and move the audio around in space as instructed using a PannerNode prior to sending the audio to the HTML audio destination.
This being a quick prototype, it has a number of areas to improve, namely, there is a huge amount of lag, the audio is not segmented to allow for real-time streaming (and it uses up tons of memory), the lack of server-side processing requires connections to grow exponentially, the UI is rough and not using the best frame of reference to visualize a 3D space, and the frontend framework is not used to its best abilities.
Hopefully, this will motivate someone to take a look at the inclusion of spatial audio into modern teleconference software or inspire others to explore some of the browser APIs and mess around with media.
Connect With Us
Share what lessons your team has learned while transitioning to a fully remote team.