torachaudio.load does not load webm bytesio or _io.BufferedReader, but loads the file from hard drive

🐛 Describe the bug

I have a websocket that receives chunks of data in a byte format. The browser encodes the data in audio/webm format. The code is like the following:

@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_bytes()
            with open('audio.wav', mode="wb") as f:
                f.write(data)
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

Manually writing the data to audio.wav and then reading the file using the following code works fine with no errors:

array, sr = torchaudio.load("audio.wav")

However, reading the file as a file object does not work:

with open("audio.wav", mode="rb") as f:
    torchaudio.load(f)

It raises the following error:

Exception: Could not process audio: Failed to open the input "<_io.BufferedReader name='audio.wav'>" (Invalid data found when processing input).
INFO:     connection closed

PS: Creating BytesIO from the data and passing it to the torchaudio.load results in error the same as the above.

Versions

Versions of relevant libraries: [pip3] numpy==1.23.4 [pip3] torch==1.12.1 [pip3] torchaudio==0.12.1 [conda] numpy 1.23.4 pypi_0 pypi

OS

Ubuntu: 22.04 torchaudio.backend: “sox_io”

PS

I tested the same process on a webm file which was converted from a wav file, and the result was the same:

torchaudio.load can read the file from hard drive.
torchaudio.load cannot read bytesio or _io.BufferedReader

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

3reactions

mthrokcommented, Oct 25, 2022

I think you could do chunk-by-chunk decoding, which is more efficient, but not sure if this is what you want, as I do not know what application you are building.

To do chunk-by-chunk decoding, you can wrap the socket object into a synchronous file-like object.

class Wrapper:
    def __init__(self, socket):
        self.socket = socket
        self.buffer = b''

    def read(self, n):
        while len(self.buffer) < n:
            new_data = await self.socket.receive_bytes()
            if not new_data:
                break
            self.buffer += new_data
        data, self.buffer = self.buffer[:n], self.buffer[n:]
        return data

Then passing it to StreamReader and let StreamReader pull the data.

try:
    wrapper = Wrapper(websocket)
    s = torchaudio.io.StreamReader(wrapper)
    for chunk in s.stream():
        print(chunk.shape)
except ...

2reactions

mthrokcommented, Oct 25, 2022

Hi @pooya-mohammadi

The audio you shared has wav extension but, in fact, it is WebM format.

with open("audio.wav", "rb") as f:
    print(f.read(50)[30:])

prints the following

b'\x84webmB\x87\x81\x02B\x85\x81\x02\x18S\x80g\x01\xff\xff'

and ffprove audio.wav reports;

Input #0, matroska,webm, from 'audio.wav':
  Metadata:
    encoder         : QTmuxingAppLibWebM-0.0.1
  Duration: N/A, start: -0.001000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

torchaudio.load first attempts to read it with libsox, but it fails as WebM is not supported, and it re-tries with FFmpeg only when the source is file path. It cannot retry when the input is file-like object, as seek method is not always available.

To handle WebM, you can use torchaudio.io.StreamReader, and it works with both file input and file-like object input and it can do iterative reading as well.

# loading from path and  read the entire audio in one-go
s = torchaudio.io.StreamReader(path)
s.add_basic_audio_stream(-1)
s.process_all_packets()
waveform, = s.pop_chunks()

# load from file-like object and read audio chunk-by-chunk
s = torchaudio.io.StreamReader(f)
s.add_basic_audio_stream(chunk_size)
for chunk, in s.stream():
    # process waveform

For the detailed usage, please checkout tutorials like