Skip to content

'All possible sounds': speech, music, and the emergence of machine listening

Date:2023
Format:Journal article
Credit:James E. K. Parker with Sean Dockray
Index: writing

attachments/Pasted image 20250630135527.png

Abstract

“Machine listening” is one common term for a fast-growing interdisciplinary field of science and engineering that “uses signal processing and machine learning to extract useful information from sound”. This article contributes to the critical literature on machine listening by presenting some of its history as a field. From the 1940s to the 1990s, work on artificial intelligence and audio developed along two streams. There was work on speech recognition/understanding, and work in computer music. In the early 1990s, another stream began to emerge. At institutions such as MIT Media Lab and Stanford’s CCRMA, researchers started turning towards “more fundamental problems of audition”. Propelled by work being done by and alongside musicians, speech and music would increasingly be understood by computer scientists as particular sounds within a broader “auditory scene”. Researchers began to develop machine listening systems for a more diverse range of sounds and classification tasks: often in the service of speech recognition, but also increasingly for their own sake. The soundscape itself was becoming an object of computational concern. Today, the ambition is “to cover all possible sounds”. That is the aspiration with which we must now contend politically, and which this article sets out to historicise and understand.

Download

Full text available here

Reference

attachments/sound-studies.jpg