No Cover Image

Conference Paper/Proceeding/Abstract 478 views 69 downloads

Cultivating Spoken Language Technologies for Unwritten Languages

Thomas Reitmaier Orcid Logo, Dani Kalarikalayil Raju Orcid Logo, Ondrej Klejch Orcid Logo, Electra Wallington Orcid Logo, Nina Markl Orcid Logo, Jen Pearson Orcid Logo, Matt Jones Orcid Logo, Peter Bell Orcid Logo, Simon Robinson Orcid Logo

Proceedings of the CHI Conference on Human Factors in Computing Systems

Swansea University Authors: Thomas Reitmaier Orcid Logo, Jen Pearson Orcid Logo, Matt Jones Orcid Logo, Simon Robinson Orcid Logo

  • 65595.VoR.pdf

    PDF | Version of Record

    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Download (2.06MB)

DOI (Published version): 10.1145/3613904.3642026

Abstract

We report on community-centered, collaborative research that weaves together HCI, natural language processing, linguistic, and design insights to develop spoken language technologies for unwritten languages. Across three visits to a Banjara farming community in India, we use participatory, technical...

Full description

Published in: Proceedings of the CHI Conference on Human Factors in Computing Systems
ISBN: 979-8-4007-0330-0
Published: New York, NY, USA ACM 2024
URI: https://cronfa.swan.ac.uk/Record/cronfa65595
Abstract: We report on community-centered, collaborative research that weaves together HCI, natural language processing, linguistic, and design insights to develop spoken language technologies for unwritten languages. Across three visits to a Banjara farming community in India, we use participatory, technical, and creative methods to engage community members, collect spoken language photo annotations, and develop an information retrieval (IR) system. Drawing on orality theory, we interrogate assumptions and biases of current speech interfaces and create a simple application that leverages our IR system to match fluidly spoken queries with recorded annotations and surface corresponding photos. In-situ evaluations show how our novel approach returns reliable results and inspired the co-creation of media retrieval use-cases that are more appropriate in oral contexts. The very low (< 4h) spoken data requirements makes our approach adaptable to other contexts where languages are unwritten or have no digital language resources available.
Keywords: speech/language, zero-resource information retrieval, co-creation field study
College: Faculty of Science and Engineering