No Cover Image

Journal article 15 views 2 downloads

BabyDS: Visually Grounded Grammar Induction with Online Curriculum Learning

Arash Ashrafzadeh, Julian Hough Orcid Logo, Arash Eshghi Orcid Logo

Languages, Volume: 11, Issue: 5, Start page: 99

Swansea University Author: Julian Hough Orcid Logo

  • languages-11-00099.pdf

    PDF | Version of Record

    © 2026 by the authors. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

    Download (5.72MB)

Abstract

Recent research in grounded language learning has seen remarkable success due to advances in large vision and language models (VLMs). However, these models (i) are extremely costly to train and update; (ii) struggle with generalisation; and (iii) do not support continual learning. In this paper, we...

Full description

Published in: Languages
ISSN: 2226-471X
Published: MDPI 2026
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa72028
Abstract: Recent research in grounded language learning has seen remarkable success due to advances in large vision and language models (VLMs). However, these models (i) are extremely costly to train and update; (ii) struggle with generalisation; and (iii) do not support continual learning. In this paper, we introduce baby-ds integrating the Dynamic Syntax (DS) framework with automated planning within the multimodal BabyAI platform as a testbed. We provide methods whereby DS lexicons are induced continually from teacher demonstrations within BabyAI. We study (i–iii) by experimenting with the compositional complexity of natural language instructions in the data to compare data efficiency, generalisation, and continual learning properties of baby-ds with a simple neural model. The results show that the baby-ds model: (i) needs much less data than the neural model to reach threshold performance; (ii) generalises much faster to more complex instructions; and (iii) is a more effective continual learner. We argue that it is the attendant linguistic bias within DS and the rich inferential power of TTR that enables (i–iii), highlighting the importance of further research on hybrid grammar–neural approaches. Finally, we discuss several important limitations of baby-ds and sketch a path forward for further DS research.
Keywords: grammar induction; neural semantic parsing; computational semantics; grounded language learning
College: Faculty of Science and Engineering
Funders: Hough’s work is supported by the EPSRC grant EP/X009343/1 ‘FLUIDITY’.
Issue: 5
Start Page: 99