Research articles

Real World Music Object Recognition

Authors:

Abstract

We present solutions to two of the most pressing issues in contemporary optical music recognition (OMR). We improve recognition accuracy on low-quality, real-world (i.e. containing ageing, lighting, or dirt artefacts among others) input data and provide confidence-rated model outputs to enable efficient human post-processing. Specifically, we present (i) a sophisticated input augmentation scheme that can reduce the gap between sanitised benchmarks and realistic tasks through a combination of synthetic data and noisy perturbations of real-world documents; (ii) an adversarial discriminative domain adaptation method that can be employed to improve the performance of OMR systems on low-quality data; (iii) a combination of model ensembles and prediction fusion, which generates trustworthy confidence ratings for each prediction. We evaluate our contributions on a newly created test set consisting of manually annotated pages of varying real-world quality, sourced from the International Music Score Library Project (IMSLP)/Petrucci Music Library. With the presented data augmentation scheme, we achieve a doubling in detection performance from 36.0% to 73.3% on noisy real-world data compared to state-of-the-art training. This result is then combined with robust confidence ratings paving the way for OMR to be deployed in the real world. Additionally, we show the merits of unsupervised adversarial domain adaptation for OMR raising the 36.0% baseline to 48.9%.

All our code and data are freely available at: https://github.com/raember/s2anet/tree/TISMIR_publication.

Keywords:

Optical Music RecognitionDeep LearningData AugmentationAdversarial TrainingModel EnsemblesOpen Data
  • Year: 2024
  • Volume: 7 Issue: 1
  • Page/Article: 1–14
  • DOI: 10.5334/tismir.157
  • Submitted on 6 Dec 2022
  • Accepted on 31 Jul 2023
  • Published on 11 Jan 2024
  • Peer Reviewed