For it to succeed, we need to be able to conduct analyses of all parts of the evaluation process: task definition, data, metrics, systems, and analysis. This is referred to as the research cycle ( Urbano et al., 2013 Sturm, 2016). 31), if experiments are followed by interpretation of results, learning, system improvement, and eventually re-evaluation or even re-definition of the task or the evaluation methodology (Figure 1). This kind of evaluation also allows us to acquire new knowledge and advance the field ( Serra et al., 2013, p. The combination of these elements determines the success criteria to evaluate systems and judge whether the task is indeed solved ( Sturm et al., 2014 Sturm, 2016). We do so based on the idea that applications lead to use cases that define who the users are, how they use the system, in what context and for what purpose ( Schedl et al., 2013). In this work, we critically discuss the evaluation of global tempo estimation systems. This is why recent near-perfect MIREX results ( Böck et al., 2015 Schreiber and Müller, 2018b) beg the question: are we done yet? In the meantime, new datasets have been published and another large-scale evaluation has been conducted ( Zapata and Gómez, 2011), but neither applications nor metrics have been fundamentally questioned or updated. Through both the datasets and metrics established in 2004 and for MIREX, we have seen global tempo estimation systems improve and have been able to track their performance. One year later, the 2005 Music Information Retrieval Evaluation eXchange (MIREX) ( Downie, 2008) established an automatic tempo extraction task, which has been conducted almost every year ever since. Acknowledging the importance of making results comparable, the first systematic evaluation with a defined set of metrics and datasets was conducted in 2004 ( Gouyon et al., 2006). Starting with the work of Goto and Muraoka ( 1994) and Scheirer ( 1998), the MIR research community has been conducting such evaluations for 25 years. To conduct a basic evaluation of a global tempo estimation system one needs the system itself, test recordings with globally stable tempo, suitable annotations, and at least one metric. In contrast to beat-tracking ( Allen and Dannenberg, 1990 Goto and Muraoka, 1994) or local tempo estimation ( Peeters, 2005), successful global tempo estimation requires the existence of a stable tempo as often occurs in Rock, Pop, or Dance music. It is often defined as estimating the frequency with which humans tap along to the beat ( Scheirer, 1998 Dixon, 2001). The estimation of a music recording’s global tempo is a classic Music Information Retrieval (MIR) task.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |