Tuesday, 5 February 2008

Re: Bach fugue project. Kirlin and Utgoff 2005: Learning to segregate voices

As the only paper I have found which incorporates some form of learning how to segregate voices by training on examples, rather than by following rules, this is the most similar to my work. However the authors only train their system on very small amounts of data; they use one piece (Bach's Ciaccona) for both training and testing, choosing small sections (circa 4-8 bars) from this piece to train the system on. The training sections are selected due to their similarity to the testing sections chosen from the piece. In the later experimentation there is some effort to combine the earlier training sections into one large training section. However this is still only using a relatively small amount of training music and still uses the same piece, so therefore there is a constant underlying tonality, style, set of basic thematic ideas). Some overfitting may well occur to that one specific piece (Ciaccona). How would this generalise to other similar works by Bach? or other Baroque pieces? or more widely across the musical spectrum? Could their system cope with larger amounts of training data and if so, why haven't they used more?

It is good to see in this paper that they say that they have found no other voice segregation methods that used automatic learning techniques - neither have I (yet)

This method learns how to identify what voice a note belongs to by observing its pitch relationship with the note in the previous time-slot. They examine the piece in smaller windows rather than in one large set, using a fixed window size (the exact window size is varied across experimentation to see which gives the best results).

Also: this system uses decision trees, therefore it is always a single, discrete answer that is given to the question "does this note belong to voice v?". No measure of the likelihood of this answer being correct, even in a non-probabilistic, general estimate. What happens in this system if there is a clash of two simultaneous notes being assigned to the same (monophonic) voice. The authors are unclear on this prospect.

Thinking about Cambouropoulos's points (Voice identification 2006), it is interesting to see that this system makes no use of "vertical" information in its process (harmonic structure at any particular time-point c.f. Schenkerian analysis middle ground analysis level). So they do not make use of observations about what notes are sounding at the same time to guide their working at all.

Another difference between this system and mine is in the end-goal of the system. My focus is on identifying the route that each voice takes throughout the entire course of the piece, assuming that voices are present (but not necessarily active) for the entirety of the music. K+U , on the other hand, take a lower-level approach, identifying fragments of the voices on selected bars, but allowing the voices to vary throughout the course of the piece.

No comments:

Post a Comment