MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss

Abstract

Automatically generating symbolic music—music scores tailored to specific human needs—can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the ability to manage finer details, such as control at the level of individual bars. While fine-tuning a pre-trained symbolic music generation model might seem like a straightforward method for achieving this finer control, our research indicates challenges in this approach. The model often fails to respond adequately to new, fine-grained bar-level control signals. To address this, we propose two innovative solutions. First, we introduce a pre-training task designed to link control signals directly with corresponding musical tokens, which helps in achieving a more effective initialization for subsequent fine-tuning. Second, we implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts. Together, these techniques significantly enhance our ability to control music generation at the bar level, showing a 13.06% improvement over conventional methods. Our subjective evaluations also confirm that this enhanced control does not compromise the musical quality of the original pre-trained generative model.

Figure 1: The two-strategies framework of MuseBarControl to improve the controllability of the network.

Comparison with MuseCoco

Group 1

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 10 respondents (62.5%) felt that MuseBarControl aligns more closely with human creation. Meanwhile, 3 respondents (18.8%) preferred MuseCoco or viewed the two as similar. This indicates that the majority of participants favor MuseBarControl in terms of human-like creation.


Group 2

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 50% of the participants believed that MuseCoco was more in line with human creation, while 37.5% chose MuseBarControl. Additionally, 12.5% of participants felt that the two were about the same. This indicates that the majority of respondents tend to believe that MuseCoco more closely resembles the style of human creation.


Group 3

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 75% of participants believed that MuseCoco was more aligned with human creation. This proportion was significantly higher than those who preferred MuseBarControl (18.8%) and those who found the two nearly identical (6.2%). This indicates that, when comparing these two pieces of music, most listeners tend to believe that MuseCoco exhibits more characteristics of human creation.


Group 4

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 87.5% (14 people) believed that MuseBarControl was more in line with human creation. This proportion was significantly higher than those who preferred MuseCoco (6.2%, 1 person) and those who felt the two were about the same (6.2%, 1 person). This suggests that most respondents perceive more human-like creative elements in MuseBarControl's music.


Group 5

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 56.2% of participants believed that MuseCoco was more aligned with human creation, making it the most preferred. Additionally, 25.0% of participants thought the two were about the same, while only 18.8% felt that MuseBarControl was more in line with human creation.


Group 6

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 7 participants (43.8%) believed that MuseBarControl was more aligned with human creation, while MuseCoco received support from 5 participants (31.2%). Additionally, 4 participants (25.0%) felt that the two were similar. This suggests that most respondents tend to believe that MuseBarControl is closer to the human creative style.


Group 7

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 56.2% of participants believed that MuseBarControl is more in line with human creation, making it the most favored option. Only 18.8% felt that MuseCoco was more suitable, while 25.0% chose "almost the same" as a middle ground.


Group 8

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 68.8% of participants believed that MuseBarControl was more aligned with human creation, significantly higher than those who preferred MuseCoco (12.5%) or felt the two were "about the same" (18.8%). This indicates that most listeners perceive MuseBarControl as exhibiting more human-like creative characteristics when comparing the two pieces of music.


Group 9

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 11 users (68.8%) chose MuseCoco, believing it offers a better control experience. Four users (25.0%) preferred MuseBarControl, while only one user (6.2%) felt there was little difference between the two. This data suggests that MuseCoco is favored by more users for its control experience.


Group 10

MuseBarControl MuseCoco

Comments:

Among the 16 valid questionnaires, 50.0% of participants believed that MuseCoco was more in line with human creation, making it the most popular choice. MuseBarControl was chosen by 37.5% of participants, while 12.5% felt that the two were similar.

Canon-style generation by MuseBarControl

Figure 2: An example of three pop songs sharing the same chord progression.

Our MuseBarControl can generate various music pieces that share the same chord progression as "Canon" by Pachelbel, "Far Away" by Jay Chou, and "Absolute Obsession" by Sam Lee. The chord alignment in the first five bars is C, G, A:m, E:m, and F.
Global Attribute Value Canon-style Generation

Rhythm Intensity: Intense

Key: Minor

Pitch range: 4 Octaves

Tempo: Moderato

Rhythm Intensity: Moderate

Key: Major

Pitch range: 2 Octaves

Tempo: Fast

Rhythm Intensity: Intense

Key: Major

Pitch range: 2 Octaves

Tempo: Slow