The perception of two (or more) simultaneous musical notes, depending on their pitch interval(s), could be broadly categorized as consonant or dissonant. Previous literature has suggested that musicians and non-musicians adopt different strategies when discerning music intervals: musicians rely on the frequency ratio (defined as the relative ratio between the two fundamental frequencies, such as 3:2 (C-G), perfect fifth/consonant; vs. the mixtures of semitones, as 45:32 (C-#F), tritone/dissonant) for the musicians; and non-musicians on frequency differences (e.g., the presences of beats, perceived as rough), and their separate ERP differences in N1(~160ms) and P2(~250ms) along the midline electrodes. To replicate and extend, in this study we reran the previous experiment, and separately collected fMRI data of the same protocol (with sparse sampling modifications). The behavioral and EEG results to a large extent corresponded to our previous finding. The fMRI results, with the joint analyses by univariate, psycho-physiological interaction, multi-voxel pattern analysis, and representational similarity analysis (RSA) approaches, further reinforce the involvement of midline and related-brain regions in consonant/dissonance judgments. The final spatio-temporal searchlight RSA provided convincing evidence that medial prefrontal cortex, along with bilateral superior temporal cortex, as the joint locus of midline N1, and dorsal anterior cingulate cortex for the P2 effect (for musicians). Together, these analyses not just reaffirm that musicians rely more on top-down knowledge for consonance/dissonance perception; but also demonstrate the advantages of multiple analyses in constraining the findings from both EEG and fMRI.