Fundamentals of music processing - Meinard Müller
course: Computational Musicology
chapter 1+2
music analysis in 3 ways:
1. sheet music representation
2. symbolic representations
3. audio representations
1. sheet music representations contains:
- notes on a staff; vertical position indicated pitch, shape indicate duration
- symbols for dynamics, articulation, tempo and harmony
- advantage: an abstract and universal representation
- disadvantage: gives instruction, but not actual sound; Optical Music
Recognition (OMR) converts printed music intro digital symbolic formats
2. symbolic representations: Musical Instrument Digital Interface (MIDI)
- = a set of instructions indicating which notes are played, the intensity, duration
and instruments
- to transfer musical data between different instruments. it is the computational
version of the piano-roll → horizontal position indicates time, vertical pitch
- score based representations → enable advanced analysis and editing
- it is highly suitable for automatic transcription, music and harmony synthesis,
BUT no information on timbre and expression
3. audio representations
- = direct storage and processing of soundwaves (eg .wav, .mp3) it captures
sounds, not notes. suitable for recording, playback and signal processing
For music information retrieval (MIR) 4 aspects:
1. pitch; notes, melody, harmony
2. volume; power of the soundwave, more power=louder (in decibels)
3. timbre; sound color, to distinguish between instruments
4. duration; how long, meter, rhythm
but processing audio files require complex algorithms
Spectrogram = visual representation of the frequency content of a sound over time
- Fourier transform → how much of each frequency (soundwave) is present in a signal.
This is on a linear scale. 2 types of fourier transforms
1. Short-Time Fourier Transform: breaking the sound wave into tiny time
windows, eg 0.5 sec, to analyze how frequencies change over time. This is
done via a windowing function to smooth the transition (no artifacts). for
musical applications usage of Hann & Hamming windows for a balance
between time frequency resolution. Always include a fade-in and -out to get a
continuous signal
2. Discrete Short-Time FOurier Transform: used in digital signal processing
since signal is discrete
, - sampling rate determines how often the signal is measured
- nyquist rate: sampling rate is two times the highest frequency present
in the signal to avoid aliasing
- x-axis indicated progression over time, the y-axis indicated frequency
in Hz
alternative frequency scales are:
- constant- Q transform: on a logarithmic scale, makes it more musically relevant, eg
equal spacing of octaves
- mel & bark scales: on a logarithmic scale, how the human ear perceives frequencies
rather than their mathematical distribution
chapter 3
Each representation has multiple versions, e.g. different performances.
→ the goal of music synchronization is aligning these different versions, eg
- aligning sheet music with an audio recording
- synchronizing different audio recordings
the synchronization involves 2 main steps:
1. feature extraction; convert audio into a mid-level representation that captures the key
aspect of music
2. sequence alignment; use algorithms to match the extracted features from different
versions
to make music comparable across different formats, it must be transformed into suitable
feature representations → choice of features
- log-frequency spectrogram: represents human pitch perception
- chroma-features: represents the pitch content of a musical signal while ignoring
octave difference; multiple octaves, but the same pitch class
chroma representation → pitch class, loudness and timbre
chromagram: spectral content to the 12 pitch classes by
1. summing up all energy across all octaves for each pitch class
2. assigning energy to a chroma bin eg C, C#, D
3. displaying intensity over time; brighter colors indicate stronger presence of a note,
pitch class
limitations are no exact pitch frequency and issues regarding mistuning
3 metrics to measure chroma similarity
1. manhattan distance → sum of absolute differences in chroma values
2. euclidean distance → root mean square overall difference considering magnitude
3. chebyshev distance → maximum and largest chroma difference as a basis
Dynamic time warping (DTW) finds the best alignment between two sequences. it warps the
time axis for tempo variations between performances. 3 key rules; start and end alignment,
no backtracking and no skipping forward
music synchronization enables advances interactive music applications, such as MIDI, let
users switch between different performances, temppop curves
course: Computational Musicology
chapter 1+2
music analysis in 3 ways:
1. sheet music representation
2. symbolic representations
3. audio representations
1. sheet music representations contains:
- notes on a staff; vertical position indicated pitch, shape indicate duration
- symbols for dynamics, articulation, tempo and harmony
- advantage: an abstract and universal representation
- disadvantage: gives instruction, but not actual sound; Optical Music
Recognition (OMR) converts printed music intro digital symbolic formats
2. symbolic representations: Musical Instrument Digital Interface (MIDI)
- = a set of instructions indicating which notes are played, the intensity, duration
and instruments
- to transfer musical data between different instruments. it is the computational
version of the piano-roll → horizontal position indicates time, vertical pitch
- score based representations → enable advanced analysis and editing
- it is highly suitable for automatic transcription, music and harmony synthesis,
BUT no information on timbre and expression
3. audio representations
- = direct storage and processing of soundwaves (eg .wav, .mp3) it captures
sounds, not notes. suitable for recording, playback and signal processing
For music information retrieval (MIR) 4 aspects:
1. pitch; notes, melody, harmony
2. volume; power of the soundwave, more power=louder (in decibels)
3. timbre; sound color, to distinguish between instruments
4. duration; how long, meter, rhythm
but processing audio files require complex algorithms
Spectrogram = visual representation of the frequency content of a sound over time
- Fourier transform → how much of each frequency (soundwave) is present in a signal.
This is on a linear scale. 2 types of fourier transforms
1. Short-Time Fourier Transform: breaking the sound wave into tiny time
windows, eg 0.5 sec, to analyze how frequencies change over time. This is
done via a windowing function to smooth the transition (no artifacts). for
musical applications usage of Hann & Hamming windows for a balance
between time frequency resolution. Always include a fade-in and -out to get a
continuous signal
2. Discrete Short-Time FOurier Transform: used in digital signal processing
since signal is discrete
, - sampling rate determines how often the signal is measured
- nyquist rate: sampling rate is two times the highest frequency present
in the signal to avoid aliasing
- x-axis indicated progression over time, the y-axis indicated frequency
in Hz
alternative frequency scales are:
- constant- Q transform: on a logarithmic scale, makes it more musically relevant, eg
equal spacing of octaves
- mel & bark scales: on a logarithmic scale, how the human ear perceives frequencies
rather than their mathematical distribution
chapter 3
Each representation has multiple versions, e.g. different performances.
→ the goal of music synchronization is aligning these different versions, eg
- aligning sheet music with an audio recording
- synchronizing different audio recordings
the synchronization involves 2 main steps:
1. feature extraction; convert audio into a mid-level representation that captures the key
aspect of music
2. sequence alignment; use algorithms to match the extracted features from different
versions
to make music comparable across different formats, it must be transformed into suitable
feature representations → choice of features
- log-frequency spectrogram: represents human pitch perception
- chroma-features: represents the pitch content of a musical signal while ignoring
octave difference; multiple octaves, but the same pitch class
chroma representation → pitch class, loudness and timbre
chromagram: spectral content to the 12 pitch classes by
1. summing up all energy across all octaves for each pitch class
2. assigning energy to a chroma bin eg C, C#, D
3. displaying intensity over time; brighter colors indicate stronger presence of a note,
pitch class
limitations are no exact pitch frequency and issues regarding mistuning
3 metrics to measure chroma similarity
1. manhattan distance → sum of absolute differences in chroma values
2. euclidean distance → root mean square overall difference considering magnitude
3. chebyshev distance → maximum and largest chroma difference as a basis
Dynamic time warping (DTW) finds the best alignment between two sequences. it warps the
time axis for tempo variations between performances. 3 key rules; start and end alignment,
no backtracking and no skipping forward
music synchronization enables advances interactive music applications, such as MIDI, let
users switch between different performances, temppop curves