emvoice.pitch
Pitch-related voice features.
Module Contents
Classes
Estimate and store pitch frames. |
|
Extract and store glottal pulse frames. |
|
Create and store signal frames. |
|
Estimate and store voice pitch harmonics. |
|
Extract and store voice jitter frames. |
|
Extract and store voice shimmer frames. |
- class emvoice.pitch.PitchFrames(frames: numpy.ndarray, flag: numpy.ndarray, prob: numpy.ndarray, sr: int, lower: float, upper: float, frame_len: int, hop_len: int, method: str, center: bool = True, pad_mode: str = 'constant')[source]
Bases:
emvoice.frames.BaseFramesEstimate and store pitch frames.
Estimate and store the voice pitch measured as the fundamental frequency F0 in Hz.
- Parameters:
frames (numpy.ndarray) – Voice pitch frames in Hz with shape (num_frames,).
flag (numpy.ndarray) – Boolean flags indicating which frames are voiced with shape (num_frames,).
prob (numpy.ndarray) – Probabilities for frames being voiced with shape (num_frames,).
lower (float) – Lower limit used for pitch estimation (in Hz).
upper (float) – Upper limit used for pitch estimation (in Hz).
method (str) – Method used for estimating voice pitch.
See also
librosa.pyin,librosa.yin- classmethod from_signal(sig_obj: emvoice.signal.BaseSignal, frame_len: int, hop_len: Optional[int] = None, center: bool = True, pad_mode: str = 'constant', lower: float = 75.0, upper: float = 600.0, method: str = 'pyin')[source]
Estimate the voice pitch frames from a signal.
Currently, voice pitch can only be extracted with the pYIN method.
- Parameters:
sig_obj (BaseSignal) – Signal object.
frame_len (int) – Number of samples per frame.
hop_len (int, optional, default=None) – Number of samples between frame starting points. If None, uses frame_len // 4.
center (bool, default=True) – Whether to center the frames and apply padding.
pad_mode (str, default='constant') – How the signal is padded before framing. See
numpy.pad(). Uses the default value 0 for ‘constant’ padding. Ignored if center=False.lower (float, default = 75.0) – Lower limit for pitch estimation (in Hz).
upper (float, default = 600.0) – Upper limit for pitch estimation (in Hz).
method (str, default = 'pyin') – Method for estimating voice pitch. Only ‘pyin’ is currently available.
- Raises:
NotImplementedError – If a method other than ‘pyin’ is given.
- class emvoice.pitch.PitchPulseFrames(frames: List[Tuple], sr: int, frame_len: int, hop_len: int, center: bool = True, pad_mode: str = 'constant')[source]
Bases:
emvoice.frames.BaseFramesExtract and store glottal pulse frames.
Glottal pulses are peaks in the signal corresponding to the fundamental frequency F0.
- Parameters:
frames (list) – Pulse frames. Each frame contains a list of pulses or an empty list if no pulses are detected. Pulses are stored as tuples (pulse timestamp, T0, amplitude).
Notes
See Algorithms section for details.
- property idx: numpy.ndarray[source]
Frame indices (read-only).
- classmethod from_signal_and_pitch_frames(sig_obj: emvoice.signal.BaseSignal, pitch_frames_obj: PitchFrames)[source]
Extract glottal pulse frames from a signal and voice pitch frames.
- Parameters:
sig_obj (BaseSignal) – Signal object.
pitch_frames_obj (PitchFrames) – Voice pitch frames object.
- class emvoice.pitch.PitchPeriodFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, lower: float, upper: float)[source]
Bases:
emvoice.frames.BaseFramesCreate and store signal frames.
A frame is an (overlapping, padded) slice of a signal for which higher-order features can be computed.
- Parameters:
frames (numpy.ndarray) – Signal frames. The first dimension should be the number of frames.
sr (int) – Sampling rate.
frame_len (int) – Number of samples per frame.
hop_len (int) – Number of samples between frame starting points.
center (bool, default=True) – Whether the signal has been centered and padded before framing.
pad_mode (str, default='constant') – How the signal has been padded before framing. See
numpy.pad(). Uses the default value 0 for ‘constant’ padding.
See also
librosa.util.frame
- class emvoice.pitch.PitchHarmonicsFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool = True, pad_mode: str = 'constant', n_harmonics: int = 100)[source]
Bases:
emvoice.frames.BaseFramesEstimate and store voice pitch harmonics.
Compute the energy of the signal at harmonics (nF0 for any integer n) of the fundamental frequency.
- Parameters:
frames (numpy.ndarray) – Harmonics frames with the shape (num_frames, n_harmonics)
n_harmonics (int, default=100) – Number of estimated harmonics.
See also
librosa.f0_harmonics- classmethod from_spec_and_pitch_frames(spec_frames_obj: emvoice.spectral.SpecFrames, pitch_frames_obj: PitchFrames, n_harmonics: int = 100)[source]
Estimate voice pitch harmonics from spectrogram frames and voice pitch frames.
- Parameters:
spec_frames_obj (SpecFrames) – Spectrogram frames object.
pitch_frames_obj (PitchFrames) – Pitch frames object.
n_harmonics (int, default=100) – Number of harmonics to estimate.
- class emvoice.pitch.JitterFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, rel: bool, lower: float, upper: float, max_period_ratio: float)[source]
Bases:
PitchPeriodFramesExtract and store voice jitter frames.
- Parameters:
frames (numpy.ndarray) – Voice jitter frames of shape (num_frames,).
rel (bool) – Whether the voice jitter is relative to the average period length.
lower (float) – Lower limit for periods between glottal pulses.
upper (float) – Upper limit for periods between glottal pulses.
max_period_ratio (float) – Maximum ratio between consecutive periods used for jitter extraction.
Notes
Compute jitter as the average absolute difference between consecutive fundamental periods with a ratio below max_period_ratio for each frame. If
rel=True, jitter is divided by the average fundamental period of each frame. Fundamental periods are calculated as the first-order temporal difference between consecutive glottal pulses.- classmethod from_pitch_pulse_frames(pitch_pulse_frames_obj: PitchPulseFrames, rel: bool = True, lower: float = 0.0001, upper: float = 0.02, max_period_ratio: float = 1.3)[source]
Extract voice jitter frames from glottal pulse frames.
- Parameters:
pitch_pulse_frames_obj (PitchPulseFrames) – Glottal pulse frames object.
rel (bool, optional, default=True) – Divide jitter by the average pitch period.
lower (float, optional, default=0.0001) – Lower limit for periods between glottal pulses.
upper (float, optional, default=0.02) – Upper limit for periods between glottal pulses.
max_period_ratio (float, optional, default=1.3) – Maximum ratio between consecutive periods for jitter extraction.
- class emvoice.pitch.ShimmerFrames(frames: List[Tuple], sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, rel: bool, lower: float, upper: float, max_period_ratio: float, max_amp_factor: float)[source]
Bases:
PitchPeriodFramesExtract and store voice shimmer frames.
- Parameters:
frames (numpy.ndarray) – Voice shimmer frames of shape (num_frames,).
rel (bool) – Whether the voice shimmer is relative to the average period length.
lower (float) – Lower limit for periods between glottal pulses.
upper (float) – Upper limit for periods between glottal pulses.
max_period_ratio (float) – Maximum ratio between consecutive periods used for shimmer extraction.
max_amp_factor (float) – Maximum ratio between consecutive amplitudes used for shimmer extraction.
Notes
Compute shimmer as the average absolute difference between consecutive pitch amplitudes with a fundamental period ratio below max_period_ratio and amplitude ratio below max_amp_factor for each frame. If
rel=True, shimmer is divided by the average amplitude of each frame. Fundamental periods are calculated as the first-order temporal difference between consecutive glottal pulses. Amplitudes are signal amplitudes at the glottal pulses.- classmethod from_pitch_pulse_frames(pitch_pulse_frames_obj: PitchPulseFrames, rel: bool = True, lower: float = 0.0001, upper: float = 0.02, max_period_ratio: float = 1.3, max_amp_factor: float = 1.6)[source]
Extract voice shimmer frames from glottal pulse frames.
- Parameters:
pitch_pulse_frames_obj (PitchPulseFrames) – Glottal pulse frames object.
rel (bool, optional, default=True) – Divide shimmer by the average pulse amplitude.
lower (float, optional, default=0.0001) – Lower limit for periods between glottal pulses.
upper (float, optional, default=0.02) – Upper limit for periods between glottal pulses.
max_period_ratio (float, optional, default=1.3) – Maximum ratio between consecutive periods for shimmer extraction.
max_amp_factor (float, optional, default=1.6) – Maximum ratio between consecutive amplitudes used for shimmer extraction.