emvoice.pitch

Pitch-related voice features.

Module Contents

Classes

PitchFrames

Estimate and store pitch frames.

PitchPulseFrames

Extract and store glottal pulse frames.

PitchPeriodFrames

Create and store signal frames.

PitchHarmonicsFrames

Estimate and store voice pitch harmonics.

JitterFrames

Extract and store voice jitter frames.

ShimmerFrames

Extract and store voice shimmer frames.

class emvoice.pitch.PitchFrames(frames: numpy.ndarray, flag: numpy.ndarray, prob: numpy.ndarray, sr: int, lower: float, upper: float, frame_len: int, hop_len: int, method: str, center: bool = True, pad_mode: str = 'constant')[source]

Bases: emvoice.frames.BaseFrames

Estimate and store pitch frames.

Estimate and store the voice pitch measured as the fundamental frequency F0 in Hz.

Parameters:
  • frames (numpy.ndarray) – Voice pitch frames in Hz with shape (num_frames,).

  • flag (numpy.ndarray) – Boolean flags indicating which frames are voiced with shape (num_frames,).

  • prob (numpy.ndarray) – Probabilities for frames being voiced with shape (num_frames,).

  • lower (float) – Lower limit used for pitch estimation (in Hz).

  • upper (float) – Upper limit used for pitch estimation (in Hz).

  • method (str) – Method used for estimating voice pitch.

See also

librosa.pyin, librosa.yin

classmethod from_signal(sig_obj: emvoice.signal.BaseSignal, frame_len: int, hop_len: Optional[int] = None, center: bool = True, pad_mode: str = 'constant', lower: float = 75.0, upper: float = 600.0, method: str = 'pyin')[source]

Estimate the voice pitch frames from a signal.

Currently, voice pitch can only be extracted with the pYIN method.

Parameters:
  • sig_obj (BaseSignal) – Signal object.

  • frame_len (int) – Number of samples per frame.

  • hop_len (int, optional, default=None) – Number of samples between frame starting points. If None, uses frame_len // 4.

  • center (bool, default=True) – Whether to center the frames and apply padding.

  • pad_mode (str, default='constant') – How the signal is padded before framing. See numpy.pad(). Uses the default value 0 for ‘constant’ padding. Ignored if center=False.

  • lower (float, default = 75.0) – Lower limit for pitch estimation (in Hz).

  • upper (float, default = 600.0) – Upper limit for pitch estimation (in Hz).

  • method (str, default = 'pyin') – Method for estimating voice pitch. Only ‘pyin’ is currently available.

Raises:

NotImplementedError – If a method other than ‘pyin’ is given.

class emvoice.pitch.PitchPulseFrames(frames: List[Tuple], sr: int, frame_len: int, hop_len: int, center: bool = True, pad_mode: str = 'constant')[source]

Bases: emvoice.frames.BaseFrames

Extract and store glottal pulse frames.

Glottal pulses are peaks in the signal corresponding to the fundamental frequency F0.

Parameters:

frames (list) – Pulse frames. Each frame contains a list of pulses or an empty list if no pulses are detected. Pulses are stored as tuples (pulse timestamp, T0, amplitude).

Notes

See Algorithms section for details.

property idx: numpy.ndarray[source]

Frame indices (read-only).

classmethod from_signal_and_pitch_frames(sig_obj: emvoice.signal.BaseSignal, pitch_frames_obj: PitchFrames)[source]

Extract glottal pulse frames from a signal and voice pitch frames.

Parameters:
class emvoice.pitch.PitchPeriodFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, lower: float, upper: float)[source]

Bases: emvoice.frames.BaseFrames

Create and store signal frames.

A frame is an (overlapping, padded) slice of a signal for which higher-order features can be computed.

Parameters:
  • frames (numpy.ndarray) – Signal frames. The first dimension should be the number of frames.

  • sr (int) – Sampling rate.

  • frame_len (int) – Number of samples per frame.

  • hop_len (int) – Number of samples between frame starting points.

  • center (bool, default=True) – Whether the signal has been centered and padded before framing.

  • pad_mode (str, default='constant') – How the signal has been padded before framing. See numpy.pad(). Uses the default value 0 for ‘constant’ padding.

See also

librosa.util.frame

class emvoice.pitch.PitchHarmonicsFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool = True, pad_mode: str = 'constant', n_harmonics: int = 100)[source]

Bases: emvoice.frames.BaseFrames

Estimate and store voice pitch harmonics.

Compute the energy of the signal at harmonics (nF0 for any integer n) of the fundamental frequency.

Parameters:
  • frames (numpy.ndarray) – Harmonics frames with the shape (num_frames, n_harmonics)

  • n_harmonics (int, default=100) – Number of estimated harmonics.

See also

librosa.f0_harmonics

classmethod from_spec_and_pitch_frames(spec_frames_obj: emvoice.spectral.SpecFrames, pitch_frames_obj: PitchFrames, n_harmonics: int = 100)[source]

Estimate voice pitch harmonics from spectrogram frames and voice pitch frames.

Parameters:
  • spec_frames_obj (SpecFrames) – Spectrogram frames object.

  • pitch_frames_obj (PitchFrames) – Pitch frames object.

  • n_harmonics (int, default=100) – Number of harmonics to estimate.

class emvoice.pitch.JitterFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, rel: bool, lower: float, upper: float, max_period_ratio: float)[source]

Bases: PitchPeriodFrames

Extract and store voice jitter frames.

Parameters:
  • frames (numpy.ndarray) – Voice jitter frames of shape (num_frames,).

  • rel (bool) – Whether the voice jitter is relative to the average period length.

  • lower (float) – Lower limit for periods between glottal pulses.

  • upper (float) – Upper limit for periods between glottal pulses.

  • max_period_ratio (float) – Maximum ratio between consecutive periods used for jitter extraction.

Notes

Compute jitter as the average absolute difference between consecutive fundamental periods with a ratio below max_period_ratio for each frame. If rel=True, jitter is divided by the average fundamental period of each frame. Fundamental periods are calculated as the first-order temporal difference between consecutive glottal pulses.

classmethod from_pitch_pulse_frames(pitch_pulse_frames_obj: PitchPulseFrames, rel: bool = True, lower: float = 0.0001, upper: float = 0.02, max_period_ratio: float = 1.3)[source]

Extract voice jitter frames from glottal pulse frames.

Parameters:
  • pitch_pulse_frames_obj (PitchPulseFrames) – Glottal pulse frames object.

  • rel (bool, optional, default=True) – Divide jitter by the average pitch period.

  • lower (float, optional, default=0.0001) – Lower limit for periods between glottal pulses.

  • upper (float, optional, default=0.02) – Upper limit for periods between glottal pulses.

  • max_period_ratio (float, optional, default=1.3) – Maximum ratio between consecutive periods for jitter extraction.

class emvoice.pitch.ShimmerFrames(frames: List[Tuple], sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, rel: bool, lower: float, upper: float, max_period_ratio: float, max_amp_factor: float)[source]

Bases: PitchPeriodFrames

Extract and store voice shimmer frames.

Parameters:
  • frames (numpy.ndarray) – Voice shimmer frames of shape (num_frames,).

  • rel (bool) – Whether the voice shimmer is relative to the average period length.

  • lower (float) – Lower limit for periods between glottal pulses.

  • upper (float) – Upper limit for periods between glottal pulses.

  • max_period_ratio (float) – Maximum ratio between consecutive periods used for shimmer extraction.

  • max_amp_factor (float) – Maximum ratio between consecutive amplitudes used for shimmer extraction.

Notes

Compute shimmer as the average absolute difference between consecutive pitch amplitudes with a fundamental period ratio below max_period_ratio and amplitude ratio below max_amp_factor for each frame. If rel=True, shimmer is divided by the average amplitude of each frame. Fundamental periods are calculated as the first-order temporal difference between consecutive glottal pulses. Amplitudes are signal amplitudes at the glottal pulses.

classmethod from_pitch_pulse_frames(pitch_pulse_frames_obj: PitchPulseFrames, rel: bool = True, lower: float = 0.0001, upper: float = 0.02, max_period_ratio: float = 1.3, max_amp_factor: float = 1.6)[source]

Extract voice shimmer frames from glottal pulse frames.

Parameters:
  • pitch_pulse_frames_obj (PitchPulseFrames) – Glottal pulse frames object.

  • rel (bool, optional, default=True) – Divide shimmer by the average pulse amplitude.

  • lower (float, optional, default=0.0001) – Lower limit for periods between glottal pulses.

  • upper (float, optional, default=0.02) – Upper limit for periods between glottal pulses.

  • max_period_ratio (float, optional, default=1.3) – Maximum ratio between consecutive periods for shimmer extraction.

  • max_amp_factor (float, optional, default=1.6) – Maximum ratio between consecutive amplitudes used for shimmer extraction.