emvoice.spectral
Spectral voice features.
Module Contents
Classes
Create and store spectrogram frames. |
|
Calculate and store Mel spectrograms. |
|
Estimate and store Mel frequency cepstral coefficients (MFCCs). |
|
Calculate and store spectogram alpha ratios. |
|
Calculate and store the spectogram Hammarberg index. |
|
Estimate and store spectral slopes. |
|
Calculate and store spectral flux. |
- class emvoice.spectral.SpecFrames(frames: numpy.ndarray, sr: int, window: str, frame_len: int, hop_len: int, center: bool = True, pad_mode: str = 'constant')[source]
Bases:
emvoice.frames.BaseFramesCreate and store spectrogram frames.
Computes a spectrogram of a signal using the short-time Fourier transform (STFT).
- Parameters:
frames (numpy.ndarray) – Spectrogram frames.
window (str) – The window that was applied before the STFT.
Notes
Frames contain complex arrays x where
np.abs(x)is the magnitude andnp.angle(x)is the phase of the signal for different frequency bins.See also
librosa.stft- classmethod from_signal(sig_obj: emvoice.signal.BaseSignal, frame_len: int, hop_len: Optional[int] = None, center: bool = True, pad_mode: str = 'constant', window: Union[str, float, Tuple] = 'hann')[source]
Transform a signal into spectrogram frames.
- Parameters:
sig_obj (BaseSignal) – Signal object.
frame_len (int) – Number of samples per frame.
hop_len (int, optional, default=None) – Number of samples between frame starting points. If None, uses frame_len // 4.
center (bool, default=True) – Whether to center the frames and apply padding.
pad_mode (str, default='constant') – How the signal is padded before framing. See
numpy.pad(). Uses the default value 0 for ‘constant’ padding. Ignored if center=False.window (str) – The window that is applied before the STFT.
- class emvoice.spectral.MelSpecFrames(frames: numpy.ndarray, sr: int, window: str, frame_len: int, hop_len: int, center: bool, pad_mode: str, n_mels: int, lower: float, upper: float)[source]
Bases:
SpecFramesCalculate and store Mel spectrograms.
- Parameters:
frames (numpy.ndarray) – Spectrogram frames on the Mel power scale with shape (num_frames, n_mels).
n_mels (int) – Number of Mel filters.
lower (float) – Lower frequency boundary in Hz.
upper (float) – Upper frequency boundary in Hz.
See also
librosa.feature.melspectrogram- classmethod from_spec_frames(spec_frames_obj: SpecFrames, n_mels: int = 26, lower: float = 20.0, upper: float = 8000.0)[source]
Calculate Mel spectrograms from spectrogram frames.
- spec_frames_obj: SpecFrames
Spectrogram frames object.
- n_mels: int, default=26
Number of Mel filters.
- lower: float, default=20.0
Lower frequency boundary in Hz.
- upper: float, default=8000.0
Upper frequency boundary in Hz.
- class emvoice.spectral.MfccFrames(frames: numpy.ndarray, sr: int, window: str, frame_len: int, hop_len: int, center: bool, pad_mode: str, n_mels: int, lower: float, upper: float, n_mfcc: int, lifter: float)[source]
Bases:
MelSpecFramesEstimate and store Mel frequency cepstral coefficients (MFCCs).
- Parameters:
frames (numpy.ndarray) – MFCC frames with shape (num_frames, n_mfcc).
n_mfcc (int) – Number of coeffcients that were estimated per frame.
lifter (float) – Cepstral liftering coefficient. Must be >= 0. If zero, no liftering is applied.
- classmethod from_mel_spec_frames(mel_spec_frames_obj: MelSpecFrames, n_mfcc: int = 4, lifter: float = 22.0)[source]
Estimate MFCCs from Mel spectogram frames.
- Parameters:
mel_spec_frames_obj (MelSpecFrames) – Mel spectrogram frames object.
n_mfcc (int, default=4) – Number of coeffcients that were estimated per frame.
lifter (float, default=22.0) – Cepstral liftering coefficient. Must be >= 0. If zero, no liftering is applied.
See also
librosa.feature.mfcc
- class emvoice.spectral.AlphaRatioFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, lower_band: Tuple[float], upper_band: Tuple[float])[source]
Bases:
emvoice.frames.BaseFramesCalculate and store spectogram alpha ratios.
- Parameters:
frames (numpy.ndarray) – Alpha ratio frames in dB with shape (num_frames,).
lower_band (tuple) – Boundaries of the lower frequency band (start, end) in Hz.
upper_band (tuple) – Boundaries of the upper frequency band (start, end) in Hz.
Notes
Calculate the alpha ratio by dividing the energy (sum of magnitude) in the lower frequency band by the energy in the upper frequency band. The ratio is then converted to dB.
- classmethod from_spec_frames(spec_frames_obj: SpecFrames, lower_band: Tuple = (50.0, 1000.0), upper_band: Tuple = (1000.0, 5000.0))[source]
Calculate the alpha ratio from spectrogram frames.
- Parameters:
spec_frames_obj (SpecFrames) – Spectrogram frames object.
lower_band (tuple, default=(50.0, 1000.0)) – Boundaries of the lower frequency band (start, end) in Hz.
upper_band (tuple, default=(1000.0, 5000.0)) – Boundaries of the upper frequency band (start, end) in Hz.
- class emvoice.spectral.HammarIndexFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, pivot_point: float, upper: float)[source]
Bases:
emvoice.frames.BaseFramesCalculate and store the spectogram Hammarberg index.
- Parameters:
frames (numpy.ndarray) – Hammarberg index frames in dB with shape (num_frames,).
pivot_point (float) – Point separating the lower and upper frequency regions in Hz.
upper (float) – Upper limit for the upper frequency region in Hz.
Notes
Calculate the Hammarberg index by dividing the peak magnitude in the spectrogram region below pivot_point by the peak magnitude in region between pivot_point and upper. The ratio is then converted to dB.
- classmethod from_spec_frames(spec_frames_obj: SpecFrames, pivot_point: float = 2000.0, upper: float = 5000.0)[source]
Calculate the Hammarberg index from spectrogram frames.
- Parameters:
spec_frames_obj (SpecFrames) – Spectrogram frames object.
pivot_point (float, default=2000.0) – Point separating the lower and upper frequency regions in Hz.
upper (float, default=5000.0) – Upper limit for the upper frequency region in Hz.
- class emvoice.spectral.SpectralSlopeFrames(frames: numpy.ndarray, sr: int, frame_len: int, hop_len: int, center: bool, pad_mode: str, bands: Tuple[Tuple[float]])[source]
Bases:
emvoice.frames.BaseFramesEstimate and store spectral slopes.
- Parameters:
frames (numpy.ndarray) – Spectral slope frames with shape (num_frames, num_bands).
bands (tuple) – Frequency bands in Hz for which slopes were estimated.
Notes
Estimate spectral slopes by fitting linear models to frequency bands predicting power in dB from frequency in Hz. Fits separate models for each frame and band.
- classmethod from_spec_frames(spec_frames_obj: SpecFrames, bands: Tuple[Tuple[float]] = ((0.0, 500.0), (500.0, 1500.0)))[source]
Estimate spectral slopes from spectrogram frames.
- Parameters:
spec_frames_obj (SpecFrames) – Spectrogram frames object.
bands (tuple, default=((0.0, 500.0), (500.0, 1500.0))) – Frequency bands in Hz for which slopes are estimated.
- class emvoice.spectral.SpectralFluxFrames(frames: numpy.ndarray, sr: int, window: str, frame_len: int, hop_len: int, center: bool, pad_mode: str, lower: float, upper: float)[source]
Bases:
SpecFramesCalculate and store spectral flux.
- Parameters:
frames (numpy.ndarray) – Spectral flux frames with shape (num_frames-1,).
lower (float) – Lower limit for frequency bins.
upper (float) – Upper limit for frequency bins
Notes
Compute the spectral flux as:
Compute the normalized magnitudes of the frame spectra by dividing the magnitude at each frequency bin by the sum of all frequency bins.
Compute the first-order difference of normalized magnitudes for each frequency bin within [lower, upper) across frames.
Sum up the squared differences for each frame.
Due to the first-order difference, the object has a frame less than the spectrogram from which it has been computed.
- classmethod from_spec_frames(spec_frames_obj: SpecFrames, lower: float = 0.0, upper: float = 5000.0)[source]
Calculate the spectral flux from spectrogram frames.
- Parameters:
spec_frames_obj (SpecFrames) – Spectrogram frames object.
lower (float, default=0.0) – Lower limit for frequency bins.
upper (float, default=5000.0) – Upper limit for frequency bins