This disclosure provides a method of decoding a digital speech signal, a speech decoder, a handset or mobile radio, and a base station or console. The method includes receiving a voice bit stream including at least one frame of bits that includes block codes, determining least confident bits in a first block code, generating candidates for the first block code based on the least confident bits, determining a first distance between each candidate and the first block code, and demodulating at least one other block code to obtain at least one demodulated vector. For each demodulated vector, a second distance between the demodulated vector and possible transmitted vectors is determined, and from the possible transmitted vectors, a vector corresponding to a minimum second distance is selected as a corrected demodulated vector. A minimum total distance is determined, and a candidate is selected as a corrected first block code.
This disclosure provides a method of decoding a digital speech signal, a speech decoder, a handset or mobile radio, and a base station or console. The method includes receiving a voice bit stream including at least one frame of bits that includes block codes, determining least confident bits in a first block code, generating candidates for the first block code based on the least confident bits, determining a first distance between each candidate and the first block code, and demodulating at least one other block code to obtain at least one demodulated vector. For each demodulated vector, a second distance between the demodulated vector and possible transmitted vectors is determined, and from the possible transmitted vectors, a vector corresponding to a minimum second distance is selected as a corrected demodulated vector. A minimum total distance is determined, and a candidate is selected as a corrected first block code.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 21/00 - Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
Non-voice data is embedded in a voice bit stream that includes frames of voice bits by selecting a frame of voice bits to carry the non-voice data, placing non-voice identifier bits in a first portion of the voice bits in the selected frame, and placing the non-voice data in a second portion of the voice bits in the selected frame. The non-voice identifier bits are employed to reduce a perceived effect of the non-voice data on audible speech produced from the voice bit stream.
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 15/24 - Speech recognition using non-acoustical features
G10L 19/005 - Correction of errors induced by the transmission channel, if related to the coding algorithm
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Tone data embedded in a voice bit stream that includes frames of non-tone bits and frames of tone bits is detected and extracted by selecting a frame of bits, analyzing the selected frame of bits to determine whether it is a frame of tone bits, and, when it is a frame of tone bits, extracting tone data from it. Analyzing the selected frame includes comparing bits of the selected frame to sets of tone data to produce error criteria representative of differences between the selected frame and each of multiple sets of tone data. Based on the error criteria, a set of tone data that most closely corresponds to the bits of the selected frame is selected. When the error criteria corresponding to the selected set of tone data satisfies a set of thresholds, the selected frame is designated as a frame of tone bits.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G10L 19/005 - Correction of errors induced by the transmission channel, if related to the coding algorithm
H04W 88/18 - Service support devicesNetwork management devices
5.
SPEECH MODEL PARAMETER ESTIMATION AND QUANTIZATION
Quantizing speech model parameters includes, for each of multiple vectors of quantized excitation strength parameters, determining first and second errors between first and second elements of a vector of excitation strength parameters and, respectively, first and second elements of the vector of quantized excitation strength parameters, and determining a first energy and a second energy associated with, respectively, the first and second errors. First and second weights for, respectively, the first error and the second error, are determined and are used to produce first and second weighted errors, which are combined to produce a total error. The total errors of each of the multiple vectors of quantized excitation strength parameters are compared and the vector of quantized excitation strength parameters that produces the smallest total error is selected to represent the vector of excitation strength parameters.
G10L 19/08 - Determination or coding of the excitation functionDetermination or coding of the long-term prediction parameters
G10L 25/21 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being power information
Tone data embedded in a voice bit stream that includes frames of non-tone bits and frames of tone bits is detected and extracted by selecting a frame of bits, analyzing the selected frame of bits to determine whether it is a frame of tone bits, and, when it is a frame of tone bits, extracting tone data from it. Analyzing the selected frame includes comparing bits of the selected frame to sets of tone data to produce error criteria representative of differences between the selected frame and each of multiple sets of tone data. Based on the error criteria, a set of tone data that most closely corresponds to the bits of the selected frame is selected. When the error criteria corresponding to the selected set of tone data satisfies a set of thresholds, the selected frame is designated as a frame of tone bits.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 19/22 - Mode decision, i.e. based on audio signal content versus external parameters
G10L 25/57 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals
7.
Speech model parameter estimation and quantization
Quantizing speech model parameters includes, for each of multiple vectors of quantized excitation strength parameters, determining first and second errors between first and second elements of a vector of excitation strength parameters and, respectively, first and second elements of the vector of quantized excitation strength parameters, and determining a first energy and a second energy associated with, respectively, the first and second errors. First and second weights for, respectively, the first error and the second error, are determined and are used to produce first and second weighted errors, which are combined to produce a total error. The total errors of each of the multiple vectors of quantized excitation strength parameters are compared and the vector of quantized excitation strength parameters that produces the smallest total error is selected to represent the vector of excitation strength parameters.
G10L 19/087 - Determination or coding of the excitation functionDetermination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC
G10L 19/038 - Vector quantisation, e.g. TwinVQ audio
G10L 25/21 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being power information
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
Non-voice data is embedded in a voice bit stream that includes frames of voice bits by selecting a frame of voice bits to carry the non-voice data, placing non-voice identifier bits in a first portion of the voice bits in the selected frame, and placing the non-voice data in a second portion of the voice bits in the selected frame. The non-voice identifier bits are employed to reduce a perceived effect of the non-voice data on audible speech produced from the voice bit stream.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 15/24 - Speech recognition using non-acoustical features
G10L 19/005 - Correction of errors induced by the transmission channel, if related to the coding algorithm
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
9.
REDUCING PERCEIVED EFFECTS OF NON-VOICE DATA IN DIGITAL SPEECH
Non-voice data is embedded in a voice bit stream that includes frames of voice bits by selecting a frame of voice bits to carry the non-voice data, placing non-voice identifier bits in a first portion of the voice bits in the selected frame, and placing the non-voice data in a second portion of the voice bits in the selected frame. The non-voice identifier bits are employed to reduce a perceived effect of the non-voice data on audible speech produced from the voice bit stream.
Compensating a speech signal for the presence of a speaker mask includes receiving a speech signal, dividing the speech signal into subframes, generating speech parameters for a subframe, and determining whether the subframe is suitable for use in detecting a mask. If the subframe is suitable for use in detecting a mask, the speech parameters for the subframe are used in determining whether a mask is present. If a mask is present, the speech parameters for the subframe are modified to produce modified speech parameters that compensate for the presence of the mask.
G10L 21/00 - Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/78 - Detection of presence or absence of voice signals
Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into frames including N subframes (where N is greater than 1); computing subframe model parameters including spectral parameters; and generating a representation of the frame that includes information representing the spectral parameters of P subframes (where P < N) and information identifying the P subframes. The representation excludes information representing the spectral parameters of the N-P subframes not included in the P subframes. Generating the representation includes selecting the P subframes by, for multiple combinations of P subframes, determining an error induced by representing the frame using the spectral parameters for the P subframes and using interpolated spectral parameter values for the N-P subframes. A combination of P subframes is selected based on the determined error for the combination of P subframes.
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/06 - Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
G10L 25/93 - Discriminating between voiced and unvoiced parts of speech signals
Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into frames including N subframes (where N is an integer greater than 1); computing model parameters for the subframes, the model parameters including spectral parameters; and generating a representation of the frame. The representation includes information representing the spectral parameters of P subframes (where P is an integer and P
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/12 - Determination or coding of the excitation functionDetermination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
13.
Audio watermarking via correlation modification using an amplitude and a magnitude modification based on watermark data and to reduce distortion
To convey information using an audio channel, an audio signal is modulated to produce a modulated signal by embedding additional information into the audio signal. Modulating the audio signal processing the audio signal to produce a set of filter responses; creating a delayed version of the filter responses; modifying the delayed version of the filter responses based on the additional information to produce an echo audio signal; and combining the audio signal and the echo audio signal to produce the modulated signal. Modulating the audio signal may involve employing a modulation strength, and a psychoacoustic model may be used to modify the modulation strength based on a comparison of a distortion of the modified audio signal relative to the audio signal and a target distortion.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G11B 20/00 - Signal processing not specific to the method of recording or reproducingCircuits therefor
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 25/21 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being power information
An audio watermarking system conveys information using an audio channel by modulating an audio signal to produce a modulated signal by embedding additional information into the audio signal. Modulating the audio signal includes segmenting the audio signal into overlapping time segments using a non-rectangular analysis window function produce a windowed audio signal, processing the windowed audio signal for a time segment to produce frequency coefficients representing the windowed time segment and having phase values and magnitude values, selecting one or more of the frequency coefficients, modifying phase values of the selected frequency coefficients using the additional information to map the phase values onto a known phase constellation, and processing the frequency coefficients including the modified phase values to produce the modulated signal.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
computer hardware for audio compression, audio analysis and audio processing; computer hardware for voice analysis and voice synthesis processing; voice compression hardware; Vocoders; integrated circuits for audio compression, audio analysis and audio processing; vocoder chips; computer software for audio compression, audio analysis and audio processing; embedded software for audio compression, audio analysis and audio processing; voice compression software; computer software for voice analysis and voice synthesis processing; software for use in vocoder chips. design, development, programming, customization and rental of vocoders; design, development, programming, customization and rental of integrated circuits; providing temporary use of non-downloadable software for audio compression, audio analysis and audio processing; providing temporary use of non-downloadable software for voice compression, voice analysis and voice processing; digital compression, analysis and processing of digital data for others; digital compression, analysis and processing of digital voice data for others; information and advisory services relating to all the abovementioned services.
An audio watermarking system conveys information using an audio channel by modulating an audio signal to produce a modulated signal by embedding additional information into the audio signal. Modulating the audio signal includes segmenting the audio signal into overlapping time segments using a non-rectangular analysis window function produce a windowed audio signal, processing the windowed audio signal for a time segment to produce frequency coefficients representing the windowed time segment and having phase values and magnitude values, selecting one or more of the frequency coefficients, modifying phase values of the selected frequency coefficients using the additional information to map the phase values onto a known phase constellation, and processing the frequency coefficients including the modified phase values to produce the modulated signal.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Methods for estimating speech model parameters are disclosed. For pulsed parameter estimation, a speech signal is divided into multiple frequency bands or channels using bandpass filters. Channel processing reduces sensitivity to pole magnitudes and frequencies and reduces impulse response time duration to improve pulse location and strength estimation performance. These methods are useful for high quality speech coding and reproduction at various bit rates for applications such as satellite and cellular voice communication.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
A digital audio server may be used to automatically download music from a collection of audio media, such as CDs or DVDs. The server also may automatically identify the media using track offset information.
Speech enhancement in a breathing apparatus is provided using a primary sensor mounted near a breathing mask user's mouth, at least one reference sensor mounted near a noise source, and a processor that combines the signals from these sensors to produce an output signal with an enhanced speech component. The reference sensor signal may be filtered and the result may be subtracted from the primary sensor signal to produce the output signal with an enhanced speech component. A method for detecting the exclusive presence of a low air alarm noise may be used to determine when to update the filter. A triple filter adaptive noise cancellation method may provide improved performance through reduction of filter maladaptation. The speech enhancement techniques may be employed as part of a communication system or a speech recognition system.
Methods for estimating speech model parameters are disclosed. For pulsed parameter estimation, a speech signal is divided into multiple frequency bands or channels using bandpass filters. Channel processing reduces sensitivity to pole magnitudes and frequencies and reduces impulse response time duration to improve pulse location and strength estimation performance. These methods are useful for high quality speech coding and reproduction at various bit rates for applications such as satellite and cellular voice communication.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into one or more frames, computing model parameters for a frame, and quantizing the model parameters to produce pitch bits conveying pitch information, voicing bits conveying voicing information, and gain bits conveying signal level information. One or more of the pitch bits are combined with one or more of the voicing bits and one or more of the gain bits to create a first parameter codeword that is encoded with an error control code to produce a first FEC codeword that is included in a bit stream for the frame. The process may be reversed to decode the bit stream.
G10L 19/12 - Determination or coding of the excitation functionDetermination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
First encoded voice bits are transcoded into second encoded voice bits by dividing the first encoded voice bits into one or more received frames, with each received frame containing multiple ones of the first encoded voice bits. First parameter bits for at least one of the received frames are generated by applying error control decoding to one or more of the encoded voice bits contained in the received frame, speech parameters are computed from the first parameter bits, and the speech parameters are quantized to produce second parameter bits. Finally, a transmission frame is formed by applying error control encoding to one or more of the second parameter bits, and the transmission frame is included in the second encoded voice bits.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into one or more frames and computing a set of model parameters for the frames. The set of model parameters includes at least a first parameter conveying pitch information. The voicing state of a frame is determined and the first parameter conveying pitch information is modified to designate the determined voicing state of the frame, if the determined voicing state of the frame is equal to one of a set of reserved voicing states. The model parameters are quantized to generate quantizer bits which are used to produce the bit stream.
G10L 19/12 - Determination or coding of the excitation functionDetermination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
G10L 11/06 - Discriminating between voiced and unvoiced parts of speech signals (G10L 11/04 takes precedence);;