A projection system for pixel shifting comprising a light source configured to emit light and a spatial light modulator configured to receive the light and generate a modulated light. The spatial light modulator includes a plurality of micromirrors. The projection system includes a wobulation device situated between the light source and the spatial light modulator. The wobulation device is configured to shift the light from the light source by fractional pixels. The projection system includes a controller configured to, for each of a plurality of subperiods, control the light source to emit the light onto the spatial light modulator, and between each of the plurality of subperiods and with the wobulation device, shift the light from the light source by a partial pixel distance.
Methods and apparatus for temporally consistent video denoising directed at removing image noise. According to one example, a denoising algorithm uses a noise estimation block and an image denoising block operatively connected to one another. In various examples, the noise estimation block is implemented using a neural network or a filter arrangement including an entropy filter and is designed to analyze noisy frame sequences and generate a noise strength map. With this noise strength map, the image denoising block operates to perform the image denoising more efficiently, effectively reducing the image noise while preserving the texture of the original footage. In some examples, a joint loss objective is used to find configurations of both blocks that result in nearly optimal performance of the denoising algorithm.
The disclosure relates to a computer-implemented method of training a neural network-based language model for restoring clean speech from a distorted speech waveform. The method includes obtaining an input time-frequency representation of the distorted speech waveform; determining, from the input time-frequency representation, one or more audio features (e.g., semantic representations) associated with the distorted speech waveform using a neural network-based multi-stage speech encoder; determining conditioning information for the language model based at least in part on the one or more audio features; and jointly training the language model and the speech encoder, wherein the language model is trained based at least in part on the conditioning information.
Methods and systems for designing binaural room impulse responses (BRIRs) for use in headphone virtualizers, and methods and systems for generating a binaural signal in response to a set of channels of a multi-channel audio signal, including by applying a BRIR to each channel of the set, thereby generating filtered signals, and combining the filtered signals to generate the binaural signal, where each BRIR has been designed in accordance with an embodiment of the design method. Other aspects are audio processing units configured to perform any embodiment of the inventive method. In accordance with some embodiments, BRIR design is formulated as a numerical optimization problem based on a simulation model (which generates candidate BRIRs) and at least one objective function (which evaluates each candidate BRIR), and includes identification of a best one of the candidate BRIRs as indicated by performance metrics determined for the candidate BRIRs by each objective function.
Disclosed is a prediction method adopting in-loop filtering. According to the present invention, a prediction method for encoding and decoding video comprises the following steps: generating a residual block of the current block through an inverse quantization and inverse transform; generating a prediction block of the current block through an intra-prediction; performing in-loop filtering on the current block in which the residual block and the prediction block are combined; and storing the current block, on which the in-loop filtering is performed, in a frame buffer for an intra-prediction of the next block to be encoded. As described above, prediction is performed using an in-loop filter during processes for encoding and decoding video, thereby improving the accuracy of prediction and reducing errors in prediction, thus improving the efficiency of video compression and reducing the amount of data to be transmitted.
H04N 19/80 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels
H04N 19/117 - Filtres, p. ex. pour le pré-traitement ou le post-traitement
H04N 19/126 - Détails des fonctions de normalisation ou de pondération, p. ex. matrices de normalisation ou quantificateurs uniformes variables
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/182 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant un pixel
H04N 19/186 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une couleur ou une composante de chrominance
H04N 19/44 - Décodeurs spécialement adaptés à cet effet, p. ex. décodeurs vidéo asymétriques par rapport à l’encodeur
H04N 19/50 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif
H04N 19/59 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif mettant en œuvre un sous-échantillonnage spatial ou une interpolation spatiale, p. ex. modification de la taille de l’image ou de la résolution
H04N 19/593 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif mettant en œuvre des techniques de prédiction spatiale
H04N 19/82 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels mettant en œuvre le filtrage dans une boucle de prédiction
H04N 19/86 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo mettant en œuvre la diminution des artéfacts de codage, p. ex. d'artéfacts de blocs
6.
METHOD AND APPARATUS FOR ENCODING/DECODING IMAGES USING A PREDICTION METHOD ADOPTING IN-LOOP FILTERING
Disclosed is a prediction method adopting in-loop filtering. According to the present invention, a prediction method for encoding and decoding video comprises the following steps: generating a residual block of the current block through an inverse quantization and inverse transform; generating a prediction block of the current block through an intra-prediction; performing in-loop filtering on the current block in which the residual block and the prediction block are combined; and storing the current block, on which the in-loop filtering is performed, in a frame buffer for an intra-prediction of the next block to be encoded. As described above, prediction is performed using an in-loop filter during processes for encoding and decoding video, thereby improving the accuracy of prediction and reducing errors in prediction, thus improving the efficiency of video compression and reducing the amount of data to be transmitted.
H04N 19/80 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels
H04N 19/117 - Filtres, p. ex. pour le pré-traitement ou le post-traitement
H04N 19/126 - Détails des fonctions de normalisation ou de pondération, p. ex. matrices de normalisation ou quantificateurs uniformes variables
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/182 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant un pixel
H04N 19/186 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une couleur ou une composante de chrominance
H04N 19/44 - Décodeurs spécialement adaptés à cet effet, p. ex. décodeurs vidéo asymétriques par rapport à l’encodeur
H04N 19/50 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif
H04N 19/59 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif mettant en œuvre un sous-échantillonnage spatial ou une interpolation spatiale, p. ex. modification de la taille de l’image ou de la résolution
H04N 19/593 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif mettant en œuvre des techniques de prédiction spatiale
H04N 19/82 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels mettant en œuvre le filtrage dans une boucle de prédiction
H04N 19/86 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo mettant en œuvre la diminution des artéfacts de codage, p. ex. d'artéfacts de blocs
A light field data set with surface layers is received. Each surface layer includes depth, alpha and spherical harmonics (SH) coefficient data. The depth data, alpha map data, and SH coefficient data are converted to depth, alpha and multiple SH coefficient images. Encoder-side operations are performed on the multiple SH coefficient images to generate an alternative SH coefficient data representation different from an input SH coefficient data representation. The alternative SH coefficient data representation is encoded into a coded bitstream along with the depth and alpha images.
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
Embodiments are disclosed for hybrid near/far-field speaker virtualization. In an embodiment, a method comprises: receiving a source signal including channel-based audio or audio objects; generating near-field gain(s) and far-field gain(s) based on the source signal and a blending mode; generating a far-field signal based, at least in part, on the source signal and the far-field gain(s); rendering, using a speaker virtualizer, the far-field signal for playback of far-field acoustic audio through far-field speakers into an audio reproduction environment; generating a near-field signal based at least in part on the source signal and the near-field gain(s); prior to providing the far-field signal to the far-field speakers, sending the near-field signal to a near-field playback device or an intermediate device coupled to the near-field playback device; providing the far-field signal to the far-field speakers; and providing the near-field signal to the near-field speakers to synchronously overlay the far-field acoustic audio.
A computer-implemented method includes providing input features representing a speech signal to an input layer of an acoustic model to generate first compact representations mapped from the input features. The input features include a lower-dimensionality representation of a speech signal. The first compact representations include lower-dimensional representations of (i) phonetic content of speech mapped from the speech signal and (ii) articulatory characteristics of speech mapped from the speech signal.
An input lightfield comprising a spatial distribution of visual and geometric information in a volumetric scene in a three-dimensional (3D) space is received. A spatial sequence of two-dimensional (2D) surface layers is selected to be included in a layered surface lightfield (LSLF) set. The input lightfield is collapsed into the spatial sequence of 2D surface layers of the LSLF set. Each 2D surface layer in the spatial sequence of 2D surface layers includes a depth data portion, a transparency data portion, a color data portion, etc. The LSLF set is encoded into a coded bitstream. The coded bitstream causes a recipient device of the coded bitstream to generate a display image of the volumetric scene for rendering on an image display.
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
H04N 19/33 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage dans le domaine spatial
Described herein is a method of determining at least one mask for use in training a deep neural network (DNN)—based mask-based audio processing model. In particular, the method may comprise obtaining a time-frequency representation of a target audio signal for use in the training. The method may further comprise determining a per-channel energy normalization (PCEN) measure for the target audio signal. The method may yet further comprise determining the at least one mask based on the PCEN measure.
G10L 21/0224 - Traitement dans le domaine temporel
G10L 25/21 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de paramètres extraits les paramètres extraits étant l’information sur la puissance
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
G10L 25/84 - Détection de la présence ou de l’absence de signaux de voix pour différencier la parole du bruit
G10L 25/93 - Différenciation entre parties voisées et non voisées des signaux de la parole
12.
TEXT TO SYNCHRONIZED JOINT VIDEO AND AUDIO GENERATION
Methods and apparatus for generating a synchronized video-audio pair. According to an example embodiment, a method of generating a synchronized video-audio pair includes: applying one or more text inputs to a generative model, the one or more text inputs including a speech text, the generative model including a neural network; with the generative model, converting the speech text into an audio segment for the synchronized video-audio pair; and with the generative model, generating a video segment for the synchronized video-audio pair, the video segment including a talking head having lip movements corresponding to the speech text and in synchronization with the audio segment.
G10L 13/00 - Synthèse de la paroleSystèmes de synthèse de la parole à partir de texte
G06V 40/00 - Reconnaissance de formes biométriques, liées aux êtres humains ou aux animaux, dans les données d’image ou vidéo
G10L 21/00 - Techniques de traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p. ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
13.
SPATIAL AUDIO RENDERING ADAPTIVE TO SIGNAL LEVEL AND LOUDSPEAKER PLAYBACK LIMIT THRESHOLDS
Rendering audio signals may involve a mapping for each audio signal to the loudspeaker signals computed as a function of an audio signal's intended perceived spatial position, physical positions associated with the loudspeakers and a time- and frequency-varying representation of loudspeaker signal level relative to a maximum playback limit of each loudspeaker. Each mapping may be computed to approximately achieve the intended perceived spatial position of an associated audio signal when the loudspeaker signals are played back. A representation of loudspeaker signal level relative to a maximum playback limit may be computed for each audio signal. The mapping of an audio signal into a particular loudspeaker signal may be reduced as loudspeaker signal level relative to a maximum playback limit increases above a threshold, while the mapping may be increased into one or more other loudspeakers for which the maximum playback limits are less than a threshold.
Improved methods and/or apparatus for decoding an encoded audio signal in soundfield format for L loudspeakers. The method and/or apparatus can render an Ambisonics format audio signal to 2D loudspeaker setup(s) based on a rendering matrix. The rendering matrix has elements based on loudspeaker positions and wherein the rendering matrix is determined based on weighting at least an element of a first matrix with a weighting factor g=1/√{square root over (L)}. The first matrix is determined based on positions of the L loudspeakers and at least a virtual position of at least a virtual loudspeaker that is added to the positions of the L loudspeakers.
H04S 3/02 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques du type matriciel, c.-à-d. dans lesquels les signaux d'entrée sont combinés algébriquement, p. ex. après avoir été déphasés les uns par rapport aux autres
H04S 7/00 - Dispositions pour l'indicationDispositions pour la commande, p. ex. pour la commande de l'équilibrage
15.
METHOD AND DEVICE FOR APPLYING DYNAMIC RANGE COMPRESSION TO A HIGHER ORDER AMBISONICS SIGNAL
A method for performing DRC on a HOA signal comprises transforming the HOA signal to the spatial domain, analyzing the transformed HOA signal, and obtaining, from results of said analyzing, gain factors that are usable for dynamic compression. The gain factors can be transmitted together with the HOA signal. When applying the DRC, the HOA signal is transformed to the spatial domain, the gain factors are extracted and multiplied with the transformed HOA signal in the spatial domain, wherein a gain compensated transformed HOA signal is obtained. The gain compensated transformed HOA signal is transformed back into the HOA domain, wherein a gain compensated HOA signal is obtained. The DRC may be applied in the QMF-filter bank domain.
H04S 3/00 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
H04S 3/02 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques du type matriciel, c.-à-d. dans lesquels les signaux d'entrée sont combinés algébriquement, p. ex. après avoir été déphasés les uns par rapport aux autres
A method of rendering audio is disclosed that may include receiving audio reproduction data including audio objects and associated metadata. The method may include receiving reproduction environment data including an indication of a number of reproduction speakers in a reproduction environment and an indication of a location of each reproduction speaker within the reproduction environment. The method may include rendering the audio objects into one or more speaker feed signals based, at least in part, on the associated metadata. Rendering the audio objects may include rendering a first audio object (i) according to a first audio object position of the first audio object, and (ii) in accordance with a positional jitter pattern that defines a periodically varying movement of the first audio object. The periodically varying movement may be separately defined with respect to any movement defined by the first audio object position of the first audio object.
Disclosed are systems, methods and user interfaces for manipulating audio data. In some embodiments, a method comprises: presenting a graphical user interface on a display device, the graphical user interface including a portion for displaying an image projection, the image projection including at least one sound source; receiving, with at least one processor, first user input associating a location of a beamformer with the at least one sound source; generating, with the at least one processor, first control metadata based on the selected location; and controlling, with the at least one processor, the beamformer location with the first control metadata.
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p. ex. glissement, rotation, agrandissement ou changement de couleur
An multi-input, multi-output audio process is implemented as a linear system for use in an audio filterbank to convert a set of frequency-domain input audio signals into a set of frequency-domain output audio signals. A transfer function from one input to one output is defined as a frequency dependent gain function. In some implementations, the transfer function includes a direct component that is substantially defined as a frequency dependent gain, and one or more decorrelated components that have frequency-varying group phase response. The transfer function is formed from a set of sub-band functions, with each sub-band function being formed from a set of corresponding component transfer functions including direct component and one or more decorrelated components.
H04S 3/02 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques du type matriciel, c.-à-d. dans lesquels les signaux d'entrée sont combinés algébriquement, p. ex. après avoir été déphasés les uns par rapport aux autres
H04S 5/00 - Systèmes pseudo-stéréophoniques, p. ex. dans lesquels les signaux d'un canal supplémentaire sont dérivés du signal monophonique par déphasage, retardement ou réverbération
19.
ENHANCEMENT OF GENERATIVE IMAGE MODELS BASED ON GAZE
A system may display a set of images to a user, the set of images includes a plurality of synthetic images output by a generative adversarial network (GAN) includes a generator and a discriminator, and a plurality of non-synthetic images, detect a user response to the set of images, the user response includes at least a gaze of the user relative to the set of images, and train the GAN based at least on the user response, including tuning the generator based on the gaze of the user.
A method and apparatus for decompressing a Higher Order Ambisonics (HOA) signal representation is disclosed. The apparatus includes an input interface that receives an encoded directional signal and an encoded ambient signal and an audio decoder that perceptually decodes the encoded directional signal and encoded ambient signal to produce a decoded directional signal and a decoded ambient signal, respectively. The apparatus further includes an extractor for obtaining side information related to the directional signal and an inverse transformer for converting the decoded ambient signal from a spatial domain to an HOA domain representation of the ambient signal. The apparatus also includes a synthesizer for recomposing a Higher Order Ambisonics (HOA) signal from the HOA domain representation of the ambient signal and the decoded directional signal. The side information includes a direction of the directional signal selected from a set of uniformly spaced directions.
H04S 3/00 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
G10L 19/20 - Vocodeurs utilisant des modes multiples utilisant un codage spécifique de la catégorie de son, des encodeurs hybrides ou un codage basé objet
H04H 20/89 - Systèmes de radiodiffusion stéréophonique utilisant au moins trois voies audio, p. ex. triphoniques ou quadriphoniques
H04S 3/02 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques du type matriciel, c.-à-d. dans lesquels les signaux d'entrée sont combinés algébriquement, p. ex. après avoir été déphasés les uns par rapport aux autres
Methods and apparatus for mitigating speckle on an image projection screen are disclosed. A non-contact shaking or vibrating of the projection screen may be achieved using lightweight material. For example, magnetic material such as iron powder (suspended in a binder material) may be coated onto the screen, and one or more electromagnetic actuators may be positioned at a distance and generate magnetic fields that cause the shaking from the distance. A controller may generate and transmit signals toward the electromagnetic actuator, the signals configured to cause generation of the magnetic fields that cause the screen to shake at one or more prescribed frequencies. Such motion may at least partly mitigate a visual perceptibility of the speckle.
Methods related to the geometric partition mode (GPM) in video coding are described. The proposed methods include: applying adaptive ordering of merge candidates with template matching (ARMC-TM) to derive GPM inter candidate lists, candidates list applying merge motion vector differences in GPM, enabling GPM for all-intra coding units (CUs), using inter and intra template costs in intra-prediction modes of GPM, using GPM partitions to generate templates in template matching, and using neighboring reconstructed samples and an edge criterion to derive top and left-edge intercepts to generate partitioning candidates.
H04N 19/119 - Aspects de subdivision adaptative, p. ex. subdivision d’une image en blocs de codage rectangulaires ou non
H04N 19/11 - Sélection du mode de codage ou du mode de prédiction parmi plusieurs modes de codage prédictif spatial
H04N 19/70 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par des aspects de syntaxe liés au codage vidéo, p. ex. liés aux standards de compression
H04N 19/88 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo mettant en œuvre la réorganisation de données entre différentes unités de codage, p. ex. redistribution, entrelacement, brouillage ou permutation de données de pixel ou permutation de données de coefficients de transformée entre différents blocs
Methods, systems, and bitstream syntax are described for sign prediction in video coding. The method include: selection of top and left neighbors based on an image continuity check, the intra mode of the current coded unit (CU), the merge motion vector, or adaptive motion vector prediction, sign prediction based on residue domain of current CU or neighbor CUs, sign prediction based on approximated reconstruction samples, reducing the number of selected coefficients for sorting, simplifying the sequential search cost, and by combining sign prediction with sign data hiding.
H04N 19/14 - Complexité de l’unité de codage, p. ex. activité ou estimation de présence de contours
H04N 19/105 - Sélection de l’unité de référence pour la prédiction dans un mode de codage ou de prédiction choisi, p. ex. choix adaptatif de la position et du nombre de pixels utilisés pour la prédiction
H04N 19/18 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant un ensemble de coefficients de transformée
H04N 19/182 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant un pixel
24.
METHOD FOR GENERATING AN AUDIO-VISUAL MEDIA STREAM
N N ≥ 2 channel signals captured by the user device, and a second audio stream comprising a pair of channel signals captured by the binaural capturing device. A set of audio objects are extracted and spatial information is estimated comprising, for each audio object, a horizontal direction of arrival estimated based on at least three of the channel signals, and on orientation data indicative of a landscape or portrait mode of the user device. Each audio object is panned in accordance with the spatial information to channels of a multichannel format to generate an upmixed audio stream for the audio-visual media stream.
Disclosed is a data transmission system that transmits data by using a relay. The relay selects a transmission terminal from among a plurality of terminals accessing a base station. A base station transmits base station data to the relay during a first time slot, and the transmission terminal transmits terminal data to the relay. The relay transmits terminal data to the base station during a second time slot, and transmits base station data to the transmission terminal.
H04N 7/173 - Systèmes à secret analogiquesSystèmes à abonnement analogiques à deux voies, p. ex. l'abonné envoyant un signal de sélection du programme
H04N 19/12 - Sélection parmi plusieurs transformées ou standards, p. ex. sélection entre une transformée en cosinus discrète [TCD] et une transformée en sous-bandes ou sélection entre H.263 et H.264
H04N 19/132 - Échantillonnage, masquage ou troncature d’unités de codage, p. ex. ré-échantillonnage adaptatif, saut de trames, interpolation de trames ou masquage de coefficients haute fréquence de transformée
H04N 19/147 - Débit ou quantité de données codées à la sortie du codeur selon des critères de débit-distorsion
H04N 19/149 - Débit ou quantité de données codées à la sortie du codeur par estimation de la quantité de données codées au moyen d’un modèle, p. ex. un modèle mathématique ou un modèle statistique
H04N 19/19 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par le procédé d’adaptation, l’outil d’adaptation ou le type d’adaptation utilisés pour le codage adaptatif utilisant l’optimisation basée sur les multiplicateurs de Lagrange
H04N 19/46 - Inclusion d’information supplémentaire dans le signal vidéo pendant le processus de compression
H04N 19/70 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par des aspects de syntaxe liés au codage vidéo, p. ex. liés aux standards de compression
H04N 21/236 - Assemblage d'un flux multiplexé, p. ex. flux de transport, en combinant un flux vidéo avec d'autres contenus ou données additionnelles, p. ex. insertion d'une adresse universelle [URL] dans un flux vidéo, multiplexage de données de logiciel dans un flux vidéoRemultiplexage de flux multiplexésInsertion de bits de remplissage dans le flux multiplexé, p. ex. pour obtenir un débit constantAssemblage d'un flux élémentaire mis en paquets
Disclosed herein are techniques for enhancing audio signals. In some embodiments, the techniques may involve obtaining multiple audio signals associated with multiple microphones of one or more devices. The techniques may further involve responsive to detecting wind in at least one audio signal of the multiple audio signals, filtering the multiple audio signals to generate multiple filtered audio signals by filtering each audio signal based on filter coefficients determined for each audio signal that weights each audio signal based on a noise covariance associated with that audio signal. The techniques may further involve mixing the multiple filtered audio signals to generate a multi-channel wind-reduced audio signal, wherein each multi-channel wind-reduced audio signal comprises a summation of the multiple filtered audio signals.
In an audio encoder, for audio content received in a source audio format, default gains are generated based on a default dynamic range compression (DRC) curve, and non-default gains are generated for a non-default gain profile. Based on the default gains and non-default gains, differential gains are generated. An audio signal comprising the audio content, the default DRC curve, and differential gains is generated. In an audio decoder, the default DRC curve and the differential gains are identified from the audio signal. Default gains are re-generated based on the default DRC curve. Based on the combination of the re-generated default gains and the differential gains, operations are performed on the audio content extracted from the audio signal.
H03G 3/30 - Commande automatique dans des amplificateurs comportant des dispositifs semi-conducteurs
H03G 7/00 - Compression ou expansion de volume dans les amplificateurs
H03G 9/00 - Combinaisons de plusieurs types de commande, p. ex. commande de gain et commande de tonalité
H03G 9/02 - Combinaisons de plusieurs types de commande, p. ex. commande de gain et commande de tonalité dans des amplificateurs non accordés
H03G 9/12 - Combinaisons de plusieurs types de commande, p. ex. commande de gain et commande de tonalité dans des amplificateurs non accordés comportant des dispositifs à semi-conducteurs
H03G 9/18 - Combinaisons de plusieurs types de commande, p. ex. commande de gain et commande de tonalité dans des amplificateurs non accordés comportant des dispositifs à semi-conducteurs pour réglage de tonalité et expansion ou compression de volume
An aspect relates to a method for generating immersive media content, comprising: by a user device: obtaining an audio stream of a scene; determining, based on audio feature information for the audio stream, that a cloud server upload process is to be performed; and responsive to determining that the cloud server upload process is to be performed, performing the cloud server upload process, wherein the cloud server upload process comprises uploading the audio stream to a cloud server; by the cloud server: performing audio source separation to extract at least one audio source from the audio stream; and generating an immersive audio stream, wherein each respective audio source of the at least one audio source is included in an audio object and is assigned a spatial location based on an estimated location of the respective audio source in the scene, or is included in at least one channel based on an estimated location of the respective audio source in the scene.
H04N 21/439 - Traitement de flux audio élémentaires
H04N 21/233 - Traitement de flux audio élémentaires
G10L 25/48 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier
H04S 7/00 - Dispositions pour l'indicationDispositions pour la commande, p. ex. pour la commande de l'équilibrage
G10L 25/57 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour le traitement des signaux vidéo
H04L 67/06 - Protocoles spécialement adaptés au transfert de fichiers, p. ex. protocole de transfert de fichier [FTP]
Techniques for enhancing audio signals are provided herein. In some embodiments, a method for enhancing audio signals involves obtaining an input signal obtained using multiple microphones of one or more audio capture devices. The method may further involve determining a beamforming noise covariance and a wind reduction noise covariance associated with the input signal. The method may further involve determining a weighted average of the beamforming noise covariance and the wind reduction noise covariance to determine an aggregate noise covariance. The method may further involve generating an output signal at least in part by filtering a representation of the input signal using filter weights determined based on the aggregate noise covariance, such that the output signal is enhanced relative to the input signal with respect to directionality and wind noise reduction.
Methods and apparatus for decoding a compressed Higher Order Ambisonics (HOA) representation of a sound or soundfield. The method may include receiving a bit stream containing the compressed HOA representation and decoding, based on a determination that there are multiple layers, the compressed HOA representation from the bitstream to obtain a sequence of decoded HOA representations. A first subset of the sequence of decoded HOA representations is determined based only on corresponding ambient HOA components. A second subset of the sequence of decoded HOA representations is determined based on corresponding ambient HOA components and corresponding predominant sound components.
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
G10L 19/24 - Codecs à débit variable, p. ex. pour générer différentes qualités en utilisant une représentation évolutive comme le codage hiérarchique ou le codage par couches
H04S 3/00 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques
31.
IMAGING FILE FORMAT FOR MULTIPLANE IMAGES WITH HEIF
Methods, systems, and bitstream syntax are described for a file container that supports the storage and transmission of multi-plane images. Examples are provided for coding texture and opacity information using HEVC or VVC coding and the HEIF container. Examples of carrying coded MPI images according to V3C and an example HEIF-based player are also presented.
H04N 19/46 - Inclusion d’information supplémentaire dans le signal vidéo pendant le processus de compression
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
H04N 19/70 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par des aspects de syntaxe liés au codage vidéo, p. ex. liés aux standards de compression
H04N 21/218 - Source du contenu audio ou vidéo, p. ex. réseaux de disques locaux
H04N 21/2365 - Multiplexage de plusieurs flux vidéo
H04N 21/434 - Désassemblage d'un flux multiplexé, p. ex. démultiplexage de flux audio et vidéo, extraction de données additionnelles d'un flux vidéoRemultiplexage de flux multiplexésExtraction ou traitement de SIDésassemblage d'un flux élémentaire mis en paquets
Methods and apparatus for generating an MPI representation of a scene. According to an example embodiment, a method of generating an MPI representation of a scene includes generating a first MPI representation of the scene based on a first image of the scene; constructing an occlusion map that indicates respective occluded portions for different layers of the first MPI representation; obtaining, from one or more second images of the scene, respective sets of pixel values corresponding to the occlusion map, each of the second images corresponding to a respective camera pose that is different from a camera pose corresponding to the first image; and generating a second MPI representation of the scene by selectively filling parts of the respective occluded portions of the different layers of the first MPI representation using the respective sets of the pixel values.
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
H04N 19/117 - Filtres, p. ex. pour le pré-traitement ou le post-traitement
H04N 19/21 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage d'objets vidéo avec codage plan alpha binaire pour les objets vidéo, p. ex. codage arithmétique contextuel [CAE]
33.
SYSTEMS AND METHODS FOR DIGITAL LASER PROJECTION WITH INCREASED CONTRAST USING FOURIER FILTER
An optical filter to increase contrast of an image generated with a spatial light modulator includes a lens for spatially Fourier transforming modulated light from the spatial light modulator, and an optical filter mask positioned at a Fourier plane of the lens to filter the modulated light. The modulated light has a plurality of diffraction orders, and the optical filter mask transmits at least one of the diffraction orders of the modulated light and block a remaining portion of the modulated light. A method that improves contrast of an image generated with a spatial light modulator includes spatially Fourier transforming modulated light from the spatial light modulator onto a Fourier plane, and filtering the modulated light by transmitting at least one diffraction order of the modulated light at the Fourier plane and blocking a remaining portion of the modulated light at the Fourier plane.
G02B 26/00 - Dispositifs ou dispositions optiques pour la commande de la lumière utilisant des éléments optiques mobiles ou déformables
G02B 26/08 - Dispositifs ou dispositions optiques pour la commande de la lumière utilisant des éléments optiques mobiles ou déformables pour commander la direction de la lumière
Systems and methods for context-based encoding of video data using reshaping algorithms. One method includes receiving the video data, the video data composed of a plurality of image frames, each image frame including a plurality of pixel blocks. The method includes determining, for each pixel block, a luma bin index, determining, for each luma bin, a banding risk value, and determining Gaussian function parameters based on the banding risk value. The method includes generating a differential reshaping function using the Gaussian function parameters, computing a luma-based forward reshaping function based on the differential reshaping function, and generating an output image for each image frame by applying the luma-based forward reshaping function to the respective image frame.
H04N 19/98 - Codage de plage-dynamique adaptative [ADRC]
G06F 17/18 - Opérations mathématiques complexes pour l'évaluation de données statistiques
H04N 19/172 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant une image, une trame ou un champ
Novel methods and systems for creating a 2D floor plan visualization from video from sparse camera views, based on object recognition creating latent vectors for camera pose estimation. Estimated camera poses are converted to world coordinates, which are projected onto a 2D plane. These world coordinates can be used to form the 2D floor plan, useful for user interface implementation.
H04N 23/90 - Agencement de caméras ou de modules de caméras, p. ex. de plusieurs caméras dans des studios de télévision ou des stades de sport
G06T 7/73 - Détermination de la position ou de l'orientation des objets ou des caméras utilisant des procédés basés sur les caractéristiques
G06V 10/77 - Traitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source
36.
METHOD OF RENDERING ONE OR MORE CAPTURED AUDIO SOUNDFIELDS TO A LISTENER
A computer implemented system for rendering captured audio soundfields to a listener comprises apparatus to deliver the audio soundfields to the listener. The delivery apparatus delivers the audio soundfields to the listener with first and second audio elements perceived by the listener as emanating from first and second virtual source locations, respectively, and with the first audio element and/or the second audio element delivered to the listener from a third virtual source location. The first virtual source location and the second virtual source location are perceived by the listener as being located to the front of the listener, and the third virtual source location is located to the rear or the side of the listener.
An audio renderer for rendering a multi-channel audio signal having M channels to a portable device having S independent speakers, comprising a first matrix application module for applying a primary rendering matrix to the input audio signal to provide a first pre-rendered signal suitable for playback on the multiple independent speakers, a second matrix application module for applying a secondary rendering matrix to the input audio signal to provide a second pre-rendered signal suitable for playback on the multiple independent speakers, a channel analysis module configured to calculate mixing gain according to a time-varying channel distribution, and a mixing module configured to produce a rendered output signal by mixing the first and second pre-rendered signals based on the mixing gain.
Methods and systems for performing at least one audio activity (e.g., conducting a phone call or playing music or other audio content) in an environment including by determining an estimated location of a user in the environment in response to sound uttered by the user (e.g., a voice command), and controlling the audio activity in response to determining the estimated user location. The environment may have zones which are indicated by a zone map and estimation of the user location may include estimating in which of the zones the user is located. The audio activity may be performed using microphones and loudspeakers which are implemented in or coupled to smart audio devices.
G10L 15/06 - Création de gabarits de référenceEntraînement des systèmes de reconnaissance de la parole, p. ex. adaptation aux caractéristiques de la voix du locuteur
G10L 15/16 - Classement ou recherche de la parole utilisant des réseaux neuronaux artificiels
G10L 15/22 - Procédures utilisées pendant le processus de reconnaissance de la parole, p. ex. dialogue homme-machine
H04R 1/40 - Dispositions pour obtenir la fréquence désirée ou les caractéristiques directionnelles pour obtenir la caractéristique directionnelle désirée uniquement en combinant plusieurs transducteurs identiques
A method for reconstructing a raw image, including: generating a low-frequency image and a high-frequency image from an initial image; linearly estimating the high-frequency image to generate a reconstructed high-frequency image; sparsely interpolating the low-frequency image to generate a reconstructed low-frequency image; and generating a reconstructed raw image from the reconstructed low-frequency image and the reconstructed high-frequency image.
G06T 3/4007 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur l’interpolation, p. ex. interpolation bilinéaire
Approaches for generating metadata for content to be composited and rendered using the generated metadata are described. These approaches can be used with the development and distribution of one or more web pages or other graphical user interfaces. For example, one can collect content (e.g., images, animation, text and user interface elements) to be composited together into a web page and invoke a set of APIs to generate the metadata for the content of the web page that will be composited; a metadata generation system receives the calls through the API and generates the metadata. The web page can then be distributed with the generated metadata which can be used to create the display of the web page with content that is perceptually modified based on the metadata about the individual elements on the web page and their spatial proximity.
Methods of processing audio data relating to user generated content are described. One method includes obtaining the audio data; applying frame-wise audio enhancement to the audio data; generating metadata for the enhanced audio data, based on one or more processing parameters of the frame-wise audio enhancement; and outputting the enhanced audio data together with the metadata. Another method includes obtaining the audio data and metadata for the audio data, wherein the metadata comprises first metadata indicative of one or more processing parameters of a previous frame-wise audio enhancement of the audio data; applying restore processing to the audio data, using the one or more processing parameters, to at least partially reverse the previous frame-wise audio enhancement; and applying frame-wise audio enhancement or editing processing to the restored raw audio data. Further described are corresponding apparatus, programs, and computer-readable storage media.
A computer-implemented method includes generating, at a feature extraction model, extracted features based on input media, the input media including a first modality and a second modality. The extracted features include a lower-dimensional numerical representation of the first modality and a lower-dimensional numerical representation of the second modality. The method includes generating, at a multi-modal fusion model, an integrated representation based on the extracted features. The integrated representation includes attention scores capturing correlations between the lower-dimensional numerical representation of the first modality and the lower-dimensional numerical representation of the second modality. The method includes generating, at a moment prediction model, a moment prediction based on the integrated representation. The moment prediction identifies a moment in the input media.
G06F 16/783 - Recherche de données caractérisée par l’utilisation de métadonnées, p. ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
43.
METHODS AND DEVICES FOR GENERATION AND PROCESSING OF MODIFIED BITSTREAMS
Described herein is a method for generating a modified bitstream on a source device, wherein the method includes the steps of:
a) receiving, by a receiver, a bitstream including coded media data;
b) generating, by an embedder, payload of additional media data and embedding the payload in the bitstream for obtaining, as an output from the embedder, a modified bitstream including the coded media data and the payload of the additional media data; and
d) outputting the modified bitstream to a sink device.
Described herein is a method for generating a modified bitstream on a source device, wherein the method includes the steps of:
a) receiving, by a receiver, a bitstream including coded media data;
b) generating, by an embedder, payload of additional media data and embedding the payload in the bitstream for obtaining, as an output from the embedder, a modified bitstream including the coded media data and the payload of the additional media data; and
d) outputting the modified bitstream to a sink device.
Described is further a method for processing said modified bitstream on a sink device. Described are moreover a respective source device and sink device as well as a system of a source device and a sink device and respective computer program products.
G10L 19/018 - Mise en place d’un filigrane audio, c.-à-d. insertion de données inaudibles dans le signal audio
H04W 4/80 - Services utilisant la communication de courte portée, p. ex. la communication en champ proche, l'identification par radiofréquence ou la communication à faible consommation d’énergie
Audio content coded for a reference speaker configuration is downmixed to downmix audio content coded for a specific speaker configuration. One or more gain adjustments are performed on individual portions of the downmix audio content coded for the specific speaker configuration. Loudness measurements are then performed on the individual portions of the downmix audio content. An audio signal that comprises the audio content coded for the reference speaker configuration and downmix loudness metadata is generated. The downmix loudness metadata is created based at least in part on the loudness measurements on the individual portions of the downmix audio content.
H04S 7/00 - Dispositions pour l'indicationDispositions pour la commande, p. ex. pour la commande de l'équilibrage
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
Disclosed is a data transmission system that transmits data by using a relay. The relay selects a transmission terminal from among a plurality of terminals accessing a base station. A base station transmits base station data to the relay during a first time slot, and the transmission terminal transmits terminal data to the relay. The relay transmits terminal data to the base station during a second time slot, and transmits base station data to the transmission terminal.
H04N 7/173 - Systèmes à secret analogiquesSystèmes à abonnement analogiques à deux voies, p. ex. l'abonné envoyant un signal de sélection du programme
H04N 19/12 - Sélection parmi plusieurs transformées ou standards, p. ex. sélection entre une transformée en cosinus discrète [TCD] et une transformée en sous-bandes ou sélection entre H.263 et H.264
H04N 19/132 - Échantillonnage, masquage ou troncature d’unités de codage, p. ex. ré-échantillonnage adaptatif, saut de trames, interpolation de trames ou masquage de coefficients haute fréquence de transformée
H04N 19/147 - Débit ou quantité de données codées à la sortie du codeur selon des critères de débit-distorsion
H04N 19/149 - Débit ou quantité de données codées à la sortie du codeur par estimation de la quantité de données codées au moyen d’un modèle, p. ex. un modèle mathématique ou un modèle statistique
H04N 19/19 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par le procédé d’adaptation, l’outil d’adaptation ou le type d’adaptation utilisés pour le codage adaptatif utilisant l’optimisation basée sur les multiplicateurs de Lagrange
H04N 19/46 - Inclusion d’information supplémentaire dans le signal vidéo pendant le processus de compression
H04N 19/70 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par des aspects de syntaxe liés au codage vidéo, p. ex. liés aux standards de compression
H04N 21/236 - Assemblage d'un flux multiplexé, p. ex. flux de transport, en combinant un flux vidéo avec d'autres contenus ou données additionnelles, p. ex. insertion d'une adresse universelle [URL] dans un flux vidéo, multiplexage de données de logiciel dans un flux vidéoRemultiplexage de flux multiplexésInsertion de bits de remplissage dans le flux multiplexé, p. ex. pour obtenir un débit constantAssemblage d'un flux élémentaire mis en paquets
Sampled data is packaged in checkerboard format for encoding and decoding. The sampled data may be quincunx sampled multi-image video data (e.g., 3D video or a multi-program stream), and the data may also be divided into sub-images of each image which are then multiplexed, or interleaved, in frames of a video stream to be encoded and then decoded using a standardized video encoder. A system for viewing may utilize a standard video decoder and a formatting device that de-interleaves the decoded sub-images of each frame reformats the images for a display device. A 3D video may be encoded using a most advantageous interleaving format such that a preferred quality and compression ratio is reached. In one embodiment, the invention includes a display device that accepts data in multiple formats.
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
H04N 13/139 - Conversion du format, p. ex. du débit de trames ou de la taille
H04N 13/161 - Encodage, multiplexage ou démultiplexage de différentes composantes des signaux d’images
H04N 19/112 - Sélection du mode de codage ou du mode de prédiction selon un mode d’affichage donné, p. ex. le mode d’affichage entrelacé ou progressif
H04N 19/132 - Échantillonnage, masquage ou troncature d’unités de codage, p. ex. ré-échantillonnage adaptatif, saut de trames, interpolation de trames ou masquage de coefficients haute fréquence de transformée
H04N 19/16 - Mode de codage attribué, c.-à-d. le mode de codage étant prédéfini ou présélectionné pour être utilisé ultérieurement afin de sélectionner un autre élément ou paramètre pour un mode donné d’affichage, p. ex. pour un mode d'affichage entrelacé ou progressif
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/33 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage dans le domaine spatial
H04N 19/46 - Inclusion d’information supplémentaire dans le signal vidéo pendant le processus de compression
H04N 19/587 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif mettant en œuvre un sous-échantillonnage ou une interpolation temporels, p. ex. décimation ou interpolation subséquente d’images dans une séquence vidéo
H04N 19/60 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant un codage par transformée
H04N 19/61 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant un codage par transformée combiné avec un codage prédictif
H04N 19/85 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo
H04N 21/2365 - Multiplexage de plusieurs flux vidéo
H04N 21/2383 - Codage de canal d'un flux binaire numérique, p. ex. modulation
H04N 21/434 - Désassemblage d'un flux multiplexé, p. ex. démultiplexage de flux audio et vidéo, extraction de données additionnelles d'un flux vidéoRemultiplexage de flux multiplexésExtraction ou traitement de SIDésassemblage d'un flux élémentaire mis en paquets
H04N 21/438 - Interfaçage de la voie descendante du réseau de transmission provenant d'un serveur, p. ex. récupération de paquets du flux vidéo codé d'un réseau IP
47.
METHODS, APPARATUS AND SYSTEMS FOR DIRECTIONAL AUDIO CODING-SPATIAL RECONSTRUCTION AUDIO PROCESSING
Enclosed are embodiments for audio processing that combines complementary aspects of Spatial Reconstruction (SPAR) and Directional Audio Coding (DirAC) technologies, including higher audio quality, reduced bitrate, input/output format flexibility and/or reduced computational complexity, to produce a codec (e.g., an Ambisonics codec) that has better overall performance than DirAC or SPAR codecs.
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
G10L 19/032 - Quantification ou dé-quantification de composantes spectrales
G10L 19/18 - Vocodeurs utilisant des modes multiples
A method and apparatus for obtaining a stereo recording with closely spaced physical microphones, e.g., of a consumer-grade electronic device. Various examples construct signals for two stereo channels by combining different spectral portions of variously generated signals. In one example, a stereo channel of the stereo recording is constructed using a spectral portion of a directional virtual- microphone signal (332) derived from the audio signals captured by the physical microphones (302), a modified spectral portion of one of the audio signals (352), and an unmodified spectral portion of that audio signal (328). The directional virtual-microphone signal is generated using a pressure-gradient method (318). The modification of the audio signal (350) includes imposing an interchannel level difference (ILD) that matches the ILD of the directional virtual- microphone signals (332) in the adjacent frequency range (IMF). In various examples, the corresponding algorithm can be run at the electronic device that houses the closely spaced microphones or at a remote server.
Event camera data containing raw events is received. Neural field training data is generated from the raw events in the event camera data. In a training phase, optimized values for operational parameters of a neural field are generated by training the neural field with the neural field training data based on a specific loss function. Neural field operational parameter values are encoded in a coded bitstream to enable a recipient device to use the neural field operating with the neural field operational parameter values to generate reconstructed events.
H04N 19/503 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif mettant en œuvre la prédiction temporelle
H04N 19/87 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo mettant en œuvre la détection de coupure ou de changement de scène en combinaison avec la compression vidéo
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
50.
MESSAGING PARAMETERS FOR NEURAL-NETWORK POST FILTERING IN IMAGE AND VIDEO CODING
Methods, systems, and bitstream syntax are described for the carriage of neural network topology and parameters as related to neural-network-based post filtering (NNPF) in image and video coding. Examples of NNPF SEI messaging as applicable to the MPEG standards for coding video pictures are described at the sequence layer and at the picture layer.
H04N 19/85 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo
H04N 19/172 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant une image, une trame ou un champ
H04N 19/186 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une couleur ou une composante de chrominance
H04N 19/70 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par des aspects de syntaxe liés au codage vidéo, p. ex. liés aux standards de compression
51.
MULTI-DEVICE, MULTI-CHANNEL ATTENTION FOR SPEECH AND AUDIO ANALYTICS APPLICATIONS
Some disclosed methods involve receiving sensor data, including microphone data from each of a plurality of devices in the environment, producing an input embedding vector corresponding to each sensor, producing a device-wise context vector corresponding to each device, obtaining ground truth data and comparing each device-wise context vector with the ground truth data, to produce a comparison result. The comparing may involve an attention-based process. Some disclosed methods involve generating one or more current output analytics tokens based, at least in part, on the comparison result and controlling the operation of at least one device based, at least in part, in the one or more current output analytics tokens. The controlling may involve controlling at least one of a speaker operation or a microphone operation.
G10L 15/22 - Procédures utilisées pendant le processus de reconnaissance de la parole, p. ex. dialogue homme-machine
G10L 15/18 - Classement ou recherche de la parole utilisant une modélisation du langage naturel
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
52.
METHOD FOR SEPARATION OF AUDIO SOURCES WITH DIFFERENT TIME-FREQUENCY CHARACTERISTICS
The present disclosure relates to a neural network system for residual source activity detection in an input audio signal. The neural network system comprising a feature extractor trained to predict a set of latent variables based on an input audio frame of the input audio signal and at least two separation blocks, configured to generate a main source mask for separating a respective main audio source. The neural network system further comprises a residual mask extractor, configured to form a combined source mask by combining all main source masks and determine a residual mask that complements the combined source mask and at least one residual classifier, comprising a neural network trained to predict a residual source activity confidence metric based on the residual mask, the residual source activity confidence metric indicating a likelihood of a residual audio source being active.
The disclosure relates to methods of training neural networks for de-coloration, de-reverbing, and/or de-noising of speech. The neural network includes a cascade of one or more neural network stages for receiving input of a frequency-domain representation of a speech signal. The methods include, for each neural network stage: determining a loss function based on an output of the neural network stage; and adjusting neural network parameters of the neural network stage based on the determined loss function using backpropagation. The loss function includes a first contribution and a second contribution. The first contribution to the loss function relates to a frequency-domain loss function and the second contribution to the loss function relates to a time-domain loss function. The disclosure further relates to methods of de-coloration, de-reverbing, and/or de-noising of speech, and to corresponding apparatus, computer programs, and computer-readable storage media.
G06N 3/084 - Rétropropagation, p. ex. suivant l’algorithme du gradient
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
54.
POSE CORRECTION METADATA FOR INTERACTIVE HEADTRACKING
Systems, methods, and devices for encoding and decoding immersive audio content are disclosed. Some examples provide methods for encoding an input audio signal. An example method includes obtaining an input audio signal with immersive audio content, obtaining a first set of head poses, and rendering a first set of binaural signals. The method may also include obtaining a second set of head poses and rendering a second set of binaural signals. The method may include computing, in a first frequency range, a first reconstruction metadata to enable the reconstruction of both magnitude and phase of the second set of binaural signals and computing, in a second frequency range, a second reconstruction metadata to enable the reconstruction of the magnitude of the second set of binaural signals. The method may include encoding the first set of binaural signals, the first reconstruction metadata, and the second reconstruction metadata in a bitstream.
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
H04S 7/00 - Dispositions pour l'indicationDispositions pour la commande, p. ex. pour la commande de l'équilibrage
55.
BLOCK-BASED MULTIMEDIA PACKET ASSIGNMENT USING MULTISOURCE CODING
Methods and apparatus to perform block-based packet-assignment for delivering multimedia. According to an example embodiment, an end-user device receives descriptions (i) identifying blocks of multimedia content encoded using a rateless error correction code and stored at a plurality of network sources and (ii) specifying network conditions for the corresponding plurality of network paths. The end-user device communicates to the network sources a packet assignment plan for transmission of packets corresponding to different blocks from the network sources to the end-user device. The packet assignment plan is determined based on the received plurality of descriptions and an optimization performed with an objective function including a component representing an estimated distortion of the received multimedia content and another component representing a cost of delivering the packets. The end-user device reconstructs the multimedia content using the blocks assembled from corresponding sets of packets transmitted in accordance with the communicated packet assignment plan.
H04L 65/70 - Mise en paquets adaptés au réseau des données multimédias
H04L 65/75 - Gestion des paquets du réseau multimédia
H04L 65/80 - Dispositions, protocoles ou services dans les réseaux de communication de paquets de données pour prendre en charge les applications en temps réel en répondant à la qualité des services [QoS]
56.
REPRESENTATION LEARNING USING INFORMED MASKING FOR SPEECH AND OTHER AUDIO APPLICATIONS
Some disclosed methods involve receiving, by a control system configured to implement at least one neural network, input audio data and feature weightings and producing, by the control system and based at least in part on the input audio data and the feature weightings, latent space embeddings. In some examples, the input audio data corresponds to an input mathematical space and the latent space embeddings may correspond with unmasked portions of the input audio data. According to some examples, the latent space embeddings may be mathematical representations of the input audio data indicated by the feature weightings in a latent space that is a different mathematical space from the input mathematical space. In some examples, the feature weightings may be, or may be based on, mask data.
G10L 21/0232 - Traitement dans le domaine fréquentiel
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
Methods and systems are described to create transitions between a set of video clips according to a style choice and to pick a next video clip in a sequence of ordered clips. In one embodiment, a device receives a plurality of video clips. In addition, the device may encode the plurality of video clips. The device further may compute a set of predicted transition embeddings using the target style and the encoded plurality of video clips. Furthermore, the device may determine a set of transitions for the plurality of video clips using the set of predicted transition embeddings in a recurrent manner.
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
A multi-stream rendering system and method may render and play simultaneously a plurality of audio program streams over a plurality of arbitrarily placed loudspeakers. At least one of the program streams may be a spatial mix. The rendering of said spatial mix may be dynamically modified as a function of the simultaneous rendering of one or more additional program streams. The rendering of one or more additional program streams may be dynamically modified as a function of the simultaneous rendering of the spatial mix.
Methods and apparatus for relighting images. According to an example embodiment, inverse rendering is applied to an input image to extract a plurality of channels including an original lighting channel. A first neural network is used to determine a first latent feature corresponding to the input image based on a first set of channels including a shading channel generated using a replacement lighting channel. A second neural network is used to determine a second latent feature corresponding to the input image based on a different second set of channels including the replacement lighting channel. A relighted image is generated by propagating samples of a latent image map corresponding to the input image through a conditional diffusion model to which the first and second latent features are applied as first and second conditions.
Given a sequence of images in a first codeword representation, methods, processes, and systems are presented for image reshaping using rate distortion optimization, wherein reshaping allows the images to be coded in a second codeword representation which allows more efficient compression than using the first codeword representation. Syntax methods for signaling reshaping parameters are also presented.
H04N 19/85 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo
H04N 19/117 - Filtres, p. ex. pour le pré-traitement ou le post-traitement
H04N 19/119 - Aspects de subdivision adaptative, p. ex. subdivision d’une image en blocs de codage rectangulaires ou non
H04N 19/136 - Caractéristiques ou propriétés du signal vidéo entrant
H04N 19/147 - Débit ou quantité de données codées à la sortie du codeur selon des critères de débit-distorsion
H04N 19/159 - Type de prédiction, p. ex. prédiction intra-trame, inter-trame ou de trame bidirectionnelle
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/46 - Inclusion d’information supplémentaire dans le signal vidéo pendant le processus de compression
H04N 19/82 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels mettant en œuvre le filtrage dans une boucle de prédiction
61.
HEADPHONE RENDERING METADATA-PRESERVING SPATIAL CODING WITH SPEAKER OPTIMIZATION
Systems and methods of clustering audio objects. Example systems and devices are described that include a cluster selection module and a speaker dropout monitoring module. Systems and devices may also include a speaker optimization process that is selectively enabled when the speaker dropout monitoring modules determines that one or more speakers in the object-based audio system has a dropout issue. Example methods for clustering audio objects are described that may include the steps of receiving an input audio block that includes a plurality of audio objects and calculating object-to-speaker gains for the plurality of audio objects. The methods may include identifying a speaker as experiencing a dropout issue based on the object-to-speaker gains, and clustering the audio objects based on whether the speaker is experiencing dropout issues.
Systems and methods for controlling switched-mode power supplies. One system includes a converter including a switch and an inductor and processor to control operation of the converter. The processor is configured to determine whether a predicted value of current flowing through the inductor is greater than zero. The processor is further configured to determine the converter is operating in continuous conduction mode (CCM) when the predicted value of the current is greater than zero and control the switch using a first duty cycle when the converter is operating in CCM. The processor is further configured to determine the converter is operating in discontinuous conduction mode (DCM) when the predicted value of the current is less than zero and control the switch using a second duty cycle when the converter is operating in DCM.
H02M 3/156 - Transformation d'une puissance d'entrée en courant continu en une puissance de sortie en courant continu sans transformation intermédiaire en courant alternatif par convertisseurs statiques utilisant des tubes à décharge avec électrode de commande ou des dispositifs à semi-conducteurs avec électrode de commande utilisant des dispositifs du type triode ou transistor exigeant l'application continue d'un signal de commande utilisant uniquement des dispositifs à semi-conducteurs avec commande automatique de la tension ou du courant de sortie, p. ex. régulateurs à commutation
G06F 1/26 - Alimentation en énergie électrique, p. ex. régulation à cet effet
H02M 1/00 - Détails d'appareils pour transformation
Systems and methods for calibrating display levels created using dual modulation digital micromirror devices. In one example, multi-modulation display system includes a light source, a first modulator including a first plurality of mirrors to modulate light from the light source, and a second modulator including a second plurality of mirrors to modulator light from the first modulator. A first image is captured while the first modulator is off and the second modulator is on, creating a first display level. A second image is captured while the first modulator is on and the second modulator is on, also creating the same first display level. A difference between the first image and the second image is determined, and control signals for controlling the second modulator may be adjusted based on the difference.
The present disclosure relates to a method and system for processing audio for source separation. The method comprises obtaining an input audio signal (A) comprising at least two channels and processing the input audio signal (A) with a spatial cue based separation module (10) to obtain an intermediate audio signal (B). The spatial cue based separation module (10) is configured to determine a mixing parameter of the at least two channels of the input audio signal (A) and modify the channels, based on the mixing parameter, to obtain the intermediate audio signal (B). The method further comprises processing the intermediate audio signal (B) with a source cue based separation module (20) to generate an output audio signal (C), wherein the source cue based separation module (20) is configured to implement a neural network trained to predict a noise reduced output audio signal (C) given the intermediate audio signal (B).
G10L 21/0232 - Traitement dans le domaine fréquentiel
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
65.
METHOD AND AUDIO PROCESSING SYSTEM FOR WIND NOISE SUPPRESSION
The present disclosure relates to a method and system (1) for suppressing wind noise. The method comprises obtaining an input audio signal (100, 100′) comprising a plurality of consecutive audio signal segments (101, 102, 103, 101′, 102′, 103′) and suppressing wind noise in the input audio signal with a wind noise suppressor module (20) to generate a wind noise reduced audio signal. The method further comprises sing a neural network (10) trained to predict a set of gains for reducing noise in the input audio signal (100, 100′) given samples of the input audio signal (100, 100′), wherein a noise reduced audio signal is formed by applying said set of gains to the input audio signal (100, 100′) and mixing the wind noise reduced audio signal and the noise reduced audio signal with a mixer (30) to obtain an output audio signal with suppressed wind noise.
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage
G06F 3/14 - Sortie numérique vers un dispositif de visualisation
G09G 5/02 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par la manière dont la couleur est visualisée
H04N 9/69 - Circuits pour modifier les signaux de couleur par correction de gamma
H04N 21/235 - Traitement de données additionnelles, p. ex. brouillage de données additionnelles ou traitement de descripteurs de contenu
H04N 21/4402 - Traitement de flux élémentaires vidéo, p. ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène du flux vidéo codé impliquant des opérations de reformatage de signaux vidéo pour la redistribution domestique, le stockage ou l'affichage en temps réel
41 - Éducation, divertissements, activités sportives et culturelles
42 - Services scientifiques, technologiques et industriels, recherche et conception
09 - Appareils et instruments scientifiques et électriques
Produits et services
Advertising services; Advertising services, namely, promoting the goods and services of others; Advertising services, namely, server guided ad insertion into broadcast and multicast networks; Online marketing and advertising services, namely, providing personalized advertising, content recommendations and product offerings; Arranging and conducting online auctions; Arranging and conducting auctions, namely, live auctions Communication services, namely, digital and electronic transmission of voice, data, sound, music, graphics, images, audio, video, information, and messages; Video on demand (VOD) transmission services; Digital media streaming services; Transmission of network and satellite television programming; Teleconferencing and video conferencing services; Communications services, namely, interactive streaming and broadcasting services; Data streaming; Video, audio and video streaming in the fields of entertainment, films, sports, gaming and music via global and local computer networks; Streaming of audio and video via the Internet featuring music, movies, news, and sports; Streaming audio and video materials about music, movies, television shows, music videos, and news accessible on websites via global and local computer networks; Broadcasting of film and television features via the Internet and social media platforms; Broadcasting of television news programs; Providing streaming audio and video such as music, movies, television shows, music videos, news and sports webcasts via a website and social media platforms; Streaming services via the internet and social media platforms of podcasts; Providing access to a platform for real-time multimedia communications via a website on the internet and social media platforms; Broadcasting and streaming of audio and video recordings of live events via a website and social media platforms Visual effect reproduction for videos, DVDs, television and for internet websites, and 3D sound recording and projection; Providing a website that provides information, audio, and video in the field of sports; Providing an Internet website portal featuring entertainment news and information specifically in the fields of music, sports and gambling; Entertainment services, namely, providing online casino-style games and games of chance; Entertainment services, namely, providing online slot machine-style games; Entertainment services, namely, online casino-style gaming; Gaming services in the nature of casino gambling; Entertainment services, namely, providing games of chance via the Internet; Betting, gambling, igaming in the nature of online sports betting and wagering services; Providing online betting, gambling, igaming in the nature of online sports betting and wagering services via the internet; Providing a website featuring online betting, gambling, igaming in the nature of online sports betting and wagering services; Entertainment services in the nature of sports betting, and igaming in the nature of online sports betting via the internet Computer services, namely, providing an interactive web site featuring technology that allows users to access, consolidate and manage accounts and connections to application programming interfaces; Software as a service (SAAS) services featuring software for developing, building, and operating applications that are used to collect, publish, manage, and transform video, sound, text, visual information, and graphic works; Software as a service (SAAS) services, namely, hosting software for use by others for use for developing, building, and operating applications that are used to collect, publish, manage, and transform video, sound, text, visual information, and graphic works; Platform as a service (PAAS) featuring computer software platforms for developing, building, and operating applications that are used to collect, publish, manage, and transform video, sound, text, visual information, and graphic works; Providing temporary use of non-downloadable software for hosting and conducting live auctions for the sale of products and services via a global computer network Downloadable and recorded software for the collection, managing, editing, organizing, modifying, transmission, sharing, and storage of data and information; Downloadable and recorded software for processing images, graphics, audio, video, and text; Digital media streaming devices; Apparatus for recording or transmission of images, sound or data; Wireless data capture and communications apparatus for transmission of data, images and sound; Interactive data transmission apparatus; Downloadable and recorded video display software
68.
METHOD AND DEVICE FOR DERIVING INTER-VIEW MOTION MERGING CANDIDATE
The present invention provides a method and a device for deriving an inter-view motion merging candidate. A method for deriving an inter-view motion merging candidate, according to an embodiment of the present invention, can comprise the steps of: on the basis of encoding information of an inter-view reference block derived by means of a variation vector of a current block, determining whether or not inter-view motion merging of the current block is possible; and, if inter-view motion merging of the current block is not possible, generating an inter-view motion merging candidate of the current block by using encoding information of an adjacent block that is spatially adjacent to the inter-view reference block.
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
H04N 19/105 - Sélection de l’unité de référence pour la prédiction dans un mode de codage ou de prédiction choisi, p. ex. choix adaptatif de la position et du nombre de pixels utilisés pour la prédiction
H04N 19/139 - Analyse des vecteurs de mouvement, p. ex. leur amplitude, leur direction, leur variance ou leur précision
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage
G06F 3/14 - Sortie numérique vers un dispositif de visualisation
G09G 5/02 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par la manière dont la couleur est visualisée
H04N 9/69 - Circuits pour modifier les signaux de couleur par correction de gamma
H04N 21/235 - Traitement de données additionnelles, p. ex. brouillage de données additionnelles ou traitement de descripteurs de contenu
H04N 21/4402 - Traitement de flux élémentaires vidéo, p. ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène du flux vidéo codé impliquant des opérations de reformatage de signaux vidéo pour la redistribution domestique, le stockage ou l'affichage en temps réel
70.
METHODS AND APPARATUS FOR DECODING ENCODED HOA SIGNALS
There are two representations for Higher Order Ambisonics denoted HOA: spatial domain and coefficient domain. The invention generates from a coefficient domain representation a mixed spatial/coefficient domain representation, wherein the number of said HOA signals can be variable. An aspect of the invention further relates to methods and apparatus decoding multiplexed and perceptually encoded HOA signals, including transforming a vector of PCM encoded spatial domain signals of the HOA representation to a corresponding vector of coefficient domain signals by multiplying the vector of PCM encoded spatial domain signals with a transform matrix and de-normalizing the vector of PCM encoded and normalized coefficient domain signals, wherein said de-normalizing comprises. The methods may include combining a vector of coefficient domain signals and the vector of de-normalized coefficient domain signals to determine a combined vector of HOA coefficient domain signals that can have a variable number of HOA coefficients.
H04S 3/00 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
71.
Scalable systems for controlling color management comprising varying levels of metadata
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage
G09G 5/02 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par la manière dont la couleur est visualisée
H04N 9/69 - Circuits pour modifier les signaux de couleur par correction de gamma
H04N 21/235 - Traitement de données additionnelles, p. ex. brouillage de données additionnelles ou traitement de descripteurs de contenu
H04N 21/4402 - Traitement de flux élémentaires vidéo, p. ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène du flux vidéo codé impliquant des opérations de reformatage de signaux vidéo pour la redistribution domestique, le stockage ou l'affichage en temps réel
G06F 3/14 - Sortie numérique vers un dispositif de visualisation
72.
SCALABLE SYSTEMS FOR CONTROLLING COLOR MANAGEMENT COMPRISING VARYING LEVELS OF METADATA
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage
G06F 3/14 - Sortie numérique vers un dispositif de visualisation
G09G 5/02 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par la manière dont la couleur est visualisée
H04N 9/69 - Circuits pour modifier les signaux de couleur par correction de gamma
H04N 21/235 - Traitement de données additionnelles, p. ex. brouillage de données additionnelles ou traitement de descripteurs de contenu
H04N 21/4402 - Traitement de flux élémentaires vidéo, p. ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène du flux vidéo codé impliquant des opérations de reformatage de signaux vidéo pour la redistribution domestique, le stockage ou l'affichage en temps réel
73.
MULTICHANNEL AND MULTI-STREAM SOURCE SEPARATION VIA MULTI-PAIR PROCESSING
A method and system for separating a target audio source from a multi-channel audio input including N audio signals, N>=3. The N audio signals are combined into at least two unique signal pairs, and pairwise source separation is performed on each signal pair to generate at least two processed signal pairs, each processed signal pair including source separated versions of the audio signals in the signal pair. The at least two processed signal pairs are combined to form the target audio source having N target audio signals corresponding to the N audio signals.
Approaches for generating metadata for content to be composited and rendered are described. These approaches can be used with the development and distribution of one or more web pages or other graphical user interfaces. For example, a web page developer can collect content to be composited together into a web page and invoke a set of APIs to generate the metadata for the content of the web page that will be composited; a metadata generation system receives the calls through the API and generates the metadata. The web page can then be distributed with the generated metadata which can be used to create the display of the web page with content that is perceptually modified based on the metadata about the individual elements on the web page and their spatial proximity.
The present disclosure relates to a method and audio processing arrangement for extracting a target mid (and optionally a target side) audio signal from a stereo audio signal. The method comprises obtaining (S1) a plurality of consecutive time segments of the stereo audio signal and obtaining (S2), for each of a plurality of frequency bands of each time segment of the stereo audio signal, at least one of a target panning parameter (Θ) and a target phase difference parameter (Φ). The method further comprises extracting (S3), for each time segment and each frequency band, a partial mid signal representation (211, 212) based on at least one of the target panning parameter (Θ) and the target phase difference parameter (Φ) of each frequency band and forming (S4) the target mid audio signal (M) by combining the partial mid signal representations (211, 212) for each frequency band and time segment.
Optical filters for projection assemblies. One optical filter includes a transmissive portion configured to transmit modulated light toward a downstream optical element and a reflective portion. The reflective portion is configured to receive unmodulated light from a modulator at a first angle, and reflect the unmodulated light toward a light dump at a second angle, wherein an angle different between the first angle and the second angle is between 90° and 180°. The optical filter is disposed at a Fourier plane of the modulated light.
G02B 26/08 - Dispositifs ou dispositions optiques pour la commande de la lumière utilisant des éléments optiques mobiles ou déformables pour commander la direction de la lumière
G02B 5/00 - Éléments optiques autres que les lentilles
G02B 26/06 - Dispositifs ou dispositions optiques pour la commande de la lumière utilisant des éléments optiques mobiles ou déformables pour commander la phase de la lumière
Techniques for generating audio streams for immersive audio content are provided. In some embodiments, the techniques involve obtaining a first set of audio channel recordings from a first audio capture device and a second set of audio channel recordings from a second audio capture device. The techniques may involve performing beamforming using the first set of audio channel recordings to generate a spatially-processed set of audio channel recordings. The techniques may involve performing spatial matching and timbre matching on the spatially-processed set of audio channel recordings using information associated with the second set of audio channel recordings to generate a matched set of audio channel recordings. The techniques may involve combining the matched set of audio channel recordings with the second set of audio channel recordings to generate a perceptually-matched audio stream.
In one example, a method of generating binaural audio includes transforming input audio signals representing an audio scene into corresponding decorrelated audio objects and applying a head related transform filter (HRTF) to each of the decorrelated audio objects to generate respective left and right audio components. The method also includes summing the left audio components and summing the right audio components to generate left and right output binaural signals, respectively. In some examples, the decorrelation of audio objects is performed using an all-pass filter serially connected with a bank of comb filters whose delay parameters are selected based on different respective prime numbers. In some examples, different HRTFs are applied to low- and high-frequency components. In some examples, the HRTFs are normalized with respect to an HRTF corresponding to a reference head-rotation angle.
H04S 5/00 - Systèmes pseudo-stéréophoniques, p. ex. dans lesquels les signaux d'un canal supplémentaire sont dérivés du signal monophonique par déphasage, retardement ou réverbération
79.
METHODS, APPARATUS AND SYSTEMS FOR OPTIMIZING COMMUNICATION BETWEEN SENDER(S) AND RECEIVER(S) IN COMPUTER-MEDIATED REALITY APPLICATIONS
The present invention is directed to systems, methods and apparatus for processing media content for reproduction by a first apparatus. The method includes obtaining pose information indicative of a position and/or orientation of a user. The pose information is transmitted to a second apparatus that provides the media content. The media content is rendered based on the pose information to obtain rendered media content. The rendered media content is transmitted to the first apparatus for reproduction. The present invention may include a first apparatus for reproducing media content and a second apparatus storing the media content. The first apparatus is configured to obtain pose information indicative and transmit the pose information to the second apparatus; and the second apparatus is adapted to: render the media content based on the pose information to obtain rendered media content; and transmit the rendered media content to the first apparatus for reproduction.
A63F 13/428 - Traitement des signaux de commande d’entrée des dispositifs de jeu vidéo, p. ex. les signaux générés par le joueur ou dérivés de l’environnement par mappage des signaux d’entrée en commandes de jeu, p. ex. mappage du déplacement d’un stylet sur un écran tactile en angle de braquage d’un véhicule virtuel incluant des signaux d’entrée de mouvement ou de position, p. ex. des signaux représentant la rotation de la manette d’entrée ou les mouvements des bras du joueur détectés par des accéléromètres ou des gyroscopes
A63F 13/213 - Dispositions d'entrée pour les dispositifs de jeu vidéo caractérisées par leurs capteurs, leurs finalités ou leurs types comprenant des moyens de photo-détection, p. ex. des caméras, des photodiodes ou des cellules infrarouges
G06F 3/01 - Dispositions d'entrée ou dispositions d'entrée et de sortie combinées pour l'interaction entre l'utilisateur et le calculateur
H04L 67/131 - Protocoles pour jeux, simulations en réseau ou réalité virtuelle
H04S 3/00 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques
H04S 7/00 - Dispositions pour l'indicationDispositions pour la commande, p. ex. pour la commande de l'équilibrage
80.
SIGNAL RESHAPING AND CODING FOR HDR AND WIDE COLOR GAMUT SIGNALS
In a method to improve the coding efficiency of high-dynamic range (HDR) images, a decoder parses sequence processing set (SPS) data from an input coded bitstream to detect that an HDR extension syntax structure is present in the parsed SPS data. It extracts from the HDR extension syntax structure post-processing information that includes one or more of a color space enabled flag, a color enhancement enabled flag, an adaptive reshaping enabled flag, a dynamic range conversion flag, a color correction enabled flag, or an SDR viewable flag. It decodes the input bitstream to generate a preliminary output decoded signal, and generates a second output signal based on the preliminary output signal and the post-processing information.
H04N 19/70 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par des aspects de syntaxe liés au codage vidéo, p. ex. liés aux standards de compression
G06T 5/90 - Modification de la plage dynamique d'images ou de parties d'images
H04N 19/186 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une couleur ou une composante de chrominance
H04N 19/30 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant des techniques hiérarchiques, p. ex. l'échelonnage
H04N 19/85 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo
81.
CODED SPEECH ENHANCEMENT BASED ON DEEP GENERATIVE MODEL
A system for generating enhanced speech data using robust audio features is disclosed. In some embodiments, a system is programmed to use a self-supervised deep learning model to generate a set of feature vectors from given audio data that contains contaminated speech and is coded. The system is further programmed to use a generative deep learning model to create improved audio data corresponding to clean speech from the set of feature vectors.
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
Some disclosed methods involve: receiving multi-channel audio data including unlabeled multi-channel audio data; extracting audio feature data from the unlabeled multi-channel audio data; applying a spatial masking process to a portion of the audio feature data; applying a contextual encoding process to the masked audio feature data, to produce predicted spatial embeddings in a latent space; obtaining reference spatial embeddings in the latent space; determining a loss function gradient based, at least in part, on a variance between the predicted spatial embeddings and the reference spatial embeddings; and updating the contextual encoding process according to the loss function gradient until one or more convergence metrics are attained.
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
83.
Neutral color preservation for single-layer backward compatible codec
Novel methods and systems for processing a single-layer backward compatible codec with multiple-channel multiple regression coefficients either provided in or pointed to in metadata such that the coefficients have been biased to prevent a shift in neutral colors. Pseudo neutral color patches are used along with a saturation weighting factor to bias the coefficients.
G06T 7/90 - Détermination de caractéristiques de couleur
H04N 19/186 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une couleur ou une composante de chrominance
H04N 19/98 - Codage de plage-dynamique adaptative [ADRC]
84.
SYSTEM FOR MAINTAINING REVERSIBLE DYNAMIC RANGE CONTROL INFORMATION ASSOCIATED WITH PARAMETRIC AUDIO CODERS
On the basis of a bitstream (P), an n-channel audio signal (X) is reconstructed by deriving an m-channel core signal (Y) and multichannel coding parameters (α) from the bitstream, where 1≤m
G10L 19/24 - Codecs à débit variable, p. ex. pour générer différentes qualités en utilisant une représentation évolutive comme le codage hiérarchique ou le codage par couches
E21B 21/00 - Procédés ou appareils pour nettoyer les trous de forage par jet de fluide, p. ex. en utilisant l'air d'échappement du moteur
E21B 33/138 - Plâtrage de la paroi du trou de forageInjections dans la formation
E21B 41/00 - Matériel ou accessoires non couverts par les groupes
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
Volume leveling of an audio signal using a volume leveling control signal. The method comprises determining a noise reliability ratio w(n) as a ratio of noise-like frames over all frames in a current time segment, determining a PGC noise confidence score XPGN(n) indicating a likelihood that professionally generated content, PGC, noise is present in the time segment, and determining, for the time segment, whether the noise reliability ratio is above a predetermined threshold. When the noise reliability ratio is above the predetermined threshold, the volume leveling control signal is updated based on the PGC noise confidence score, and when the noise reliability ratio is below the predetermined threshold, the volume leveling control signal is left unchanged. Volume leveling is improved by preventing boosting of e.g. phone-recorded environmental noise in UGC, while keeping original behavior for other types of content.
Systems and methods for dynamic video timing in display devices. A display control system includes an active matrix, a column driver, a row driver, and a controller. The active matrix includes a plurality of pixels forming a plurality of rows and a plurality of columns. The column driver is configured to control the plurality of columns of pixels. The row driver is configured to control the plurality of rows of pixels. The controller is configured to receive a plurality of scanlines forming a video frame, receive an indication of a scanline order, reorder the plurality of scanlines according to the scanline order, and control the row driver according to the reordered plurality of scanlines. The scanline order is indicative of an order at which the plurality of rows of pixels are controlled.
A dual-modulation laser projection system includes (a) a polarizing beamsplitter for splitting laser light into first and second polarized beams having mutually orthogonal polarizations, (b) a phase spatial light modulator (SLM) for beam steering the second polarized beam, (c) a mechanical amplitude SLM for amplitude modulating a combination of the first polarized beam and the second polarized beam as beam steered by the phase SLM, and (d) a filter for removing, from the amplitude modulated combination of the first and second polarized beams, one or more of a plurality of diffraction orders introduced by the mechanical amplitude SLM, to generate filtered, modulated output light.
Systems and methods for generating downsampled textured maps. One example method includes training a neural network by providing a three-dimensional model including an original texture map to the neural network, downsampling the original texture map, and iteratively differentiably rendering the original and downsampled texture maps and using differences in renderings of the original and downsampled texture maps as feedback for training the neural network. After reaching a training completion condition, the trained neural network provides a downsampled texture map that has a lower resolution than the texture map.
Systems and methods are described for controlling adaptive filtering components. A far end signal may be filtered by each of a foreground filter and a background filter, where both filters are adaptive echo cancellation filters that output echo estimations. Control logic may halt adaptation by the background filter based on a deviation signal. To determine the deviation signal, cross-correlation coefficients for the echo estimations produced by both filters may be determined for each frequency bin of the far end signal. The determined cross-correlation coefficients may be summed across a plurality of the frequency bins. A hysteresis function may then be applied to the sum associated with the filter associated with the echo estimation used to generate a filtered result. The deviation signal may be activated in response to the hysteresis function outputting a high value, which the control logic uses to turn off adaptation by the background filter.
H04M 9/08 - Systèmes téléphoniques à haut-parleur à double sens comportant des moyens pour conditionner le signal, p. ex. pour supprimer les échos dans l'une ou les deux directions du trafic
90.
IMAGE ENCODING AND DECODING APPARATUS, AND IMAGE ENCODING AND DECODING METHOD USING CONTOUR MODE BASED INTRA PREDICTION
According to the present invention, an adaptive scheme is applied to an image encoding apparatus that includes an inter-predictor, an intra-predictor, a transformer, a quantizer, an inverse quantizer, and an inverse transformer, wherein input images are classified into two or more different categories, and two or more modules from among the inter-predictor, the intra-predictor, the transformer, the quantizer, and the inverse quantizer are implemented to perform respective operations in different schemes according to the category to which an input image belongs. Thus, the invention has the advantage of efficiently encoding an image without the loss of important information as compared to a conventional image encoding apparatus which adopts a packaged scheme.
H04L 45/745 - Recherche de table d'adressesFiltrage d'adresses
H04N 19/109 - Sélection du mode de codage ou du mode de prédiction parmi plusieurs modes de codage prédictif temporel
H04N 19/11 - Sélection du mode de codage ou du mode de prédiction parmi plusieurs modes de codage prédictif spatial
H04N 19/117 - Filtres, p. ex. pour le pré-traitement ou le post-traitement
H04N 19/12 - Sélection parmi plusieurs transformées ou standards, p. ex. sélection entre une transformée en cosinus discrète [TCD] et une transformée en sous-bandes ou sélection entre H.263 et H.264
H04N 19/132 - Échantillonnage, masquage ou troncature d’unités de codage, p. ex. ré-échantillonnage adaptatif, saut de trames, interpolation de trames ou masquage de coefficients haute fréquence de transformée
H04N 19/136 - Caractéristiques ou propriétés du signal vidéo entrant
H04N 19/14 - Complexité de l’unité de codage, p. ex. activité ou estimation de présence de contours
H04N 19/159 - Type de prédiction, p. ex. prédiction intra-trame, inter-trame ou de trame bidirectionnelle
H04N 19/17 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/597 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif spécialement adapté pour l’encodage de séquences vidéo multi-vues
H04N 19/61 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant un codage par transformée combiné avec un codage prédictif
H04N 19/82 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels mettant en œuvre le filtrage dans une boucle de prédiction
91.
METHODS, APPARATUS AND SYSTEMS FOR POSITION-BASED GAIN ADJUSTMENT OF OBJECT-BASED AUDIO
The positions of a plurality of speakers at a media consumption site are determined. Audio information in an object-based format is received. Gain adjustment value for a sound content portion in the object-based format may be determined based on the position of the sound content portion and the positions of the plurality of speakers. Audio information in a ring-based channel format is received. Gain adjustment value for each ring-based channel in a set of ring-based channels may be determined based on the ring to which the ring-based channel belongs and the positions of the speakers at a media consumption site.
Methods and apparatus for representing multimedia signals using neural field networks. According to an example embodiment, a method of jointly representing an audio signal and a video signal using a neural field includes applying positional encoding to a time stamp corresponding to a video frame of the video signal and further corresponding to an associated segment of the audio signal to generate a respective higher-dimensional embedding. The method further includes reconstructing the video frame of the video signal and generating a corresponding set of audio samples representing the associated segment of the audio signal by applying the higher-dimensional embedding to a neural-field network trained to represent a sequence of video frames of the video signal and an associated plurality of segments of the audio signal.
H04N 19/46 - Inclusion d’information supplémentaire dans le signal vidéo pendant le processus de compression
H04N 19/68 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant la tolérance aux erreurs mettant en œuvre l’insertion de marqueurs de resynchronisation dans le train de bits
H04N 19/85 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo
H04N 21/8547 - Création de contenu impliquant des marquages temporels pour synchroniser le contenu
Methods, systems, and media for determining sound field rotations are provided. In some embodiments, a method for determining sound field rotations involves determining an activity situation of a user. The method may involve determining a user head orientation using at least one sensor of the one or more sensors. The method may involve determining a direction of interest based on the activity situation and the user head orientation. The method may involve determining a rotation of a sound field used to present audio objects via headphones based on the direction of interest.
A multi-view input image covering multiple sampled views is received. A multi-view layered image stack is generated from the multi-view input image. A target view of a viewer to an image space depicted by the multi-view input image is determined based on user pose data. The target view is used to select user pose selected sampled views from among the multiple sampled views. Layered images for the user pose selected sampled views, along with alpha maps and beta scale maps for the user pose selected sampled views are encoded into a video signal to cause a recipient device of the video signal to generate a display image for rendering on the image display.
Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.
H04S 7/00 - Dispositions pour l'indicationDispositions pour la commande, p. ex. pour la commande de l'équilibrage
G10L 19/00 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p. ex. dans les vocodeursCodage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique
G10L 19/008 - Codage ou décodage du signal audio multi-canal utilisant la corrélation inter-canaux pour réduire la redondance, p. ex. stéréo combinée, codage d’intensité ou matriçage
G10L 19/018 - Mise en place d’un filigrane audio, c.-à-d. insertion de données inaudibles dans le signal audio
G10L 19/20 - Vocodeurs utilisant des modes multiples utilisant un codage spécifique de la catégorie de son, des encodeurs hybrides ou un codage basé objet
H04S 3/00 - Systèmes utilisant plus de deux canaux, p. ex. systèmes quadriphoniques
97.
EFFICIENT ORIENTATION TRACKING WITH FUTURE ORIENTATION PREDICTION
The present disclosure relates to a method and system for predicting a future orientation of an orientation tracker (100). The method comprising obtaining a sequence of angular velocity samples, each angular velocity sample indicating an angular velocity at a point in time and obtaining a sequence of angular acceleration samples, each angular acceleration sample indicating an acceleration or deceleration of the angular velocity at each point in time. Wherein said method further comprises determining (S5a), for each point in time where the angular velocity is accelerating, a predicted orientation of the orientation tracker (100) based on a first order prediction of an accumulated rotation of the orientation tracker (100) and determining (S5c), for each point in time where the angular velocity is decelerating, a predicted orientation of the orientation tracker (100) based on a second order prediction of the accumulated rotation of the orientation tracker (100).
Systems and method for generating color correction matrices for converting raw red-green-blue (RGB) signals to a standard color space. One example system includes an image sensor and an electronic processor. The image sensor is configured to capture a scene and generate a raw image, the raw image including a raw RGB signal. The electronic processor is configured to receive the raw RGB signal, determine white balance coefficient ratio values of the raw image, provide the white balance coefficient ratio values to a neural network, and receive color correction matrix values from the neural network. The color correction matrix values are based on the white balance coefficient ratio values. The electronic processor is configured to generate a color correction matrix using the color correction matrix values and apply the color correction matrix to the raw RGB signal to generate a corrected RGB signal.
H04N 23/88 - Chaînes de traitement de la caméraLeurs composants pour le traitement de signaux de couleur pour l'équilibrage des couleurs, p. ex. circuits pour équilibrer le blanc ou commande de la température de couleur
99.
METHODS AND DEVICES FOR RENDERING AN AMBISONICS AUDIO SIGNAL
The present document describes a method (400) for rendering an ambisonics signal using a loudspeaker arrangement comprising S loudspeakers. The method (400) comprises converting (401) a set of N ambisonics channel signals (111) into a set of unfiltered pre-rendered signals (211), with N>1 and S>1. Furthermore, the method (400) comprises performing (402) near field compensation, referred to as NFC, filtering of M unfiltered pre-rendered signals (211) of the set of unfiltered pre-rendered signals (211) to provide a set of S filtered loudspeaker channel signals (114) for rendering using the corresponding S loudspeakers.
Systems and methods for an entropy coding system are described. The entropy coding systems include an encoding apparatus and a decoding apparatus. The encoding apparatus is configured to receive an original input stream comprising a plurality of symbols having a known entropy characteristic according to a probability distribution of each of the symbols appearing in the original input stream, determine an input and respective state for each symbol read from the original input stream, append the determined input to the encoded output stream, and provided the encoded output stream to the decoding apparatus. The decoding apparatus is configured to receive the encoded output stream, process the encoded output stream, and for each read input: determine an output symbol and a respective output, persist the respective output state to the encoded output stream, and append the determined output symbol to the results output stream.
H04N 19/13 - Codage entropique adaptatif, p. ex. codage adaptatif à longueur variable [CALV] ou codage arithmétique binaire adaptatif en fonction du contexte [CABAC]
H03M 7/40 - Conversion en, ou à partir de codes de longueur variable, p. ex. code Shannon-Fano, code Huffman, code Morse