Disclosed are apparatuses, systems, and techniques for automatically generating commentary to videos that capture sporting activities, computer games, artistic events, political rallies, security-sensitive scenes, and/or any other actions. The techniques include processing a video segment that includes a plurality of video frames, to obtain a description of one or more objects pictured in the video segment and generating, using the obtained description, a prompt for a language model (LM). The techniques further include causing the LM to process the prompt to generate a commentary about an action performed by the one or more objects over a time interval associated with the plurality of video frames.
H04N 21/266 - Gestion de canal ou de contenu, p. ex. génération et gestion de clés et de messages de titres d'accès dans un système d'accès conditionnel, fusion d'un canal de monodiffusion de VOD dans un canal multidiffusion
G10L 13/08 - Analyse de texte ou génération de paramètres pour la synthèse de la parole à partir de texte, p. ex. conversion graphème-phonème, génération de prosodie ou détermination de l'intonation ou de l'accent tonique
Three-dimensional Gaussian splatting mechanisms that initialize a set of 3D Gaussian distributions, un-project pixels from two-dimensional (2D) planes to 3D space by applying queries to the 3D Gaussians at expected un-projected ray depth positions, and splat the 3D Gaussian distributions on the 2D planes based on the expected un-projected ray depth positions.
In various examples, systems and methods are disclosed relating to generating an output 3D latent representation by encoding, using a text encoder, a text prompt and encoding, using a 2D/3D encoder, a 2D image of an object or a 3D representation of the object. A 3D output is generated by applying the output 3D latent representation to a decoder. A reconstruction loss and a SDS loss are determined for the 3D output. At least one of the text encoder, the 2D/3D encoder, and the decoder is updated using the reconstruction loss and the SDS loss.
In various examples, metadata associated with one or more models may be captured during a training process and stored in association with the model(s). The metadata may then be used, in some embodiments, for enforcing execution of the model(s). For instance, the model(s) may be trained during at least a portion of the training process. During at least a second portion of the training process, one or more attributes associated with the model(s) may be determined. The attribute(s) may then be stored as metadata in association with the model(s). Additionally, in some embodiments, an endpoint may request to execute the model(s). Responsive to the request, and based at least on evaluating the metadata with respect to one or more criteria associated with the endpoint, a determination may be made regarding whether or not to provide the model(s) to the endpoint for execution.
Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.
In various examples, systems and methods are disclosed that relate to the implementation of parallel and distributed topological sorting. For example, a system can receive data associated with a request, the request associated with a plurality of dependencies. In an example, the plurality of dependencies can include a first dependency and a second dependency, and the system can determine that a first dependency of the set of dependencies is satisfied. In examples, the system can cause an indication to be provided that the first dependency is satisfied to a system associated with the second dependency.
Various embodiments include techniques for controlling temperature and fan speed in a computing system. Conventional computing systems present the user with a very limited set of three or four curated performance mode presets, which can impose substantial trade-offs in performance, acoustic noise, and/or case temperature that the user may find to be unacceptable. By contrast, the disclosed techniques allow the user to precisely position the operation of the computing system anywhere in the two-dimensional space of fan speed (which determines acoustic noise) versus case temperature that suits the preference of the user. The disclosed techniques further provide a closed-loop feedback control system for controlling the case temperature. This closed-loop feedback control system operates in conjunction with the adjustable case temperature target to determine individual power limits for certain components, such as a CPU power limit, a GPU power limit, and/or the like.
Apparatuses, systems, and techniques to generate one or more neural networks. In at least one embodiment, a processor comprises one or more circuits to use one or more first neural networks to generate one or more second versions of one or more second neural networks based, at least in part, on one or more first versions of the one or more second neural networks and one or more hardware resources to be used to perform the one or more second versions of the one or more second neural networks.
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
9.
PARTITIONING ELEMENTS USING SPATIAL INDEXING FOR LIGHT TRANSPORT SIMULATION IN MULTI-DIMENSIONAL SPACE
In various examples, based on spatial index values corresponding to first assignments of a first set of bins to spatial elements of a space, second assignments may be determined of a second set of bins to at least two spatial elements. The second assignments may at least partially overwrite corresponding assignments of the first assignments and may be used to determine partitions of the at least two spatial elements. The second assignments may be determined based on the first assignments indicating multiple spatial elements correspond to a same subregion of the space. The first assignments may be used to determine a first portion of a partitioning of the spatial elements. The second assignments may be used to partition a subset of the spatial elements to determine a second portion of the partitioning of the spatial elements, which may further partition one or more nodes from the first portion.
G06F 30/27 - Optimisation, vérification ou simulation de l’objet conçu utilisant l’apprentissage automatique, p. ex. l’intelligence artificielle, les réseaux neuronaux, les machines à support de vecteur [MSV] ou l’apprentissage d’un modèle
In various examples, image space coordinates of an image from a video may be labeled, projected to determine 3D vehicle space coordinates, then transformed to 3D world space coordinates using known 3D world space coordinates and relative positioning between the coordinate spaces. For example, 3D vehicle space coordinates may be temporally correlated with known 3D world space coordinates measured while capturing the video. The known 3D world space coordinates and known relative positioning between the coordinate spaces may be used to offset or otherwise define a transform for the 3D vehicle space coordinates to world space. Resultant 3D world space coordinates may be used for one or more labeled frames to generate ground truth data. For example, 3D world space coordinates for left and right lane lines from multiple frames may be used to define lane lines for any given frame.
G06V 20/56 - Contexte ou environnement de l’image à l’extérieur d’un véhicule à partir de capteurs embarqués
G06T 7/73 - Détermination de la position ou de l'orientation des objets ou des caméras utilisant des procédés basés sur les caractéristiques
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 10/94 - Architectures logicielles ou matérielles spécialement adaptées à la compréhension d’images ou de vidéos
11.
INSTANTIATION OF GRAPHICAL USER INTERFACE ELEMENTS FOR STREAMING SYSTEMS AND APPLICATIONS
In examples, a device's native input interface (e.g., a soft keyboard) may be invoked using interaction areas associated with image frames from an application, such as a game. An area of an image frame(s) from a streamed game video may be designated (e.g., by the game and/or a game server) as an interaction area. When an input event associated with the interaction area is detected, an instruction may be issued to the client device to invoke a user interface (e.g., a soft keyboard) of the client device and may cause the client device to present a graphical input interface. Inputs made to the presented graphical input interface may be accessed by the game streaming client and provided to the game instance.
A63F 13/235 - Dispositions d'entrée pour les dispositifs de jeu vidéo pour l'interfaçage avec le dispositif de jeu, p. ex. des interfaces spécifiques entre la manette et la console de jeu utilisant une connexion sans fil, p. ex. infrarouge ou piconet
A63F 13/335 - Dispositions d’interconnexion entre des serveurs et des dispositifs de jeuDispositions d’interconnexion entre des dispositifs de jeuDispositions d’interconnexion entre des serveurs de jeu utilisant des connexions de réseau étendu [WAN] utilisant l’Internet
A63F 13/355 - Réalisation d’opérations pour le compte de clients ayant des capacités de traitement restreintes, p. ex. serveurs transformant une scène de jeu qui évolue en flux vidéo codé à transmettre à un téléphone portable ou à un client léger
12.
ADAPTIVE ENSEMBLES OF SAFEGUARD MODELS FOR MODERATION OF LANGUAGE MODEL APPLICATIONS
Disclosed are apparatuses, systems, and techniques for adaptable provisioning of accurate and flexible assessments of safety of AI operations. The techniques include performing a probabilistic selection of a safeguard model, from an ensemble of safeguard models, to generate a safety assessment of a prompt to a language model, likelihood of the probabilistic selection being determined using historical performance of the ensemble of safeguard models.
An apparatus including an integrated circuit and logic to control power consumption of the integrated circuit based on a determination of thermal contact between a cooling element and the integrated circuit. A device including an integrated circuit, a cooling element, and logic to control heat generation by the integrated circuit based on a determination of thermal resistance between the integrated circuit and the cooling element.
Systems and methods are disclosed that generate dense blob representations such as blob parameters and blob descriptions, and use the dense blob representations to generate images. For example, embodiments of the present disclosure may decompose a scene into visual primitives (e.g., dense blob representations) and based on the blob representations, embodiments of the present disclosure develop a blob-grounded text-to-image diffusion model (BlobGEN) for compositional generation. For example, in some embodiments, a new masked cross-attention module may be introduced to disentangle the fusion between blob representations and visual features. In some embodiments, to leverage the compositionality of large language models (LLMs), a new in-context learning approach may be introduced to generate blob representations from text prompts.
In various examples, animations may be generated using audio-driven body animation synthesized with voice tempo. For example, full body animation may be driven from an audio input representative of recorded speech, where voice tempo (e.g., a number of phonemes per unit time) may be used to generate a 1D audio signal for comparing to datasets including data samples that each include an animation and a corresponding 1D audio signal. One or more loss functions may be used to compare the 1D audio signal from the input audio to the audio signals of the datasets, as well as to compare joint information of joints of an actor between animations of two or more data samples, in order to identify optimal transition points between the animations. The animations may then be stitched together—e.g., using interpolation and/or a neural network trained to seamlessly stitch sequences together—using the transition points.
Systems and methods are disclosed herein for implementation of a vehicle command operation system that may use multi-modal technology to authenticate an occupant of the vehicle to authorize a command and receive natural language commands for vehicular operations. The system may utilize sensors to receive data indicative of a voice command from an occupant of the vehicle. The system may receive second sensor data to aid in the determination of the corresponding vehicular operation in response to the received command. The system may retrieve authentication data for the occupants of the vehicle. The system authenticates the occupant to authorize a vehicular operation command using a neural network based on at least one of the first sensor data, the second sensor data, and the authentication data. Responsive to the authentication, the system may authorize the operation to be performed in the vehicle based on the vehicular operation command.
B60R 16/037 - Circuits électriques ou circuits de fluides spécialement adaptés aux véhicules et non prévus ailleursAgencement des éléments des circuits électriques ou des circuits de fluides spécialement adapté aux véhicules et non prévu ailleurs électriques pour le confort des occupants
B60R 25/01 - Équipements ou systèmes pour empêcher ou signaler l’usage non autorisé ou le vol de véhicules agissant sur des systèmes ou des équipements de véhicules, p. ex. sur les portes, les sièges ou les pare-brises
B60R 25/25 - Moyens pour enclencher ou arrêter le système antivol par biométrie
B60R 25/30 - Détection relative au vol ou autres événements relatifs aux systèmes antivol
G05B 13/02 - Systèmes de commande adaptatifs, c.-à-d. systèmes se réglant eux-mêmes automatiquement pour obtenir un rendement optimal suivant un critère prédéterminé électriques
G06F 21/32 - Authentification de l’utilisateur par données biométriques, p. ex. empreintes digitales, balayages de l’iris ou empreintes vocales
G06V 10/25 - Détermination d’une région d’intérêt [ROI] ou d’un volume d’intérêt [VOI]
G06V 20/59 - Contexte ou environnement de l’image à l’intérieur d’un véhicule, p. ex. concernant l’occupation des sièges, l’état du conducteur ou les conditions de l’éclairage intérieur
G10L 17/00 - Techniques d'identification ou de vérification du locuteur
G10L 17/06 - Techniques de prise de décisionStratégies d’alignement de motifs
A storage processing unit (SPU), which may be resident in a server in a storage system, provides a boot volume to the server and provides storage services. The SPU may execute a process including taking three snapshots of the boot volume respectively after writing an operating system image into the boot volume, after writing component images or otherwise customizing contents of the boot volume, and after the server boots from the boot volume. For updates, stability, or recovery of the storage system, the SPU may promote any of the snapshots to be the boot volume before the server reboots.
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
18.
DISTILLING NEURAL RADIANCE FIELDS INTO SPARSE HIERARCHICAL VOXEL MODELS FOR GENERALIZABLE SCENE REPRESENTATION PREDICTION
At least one embodiment is directed towards a computer-implemented method for generating generalized scene representations. The computer-implemented method includes extracting feature information from a plurality of scene images, encoding the feature information to generate a plurality of feature images, and estimating depths of at least a plurality of pixels in each feature image included in the plurality of feature images to produce a plurality of feature frustra. The computer-implemented method also includes generating a plurality of octree voxels from the plurality of feature frusta, sampling points along a plurality of views from different proposed camera angles relative to the plurality of octree voxels to produce feature angles and depths that are subsequently aggregated into a plurality of predicted feature maps, and decoding the plurality of predicted feature maps to generate a plurality of final features maps.
Various embodiments include techniques for controlling temperature and fan speed in a computing system. Conventional computing systems present the user with a very limited set of three or four curated performance mode presets, which can impose substantial trade-offs in performance, acoustic noise, and/or case temperature that the user may find to be unacceptable. By contrast, the disclosed techniques allow the user to precisely position the operation of the computing system anywhere in the two-dimensional space of fan speed (which determines acoustic noise) versus case temperature that suits the preference of the user. The disclosed techniques further provide a closed-loop feedback control system for controlling the case temperature. This closed-loop feedback control system operates in conjunction with the adjustable case temperature target to determine individual power limits for certain components, such as a CPU power limit, a GPU power limit, and/or the like.
Systems and methods herein are for a video encoder to be associated with a temporal filter and a coding tree and that can perform a main pass for video encoding using individual video blocks towards prediction of at least one frame associated with the media stream, where the coding tree is associated with a lookahead pass, and where the temporal filter can enable denoising within the lookahead pass to reduce an effect of noise in one or more of motion estimation or mode selection of the video encoding.
H04N 19/86 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo mettant en œuvre la diminution des artéfacts de codage, p. ex. d'artéfacts de blocs
H04N 19/103 - Sélection du mode de codage ou du mode de prédiction
H04N 19/147 - Débit ou quantité de données codées à la sortie du codeur selon des critères de débit-distorsion
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/80 - Détails des opérations de filtrage spécialement adaptées à la compression vidéo, p. ex. pour l'interpolation de pixels
H04N 19/96 - Codage au moyen d'une arborescence, p. ex. codage au moyen d'une arborescence quadratique
21.
SCOPE TREE CONSISTENCY PROTOCOL FOR CACHE COHERENCE
Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.
Various embodiments include techniques for migrating points of coherence (PoCs) in a cache hierarchy. The techniques comprise receiving, at a first cache memory and from a second cache memory, a memory access request associated with a first scope group, and, in response to determining a first directory within the first cache memory includes a first entry indicating (i) a third cache memory is a child of the first cache memory in a tree associated with the first scope group, and (ii) the first cache memory is a root of the tree associated with the first scope group: performing one or more operations to acquire a PoC token from a descendant cache memory of the first cache memory in the tree associated with the first scope group, and updating the first entry to indicate that the first cache memory is a PoC of the tree associated with the first scope group.
Aspects of this technical solution can allocate one or more portions of a mesh in a three-dimensional (3D) space to one or more processing units associated with the one or more circuits, the mesh associated with a surface of an object in the 3D space, transform, by the one or more of the processing units, one or more of the portions of the mesh from the 3D space into corresponding second meshes in a two-dimensional (2D) space, segment, by the one or more of the processing units, according to a determination that a distortion of a portion of the mesh among the one or more of the portions of the mesh is below a threshold of parameterization, the portion of the mesh into two further portions of the mesh, where the further portions are among the one or more portions of the mesh, and generate, according to a determination that the distortion of the portion of the mesh among the one or more of the portions of the mesh is at or above the threshold of parameterization, an output mesh including the one or more portions of the mesh.
In various examples, systems and methods are disclosed relating to determining first track point heights of a ground surface for each of a plurality of frames of a disparity image based on a plane parallax algorithm, the first track point heights including previous track point heights of the ground surface for each of the at least one previous frame of the plurality of frames of the disparity image and current track point heights of the ground surface for the current frame of the plurality of frames of the disparity image and determining second track point heights by temporally fusing the current track point heights for the current frame and the previous track point heights for each of the at least one previous frame.
One embodiment of a power delivery device includes an inductor and one or more chips that are mounted on top of the inductor. One embodiment of a graphics card includes a graphics processing unit (GPU) mounted on top of a first side of a circuit board, and one or more power delivery devices mounted on top of a second side of the circuit board. Each power delivery device included in the one or more power delivery devices includes an inductor and one or more chips disposed on top of the inductor.
H01L 25/16 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant de types couverts par plusieurs des sous-classes , , , , ou , p. ex. circuit hybrides
H01L 23/00 - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 29/78 - Transistors à effet de champ l'effet de champ étant produit par une porte isolée
H05K 1/18 - Circuits imprimés associés structurellement à des composants électriques non imprimés
26.
NEURAL NETWORK TRAINING USING GROUND TRUTH DATA AUGMENTED WITH MAP INFORMATION FOR AUTONOMOUS MACHINE APPLICATIONS
In various examples, training sensor data generated by one or more sensors of autonomous machines may be localized to high definition (HD) map data to augment and/or generate ground truth data—e.g., automatically, in embodiments. The ground truth data may be associated with the training sensor data for training one or more deep neural networks (DNNs) to compute outputs corresponding to autonomous machine operations-such as object or feature detection, road feature detection and classification, wait condition identification and classification, etc. As a result, the HD map data may be leveraged during training such that the DNNs—in deployment—may aid autonomous machines in navigating environments safely without relying on HD map data to do so.
G01C 21/30 - Mise en coïncidence avec des cartes ou des contours
B60W 60/00 - Systèmes d’aide à la conduite spécialement adaptés aux véhicules routiers autonomes
G01S 19/41 - Correction différentielle, p. ex. DGPS [GPS différentiel]
G05D 1/246 - Dispositions pour déterminer la position ou l’orientation utilisant des cartes d’environnement, p. ex. localisation et cartographie simultanées [SLAM]
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
G06V 10/80 - Fusion, c.-à-d. combinaison des données de diverses sources au niveau du capteur, du prétraitement, de l’extraction des caractéristiques ou de la classification
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 20/56 - Contexte ou environnement de l’image à l’extérieur d’un véhicule à partir de capteurs embarqués
Approaches in accordance with various illustrative embodiments provide for the encryption of communications going into and out of a device, such as a chip or proprietary bus. The encryption can occur in a central Root-of-Trust (RoT), which can include agents for individual communication protocols to generate session keys used to encrypt communications for individual sessions, and the data can be sent to a crypto engine for the respective communication protocol. A key tunnel unit can be used to receive a wrapped session key over the public bus and then unwrap the key in hardware, then able to then transmit the unwrapped session key to the corresponding crypto engine without exposing the session key to software executing on the device outside the RoT. The receiving inline crypto engine can then use that session key to encrypt session data to be transmitted to a separate device or destination.
In various examples, table classification based query join reordering for relational database systems and applications are provided. In some embodiments, a relational database system is provided that includes a join optimizer that evaluates a join clause of a query and categorizes relational database tables as either fact tables or dimension tables based on a normalized cardinality statistic. The join optimizer uses the fact and dimension tables to deconstruct the query into a plurality of deconstructed query join trees. Individual deconstructed query join trees may be generated for each respective fact table. The deconstructed query join trees may be joined to generate a reordered join solution representing a sequential join of the plurality of deconstructed query join trees. An updated query may be generated based on the reordered join solution, and a query response generated that answers the query based at least on the updated query.
Embodiments of the present disclosure relate to using or generating a token and/or tokenized representation representative of a set of content, which may help in alleviating hallucination and other problems described herein. In operation, at inference time, some embodiments may first provide a representation of first natural language characters as an input into a machine learning model. The machine learning model may then responsively generate a tokenized representation based on the first natural language characters. The tokenized representation may not include a same character sequence as the set of content. Subsequent to the generation of the token and/or tokenized representation, some embodiment retrieve, via a data structure, the set of content.
Embodiments are directed to parallel processing of network communications on devices supporting a high degree of parallelization, such as a Graphics processing unit (GPU). Generally speaking, embodiments are directed to an inline packet processing pipeline to receive packets in GPU memory without staging copies through Central Processing Unit (CPU) memory, process the received packets in parallel with one or more kernels of the GPU, and then run inference, evaluate, or send over the network the result of the calculation. In this way, the highly parallel nature of the GPU can be leveraged to process network communications without involving other elements of the system, such as the CPU, which can be quickly consumed with processing network communications to the detriment of other processes.
In various examples, a hazard detection system plots hazard indicators from multiple detection sensors to grid cells of an occupancy grid corresponding to a driving environment. For example, as the ego-machine travels along a roadway, one or more sensors of the ego-machine may capture sensor data representing the driving environment. A system of the ego-machine may then analyze the sensor data to determine the existence and/or location of the one or more hazards within an occupancy grid—and thus within the environment. When a hazard is detected using a respective sensor, the system may plot an indicator of the hazard to one or more grid cells that correspond to the detected location of the hazard. Based, at least in part, on a fused or combined confidence of the hazard indicators for each grid cell, the system may predict whether the corresponding grid cell is occupied by a hazard.
In various examples, epipolar constraint-based cross-camera calibration validation is disclosed. For a pair of cameras that have partially overlapping fields of view, a shared region of their overlapping fields of view may be extracted and used as the basis to perform an epipolar constraint-guided feature descriptor matching process. A camera calibration metric may be computed based on the degree to which a feature descriptor appearing at a pixel of the first image aligns as expected in the second image with an epipolar line associated with the pixel of the first image, where the epipolar line is computed using extrinsic camera calibration parameters associated with the pair of cameras. Epipolar matching may be performed for a plurality of feature points and an aggregate validation score computed based on measuring the computed deviations for each feature. A sensitivity analysis may be applied to better assess the usefulness of the validation score.
G06T 7/80 - Analyse des images capturées pour déterminer les paramètres de caméra intrinsèques ou extrinsèques, c.-à-d. étalonnage de caméra
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/75 - Organisation de procédés de l’appariement, p. ex. comparaisons simultanées ou séquentielles des caractéristiques d’images ou de vidéosApproches-approximative-fine, p. ex. approches multi-échellesAppariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexteSélection des dictionnaires
Apparatuses, systems, and techniques to perform an application programming interface (API). In at least one embodiment, said API is to cause information to be encrypted based, at least in part, on one or more encryption algorithm indicators.
Apparatuses, systems, and techniques to perform an application programming interface (API). In at least one embodiment, said API is to cause encrypted information to be decrypted based, at least in part, on one or more encryption algorithm indicators.
H04L 9/32 - Dispositions pour les communications secrètes ou protégéesProtocoles réseaux de sécurité comprenant des moyens pour vérifier l'identité ou l'autorisation d'un utilisateur du système
A first synthetic image including a first pattern and a second synthetic image including a second pattern are processed to generated a stitched image comprising a modified version of the first synthetic image and a modified version of the second synthetic image. One or more features of the first pattern are deformed in the modified version of the first synthetic image and one or more features of the second pattern are deformed in the modified version of the second synthetic image. Pixel areas are determined for each feature of the modified version of the first synthetic image and for each feature of the modified version of the second synthetic image. Feature densities are determined for one or more regions of the stitched image based on the determined pixel areas.
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p. ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]Caractéristiques régionales saillantes
G06T 7/136 - DécoupageDétection de bords impliquant un seuillage
G06T 7/187 - DécoupageDétection de bords impliquant des croissances de zonesDécoupageDétection de bords impliquant des fusions de zonesDécoupageDétection de bords impliquant un étiquetage de composantes connexes
G06T 7/55 - Récupération de la profondeur ou de la forme à partir de plusieurs images
G06V 10/26 - Segmentation de formes dans le champ d’imageDécoupage ou fusion d’éléments d’image visant à établir la région de motif, p. ex. techniques de regroupementDétection d’occlusion
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
36.
SYNCHRONIZING AN ENCRYPTED DATA STREAM ACROSS CHIP-TO-CHIP GROUND REFERENCED SIGNALING INTERCONNECT
A device includes a memory to store a session key, a transmitter (TX) physical layer to transmit frames to a second device over a link, and TX datalink logic coupled to the TX physical layer and the memory. To coordinate synchronized encryption over the link with the second device, the TX datalink logic is to cause a key synchronization frame to be transmitted to the second device, wherein the key synchronization frame comprises a frame count value. The TX datalink logic, in response to receipt of a key synchronization acknowledgement from the second device acknowledging receipt of the key synchronization frame, is further to start encrypting frame data with the session key after transmitting a number of frames corresponding to the frame count value.
Disclosed are apparatuses, systems, and techniques for segmentation-assisted detection and tracking of objects or features in videos, across images, and/or in other 2D and/or 3D visual content. The techniques include processing a plurality of frames of a video to obtain a plurality of representations of an object depicted in the video. A first subset of the plurality of representations is obtained by processing, using an object detection model, a first subset of the plurality of frames. A second subset of the plurality of representations is obtained using visual similarity of an appearance of the object in a second subset of the plurality of frames to the appearance of the object in at least one other frame of the plurality of frames. The techniques further include obtaining, using the plurality of representations, segmentation masks for the plurality of frames and performing one or more operations based on the segmentation masks.
G06V 10/26 - Segmentation de formes dans le champ d’imageDécoupage ou fusion d’éléments d’image visant à établir la région de motif, p. ex. techniques de regroupementDétection d’occlusion
38.
MULTI-OBJECT TRACKING USING HIERARCHICAL GRAPH NEURAL NETWORKS
Various examples, systems, and methods are disclosed relating to dynamic novel view reconstruction based at least in part on flow rematching. A first computing system can update a graph neural network based at least on video data representing a plurality of first objects and a plurality of first labels corresponding to the plurality of first objects. The first computing system can cause the graph neural network to generate a plurality of second labels of a first example video and update the graph neural network based at least on the plurality of second labels and the first example video. The first computing system can cause the graph neural network to generate a plurality of third labels of a second example video. The first computing system can output a request for a modification to the at least one third label responsive to the uncertainty score satisfying an annotation criterion.
G06V 20/40 - ScènesÉléments spécifiques à la scène dans le contenu vidéo
G06V 10/776 - ValidationÉvaluation des performances
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
Hyperspectral and multispectral cameras are unique from conventional cameras in that they are configured to capture and separate the light from a scene into its individual wavelengths or spectral bands. Conventional cameras, on the other hand, capture three-channel color information, i.e., the intensity of red, green and blue colors. Currently, hyperspectral/multispectral cameras are expensive scientific devices (i.e. not built from off the shelf components), thus limiting the availability to the general population. Furthermore, the currently available designs of hyperspectral/multispectral cameras tend to make trade-offs between three quantities: spectral resolution, spatial resolution, and the time to acquire an image, such that improving one area negatively impacts the others. The present disclosure provides a hyperspectral or multispectral type camera, which can be built from off the shelf components, and that is configured with compressive sensing, which can alleviate at least part of the three-way design tradeoff present in current hyperspectral camera designs.
H04N 25/11 - Agencement de matrices de filtres colorés [CFA]Mosaïques de filtres
H04N 23/12 - Caméras ou modules de caméras comprenant des capteurs d'images électroniquesLeur commande pour générer des signaux d'image à partir de différentes longueurs d'onde avec un seul capteur
In various examples, sensor configuration for autonomous or semi-autonomous systems and applications is described. Systems and methods are disclosed that may use image feature correspondences between camera images along with an assumption that image features are locally planar to determine parameters for calibrating an image sensor with a LiDAR sensor and/or another image sensor. In some examples, an optimization problem is constructed that attempts to minimize a geometric loss function, where the geometric loss function encodes the notion that corresponding image features are views of a same point on a locally planar surface (e.g., a surfel or mesh) that is constructed from LiDAR data generated using a LiDAR sensor. In some examples, performing such processes to determine the calibration parameters may remove structure estimation from the optimization problem.
Simulation of complex agents, such as robots with many articulation links, can be performed utilizing a pre-computed a response matrix for each link. When an impulse is applied to a link for this agent, the response matrix for a root node can be used to determine an impact of that impulse on the root node, as well as changes in velocity for any direct child node. This process can be performed recursively for each link down to the leaf links of a hierarchical agent structure. These response matrices can be solved recursively from root to leaf while only visiting each hierarchical link once. Such an approach can be used to solve a full set of constraints acting on the agent in an amount of time per solver iteration that is on the order of the number of links, or O(N) time per solver iteration.
G06F 30/27 - Optimisation, vérification ou simulation de l’objet conçu utilisant l’apprentissage automatique, p. ex. l’intelligence artificielle, les réseaux neuronaux, les machines à support de vecteur [MSV] ou l’apprentissage d’un modèle
Approaches in accordance with various illustrative embodiments provide for the selection and reuse of lighting sample data to generate high quality initial candidates, suitable for input to resampling techniques, that are better representative of the actual lighting of a scene for which an image, video frame, or other such representation is to be rendered. Instead of discarding an important light samples where weight or sample count may no longer be reliable, at least some of these samples can be provided as additional, unweighted candidates for use in importance sampling, in addition to those selected using a random (or semi-random) sampling process. Such an approach can help to ensure that important lights are considered when shading pixels for a scene, at least where such reuse makes sense due to changes in scene or location. Samples reused between frames can relate to various prior samples, such as samples that were determined to correspond to important, close, or bright lights.
Various embodiments include a memory device that is capable of transferring both commands and data via a single clock signal input. In order to initialize the memory device to receive commands, a memory controller transmits a synchronization command to the memory device. The synchronization command establishes command start points that identify the beginning clock cycle of a command that is transferred to the memory device over multiple clock cycles. Thereafter, the memory controller transmits subsequent commands to the memory device according to a predetermined command length. The predetermined command length is based on the number of clock cycles needed to transfer each command to the memory device. Adjacent command start points are separated from one another by the predetermined command length. In this manner, the memory device avoids the need for a second lower speed clock signal for transferring commands to the memory device.
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p. ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
Apparatuses, systems, and techniques to identify an encryption algorithm. In at least one embodiment, an encryption algorithm is to be identified by one or more encryption algorithm indicators.
Systems including a first circuit and a second circuit, with a multi-data lane link between the first circuit and the second circuit. The first circuit and the second circuit are configured to determine a delay setting of a clock signal forwarded from the first circuit to the second circuit by utilizing a first distinct subset of the data lanes to communicate commands redundantly encoded in multiple unit intervals of the data lanes and by utilizing a second distinct subset of the data lanes to communicate results of the commands.
Apparatuses, systems, and techniques to perform an application programming interface (API) to indicate an amount of power to be consumed by one or more processors. As an example, one or more processors comprising one or more circuits to perform an API to indicate an amount of power to be consumed by one or more processors as a result of operating said one or more processors at a first clock frequency.
In various examples, evaluating labeled training data for machine learning systems and applications is described herein. Systems and methods described herein may determine whether labels for training data are accurate based at least on additional labels for the training data that represent a consensus of how the training data should be labeled. For instance, sensor representations (e.g., images, point clouds, etc.) may initially be labeled using one or more automatic techniques (e.g., one or more machine learning models, one or more neural networks, one or more algorithms, etc.) and then verified and/or updated by users to generate first labels for the sensor representations. Additionally, copies of the sensor representations may also be labeled using additional users to generate second labels, where these second labels are then used to generate the consensus labels for the sensor representations. The consensus labels may then be used to evaluate the first labels.
Apparatuses, systems, and techniques to generate one or more neural networks. In at least one embodiment, a processor comprises one or more circuits to use one or more first neural networks to generate one or more second versions of one or more second neural networks based, at least in part, on one or more first versions of the one or more second neural networks and one or more hardware resources to be used to perform the one or more second versions of the one or more second neural networks.
In various examples, systems and methods are disclosed relating to a system including one or more processors to generate hardware-level configurations for direct memory access (DMA) devices based on high-level descriptions of data movements. The high-level descriptions may include data flows for transferring data using the DMA device and the system may automatically generate the hardware-level configurations for the DMA device based on the data flows, simplifying the process of programming data movements and reducing the opportunity for human error.
G06F 13/28 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus d'entrée/sortie utilisant le transfert par rafale, p. ex. acces direct à la mémoire, vol de cycle
50.
LINE MARKING DETECTION FOR AUTONOMOUS AND SEMI-AUTONOMOUS SYSTEMS AND APPLICATIONS
In various embodiments, sensor data representing a 3D environment may be collected using one or more ego-machines while the ego-machines are navigating through the 3D environment. The sensor data may be projected into a 2D representation of the ground or other surface, and this 2D representation may form a map representing some geographic region. The map may be divided into tiles, within which detected features (e.g., road lines, road markings, surface features, etc.) may be detected and used to detect demarcated regions, such as intersections, based on the geometry and proximity of the detected features. As such, new tiles may be centered around the detected regions, and the features may be detected from each resulting centered tile. The detected features may be aggregated, de-duplicated, and/or merged, and used to label the map.
During the rendering of an image, specific pixels in the image are identified where antialiasing would be helpful. Antialiasing is then performed on these identified pixels, where anti-aliasing is a technique used to add greater realism to a digital image by smoothing jagged edges. This reduces a cost of performing antialiasing by reducing a number of pixels within an image on which antialiasing is performed.
Apparatuses, systems, and techniques to perform non-maximum suppression (NMS) in parallel to remove redundant bounding boxes. In at least one embodiment, two or more parallel circuits to perform two or more portions of a NMS algorithm in parallel to remove one or more redundant bounding boxes corresponding to one or more objects within one or more digital images.
G06V 10/94 - Architectures logicielles ou matérielles spécialement adaptées à la compréhension d’images ou de vidéos
G06F 18/21 - Conception ou mise en place de systèmes ou de techniquesExtraction de caractéristiques dans l'espace des caractéristiquesSéparation aveugle de sources
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une formeLocalisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
G06V 10/40 - Extraction de caractéristiques d’images ou de vidéos
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p. ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]Caractéristiques régionales saillantes
53.
VIDEO UPSAMPLING USING ONE OR MORE NEURAL NETWORKS
Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create a higher resolution video using upsampled frames from a lower resolution video.
G06T 3/4046 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement utilisant des réseaux neuronaux
A63F 13/50 - Commande des signaux de sortie en fonction de la progression du jeu
G06F 7/57 - Unités arithmétiques et logiques [UAL], c.-à-d. dispositions ou dispositifs pour accomplir plusieurs des opérations couvertes par les groupes ou pour accomplir des opérations logiques
Estimating motion of a human or other object in video is a common computer task with applications in robotics, sports, mixed reality, etc. However, motion estimation becomes difficult when the camera capturing the video is moving, because the observed object and camera motions are entangled. The present disclosure provides for joint estimation of the motion of a camera and the motion of articulated objects captured in video by the camera.
In various examples, image processing techniques are presented for reducing artifacts in images for autonomous or semi-autonomous systems and applications. Systems and methods are disclosed for deferring at least a portion of a strength associated with a color correction matrix (CCM) of an image processing pipeline to one or more other stages associated with the image processing pipeline. For instance, the CCM is used to determine a first CCM and a second, deferred CCM. The first CCM may be associated with an earlier stage of the image processing pipeline while the deferred CCM is associated with a later stage of the image processing pipeline. In some examples, the deferred CCM is combined with another matrix, such as a color space conversion matrix. By deferring part of the CCM, the image processing pipeline may reduce a number of artifacts and/or eliminate the artifacts that occur when processing image data.
Systems and methods for improved media stream processing. In at least one embodiment, a first media stream is assigned a hardware processing engine and a second media stream is assigned to a software processing engine based on a performance state of an application server, one or more parameters of the first media stream, and one or more parameters of the second media stream.
H04N 21/234 - Traitement de flux vidéo élémentaires, p. ex. raccordement de flux vidéo ou transformation de graphes de scènes du flux vidéo codé
H04N 21/258 - Gestion de données liées aux clients ou aux utilisateurs finaux, p. ex. gestion des capacités des clients, préférences ou données démographiques des utilisateurs, traitement des multiples préférences des utilisateurs finaux pour générer des données collaboratives
In various embodiments, a computer-implemented method for controlling cache memory accesses comprises transmitting a first clock signal to the cache memory, where a first rising edge of the first clock signal asserts a word line, and transmitting a second clock signal to the cache memory, where a first rising edge of the second clock signal precedes a second rising edge of the first clock signal, and the first rising edge of the second clock signal de-asserts the word line.
G11C 7/22 - Circuits de synchronisation ou d'horloge pour la lecture-écriture [R-W]Générateurs ou gestion de signaux de commande pour la lecture-écriture [R-W]
G11C 8/08 - Circuits de commande de lignes de mots, p. ex. circuits d'attaque, de puissance, de tirage vers le haut, d'abaissement, circuits de précharge, pour lignes de mots
Apparatuses, systems, and techniques to generate a memory allocation plan for a set of tensors. In at least one embodiment, tensor data corresponding to the set of tensors is stored into memory locations at run time based, at least in part, on the memory allocation plan generated at a compile time.
Apparatuses, systems, and techniques to perform a neural network. In at least one embodiment, a neural network is performed by at least combining nodes of a graph based, at least in part, on computing resources to perform operations corresponding to the nodes of the graph.
The technology disclosed herein involves using a machine learning model (e.g., CNN) to expand lower dynamic-range image content (e.g., SDR images) into higher dynamic-range image content (e.g., HDR images). The machine learning model can take as input the lower dynamic-range image and can output multiple expansion maps that are used to make the expanded image appear more natural. The expansion maps may be used by image operators to smooth color banding and to dim overexposed regions or user interface elements in the expanded image. The expanded content (e.g., HDR image content) may then be provided to one or more devices for display or storage.
Apparatuses, systems, and techniques to generate code to be performed by one or more first processors based, at least in part, on one or more indications of data to be used by one or more second processors. In at least one embodiment, a CUDA program includes host code and device code, and a linker uses references for code elements in host code to link or prune code elements from device code.
Computing system performance monitors provide on-chip control, selection, collection, coalescing and communication of behavior and other processing-indicating data of high performance single- and multi-die computing and processing systems, such as for use in multi-chip-module and/or multi-instanced graphics processing units (GPUs) and/or systems-on-chips (SOCs). Commands and data records can be forwarded between modules to abstract the processing system from profilers and other data report consumers. Quality of Service and security isolation for different command and data report streams is maintained.
Systems, computer program products, and methods are described for a hybrid quantum-classical system for enhanced combinatorial optimization. An example system segments a received task into multiple sub-tasks. For each sub-task, the system accesses a database of pre-computed solutions through the classical computing unit to identify a suitable pre-computed solution. In scenarios where a pre-computed solution is not available for a sub-task, the classical computing unit transmits this sub-task to a quantum computing unit. The computing unit, utilizing a quantum optimization algorithm, computes a solution for the sub-task. This solution is then relayed back to the classical computing unit. The classical computing unit then implements each identified pre-computed and newly computed solution on the combinatorial optimization task.
In various examples, systems and methods are disclosed relating to reconstruction and synthesis of dynamic scenes from video, such as to generate a four-dimensional (4D) representation of one or more scenes based on one or more videos (e.g., two-dimensional (2D) videos) of the one or more scenes. A system may determine, using a neural network and based on a three-dimensional (3D) representation of one or more scenes, a 4D representation of the one or more scenes, the 3D representation generated by a featurizer using a plurality of first image frames from video data of the one or more scenes. The system may determine, from the 4D representation, a target image having a target pose and a target time.
G06T 17/00 - Modélisation tridimensionnelle [3D] pour infographie
G06T 7/70 - Détermination de la position ou de l'orientation des objets ou des caméras
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
65.
LANE INFERENCE AND LANE GRAPH GENERATION FOR AUTONOMOUS SYSTEMS AND APPLICATIONS
In various examples, various types of sensor data from multiple ego-machines are used to infer lanes and/or generate lane graphs for use in autonomous systems and applications. In some embodiments, one or more DNNs may be used to infer lane data indicating a representation of a lane shape using sensor data from various vehicles to represent a 3D environment. The inferred lane data may include cross-section indicators that indicate cross-sections of a lane and/or connection indicators that indicate a lane channel connecting two locations (e.g., two lane portions). The inferred lane data may be used to generate a lane graph that represents lanes on a road and, in some cases, lane dividers (e.g., polyline represented as a solid line, a dashed line, a double line, etc.). A lane graph may be used, for example, to model the environment around a vehicle, facilitate localization, provide guidance for autonomous driving, etc.
In various examples, various types of sensor data from multiple ego-machines are used to infer lanes and/or generate lane graphs for use in autonomous systems and applications. In some embodiments, one or more DNNs may be used to infer lane data indicating a representation of a lane shape using sensor data from various vehicles to represent a 3D environment. The inferred lane data may include cross-section indicators that indicate cross-sections of a lane and/or connection indicators that indicate a lane channel connecting two locations (e.g., two lane portions). The inferred lane data may be used to generate a lane graph that represents lanes on a road and, in some cases, lane dividers (e.g., polyline represented as a solid line, a dashed line, a double line, etc.). A lane graph may be used, for example, to model the environment around a vehicle, facilitate localization, provide guidance for autonomous driving, etc.
In various examples, head pose prediction for automotive occupant sensing systems and applications is presented. The systems and methods described herein provide for a machine learning model trained using a dataset that comprises ground truth head pose data computed using a registered head model of a training subject. While operating a vehicle, one or more cameras and a depth sensor capture synchronized images of the training subject. To compute a ground truth 3D head pose, angular deviations between a 3D point cloud and the registered head model may be computed to obtain a 3D ground truth head pose measurement. Using an extrinsic calibration transform, the head pose measurement may be mapped into the sensor coordinate frame. Training samples may be produced for training the machine learning model that comprise an optical image frame and the head pose measurement transposed into the frame of reference for that optical image frame.
In various examples, policy prediction-based motion planner systems and methods for autonomous and semi-autonomous systems and applications are provided. A scenario tree structure may be generated that represents potential behaviors of one or more peripheral agents based on perception data of a scene within which an ego vehicle operates. A joint MPC algorithm may optimize the motion of an ego vehicle within the context of the scenario tree structure to produce a policy tree structure. An MPC policy prediction model may be trained to predict the policy tree structures that a joint MPC algorithm would produce, given a set of environmental perception data. An ego vehicle may comprise a trained MPC policy prediction model that receives perception data, and based on that input predicts a policy tree structure that may be used to define a motion policy for navigating the ego vehicle through the scene.
A first request to allocate one or more memory blocks of a first plurality of memory blocks associated with a first memory is received by a processing device. A consecutive set of a first portion of bits of a first register with a first logical state is identified. The first logical state indicates that corresponding memory blocks of the one or more memory blocks are free. A first operation to adjust the consecutive set of the first portion of bits of the first register to a second logical state is performed. An allocation address comprising an index of the consecutive set of the first portion of bits of the first register is sent to the first request. The allocation address is useable to access the corresponding memory blocks.
a first set of threads having a same address corresponding to the shared memory is identified from a group of active threads associated with an instruction to update a shared memory. A first thread of the first set of threads is selected. The instruction is executed for the first thread using the same address to access the shared memory. Attempts to execute the instruction for remaining threads of the first set of threads are delayed until after the first thread is executed and until at least one of the remaining threads of the first set of threads is not guaranteed to fail execution of the instruction.
In various examples, providing virtual assistants for content streaming systems and applications is described herein. For instance, systems and methods are disclosed that use a virtual assistant associated with an application, such as a gaming application, to at least process queries received from a user in order to provide the user with information on how to perform various tasks associated with the application. In some examples, to determine the output information, data associated with the application is processed in order to determine state information describing a current state of the application. Additionally, the query, the state information, and/or additional information may be used to determine contextual information related to the query. One or more language models may then process the query and/or the information to determine the output information associated with the query. The output information may then be provided using various techniques, such as text, graphics, and/or audio.
This disclosure describes efficiently performing matrix multiply and add (MMA) operations using narrow operands. Narrow operand size (e.g., 8 bit/6 bit/4 bit operand) MMA operations utilize scale metadata in order to improve accuracy of the MMA operation. An efficient layout for scale metadata in narrow operand size MMA operations and its use are described. The proposed layout provides for efficient storing and efficient use of scale metadata.
Disclosed are apparatuses, systems, and techniques that implement software-agnostic transport of messages to, from, and within managed devices. In one embodiment, a managed device has an intra-device network including a plurality of units, each unit associated with a unit controller. The managed device further includes a hub controller that receives data packet(s) jointly carrying a message from an external host. The controller identifies that the one or more first data packets are associated with a given unit and forwards the data packet(s) to the corresponding unit controller. The unit controller extracts the message from the data packet(s) and stores the message in a memory associated with the unit controller.
In various examples, systems and methods described herein may determine individual components of a dashed line based at least on identifying relationships across different portions of the dashed line. For instance, input data representing a road surface may be analyzed and a representation associated with a dashed line may be determined. In some instances, the representation may be generated based at least on intensity values associated with points corresponding to the input data. Then, based at least on the representation, information associated with one or more components of the dashed line may be determined. For instance, the representation may be indicative of the relationships across the different portions of the dashed line, and these relationships may be used to determine the information associated with the one or more components of the dashed line.
G06V 20/56 - Contexte ou environnement de l’image à l’extérieur d’un véhicule à partir de capteurs embarqués
G01S 17/89 - Systèmes lidar, spécialement adaptés pour des applications spécifiques pour la cartographie ou l'imagerie
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/77 - Traitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source
G06V 10/88 - Reconnaissance d’images ou de vidéos utilisant des moyens optiques, p. ex. filtres de référence, masques holographiques, filtres de domaine de fréquence ou filtres de domaine spatial
75.
METHOD AND APPARATUS FOR SUPPORTING DISTRIBUTED GRAPHICS AND COMPUTE ENGINES AND SYNCHRONIZATION IN MULTI-DIELET PARALLEL PROCESSOR ARCHITECTURES -- MEMORY BARRIERS
This disclosure describes supporting distributed graphics and compute engines in a multi-dielet processor, such as, for example, a multi-dielet graphics processing unit (GPU), architectures and synchronization in such architectures. Each multi-dielet processor includes a hardware-implemented remapping capability and/or a hardware-implemented memory barrier capability.
This disclosure describes supporting distributed graphics and compute engines in a multi-dielet parallel processing system, such as, for example, a multi-dielet graphics processing unit (GPU), architectures and synchronizing memory management in such architectures. Respective dielets each has a memory management unit (MMU). The processing of at least one memory-related message type is serialized by a designated MMU for messages originated at any dielet, and the processing of at least some memory-related message types is performed locally on the originating dielets.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06F 12/1009 - Traduction d'adresses avec tables de pages, p. ex. structures de table de page
77.
DEPTH-BASED VEHICLE ENVIRONMENT VISUALIZATION USING GENERATIVE AI
In various examples, systems and methods are disclosed relating to geometry estimation and dynamic object rendering for vehicle environment visualization. In embodiments, the environment surrounding an ego-machine may be visualized by extracting one or more depth maps from image data, converting the depth map(s) into a 3D surface topology of the surrounding environment, and/or texturizing the detected 3D surface topology with image data. Dynamic objects such as moving vehicles or pedestrians may be detected and masked from a first pass of texturization. Rigid dynamic objects may be visualized by warping corresponding depth values using corresponding trajectories, inserting or fusing the resulting warped 3D representation of each such object into the (e.g., texturized) 3D surface topology, and texturizing the warped 3D representation of each object using corresponding image data. Non-rigid dynamic objects may be represented as flat 2D surfaces and texturized with corresponding image data.
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
78.
FUSED VECTOR STORE FOR EFFICIENT RETRIEVAL-AUGMENTED AI PROCESSING
In various examples, systems and techniques are provided that encapsulate indexing and query operations into an application programming interface (API) that automates and coordinates calls to various local and cloud-based services. When a user has a document(s) to add to a retrieval augmented generation (RAG) database, the API may offer to the user multiple document processing pipelines (DPPs) having pre-set indexing configurations. Similarly, when a user query is received, the API may generate calls to implement query processing that does not require the user to manually configure retrieval and processing of the embeddings. The API may further implement calls that locate a relevant embedding store and provide the stored embeddings, together with the query embeddings, to a search engine that identifies the most relevant matches. The API may then access the embedding-to-text indexing and identify relevant text segments and documents to a prompt generator.
In various examples, an environment visualization pipeline may determine whether to generate or otherwise enable a visualization using an environmental modeling pipeline that models an environment as a 3D bowl or using an environmental modeling pipeline that models the environment using some other 3D representation, such as a detected 3D surface topology. The determination may made based on various factors, such as ego-machine state, (e.g., one or more detected features indicative of a designated operational scenario, proximity to a detected object, speed of ego-machine, etc.), estimated image quality of a corresponding environment visualization, and/or other factors. Accordingly, an environment around an ego-machine, such as a vehicle, robot, and/or other type of object, may be visualized in systems such as parking visualization systems, Surround View Systems, and/or others.
B60W 50/14 - Moyens d'information du conducteur, pour l'avertir ou provoquer son intervention
G06T 7/50 - Récupération de la profondeur ou de la forme
G06T 17/20 - Description filaire, p. ex. polygonalisation ou tessellation
G06V 10/26 - Segmentation de formes dans le champ d’imageDécoupage ou fusion d’éléments d’image visant à établir la région de motif, p. ex. techniques de regroupementDétection d’occlusion
G06V 10/98 - Détection ou correction d’erreurs, p. ex. en effectuant une deuxième exploration du motif ou par intervention humaineÉvaluation de la qualité des motifs acquis
G06V 20/56 - Contexte ou environnement de l’image à l’extérieur d’un véhicule à partir de capteurs embarqués
80.
DRIVER AND OCCUPANT MONITORING USING VISION LANGUAGE MODELS
Some embodiments relate to driver or occupant monitoring using vision language models (VLMs). Any number of DNNs in a detection pipeline may be replaced with a VLM, and the VLM may be prompted to determine whether a corresponding feature is present in an image or sampled frames from a video. To facilitate using the VLM(s) to control one or more downstream actions, the VLM(s) may be prompted using structured inputs, and a designated output format for a corresponding structured output may be enforced in any suitable manner. As such, any number of VLMs may be used to perform any number of driver and/or occupant monitoring tasks (e.g., driver drowsiness detection, driver distraction detection, driver or occupant out-of-position detection, driver or occupant identification, seatbelt usage detection, occupant presence detection, occupant classification, child presence detection, gesture recognition, occlusion detection, and/or others).
G06V 20/59 - Contexte ou environnement de l’image à l’intérieur d’un véhicule, p. ex. concernant l’occupation des sièges, l’état du conducteur ou les conditions de l’éclairage intérieur
B60W 50/14 - Moyens d'information du conducteur, pour l'avertir ou provoquer son intervention
G06V 40/20 - Mouvements ou comportement, p. ex. reconnaissance des gestes
81.
ENVIRONMENTAL TEXT PERCEPTION AND PARKING EVALUATION USING VISION LANGUAGE MODELS
Some embodiments relate to environmental text perception using vision language models (VLMs). For example, an Advanced Driver Assistance System (ADAS) may identify candidate parking spaces, and a VLM may be used to evaluate parking signs and determine whether it is permissible and/or the cost to park in a candidate parking space. For example, frames from corresponding (e.g., front-facing, repeater, side pillar) camera(s) may be evaluated for corresponding parking signs (e.g., using a sign recognition DNN or a VLM). If a parking sign is detected, the image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether it is permissible to park at a corresponding location (and if so, the cost). The generated response may be provided to the ADAS to confirm or invalidate the candidate parking space, and a representation of the results may be provided to the driver.
B60W 50/14 - Moyens d'information du conducteur, pour l'avertir ou provoquer son intervention
G06Q 30/0283 - Estimation ou détermination de prix
G06V 20/58 - Reconnaissance d’objets en mouvement ou d’obstacles, p. ex. véhicules ou piétonsReconnaissance des objets de la circulation, p. ex. signalisation routière, feux de signalisation ou routes
82.
SCHEDULING AND PRIORITIZATION OF VISION LANGUAGE MODEL INFERENCE REQUESTS
In some embodiments, the same vision language model (VLM) may be used to support different types of detection tasks (e.g., one foundational VLM supporting some or all detection tasks performed by an ego-machine, one VLM for interior sensing tasks and one for exterior sensing tasks, etc.), and an inference scheduler may be used to serve or handle inference requests for the VLM(s) to perform the different tasks. In some embodiments, the scheduler prioritizes inference requests based on safety (e.g., prioritizing inference requests to perform ADAS tasks such as pedestrian detection, bicycle detection, or trajectory planning over requests to perform driver or occupant monitoring tasks, prioritizing exterior sensing tasks over interior sensing tasks, etc.). As such, the scheduler may queue, manage, distribute inference requests from different detection applications to the VLM(s), and receive and return responses to corresponding detection task managers.
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 10/776 - ValidationÉvaluation des performances
83.
THREE-DIMENSIONAL MULTI-CAMERA PERCEPTION SYSTEMS AND APPLICATIONS
In various examples, three-dimensional multi-camera perception systems and applications is described herein. Systems and methods are disclosed herein that process image data generated using multiple cameras located throughout an environment in order to directly determine three-dimensional (3D) information associated with objects located within the environment. For instance, the image data may be processed using one or more feature extractors (e.g., one or more backbones) to determine multi-view image features associated with images represented by the image data. These multi-view image features, along with calibration data associated with the cameras, may then be processed using one or more spatio-temporal transformers (e.g., one or more spatial encoders, one or more temporal encoders, etc.) in order to determine 3D locations of objects within the environment.
In photorealistic image synthesis by light transport simulation, the colors of each pixel are computed by evaluating an integral of a high-dimensional function. In practice, the pixel colors are estimated by using Monte Carlo and quasi-Monte Carlo methods to sample light transport paths that connect light sources and cameras and summing up the contributions to evaluate the integral. Because of the sampling, images appear noisy when the number of samples is insufficient. Due to the lack of information, denoising the shaded images introduces artifacts, for example, blurred the images. Denoising before material shading enables real-time light transport simulation, producing high visual quality even for low sampling rates (avoiding the blurred shading). The light transport integral operator is evaluated by a neural network, requiring data from only a single frame.
Techniques for training a machine learning model to control a robot include performing, based on a first set of data, one or more training operations to generate a first trained machine learning model to control a robot and a trained evaluation model, and performing, based on a second set of data and first feedback generated by the trained evaluation model, one or more training operations to generate a second trained machine learning model to control the robot, where the second set of data is associated with a different set of sensor modalities than the first set of data.
Disclosed are devices, systems, and techniques for evaluation of machine learning models, pipelines of machine learning models, retrieval-augmented generation (RAG) systems, and/or other artificial intelligence systems. Example techniques include receiving, from a client device, an evaluation task to evaluate a language model (LM) using a plurality of evaluation benchmarks (EBs) associated with respective EB dataset and configuring, using an evaluation API, respective sets of evaluation jobs to implement the evaluation task. An individual set of evaluation jobs is configured to evaluate, using the corresponding EB dataset, performance of the LM to obtain a set of evaluation metrics. The techniques further include executing the sets of evaluation jobs to obtain respective sets of evaluation metrics and causing, using the evaluation API, a representation of the sets of evaluation metrics to be provided to the client device.
H04L 41/22 - Dispositions pour la maintenance, l’administration ou la gestion des réseaux de commutation de données, p. ex. des réseaux de commutation de paquets comprenant des interfaces utilisateur graphiques spécialement adaptées [GUI]
87.
MULTI-BENCHMARK PLATFORMS FOR EVALUATION OF MACHINE LEARNING MODELS
Disclosed are devices, systems, and techniques for evaluation of machine learning models, pipelines of machine learning models, retrieval-augmented generation (RAG) systems, and/or other artificial intelligence systems. Example techniques include receiving, from a client device, an evaluation task to evaluate a language model (LM) using a plurality of evaluation benchmarks (EBs) associated with respective EB dataset and configuring, using an evaluation API, respective sets of evaluation jobs to implement the evaluation task. An individual set of evaluation jobs is configured to evaluate, using the corresponding EB dataset, performance of the LM to obtain a set of evaluation metrics. The techniques further include executing the sets of evaluation jobs to obtain respective sets of evaluation metrics and causing, using the evaluation API, a representation of the sets of evaluation metrics to be provided to the client device.
One embodiment of a method for controlling a robot includes receiving sensor data associated with an environment that includes an object; applying a machine learning model to a portion of the sensor data associated with the object and one or more trajectories of motion of the robot to determine one or more path lengths of the one or more trajectories; generating a new trajectory of motion of the robot based on the one or more trajectories and the one or more path lengths; and causing the robot to perform one or more movements based on the new trajectory.
In a ray tracer, to prevent any long-running query from hanging the graphics processing unit, a traversal coprocessor provides a preemption mechanism that will allow rays to stop processing or time out early. The example non-limiting implementations described herein provide such a preemption mechanism, including a forward progress guarantee, and additional programmable timeout options that can be time or cycle based. Those programmable options provide a means for quality of service timing guarantees for applications such as virtual reality (VR) that have strict timing requirements.
In various examples, image data may be received that represents an image. Corner detection may be used to identify pixels that may be candidate corner points. The image data may be converted from a higher dimensional color space to a converted image in a lower dimensional color space, and boundaries may be identified within the converted image. A set of the candidate corner points may be determined that are within a threshold distance to one of the boundaries, and the set of the candidate corner points may be analyzed to determine a subset of the candidate corner points representative of corners of polygons. Using the subset of the candidate corner points, one or more polygons may be identified, and a filter may be applied to the polygons to identify a polygon as corresponding to a fiducial marker boundary of a fiducial marker.
In various examples, systems and methods are disclosed relating to geometry estimation and dynamic object rendering for vehicle environment visualization. In embodiments, the environment surrounding an ego-machine may be visualized by extracting one or more depth maps from image data, converting the depth map(s) into a 3D surface topology of the surrounding environment, and/or texturizing the detected 3D surface topology with image data. Dynamic objects such as moving vehicles or pedestrians may be detected and masked from a first pass of texturization. Rigid dynamic objects may be visualized by warping corresponding depth values using corresponding trajectories, inserting or fusing the resulting warped 3D representation of each such object into the (e.g., texturized) 3D surface topology, and texturizing the warped 3D representation of each object using corresponding image data. Non-rigid dynamic objects may be represented as flat 2D surfaces and texturized with corresponding image data.
B60R 1/27 - Dispositions de visualisation en temps réel pour les conducteurs ou les passagers utilisant des systèmes de capture d'images optiques, p. ex. des caméras ou des systèmes vidéo spécialement adaptés pour être utilisés dans ou sur des véhicules pour visualiser une zone extérieure au véhicule, p. ex. l’extérieur du véhicule avec un champ de vision prédéterminé fournissant une vision panoramique, p. ex. en utilisant des caméras omnidirectionnelles
Techniques for improving the accuracy of delay line calibration schemes. For example, an amount of offset may be determined between one or more first portions of a first clock signal and one or more second portions of a second clock signal that is delayed relative to the first clock signal. The first portion(s) may correspond to the second portion(s) based at least on the second clock signal being delayed relative to the first clock signal. In some examples, a value may be determined based at least on the amount of offset. The value may correspond to an amount to adjust the first clock signal to reduce the amount of offset. In some examples, a delay line may then be calibrated, based at least on the second value, to adjust the first clock signal.
H03L 7/081 - Détails de la boucle verrouillée en phase avec un déphaseur commandé additionnel
G06F 1/08 - Générateurs d'horloge ayant une fréquence de base modifiable ou programmable
G06F 1/12 - Synchronisation des différents signaux d'horloge
H03K 5/14 - Dispositions ayant une sortie unique et transformant les signaux d'entrée en impulsions délivrées à des intervalles de temps désirés par l'utilisation de lignes à retard
93.
PERFORMANCE TESTING FOR STEREOSCOPIC IMAGING SYSTEMS AND ALGORITHMS
Approaches presented herein provide for the testing of imaging algorithms and systems. In at least one embodiment, a stereoscopic test pattern can be obtained that includes a number of features that vary in width and separation, such as may comprise a set of radial elements that converge toward a center point. A stereoscopic image of an instance of the pattern can be analyzed, such as at a set of radial positions, to make various measurements, including a limit on the ability to distinguish between different features. A pair of synthetic images of the pattern can be generated in order to test aspects of a stereoscopic algorithm used to generate stereoscopic images, with such testing being separate from the physical system, and a physical object can be generated that includes a representation of the pattern in order to be able to test the physical stereoscopic imaging system.
H04N 13/239 - Générateurs de signaux d’images utilisant des caméras à images stéréoscopiques utilisant deux capteurs d’images 2D dont la position relative est égale ou en correspondance à l’intervalle oculaire
In various examples, multi-sensor subject tracking for monitored environments for real-time and near-real-time systems and applications are provided. A location system performs multi-subject tracking using streaming data from multiple sensors. Subject tracking may be based on individual anchors and behavior states that are initialized for individual subjects using representations (e.g., behavior embeddings) derived from the streaming data. Clustering may be used to generate behavior clusters that individually represents a trackable subject. Behavior states for live anchors may identified based on continuity of trajectory and tracked by iteratively propagating their behavior states forward over time. Clusters lacking continuity of trajectory may be used to initialize new anchors, or matched to dormant anchors that may be reclassified as live anchors and propagated. Propagated behavior states may be updated using behavior data represented by the behavior embeddings.
G06V 10/762 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant le regroupement, p. ex. de visages similaires sur les réseaux sociaux
G06V 40/20 - Mouvements ou comportement, p. ex. reconnaissance des gestes
95.
DIRECT CONNECT BETWEEN NETWORK INTERFACE AND GRAPHICS PROCESSING UNIT IN SELF-HOSTED MODE IN A MULTIPROCESSOR SYSTEM
Various embodiments include techniques for performing data transfer operations via a direct interconnect between a network interface and a graphics processor in a multiprocessor system that also includes a central processing unit (CPU). The CPU communicates with the graphics processor via a dedicated high-bandwidth interconnect to the memory in the graphics processor and a second interconnect to the graphics processor for various utility functions. The network interface communicates with the graphics processor via an interconnect to the memory in the graphics processor. The interconnect between the network interface and the graphics processor does not impact the throughput of the high-bandwidth interconnect from the CPU to the graphics processor, thereby improving CPU to graphics processor performance. Further, the interconnect between the CPU to the graphics processor does not impact the throughput of the interconnect from the network interface to the graphics processor, thereby improving network interface to graphics processor performance.
This disclosure describes supporting distributed graphics and compute engines in a multi-dielet processor, such as, for example, a multi-dielet graphics processing unit (GPU), architectures and synchronization in such architectures. Each multi-dielet processor includes a hardware-implemented remapping capability and/or a hardware-implemented memory barrier capability.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p. ex. plusieurs processeurs de données à instruction unique
97.
Hardware assisted Page Migration in a Multi-Dielet Processing System
A hardware mechanism at each dielet of a multi-dielet processing system is aware of engine page-table binds at all the dielets, thereby providing accurate traffic notifications to software (e.g., a unified virtual memory driver) for on-demand page-migration between system memory and GPU memory. The mechanism broadcasts binding information to access counters on each dielet so the access counters are able to correlate engines requesting memory access with bound virtual memory pages and generate corresponding informative notifications. A flexible multi-dielet counter clear capability enables software to clear access counters.
Apparatuses, systems, and techniques are to transfer data based, at least in part, on a computational graph. In at least one embodiment, a processor causes a compiler to generate instructions to prefetch one or more data values from dynamic random access memory (DRAM) into an in-processor cache based, at least in part, on a computational graph.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec pré-lecture
G06F 12/1072 - Traduction d’adresse décentralisée, p. ex. dans des systèmes de mémoire partagée distribuée
99.
ADDRESS TRANSLATION SERVICES TO ENABLE MEMORY COHERENCE
A first virtual address is translated into a first physical address using a first translation agent associated with a first I/O device of a system. The first physical address is associated with an address space of the first I/O device. A first address translation request is sent to a second translation agent associated with a CPU of the system. The first address translation request includes the first physical address. A first address translation response is received from the second translation agent. The second address translation response includes a second physical address. the second physical address is associated with an address space of the system.
G06F 12/0815 - Protocoles de cohérence de mémoire cache
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06T 1/20 - Architectures de processeursConfiguration de processeurs p. ex. configuration en pipeline
Apparatuses, systems, and techniques to perform an application programming interface (API) to indicate whether one or more processors are able to be controlled by two or more drivers concurrently. An API is performed that will indicate whether a compute driver and a graphics driver can concurrently control a processor.