Methods, systems, and non-transitory computer readable storage media are disclosed that utilizes machine learning models for patch retrieval and deformation in completing three-dimensional digital shapes. In particular, in one or more implementations the disclosed systems utilize a machine learning model to predict a coarse completion shape from an incomplete 3D digital shape. The disclosed systems sample coarse 3D patches from the coarse 3D digital shape and learn a shape distance function to retrieve detailed 3D shape patches in the input shape. Moreover, the disclosed systems learn a deformation for each retrieved patch and blending weights to integrate the retrieved patches into a continuous surface.
G06T 17/20 - Description filaire, p. ex. polygonalisation ou tessellation
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une formeLocalisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
G06V 10/75 - Organisation de procédés de l’appariement, p. ex. comparaisons simultanées ou séquentielles des caractéristiques d’images ou de vidéosApproches-approximative-fine, p. ex. approches multi-échellesAppariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexteSélection des dictionnaires
2.
IDENTIFYING AND ALIGNING VIDEO CLIPS FROM LARGE-SCALE VIDEO DATASETS
Embodiments are disclosed for retrieving videos for a semantic and temporal alignment between a pair of video clips. The method may include receiving a query video clip. The method may further include determining alignment ratios between the query video clip and one or more candidate video clips. The method may further include identifying an alignable video clip from the one or more candidate video clips based on the alignment ratios. The method may further include aligning the alignable video clip with the query video clip.
Methods and systems are provided for using Shapley values to evaluate prompt generation parameters. In embodiments described herein, a selection of prompt parameters are accessed. A plurality of prompts are generated as a function of a combination of the prompt parameters. A corresponding quality metric is determined for each of the prompts. Prompt parameter contribution metrics are determined using a Shapley-value-based determination corresponding to a contribution of each of the prompt parameters to the corresponding content quality metric for each of the prompts. The prompt parameter contribution metrics are then displayed.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that perform text-to-image editing using executable code generated from natural language text input. For instance, in one or more embodiments, the disclosed systems receive, from a client device, a digital image and natural language text input providing instructions for modifying the digital image. The disclosed systems also generate, using a large language model, executable action code for modifying the digital image in accordance with the instructions of the natural language text input, the executable action code being compatible with an editing application. The disclosed systems further modify the digital image by executing the executable action code via the editing application and provide the modified digital image for display via a graphical user interface of the client device.
In implementation of techniques for sampling light directions on neural materials, a computing device implements a light direction system to receive neural features of a material and an indication of a view direction toward the material. Using a mixture of analytical lobes, a normalizing flow, or a histogram prediction, the light direction system predicts a probability density function (PDF). The light direction system then samples the PDF, calculates prominence values for each of a plurality of candidate light directions based on the PDF, and determines a light direction based on the prominence values.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for editing shadows in digital images. In particular, in some embodiments, the disclosed systems determine, utilizing a lighting estimation network, an environment map for a digital image, the environment map comprising a dominant light. In addition, in some embodiments, the disclosed systems generate, utilizing a lighting diffusion network, a diffused image from the digital image, the diffused image comprising smoothed shading. Moreover, in some embodiments, the disclosed systems generate, utilizing a shadow synthesis network, a shadowed image from the diffused image and a modified environment map comprising a modified dominant light. Furthermore, in some embodiments, the disclosed systems generate, from the diffused image and the shadowed image, a modified digital image comprising an edited shadow.
In implementation of techniques for three-dimensional reconstructions based on Gaussian primitives, a computing device implements a reconstruction system to receive a first digital image depicting an object from a first angle and a second digital image depicting the object from a second angle. The reconstruction system segments the first digital image and the second digital image into patches. The reconstruction system then generates, using a machine learning model, three-dimensional Gaussian primitives that predict parameters of points of the object in a three-dimensional space that correspond on a per-pixel basis to pixels of the patches. The reconstruction system then forms a three-dimensional reconstruction of the object for display in a user interface by merging the three-dimensional Gaussian primitives.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that upscale AI-generated digital content via tile-based super resolution. For instance, in one or more embodiments, the disclosed systems determine a first set of tiles from a digital image having a set of pixels to be replaced with a generated content portion. The disclosed systems further determine a second set of tiles from a first modified digital image that corresponds to the digital image and includes the generated content portion at a first resolution. Based on the first set of tiles and the second set of tiles, the disclosed systems use a super resolution neural network to generate a second modified digital image that corresponds to the digital image and includes the generated content portion at a second resolution that is higher than the first resolution.
G06T 3/4053 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur la super-résolution, c.-à-d. où la résolution de l’image obtenue est plus élevée que la résolution du capteur
G06T 3/4046 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement utilisant des réseaux neuronaux
10.
Interactive Network for Selecting, Ranking, Summarizing, and Exploring Data Insights
Insight summary and prompt generation techniques are described. In one or more examples, a plurality of insights is generated from data extracted from digital content. A network representation is produced having a plurality of nodes based on the plurality of insights and a plurality of connections between corresponding insights. A selection is received of a subset of nodes from the plurality of nodes. A prompt is formed by grouping respective insights from the subset of nodes. An insight summary of the digital content is generated based on the prompt using generative artificial intelligence as implemented using one or more machine-learning models. The insight summary is then presented for output in a user interface.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that detects shadows, removes shadows, and synthesizes shadows in a joint-framework. In particular, the disclosed systems access an object mask of an object and a digital image depicting the object and a shadow of the object. Furthermore, the disclosed systems perform object-centered shadow detection and removal to generate a modified digital image without the shadow by utilizing a shadow analyzer model. Moreover, the disclosed systems receive a user interaction to manipulate an object and generate a modified shadow utilizing a shadow synthesis model where the shadow synthesis model is conditioned on a shadow mask generated by the shadow analyzer model.
Systems and methods for natural language processing are described. Embodiments of the present disclosure identify a task set including a plurality of pseudo tasks, wherein each of the plurality of pseudo tasks includes a support set corresponding to a first natural language processing (NLP) task and a query set corresponding to a second NLP task; update a machine learning model in an inner loop based on the support set; update the machine learning model in an outer loop based on the query set; and perform the second NLP task using the machine learning model.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image and a text prompt including an image modification request, generating a text response based on the input image and the text prompt, where the text response describes a modification to the input image corresponding to the image modification request, and generating a synthetic image based on the input image and an output embedding of a language generation model, where the synthetic image depicts the modification to the input image.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image generation include obtaining a text prompt and a noise input, and then generating a synthetic image based on the text prompt and the noise input by performing a single pass with an image generation model. The image generation model is trained based on a multi-term loss comprising a positive term based on an output of a pre-trained model, and a negative term based on an output of a jointly-trained model.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating digital images with a diffusion-based generative neural network conditioned on background-extracted lighting features. The disclosed system determines, in response to a request to generate a digital image, a target background image for inserting a foreground object into the target background image. The disclosed system generates, from the target background image and utilizing a lighting conditioning neural network, a lighting feature representation indicating one or more lighting parameters of the target background image. Additionally, the disclosed system generates, utilizing a diffusion-based generative neural network conditioned on the lighting feature representation, the digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object.
G06T 5/50 - Amélioration ou restauration d'image utilisant plusieurs images, p. ex. moyenne ou soustraction
G06T 5/77 - RetoucheRestaurationSuppression des rayures
G06V 10/56 - Extraction de caractéristiques d’images ou de vidéos relative à la couleur
G06V 10/60 - Extraction de caractéristiques d’images ou de vidéos relative aux propriétés luminescentes, p. ex. utilisant un modèle de réflectance ou d’éclairage
G06V 10/771 - Sélection de caractéristiques, p. ex. sélection des caractéristiques représentatives à partir d’un espace multidimensionnel de caractéristiques
H04N 5/272 - Moyens pour insérer une image de premier plan dans une image d'arrière plan, c.-à-d. incrustation, effet inverse
17.
Techniques for Triangle-level Rejection Sampling in Three-dimensional Object Meshes
A graphics generation computing device applies triangle-level rejection sampling to generate a set of surface mesh point samples. A highly parallelized processor included in the graphics generation computing device generates a triangle-level sampling array that includes triangle-level sampling data for each triangle included in a 3D object mesh. Based on the data in the triangle-level sampling array, the highly parallelized processor determines a quantity of point samples in each triangle. The highly parallelized processor calculates, for each point sample, point sample location data that indicates a location of the point sample on a triangle. The highly parallelized processor modifies a set of point samples to include the location data. In some cases, the set of point samples is used to generate digital fibers or other structure data objects at the point sample locations indicated by the set of point samples.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image and an input mask, where the input image depicts a scene and the input mask indicates an inpainting region of the input image. A latent code is generated, using a generator network of an image generation model, based on the input image and the input mask. The latent code includes synthesized content in the inpainting region. A synthetic image is generated, using a decoder network of the image generation model, based on the latent code and the input image. The synthetic image depicts the scene from the input image outside the inpainting region and includes the synthesized content within the inpainting region, and the synthetic image comprises a seamless transition across a boundary of the inpainting region.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for removing objects from an image stream at capture time of a digital image. For example, the disclosed system contemporaneously detects and segments objects from a digital image stream being previewed in a camera viewfinder graphical user interface of a client device. The disclosed system removes selected objects from the image stream and fills a hole left by the removed object with a content aware fill. Moreover, the disclosed system displays the image stream with the removed object and content fill as the image stream is previewed by a user prior to capturing a digital image from the image stream.
A method, non-transitory computer readable medium, apparatus, and system for data processing include obtaining, by a multi-touch attribution model, individual-level user interaction data from a digital content channel, and computing, using the multi-touch attribution model, channel contribution data based on the individual-level user interaction data. Some embodiments include training, using a training component, an aggregate attribution model based on the channel contribution data. Some embodiments include generating, using a calibration component, an individual channel contribution value for the digital content channel based on the channel contribution data and the aggregate attribution model.
In some embodiments, a computing system receives a representation of an object from a client device. The computing system generates a contact representation for hand-object interaction based on the representation of the object. The object-centric contact representation includes a contact map indicating contact points on the representation of the object, a hand part map indicating hand parts contacting the object, and a direction map comprising contact directions of the hand parts contacting the object. The computing system generates a hand grasp representation with respect to the object based on the contact representation using a model-based optimization algorithm. The computing system provides the hand grasp representation to the client device.
G06F 30/23 - Optimisation, vérification ou simulation de l’objet conçu utilisant les méthodes des éléments finis [MEF] ou les méthodes à différences finies [MDF]
G06T 17/10 - Description de volumes, p. ex. de cylindres, de cubes ou utilisant la GSC [géométrie solide constructive]
22.
Relightable Scene Reconstructions Using Radiance Guided Material Extraction
Techniques for relightable scene reconstructions using radiance guided material extraction are described to accurately render 3D scenes under different lighting conditions and perspectives than original source images from which the scenes are constructed. In an example, a processing device is operable to receive a plurality of digital images that depict a scene from multiple perspectives, determine a view-independent radiance of the scene based on the plurality of digital images, and determine a view-dependent radiance of the scene based on the plurality of digital images. The processing device is further operable to determine a set of lighting conditions associated with an input perspective, generate a synthesized image having a reconstruction of the scene based on the set of lighting conditions using the view-independent radiance and the view-dependent radiance, and output the synthesized image.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating and modifying databases using a fairness deduplication algorithm. In particular, in one or more embodiments, the disclosed systems generate, within an embedding space, semantic embeddings from a plurality of digital images stored in a database. In some embodiments, the disclosed systems identify, from among the semantic embeddings in the embedding space, a preservable embedding according to a preservation prototype indicating a semantic concept to preserve within the database. In one or more embodiments, the disclosed systems generate a modified database by pruning one or more digital images corresponding to semantic embeddings other than the preservable embedding from the database.
G06T 11/60 - Édition de figures et de texteCombinaison de figures ou de texte
G06V 10/762 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant le regroupement, p. ex. de visages similaires sur les réseaux sociaux
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
24.
GENERATING VISUALLY AWARE DESIGN LAYOUTS USING A MULTI-DOMAIN DIFFUSION NEURAL NETWORK
The present disclosure relates to systems, methods, and non-transitory computer readable media that generate layouts for digital designs from image elements via multi-domain diffusion. For instance, in some embodiments, the disclosed systems receive, from a client device, a plurality of image elements for generating a digital design. The disclosed systems generate, using an encoder of a multi-domain diffusion neural network, embeddings representing visual characteristics and bounding box characteristics of the plurality of image elements. The disclosed systems further generate, using the multi-domain diffusion neural network, a layout for the digital design from the visual characteristics and bounding box characteristics of the embeddings. Additionally, the disclosed systems provide the layout for display on the client device.
Generative artificial intelligence visual effect techniques are described. A prompt, for example, is received. The prompt includes text specifying a visual effect and text specifying a shape. A mask is formed defining a portion of digital content based on an object selected from digital content. The visual effect is generated using generative artificial intelligence by one or more machine-learning models based on the text specifying the visual effect, the text specifying the shape, and the mask. The digital content is presented as having the visual effect applied to the portion of the digital content for display in a user interface.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image and an input mask, wherein the input mask indicates a region of the input image to be modified and generating, using a first image generation model, an intermediate result based on the input image and the input mask, wherein the intermediate result modifies the region of the input image indicated by the input mask. A second image generation model generates a synthetic image based on the input image and the intermediate result, wherein the synthetic image depicts the input image with content from the modified region at a higher level of detail than the intermediate result.
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a pattern prompt and a text image, where the pattern prompt describes a visual pattern and the text image depicts text, generating a pattern image based on the pattern prompt, where the pattern image depicts the visual pattern, and generating a patterned text image based on the pattern image and the pattern prompt.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining an input prompt and retrieving an intermediate noise state based on a similarity between the input prompt and a candidate prompt corresponding to the intermediate noise state. An image generation model generates a synthetic image based on the input prompt and the intermediate noise state.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input texture image and a plurality of image masks, generating a plurality of image assets corresponding to the plurality of image masks based on the input texture image, and generating a combined asset including the plurality of image assets. The plurality of image assets have a consistent texture based on the input texture image.
Embodiments described herein provide methods and systems for facilitating actively-learned context modeling. In one embodiment, a subset of data is selected from a training dataset corresponding with an image to be compressed, the subset of data corresponding with a subset of data of pixels of the image. A context model is generated using the selected subset of data. The context model is generally in the form of a decision tree having a set of leaf nodes. Entropy values corresponding with each leaf node of the set of leaf nodes are determined. Each entropy value indicates an extent of diversity of context associated with the corresponding leaf node. Additional data from the training dataset is selected based on the entropy values corresponding with the leaf nodes. The updated subset of data is used to generate an updated context model for use in performing compression of the image.
H04N 19/182 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant un pixel
H04N 19/184 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant des bits, p. ex. de flux vidéo compressé
H04N 19/50 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif
H04N 19/91 - Codage entropique, p. ex. codage à longueur variable ou codage arithmétique
Methods and systems are provided for facilitating document collaboration in accordance with collaboration controls. In embodiments, an indication of a collaboration control for a collaborator of a document is obtained. The collaboration control generally indicates an edit permission for a document section of the document in relation to the collaborator. Thereafter, a set of collaboration control data for the document is generated. In embodiments, the set of collaboration control data includes the collaboration control indicating the edit permission for the document section of the document in relation to the collaborator. Based on an input (e.g., edit) by the collaborator to the document section of the document, a determination is made, using the set of collaboration control data, as to whether to enable an edit to the document section of the document.
H04L 65/401 - Prise en charge des services ou des applications dans laquelle les services impliquent une session principale en temps réel et une ou plusieurs sessions parallèles additionnelles en temps réel ou sensibles au temps, p. ex. accès partagé à un tableau blanc ou mise en place d’une sous-conférence
G06F 40/166 - Édition, p. ex. insertion ou suppression
A method, apparatus, non-transitory computer readable medium, and system for data processing include obtaining a text prompt and generating a first intermediate noise state based on the text prompt, retrieving a second intermediate noise state based on the text prompt and the first intermediate noise state, and generating a synthetic image based on the text prompt and the second intermediate noise state.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for hierarchical entity segmentation. In particular, in one or more embodiments, the disclosed systems receive a digital image comprising a plurality of object entities. In addition, in some embodiments, the disclosed systems generate, utilizing a segmentation model comprising parameters generated according to pseudo-labels indicating hierarchies of segmentation masks for a set of training digital images, a hierarchical segmentation indicating hierarchical relations of the plurality of object entities of the digital image. Moreover, in some embodiments, the disclosed systems generate, for the digital image, a segmentation map from the hierarchical segmentation of the plurality of object entities.
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/762 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant le regroupement, p. ex. de visages similaires sur les réseaux sociaux
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
34.
GENERATING DIGITAL CONTENT CONSISTENT WITH CONTEXT-SPECIFIC GUIDELINES UTILIZING PROMPT AUGMENTATION AND MODEL TUNING
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a contextual content generation system that trains and implements a unique machine learning architecture to generate context-specific digital content items based on a digital guideline document. In particular, the disclosed systems select a content generation method from among prompt engineering and/or updating one or more machine learning models to generate digital content. For example, the disclosed systems utilize machine learning models to extract key elements from a digital guideline document comprising context-specific guidelines for digital content. Further, the disclosed systems generate an augmented prompt comprising indications of key elements from the digital guideline document. In addition, the disclosed systems select a content generation method from among prompt engineering and/or updating machine learning models to generate the digital content item which incorporates digital content corresponding to the context-specific guidelines based on the augmented prompt.
The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement an image filter for enhancing light text and removing document shadows. In particular embodiments, the disclosed systems use a modified adaptive thresholding approach the relies on image gradients to efficiently guide the thresholding process. In addition, the disclosed systems use a machine-learning model to generate a document shadow map. The document shadow map can include text reflections. Accordingly, the disclosed systems remove text reflections from the document shadow map (e.g., by using an interpolated shadow intensity value of neighboring shadow map pixels). In turn, the disclosed systems use the document text mask and the document shadow map cleaned of text reflections to remove shadows from the digital image. Further, the disclosed systems enhance text in the shadow-removed digital image based on contrast stretching.
G06T 5/40 - Amélioration ou restauration d'image utilisant des techniques d'histogrammes
G06T 5/60 - Amélioration ou restauration d'image utilisant l’apprentissage automatique, p. ex. les réseaux neuronaux
G06T 5/92 - Modification de la plage dynamique d'images ou de parties d'images basée sur les propriétés globales des images
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a style kit including a first image generation input indicating a first image attribute, a second image generation input indicating a second image attribute, and a selectability parameter indicating that the second image generation input is selectable. A third image generation input is received from a user based on the selectability parameter, wherein the third image generation input indicates a third image attribute different from the second image attribute of the second image generation input. An image generation model generates a synthetic image based on the style kit, the first image generation input, and the third image generation input, wherein the synthetic image has the first image attribute and the third image attribute.
Methods and systems are provided for using reinforcement learning to recommend data visualizations. In embodiments described herein, statistical features for each sample of corresponding samples of a dataset are determined by applying each sample of the dataset to a data visualization recommendation model. The computational cost of each of the statistical features for each of the samples is determined based via a regression model. Recommended statistical features are determined by sequentially applying each sample to a reinforcement learning model with a computational budget and with the corresponding computational costs of the statistical features of each sample. A data visualization is then displayed that is generated by applying the dataset and the recommended statistical features to the data visualization recommendation model.
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining an input image depicting an entity and a skeleton map depicting a pose of the entity and performing a cross-attention mechanism between image features of the input image and entity features representing the pose to obtain modified image features. An output image is generated based on the modified image features that depicts the entity with the pose.
Generative artificial intelligence (AI) content strategy techniques are described. In one or more examples, a content brief is received describing a goal to be achieved in controlling digital content output. Content brief data is extracted from the content brief and a content strategy is generated based on the content brief data using generative artificial intelligence implemented using one or more machine-learning models.
Aspects and features of the present disclosure relate to providing injective three-dimensional (3D) deformations based on two-dimensional (2D) mesh deformations. For example, a method involves defining at least one 2D mesh deformation based on a designated position of an object represented by an input neural radiance field (NeRF). The method also involves applying the 2D mesh deformation(s) to a 3D piecewise-linear map that operates over a plane and preserves a normal direction to produce prismatic maps. The method further involves composing a 3D deformation for the object from layers defined by the prismatic maps, and parameterizing the 3D piecewise-linear map. The method additionally involves storing or rendering, using the 3D piecewise-linear map, a deformed NeRF injectively representing the object in the designated position. Aspects also include computer systems, apparatus, and computer programs configured to perform the method.
Embodiments are disclosed for correlating video sequences and audio sequences by a media recommendation system using a trained encoder network. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training input including a media sequence, including a video sequence paired with an audio sequence, segmenting the media sequence into a set of video sequence segments and a set of audio sequence segments, extracting visual features for each video sequence segment and audio features for each audio sequence segment, generating, by transformer networks, contextualized visual features from the extracted visual features and contextualized audio features from the extracted audio features, the transformer networks including a visual transformer and an audio transformer, generating predicted video and audio sequence segment pairings based on the contextualized visual and audio features, and training the visual transformer and the audio transformer to generate the contextualized visual and audio features.
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
G06V 10/74 - Appariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques
G06V 20/40 - ScènesÉléments spécifiques à la scène dans le contenu vidéo
G10L 25/03 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de paramètres extraits
G10L 25/57 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour le traitement des signaux vidéo
42.
ENHANCING ARTIFICIAL INTELLIGENCE RESPONSES WITH CONTEXTUAL USAGE INSIGHTS
Some aspects relate to technologies for an artificial intelligent (AI) system that, among other things, enhances responses to concepts questions for an application with contextual usage insights. In accordance with some aspects, a user query is determined to comprise a concepts question regarding an application. Responsive to determining the user query comprises the concepts question, documentation regarding the application relevant to the user query is identified. A generative model generates text for a response to the concepts question using the documentation regarding the application. Additionally, a determination is made to add contextual usage insights to the response. Responsive to determining to add contextual usage insights to the response, usage data relevant to the user query and/or the response is retrieved. The generative model generates text for a final response using the response and the usage data, and the final response is provided to a user device for presentation.
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a digital design interface for intuitively creating custom arrows that demonstrate both visual consistency and inherent directionality within vector-based design applications. In particular, in one or more implementations, the disclosed systems receive a request to create a custom arrow from a digital object and a path segment. In addition, the disclosed systems detect that the digital object is within a threshold distance of the path segment and combine the digital object with the path segment to create a custom arrow object. In particular, the disclosed systems utilize a bilateral segmentation machine-learning model to segment the digit object and a symmetry axis detection model to determine an axis of symmetry of the digital object. Moreover, the disclosed systems attach the digital object to an endpoint of the path segment at the axis of symmetry.
41 - Éducation, divertissements, activités sportives et culturelles
Produits et services
Educational and training services; educational and training services in the form of classroom training, online training, web based training, and video training in the fields of computer software, cloud computing, desktop publishing, digital publishing, electronic publishing, graphic design, marketing, advertising, analytics, e-commerce, digital asset management, data management, business management, business process management, business document and forms creation, and automation of business document and forms processing and workflow; educational services; educational services in the form of arranging professional workshops and training courses, conducting classes, seminars, conferences, and workshops in the fields of computer software, cloud computing, desktop publishing, digital publishing, electronic publishing, graphic design, marketing, advertising, analytics, e-commerce, digital asset management, data management, business management, business process management, business document and forms creation, and automation of business document and forms processing and workflow; educational and training sessions in the field of organization and business matters relating to creative professionals.
45.
EDITING DIGITAL IMAGES WITH LOCAL REFINEMENT VIA SELECTIVE FEATURE TRIMMING
Methods, systems, and non-transitory computer readable storage media are disclosed for modifying digital images via a generative neural network with local refinement. The disclosed system generates, utilizing an encoder neural network, a latent feature vector of a digital image by encoding global context information of the digital image into the latent feature vector. The disclosed system also determines a modified latent feature vector by trimming the latent feature vector to a feature subset corresponding to a masked portion of the digital image. Additionally, the disclosed system generates, utilizing a generative decoder neural network on the modified latent feature vector, digital image data corresponding to the masked portion of the digital image. The disclosed system also generates a modified digital image including the digital image data corresponding to the masked portion combined with additional portions of the digital image.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating digital posters from digital documents with multimodal content using a deep submodular function. Specifically, the disclosed systems generate embedding vectors representing multimodal content of a digital document comprising text and images. Further, disclosed systems determine, utilizing a deep submodular function on the embedding vectors, a content subset comprising one or more digital images aligned with one or more text segments representative of the digital document. Moreover, the disclosed systems generate, utilizing a large language model, a summary of the multimodal content of the digital document from a prompt based on the content subset. Additionally, the disclosed systems generate, for display at a client device, a digital poster comprising the summary of the multimodal content generated via the large language model.
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining an input image and an input prompt, where the input image depicts an object and the input prompt describes a lighting condition for the object, generating relighted image features based on the input image and the input prompt, where the relighted image features represent the object with the lighting condition, and generating a synthetic image based on the relighted image features, where the synthetic image depicts the object with the lighting condition.
Embodiments are disclosed for performing a using a neural network to optimize filter weights of an adaptive filter. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving, by a filter, an input audio signal, wherein the input audio signal is a far-end audio signal, the filter including a transfer function with adaptable filter weights, generating a response audio signal modeling the input audio signal passing through the acoustic environment, receiving a target response signal, including the input audio signal and near-end audio signals, calculating an adaptive filter loss, generating, by a trained recurrent neural network, a filter weight update using the calculated adaptive filter loss, updating the adaptable filter weights of the transfer function to create an updated transfer function, generating an updated response audio signal based on the updated transfer function, and providing the updated response audio signal as an output audio signal.
G10L 21/0224 - Traitement dans le domaine temporel
G10L 25/18 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de paramètres extraits les paramètres extraits étant l’information spectrale de chaque sous-bande
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
49.
DOCUMENT BOUNDARY DETECTION USING THE CURVATURE OF TEXT LINES
Embodiments are disclosed for using the curvature of text lines to detect a document boundary. The method may include receiving a warped image depicting a page of a document having an incomplete document boundary, the page including a plurality of text lines. A complete document boundary may be identified based on the incomplete document boundary and the plurality of text lines. A dewarped image corresponding to the warped image may be determined using the complete document boundary. The dewarped image may then be provided for display on a client device.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining an input prompt describing an image element, generating, using an image generation model, an output image depicting the image element and including a watermark, and identifying the training image as a source of the output image based on the watermark. The image generation model is trained using a training image including the image element and the watermark.
In one aspect, a computer-implemented method includes accessing, by a guidance module of an analysis application executing on a processor, wildcard data associated with data in a data repository. The method further includes displaying, by the guidance module based on the wildcard data, one or more wildcard elements in a graphical user interface (GUI). The method further includes receiving, by the analysis application, selection of a first wildcard element of the one or more wildcard elements. The method further includes displaying, by the guidance module, a suggestion based on the selection of the first wildcard element.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating digital images via a generative neural network with localized constraints. The disclosed system generates, utilizing one or more encoder neural networks, a sequence of embeddings comprising a prompt embedding representing a text prompt and an object text embedding representing a phrase indicating an object in the text prompt. The disclosed system generates, utilizing the one or more encoder neural networks, a visual embedding representing an object image corresponding to the object. The disclosed system determines a modified sequence of embeddings by replacing the object text embedding with the visual embedding in the sequence of embeddings. The disclosed system also generates, utilizing a generative neural network, a synthetic digital image from the modified sequence of embeddings comprising the visual embedding.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that selectively utilizes an image super-resolution model to upscale image patches corresponding to high frequency portions. In particular, the disclosed systems select a set of image patches corresponding to high frequency portions of a digital image at a first resolution. Furthermore, the disclosed systems utilize an image super-resolution model to generate upscaled image patches for the set of image patches of the high-frequency portions to a second resolution higher than the first resolution according to an upscaling factor of at least two. The disclosed systems generate a segmentation map of the digital image based on the upscaled image patches and an upscaled segmentation corresponding to low-frequency portions of the digital image. Further, the disclosed systems generate a vectorized digital image for the digital image according to the segmentation map.
G06T 3/4053 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur la super-résolution, c.-à-d. où la résolution de l’image obtenue est plus élevée que la résolution du capteur
Three dimensional aware video compositing techniques are described. In one or more examples, subject data is produced that defines a subject depicted in frames of a subject video and viewpoint data describing movement of a viewpoint with respect to the frames of the subject video. Three-dimensional data is formed that defines a three-dimensional representation of an environment depicted in frames of an environment video. A composited video is generated by aligning the environment with the movement of the viewpoint of the subject based on the subject data and the three-dimensional data, which is then rendered, e.g., presented for display in a user interface.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a structural input indicating a target spatial structure, encoding, using a condition encoder, the structural input to obtain a structural encoding representing the target spatial structure, and generating, using an image generation model, a synthetic image based on the structural encoding, where the synthetic image depicts an object having the target spatial structure.
A method, apparatus, non-transitory computer readable medium, and system for text-to-color palette generation include encoding a text prompt to obtain text embedding. A color embedding is generated based on the text embedding by performing a diffusion process. Then a color palette is generated based on the color embedding. The color palette includes a plurality of colors corresponding to the text prompt.
Methods, non-transitory computer readable media, apparatuses, and systems for data processing include obtaining, by a machine learning model, a user cluster and interaction data for users in the user cluster, where the interaction data relates to interactions between the users and a digital platform. Some embodiments further include generating, by the machine learning model, a directed graph based on the user cluster and the interaction data, where the directed graph represents causal relationships among the interactions. Some embodiments further include updating, by the machine learning model, the user cluster based on the directed graph. Some embodiments further include providing, by a content component, customized content to a user via the digital platform based on the updated user cluster.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for modifying a digital design by performing a selective object-level undo operation. In one or more embodiments, the disclosed systems generate a modified object by performing a series of operations on an object depicted within the digital design. In some embodiments, the disclosed systems receive a selective object-level undo operation on the modified object, wherein the request specifies an operation to undo from among the series of operations performed on the object. In one or more embodiments, the disclosed systems modify the modified object by performing the selective object-level undo operation on the modified object to undo the operation from among the series of operations. In some embodiments, the disclosed systems provide an updated digital design depicting the modified object reflecting modifications from the series of operations excluding the operation undone by the selective object-level undo operation.
G06F 3/0481 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] fondées sur des propriétés spécifiques de l’objet d’interaction affiché ou sur un environnement basé sur les métaphores, p. ex. interaction avec des éléments du bureau telles les fenêtres ou les icônes, ou avec l’aide d’un curseur changeant de comportement ou d’aspect
G06F 3/04842 - Sélection des objets affichés ou des éléments de texte affichés
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p. ex. glissement, rotation, agrandissement ou changement de couleur
G06F 30/12 - CAO géométrique caractérisée par des moyens d’entrée spécialement adaptés à la CAO, p. ex. interfaces utilisateur graphiques [UIG] spécialement adaptées à la CAO
Digital image visual aesthetic score generation techniques are described. In one or more examples, these techniques are implemented by a system including a training data collection module implemented by a processing device to collect training data including training digital images and user interaction data describing user interaction with the training digital images, respectively. A training module is configured to train a machine-learning model using the training data to generate an aesthetic score based on an input digital image. The aesthetic score is configured to specify an amount of visual aesthetics exhibited by the input digital image.
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
60.
CONTROLLABLE VISUAL TEXT GENERATION WITH ADAPTER-ENHANCED DIFFUSION MODELS
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a text content image and a text style image. The text content image is encoded to obtain content guidance information and the text style image is encoded to obtain style guidance information. Then a synthesized image is generated based on the content guidance information and the style guidance information. The synthesized image includes text from the text content image having a text style from the text style image.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that iteratively generates, utilizing a machine learning model, text responses to reduce hallucinated content. In particular, in some embodiments, the disclosed systems receive a digital query and selects one or more supporting digital documents for the digital query. Furthermore, in some embodiments the disclosed systems generate a first text response from a first text prompt generated by using the digital query. Moreover, in some embodiments the disclosed systems extract a misalignment portion of the first text response by comparing the first text response and the one or more supporting digital documents. Additionally, from the misalignment portion of the first text response and the digital query, the disclosed systems further generate a second text response.
G06F 16/383 - Recherche caractérisée par l’utilisation de métadonnées, p. ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
A method, apparatus, non-transitory computer readable medium, and system for generating suggested prompts include obtaining a sequence of text prompts associated with a user and determining a session concept for the user based on the sequence of text prompts. Embodiments then generate, using a prompt generation model, an image generation prompt based on the sequence of text prompts and the session concept. Subsequently, embodiments generate, using an image generation model, a synthetic image based on the image generation prompt.
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Downloadable software using artificial intelligence for collecting, compiling, converting, organizing, consolidating, collaborating on, sharing, and editing files, links, notes, and documents and for creating an information hub; downloadable assistant and chatbot software using artificial intelligence for preparing insights, notes, and citations based on document content and user input and for collaborating on or sharing the same with other users; downloadable software using artificial intelligence for content generation and management Software as a service (SAAS) services featuring software using artificial intelligence for collecting, compiling, converting, organizing, consolidating, collaborating on, sharing, and editing files, links, notes, and documents and for creating an information hub; software as a service (SAAS) services featuring assistant and chatbot software using artificial intelligence for preparing insights, notes, and citations based on document content and user input and for collaborating on or sharing the same with other users; software as a service (SAAS) services featuring software using artificial intelligence for content generation and management
64.
COMPLETING TEMPORAL KNOWLEDGE GRAPHS BASED ON ENHANCED ENTITY REPRESENTATION AND WEIGHTED FREQUENCY-BASED SAMPLING
The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate predicted relationships for entities of a temporal knowledge graph using enhanced entity representations. For instance, in one or more embodiments, the disclosed systems generate a query for predicting a relationship for a subject entity represented within a temporal knowledge graph. The disclosed systems further determine an enhanced entity representation generated for the subject entity by an enhancement layer of a temporal knowledge graph completion model, the enhanced entity representation including a combination of a connection-based similarity for the subject entity and a relationship-based similarity for the subject entity. Using the temporal knowledge graph completion model and based on the enhanced entity representation of the subject entity, the disclosed systems generate a predicted relationship for the subject entity.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate modified digital images. In particular, in some embodiments, the disclosed systems generate image editing directions between textual identifiers of two visual features utilizing a language prediction machine learning model and a text encoder. In some embodiments, the disclosed systems generated an inversion of a digital image utilizing a regularized inversion model to guide forward diffusion of the digital image. In some embodiments, the disclosed systems utilize cross-attention guidance to preserve structural details of a source digital image when generating a modified digital image with a diffusion neural network.
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
66.
HUMAN INPAINTING UTILIZING A SEGMENTATION BRANCH FOR GENERATING AN INFILL SEGMENTATION MAP
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06V 40/10 - Corps d’êtres humains ou d’animaux, p. ex. occupants de véhicules automobiles ou piétonsParties du corps, p. ex. mains
67.
INTERACTIVE TREE REPRESENTING ATTRIBUTE QUALITY OR CONSUMPTION METRICS FOR DATA INGESTION AND OTHER APPLICATIONS
Embodiments provide systems, methods, and computer storage media for management, assessment, navigation, and/or discovery of data based on data quality, consumption, and/or utility metrics. Data may be assessed using attribute-level and/or record-level metrics that quantify data: “quality”—the condition of data (e.g., presence of incorrect or incomplete values), its “consumption”—the tracked usage of data in downstream applications (e.g., utilization of attributes in dashboard widgets or customer segmentation rules), and/or its “utility”−a quantifiable impact resulting from the consumption of data (e.g., revenue or number of visits resulting from marketing campaigns that use particular datasets, storage costs of data). This data assessment may be performed at different stages of a data intake, preparation, and/or modeling lifecycle. For example, an interactive tree view may visually represent a nested attribute schema and attribute quality or consumption metrics to facilitate discovery of bad data before ingesting into a data lake.
G06Q 10/0639 - Analyse des performances des employésAnalyse des performances des opérations d’une entreprise ou d’une organisation
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p. ex. des menus
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
Certain aspects and features of this disclosure relate to providing a vector graphics entity component system that supports collaborative editing in real time or near real time. Graphical constructs are efficiently described by integer-based identifiers, and graphical constructs of the same type are stored in a definitional component. Each client maintains both a pending state representation and a synchronized state representation of the graphical design to independently track the state of the representation at a live editing server. The use of integer-based identifiers for graphical constructs provides an efficient change representation that can be communicated with minimal network traffic. All copies of the graphical design represented among clients reach a consistent state quickly even when multiple users are making changes to the same vector path, eliminating the need to track changes manually or to move large files.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining a text prompt; generating, using a generator of an image generation model, a feature embedding based on the text prompt, wherein the feature embedding includes a first set of channels that encodes a first value of an image characteristic and a second set of channels that encodes a residual between the first value of the image characteristic and a second value of the image characteristic; and generating, using a decoder of the image generation model, a synthetic image corresponding to the second value of the image characteristic based on the feature embedding.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image depicting a first element, a text description of the input image, and a modification prompt describing a second element different from the first element, generating an intermediate output based on the input image and the text description, where the intermediate output represents the first element, and generating a synthetic image based on the intermediate output and the modification prompt, where the synthetic image replaces the first element from the input image with the second element from the modification prompt.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating three-dimensional meshes representing two-dimensional images for editing the two-dimensional images. The disclosed system utilizes a first neural network to determine density values of pixels of a two-dimensional image based on estimated disparity. The disclosed system samples points in the two-dimensional image according to the density values and generates a tessellation based on the sampled points. The disclosed system utilizes a second neural network to estimate camera parameters and modify the three-dimensional mesh based on the estimated camera parameters of the pixels of the two-dimensional image. In one or more additional embodiments, the disclosed system generates a three-dimensional mesh to modify a two-dimensional image according to a displacement input. Specifically, the disclosed system maps the three-dimensional mesh to the two-dimensional image, modifies the three-dimensional mesh in response to a displacement input, and updates the two-dimensional image.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Downloadable software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation from user prompts, image editing, and for generating translations; downloadable software for using artificial intelligence models for content generation and management; downloadable application programming interface (API) software
A method, apparatus, non-transitory computer readable medium, apparatus, and system for scene re-lighting using direct shading control include obtaining an input image and a lighting direction indicator that describes a lighting direction. A direct shading map is generated based on the input image and the lighting direction indicator and a shaded image is generated depicting an object from the input image with shading consistent with the lighting direction based on the shading map.
Methods and systems disclosed herein relate generally to radiance field gradient scaling for unbiased near-camera training. In a method, a computing system receives information about a 3D environment. The computing system receives a camera location and a camera direction. The computing system determines, using a machine learning model, a multiple densities and colors of the 3D environment from a perspective of the camera location at a number of respective points sampled along a first projected ray from the camera location in the direction of the camera direction. The computing system aggregates the multiple densities and colors of the 3D environment to generate an output pixel comprising an integrated color that represents the 3D environment.
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Downloadable software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation and editing from user prompts, image editing, and for generating translations; downloadable software for using artificial intelligence models for content generation and management; downloadable application programming interface (API) software Software as a service (SAAS) services featuring software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation and editing from user prompts and for generating translations; Software as a service (SAAS) services featuring software for using artificial intelligence models for content generation and management
76.
ANONYMIZING DIGITAL IMAGES UTILIZING A GENERATIVE NEURAL NETWORK
The present disclosure relates to systems, methods, and non-transitory computer readable media for generating anonymized digital images utilizing a face anonymization neural network. In some embodiments, the disclosed systems utilize a face anonymization neural network to extract or encode a face anonymization guide that encodes face attribute features, such as gender, ethnicity, age, and expression. In some cases, the disclosed systems utilize the face anonymization guide to inform the face anonymization neural network in generating synthetic face pixels for anonymizing a digital image while retaining attributes, such as gender, ethnicity, age, and expression. The disclosed systems learn parameters for a face anonymization neural network for preserving face attributes, accounting for multiple faces in digital images, and generating synthetic face pixels for faces in profile poses.
Techniques for generation of images based on a variety of input conditions or modalities are described, whereby one or more processing devices (300) receive a plurality of input modalities comprising multiple images (302) and a text input in a natural language (308). The processing devices (300) generate image embeddings for the multiple images (306) and a text embedding for the text input (312). The processing devices, using a machine learning model (132), generate an output image (322) based on the image embeddings (306) and the text embedding (312). The output image includes portions of the multiple images.
Techniques for personalizing multimedia content based on a knowledge graph are described. In one embodiment, a method includes receiving activity data associated with a user from a device, generating a touchpoint embedding and a decision embedding using a graph neural network (GNN) model based on the activity data, the GNN model trained using a knowledge graph, predicting a touchpoint using a first classifier based on the touchpoint embedding, predicting a decision stage using a second classifier based on the decision embedding, and generating personalized content for the touchpoint based on the decision stage using a large language model (LLM). Other embodiments are described and claimed.
Embodiments are disclosed for a process of applying and blending new textures to surfaces across frames of a video sequence. The method may include obtaining a new texture for a selected region of a video frame of a video sequence. The method may further comprise generating a mesh for the selected region of the first video frame that includes a plurality of control points. The method may further comprise determining control point location data for each of the plurality of control points for additional video frames of the video sequence and using the control point location data to generate a plurality of warped video frames by applying the new texture to the additional video. The method may further comprise generating blended video frames by blending the new texture in the warped video frames and providing a modified version of the video sequence using the generated blended video frames.
G06T 7/33 - Détermination des paramètres de transformation pour l'alignement des images, c.-à-d. recalage des images utilisant des procédés basés sur les caractéristiques
G06T 7/73 - Détermination de la position ou de l'orientation des objets ou des caméras utilisant des procédés basés sur les caractéristiques
Embodiments are disclosed for a process of detecting and processing curved text in a document using a digital design system. The method may include identifying, by a page segmentation model, a plurality of paragraph objects in a document. The disclosed systems and methods further comprise determining that a paragraph object of the plurality of paragraph objects includes curved text in view of positions of baselines of text runs in the paragraph object. The processing of the curved text in the paragraph object can include determining spacing data for text runs of the curved text in the paragraph object. The disclosed systems and methods further comprise presenting output data representing the curved text using the spacing data for the text runs of the curved text.
G06F 40/103 - Mise en forme, c.-à-d. modification de l’apparence des documents
G06V 30/414 - Extraction de la structure géométrique, p. ex. arborescenceDécoupage en blocs, p. ex. boîtes englobantes pour les éléments graphiques ou textuels
Embodiments are disclosed for summary page generation using a document. The method may include receiving a text document. The method may further include generating a test summary based on the text document and a structured representation of the text summary using the document summarized model. The method may further include generating an image generation prompt based on the text summary and the structured representation of the text summary using a prompt generator. The method may further include generating a multimedia summary document corresponding to the text document using a diffusion model and the image generation prompt. The multimedia summary document includes a generated background imagery based on the text summary. The multimedia summary document includes at least a portion of the text summary which is placed within the multimedia summary document based on the structed representation of the text summary.
Digital content generation techniques are described that are performed using a text-based input. A text-based input is received and asset recommendation data is generated based on the text-based input using a machine-learning model, e.g., a large language model (LLM). A selection of a plurality of assets is received from the asset recommendation data and a selection is also received of at least one interaction from a plurality of interactions for the plurality of assets. The digital content is generated as having the interaction between the selection of the plurality of assets.
Techniques for generation of images based on a variety of input conditions or modalities are described. In one embodiment, one or more processing devices receive a plurality of input modalities comprising multiple images and a text input in a natural language. The processing devices generate image embeddings for the multiple images and a text embedding for the text input. The processing devices, using a machine learning model, generate an output image based on the image embeddings and the text embedding. The output image includes portions of the multiple images.
A method, apparatus, non-transitory computer readable medium, and system include obtaining a first image depicting a background scene and a second image depicting a foreground element, generating a guidance embedding based on the second image, and generating a synthetic image depicting the foreground element and the background scene based on the first image and the guidance embedding, wherein the image generation model determines a location of the foreground element within the synthetic image in light of the background scene.
Object identification techniques from a digital image are described. In an implementation, edges of an object are determined by analyzing gradients from a digital image. A structure of the object is computed by detecting line segments from the digital image. A boundary of the object is defined based on the edges and the structure. A display of the object is edited in a user interface based on the boundary using an edit operation.
G06V 10/94 - Architectures logicielles ou matérielles spécialement adaptées à la compréhension d’images ou de vidéos
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G06T 7/73 - Détermination de la position ou de l'orientation des objets ou des caméras utilisant des procédés basés sur les caractéristiques
G06V 10/26 - Segmentation de formes dans le champ d’imageDécoupage ou fusion d’éléments d’image visant à établir la région de motif, p. ex. techniques de regroupementDétection d’occlusion
G06V 10/36 - Utilisation d’un opérateur local, c.-à-d. des moyens pour opérer sur des points d’image situés dans la proximité d’un point donnéOpérations de filtrage locales non linéaires, p. ex. filtrage médian
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/77 - Traitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source
G06V 10/776 - ValidationÉvaluation des performances
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
87.
MANAGING DIGITAL ASSETS STORED AS COMPONENTS AND PACKAGED FILES
The present disclosure relates to a digital asset synchronization system that provides improved digital asset management and synchronization of a digital asset stored either within a component database or a packaged file. For example, the digital asset synchronization system enables a set of components that makes up a digital asset to appear as a singular packaged file, while also maintaining the benefits of having the digital asset made up of the components. In this manner, the digital asset synchronization system provides a bridge between a digital asset stored in a packaged file format and conventional file formats. In addition, the digital asset synchronization system also provides digital asset management and improved synchronization between a client device and a cloud storage system.
G06F 16/22 - IndexationStructures de données à cet effetStructures de stockage
G06F 16/25 - Systèmes d’intégration ou d’interfaçage impliquant les systèmes de gestion de bases de données
H04L 67/10 - Protocoles dans lesquels une application est distribuée parmi les nœuds du réseau
H04L 67/1095 - Réplication ou mise en miroir des données, p. ex. l’ordonnancement ou le transport pour la synchronisation des données entre les nœuds du réseau
H04L 67/1097 - Protocoles dans lesquels une application est distribuée parmi les nœuds du réseau pour le stockage distribué de données dans des réseaux, p. ex. dispositions de transport pour le système de fichiers réseau [NFS], réseaux de stockage [SAN] ou stockage en réseau [NAS]
In some embodiments, a computing system provides a graphical interface that displays one or more graphical objects including a moving object and a static object. The computing system generates an impact contour for the moving object that has a predefined distance from a first boundary of the moving object. Based on detecting that the impact contour of the moving object intersects a second boundary of the static object, the computing system determines a first snapping point on the first boundary of the moving object and a second snapping point on the second boundary of the static object. The computing system updates the graphical interface to execute a snapping operation by translating the moving object to a location where the first snapping point and the second snapping point touch each other.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating action plans utilizing a large language model with a best-first search model. The disclosed system determines a request to utilize a large language model to generate an action plan via one or more software tools. The disclosed system generates the action plan by traversing a decision tree comprising an action space involving the one or more software tools by iteratively: selecting, utilizing a best-first search model, an action from a set of possible actions in the action space of the decision tree; and expanding, utilizing the best-first search model, the action space of the decision tree to include an additional set of possible actions. The disclosed system also executes the action plan via one or more interactions with the one or more software tools according to the action.
Techniques for reference image based material retrieval are described that support identification of procedural materials based on visual features of input images. A processing device, for instance, receives an input image that has a particular visual appearance. The processing device generates a histogram representation of the input image that represents a color prominence of the input image and generates a color distribution based on the color prominence. The processing device leverages a vision language model to filter candidate procedural materials by a semantic similarity to the input image. The processing device then identifies a procedural material that has a visual similarity to the particular visual appearance by comparing the color distribution for the input image to color distributions associated with the filtered candidate procedural materials. In this way, the techniques described herein support efficient retrieval of procedural materials based on color and on semantic features of the input image.
G06T 5/40 - Amélioration ou restauration d'image utilisant des techniques d'histogrammes
G06T 7/90 - Détermination de caractéristiques de couleur
G06T 17/00 - Modélisation tridimensionnelle [3D] pour infographie
G06V 10/56 - Extraction de caractéristiques d’images ou de vidéos relative à la couleur
G06V 10/60 - Extraction de caractéristiques d’images ou de vidéos relative aux propriétés luminescentes, p. ex. utilisant un modèle de réflectance ou d’éclairage
G06V 10/74 - Appariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques
91.
SELF ATTENTION REFERENCE FOR IMPROVED DIFFUSION PERSONALIZATION
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a reference image an input prompt describing an image element, identifying an object from the reference image; generating, using an image generation model, image features representing the object based on the reference image, and generating, using the image generation model, a synthetic image depicting the image element and the object based on the input prompt and the image features from the reference image.
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
92.
MODIFYING DIGITAL IMAGES VIA MULTI-LAYERED SCENE COMPLETION FACILITATED BY ARTIFICIAL INTELLIGENCE
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via multi-layered scene completion techniques facilitated by artificial intelligence. For instance, in some embodiments, the disclosed systems receive a digital image portraying a first object and a second object against a background, where the first object occludes a portion of the second object. Additionally, the disclosed systems pre-process the digital image to generate a first content fill for the portion of the second object occluded by the first object and a second content fill for a portion of the background occluded by the second object. After pre-processing, the disclosed systems detect one or more user interactions to move or delete the first object from the digital image. The disclosed systems further modify the digital image by moving or deleting the first object and exposing the first content fill for the portion of the second object.
G06T 5/77 - RetoucheRestaurationSuppression des rayures
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p. ex. glissement, rotation, agrandissement ou changement de couleur
In implementation of techniques for offsetting camera filter shift, a computing device implements an offset system to capture a first digital image using a filter at a first position relative to an image capture device and to capture a second digital image using the filter at a second position relative to the image capture device resulting from movement of the filter between the first position and the second position. The offset system determines a filter shift resulting from the movement by comparing the first and second digital images. The offset system then controls an offset of a portion of the image capture device based on the filter shift.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for training and implementing a vision-language model using masked distillation and contrastive image-text training. In particular, in one or more embodiments, the disclosed systems generate, utilizing a vision encoder, an image embedding from a masked digital image comprising a digital image with one or more masked patches. In some embodiments, the disclosed systems generate, utilizing a text encoder, a text embedding from a masked text phrase. In one or more embodiments, the disclosed systems generate, utilizing the vision-language model from the image embedding and the text embedding, a predicted text reconstruction of the text description and a predicted image reconstruction of the digital image. In some embodiments, the disclosed systems modify parameters of the vision-language model according to a masked distillation loss between the predicted text reconstruction and a text reconstruction generated by a pretrained large language model.
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
G06F 40/40 - Traitement ou traduction du langage naturel
G06V 10/26 - Segmentation de formes dans le champ d’imageDécoupage ou fusion d’éléments d’image visant à établir la région de motif, p. ex. techniques de regroupementDétection d’occlusion
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
G06V 10/776 - ValidationÉvaluation des performances
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Software as a service (SAAS) services featuring software for using artificial intelligence models for content generation and management, all relating to sound, audio, and music generation from user prompts and for generating translations; software as a service (SAAS) services featuring software for using artificial intelligence models for content generation and management.
97.
CHANNEL INCREMENTALITY MEASUREMENT USING CAUSAL FOREST
One or more aspects of the method, apparatus, and non-transitory computer readable medium include obtaining content presentation data; generating, using a machine learning model, predicted user interaction data by computing a plurality of decision tree regressors, wherein nodes of the decision tree regressors are trained to infer a causal relationship between a user interaction variable and a treatment variable; and present content to the user based on the predicted user interaction data. The causal relationship is based on maximizing a difference in a relationship between a user interaction variable and a treatment variable of a tree.
Artificial intelligence (AI) agent control and progress indicator techniques are described. A search-query type of a search query, for instance, is detected using a machine-learning model. Responsive to detecting the search-query type is a first type, the search query is communicated for processing by an algorithmic search engine to generate a search result. Responsive to the detecting that the search-query type is a second type, the search query is communicated for processing using an artificial intelligence (AI) search assistant implemented using a large language model (LLM) to generate the search result.
Digital video editing techniques are described that are based on a target digital image. In one or more implementations, inputs are received. The inputs include a target text prompt, a target digital image depicting a target object, and a source digital video having a plurality of frames depicting a source object. Regions-of-interest are identified in the plurality of frames of the source digital video, respectively, based on the target text prompt and the target digital image using a machine-learning model, e.g., a diffusion model. A plurality of frames of a target digital video are generated as having the target object using a generative machine-learning model. The generating is based on the regions-of-interest, the target digital image, the source digital video, and a source text prompt describing the source digital video.
Digital image synthesis techniques are described that leverage splatting, i.e., forward warping. In one example, a first digital image and a first optical flow are received by a digital image synthesis system. A first splat metric and a first merge metric are constructed by the digital image synthesis system that defines a weighted map of respective pixels. From this, the digital image synthesis system produces a first warped optical flow and a first warp merge metric corresponding to an interpolation instant by forward warping the first optical flow based on the splat metric and the merge metric. A first warped digital image corresponding to the interpolation instant is formed by the digital image synthesis system by backward warping the first digital image based on the first warped optical flow.