In implementation of techniques for scene reconstruction from digital video of moving humans, a computing device implements a scene reconstruction system to receive a digital video depicting a scene including a human and an object. The scene reconstruction system then determines a depth of the human and a depth of the object in the digital video and generates a human mesh modeled from the human in the digital video. Using a machine learning model, the scene reconstruction system determines a size of the object by comparing the depth of the human, the depth of the object, and an estimated dimension of the human mesh. The scene reconstruction system then generates a scene reconstruction including the human mesh and a three-dimensional representation of the object based on the size of the object.
Content aware background generation techniques are described. In one or more examples, a background generation system forms a mask from a digital image and receives an input specifying one or more parameters. The background generation system then generates a background using a machine-learning model and generative artificial intelligence by predicting pixel values based on the digital image, the one or more parameters, and the mask using a loss function. The background is then applied to the digital image and presented for display in a user interface.
A method, apparatus, non-transitory computer readable medium, and system for natural language processing include obtaining a source document and a user characteristic that indicates a complexity preference of a user. A topic description is generated, using a language generation model, based on the source document and the user characteristic. The language generation model is trained based on an objective function that measures a complexity of the topic description.
G06V 30/416 - Extraction de la structure logique, p. ex. chapitres, sections ou numéros de pageIdentification des éléments de document, p. ex. des auteurs
A method, apparatus, non-transitory computer readable medium, apparatus, and system for generating pattern data include obtaining an input image including a pattern element. Then, embodiments generate a pattern image including the pattern element based on the input image. The pattern image includes a plurality of versions of the pattern element. Subsequently, embodiments generate a pattern caption based on the pattern image. Embodiments then utilize the pattern image and the pattern caption for training an image generation model to generate pattern images based on a text prompt.
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
5.
DATA EXPLORATION USING NATURAL LANGUAGE WITH DATA SAMPLING
In various examples, an exploratory data analytics tool obtains a natural language query and generates a structured data query for execution on a sample of a dataset based on the natural language query. In an example, an intent is determined for the query and the intent is used, at least in part, to determine the most appropriate sample. In addition, the intent, in some examples, is used to generate recommended queries. A user interface of the exploratory data analytics tool, for example, can display the recommended queries and/or the results of the structured data query on the sample.
In accordance with the described techniques, a processing device receives one or more documents and one or more paragraphs formulated from content of the one or more documents. Using a text decomposition model, the processing device decomposes the one or more paragraphs into a plurality of statements. Using a natural language inference model, the processing device attributes a statement of the plurality of statements to one or more sentences of the one or more documents. Further, the processing device generates one or more annotated documents including at least one visual indication associating the statement with the one or more sentences.
Embodiments are disclosed for music generation. The method may include receiving a music prompt and one or more time-varying controls. A text-to-music generative model may generate a representation of music. The text-to-music generative model comprises a pretrained conditional generative model and an adapter control branch. The text-to-music generative model has been fine-tuned to generate the representation of music based on the music prompt and the one or more time-varying controls. The representation of music is converted to music audio and the music audio is output.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image depicting an entity and having a first quality level, adding noise to the input image based on the first quality level to obtain an intermediate noise image, and generating a restored image depicting the entity by denoising the intermediate noise image, where the restored image has a second quality level higher than the first quality level.
Artificial intelligence techniques for query management are described. A method comprises generating, by a context detection module, context information for a first query comprising natural language information to request a result from one of a plurality of machine learning models, modifying, by a query modification module, the first query based the context information to form a first modified query, determining, by an intent module, an intent type for the first modified query, selecting, by a routing module, a machine learning model from the plurality of machine learning models based on the intent type, and routing, by the routing module, the first modified query to the selected machine learning model. Other embodiments are described and claimed.
Embodiments are disclosed for performing video instance segmentation to mask objects across frames of a video. The method may include obtaining a frame of a video sequence where the frame depicts an object. The method further includes determining a calibrated feature of the frame using temporal information associated with a past frame. The method further includes determining a pixel embedding using the calibrated feature. The method further includes determining an object token using a past object token associated with the past frame and the pixel embedding. The method further includes generating a masked frame using the object token and the pixel embedding. The masked frame includes a masked object corresponding to the object.
G06V 20/40 - ScènesÉléments spécifiques à la scène dans le contenu vidéo
G06V 10/62 - Extraction de caractéristiques d’images ou de vidéos relative à une dimension temporelle, p. ex. extraction de caractéristiques axées sur le tempsSuivi de modèle
G06V 10/70 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique
11.
Visually Similar Variable Font Custom Instance Extraction using Differentiable Rasterizer
Variable font visual similarity search techniques are described. In an implementation, a query is received referencing an input font for performing a visual similarity search. A search result is generated specifying at least one variable font that is visually similar to the input font by searching a plurality of variable fonts based on the query. The search includes forming a plurality of instances for the at least one variable font, respectively, by adjusting a plurality of axes usable to change an appearance of the at least one variable font and identifying the at least one variable font by comparing the plurality of instances with the input font using a machine-learning model. The search result is presented for display in a user interface.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate instructions for performing a next action of a task. For instance, in some cases, the disclosed systems receive, from a client device interacting with a software application, a query for performing a task via a user interface of the application. The disclosed systems generate a lookahead prompt having an execution example corresponding to the task, the execution example including an example task and an example action sequence for the example task. The disclosed systems also generate, from the lookahead prompt using a large language model, an estimated lookahead plan describing one or more actions for performing the task. The disclosed systems also use one or more large language models to generate, from the estimated lookahead plan, instructions to perform a next action for the task via user interaction with an interactive element of the user interface.
Techniques are disclosed for direct manipulation of implicitly defined digital three-dimensional (3D) shapes. In an example method, a computing device renders a 3D shape based on an implicit definition including one or more parameters. The computing device receives an indication of an input indicating a modification to the 3D shape at a point. The computing device determines an alternative representation of the point. The computing device determines a position of the point based on the alternative representation. The computing device determines a transformation that relates the position to the one or more parameters. The computing device determines a change in at least one parameter based on the transformation and the input. The computing device re-renders the 3D shape based on the implicit definition and the change in the at least one parameter. The re-rendered 3D shape includes the modification indicated by the input.
Change-of-thought machine-learning model debiasing techniques and systems are described. A query is received and context data is produced based on the query, e.g., from an external source. A prompt is generated that includes the context data, the query, and a chain-of-though prompt, which is processed by a machine-learning model. A candidate result based on processing of the prompt using the machine-learning model. The candidate result includes a candidate answer and a chain-of-thought result describing reasoning indicated by the machine-learning model as used in generating the candidate answer.
Methods and systems are provided for facilitating generation and/or presentation of governing label recommendations for data. In embodiments described herein, a representation of a data schema associated with a dataset having a plurality of attributes is obtained. A governing label for a particular attribute of the plurality of attributes is identifying, via a machine learning model, based on the representation of the data schema associated with the dataset. Thereafter, a recommendation to assign the governing label to the particular attribute in the dataset is presented.
G06F 16/383 - Recherche caractérisée par l’utilisation de métadonnées, p. ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
Content relevance based table query answering is described. In one or more examples, a query and a table are received. The table includes a plurality of cells. A plurality of scores for calculated that correspond to the plurality of cells based on the query. One or more machine-learning models are then leveraged to generate a search result from the query, table, and scores, which is presented in a user interface for display.
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide assistive guides for path tracing of raster images. In particular, in one or more implementations, the disclosed systems determine a set of outlines corresponding to boundaries of a set of segments within a raster image. The disclosed systems select, from the set of outlines, an outline corresponding to a segment in response to a client device input indicating point(s) located within a threshold distance of the outline. The disclosed systems provide, for display within a graphical user interface of a client device, a highlighted indication of the outline corresponding to the segment. The disclosed systems generate, within a vector image, a vector path based on the outline corresponding to the segment in response to a selection of the outline via the graphical user interface.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for predicting summary quality scores and determining summary generation costs of large language models to generate a digital document summary. In particular, in one or more embodiments, the disclosed systems extract one or more text segments from a digital document. Further, the disclosed systems generate, utilizing a quality prediction neural network, a predicted summary quality score for each of a plurality of large language models for the one or more text segments. Furthermore, the disclosed systems select a large language model from the plurality of large language models based on the predicted summary quality scores. Moreover, the disclosed systems generate, utilizing the selected large language model, a summary of the digital document.
Methods and systems are provided for evaluating edges of collapsed identity graphs for identity resolution. In embodiments described herein, a collapsed state of identity graphs, such as based on an identity namespace limit being exceeded by the identity graphs, is determined by applying an identity node and edge of an incoming record to the identity graphs. A temporary state of the identity graphs is determined by pruning edges of the collapsed state. A non-collapsed state of the identity graphs that includes the edge of the incoming record is determined by applying the edge of the incoming record to the temporary state. A different edge is determined to be pruned from the non-collapsed state as when the different edge is applied to the temporary state with the edge of the incoming record, the temporary state collapses into the collapsed state. An identity graph is updated based on the non-collapsed state.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for providing a mixed level-of-detail rendering of a vector graphics document for display and modification via a client device while downloading and rendering the full vector graphics document. In particular, the disclosed systems download, in response to a request to load a vector graphics document at a client device, a raster image of the vector graphics document. Moreover, the disclosed systems select and download a vector graphic subunit of the vector graphics document, for example, by selecting a priority graphic design layout boundary. Furthermore, the disclosed systems provide, for display via the client device, a mixed level-of-detail rendering comprising the raster image of the vector graphics document and the vector graphic subunit as an overlay of the raster image such that the client device can modify the vector graphic subunit while downloading an additional vector graphic subunit.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that intelligently resize fill regions when generating content for a digital image. For instance, in one or more embodiments, the disclosed systems identify a fill region for a digital image. The disclosed systems intelligently deriving source image bounds based on one or more parameters of a generative model. Furthermore, the disclosed systems generate, utilizing the generative model, a content fill from the source image bounds and the digital image. The disclosed systems resize the content fill and generate a modified digital image including the resized content fill in a location of the fill region of the digital image.
G06T 11/60 - Édition de figures et de texteCombinaison de figures ou de texte
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p. ex. glissement, rotation, agrandissement ou changement de couleur
G06T 11/40 - Remplissage d'une surface plane par addition d'attributs de surface, p. ex. de couleur ou de texture
09 - Appareils et instruments scientifiques et électriques
Produits et services
Downloadable software for using artificial intelligence models for content generation and management; downloadable software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation from user prompts, image editing, and for generating translations; downloadable application programming interface (API) software.
23.
Vector Object Generation from Raster Objects using Semantic Vectorization
Semantic vectorization techniques are described that support generating and editing of vector objects from raster objects. A raster object, for instance, is received as an input by a semantic vectorization system. The raster object is utilized by the semantic vectorization system to generate a semantic classification for the raster object. The semantic classification identifies semantic objects in the raster image. The semantic vectorization system leverages the semantic classification to generate vector objects. As a result, the vector objects resemble the semantic objects in the raster object.
G06T 11/20 - Traçage à partir d'éléments de base, p. ex. de lignes ou de cercles
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p. ex. glissement, rotation, agrandissement ou changement de couleur
G06F 18/214 - Génération de motifs d'entraînementProcédés de Bootstrapping, p. ex. ”bagging” ou ”boosting”
G06F 18/22 - Critères d'appariement, p. ex. mesures de proximité
G06V 30/19 - Reconnaissance utilisant des moyens électroniques
G06V 30/262 - Techniques de post-traitement, p. ex. correction des résultats de la reconnaissance utilisant l’analyse contextuelle, p. ex. le contexte lexical, syntaxique ou sémantique
24.
ALIGNED VISION-LANGUAGE MODEL FOR TEXT-RICH IMAGE UNDERSTANDING
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating and implementing a vision-language model that identifies and understands text-rich content depicted in digital images. For example, the disclosed systems determine, from among a plurality of digital images with at least a threshold probability of depicting text-rich content, a subset of digital images corresponding to a set of text-rich image classifications. In some embodiments, the disclosed systems generate a ground truth text phrase utilizing an optical character recognition model to process a digital image from the subset of digital images. In certain embodiments, the disclosed systems also generate a predicted text phrase utilizing a vision-language model and compare the ground truth text phrase with the predicted text phrase. In some embodiments, the disclosed systems modify parameters of the vision-language model based on comparing the ground truth text phrase and the predicted text phrase.
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06T 3/40 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement
G06V 10/762 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant le regroupement, p. ex. de visages similaires sur les réseaux sociaux
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
A method, apparatus, non-transitory computer readable medium, and system for data processing include obtaining a query relating to a document and identifying metadata for the document based on the query, where the metadata describes a structure including a plurality of portions of the document. Some embodiments including generating, using a machine learning model, a retrieval command based on the query and the metadata, selectively retrieving at least one of the plurality of portions of the document based on the retrieval command, and generating, using the machine learning model, a response to the query based on the at least one of the plurality of portions of the document.
Techniques for neural based geometry in bounding volume hierarchies are described for enabling identification of properties of geometric objects of a scene. In an example, a processing device is operable to receive a bounding volume hierarchy that partitions geometric objects of a three-dimensional scene into bounding volumes individually assigned to respective nodes. At least one said node includes a neural representation encoding neural network information representing a respective said geometric object. The processing device is further operable to render the scene using the bounding volume hierarchy by constructing the respective said geometric object using the neural representation. The processing device is further operable to present the rendered scene for display in a user interface.
Some aspects relate to technologies providing a framework for generating captions from chart visualizations. In accordance with some aspects, input data for a chart is received that includes an indication of the chart type and chart data for the chart. Using the chart data, insight data is determined for each of a number of insight types defined for the chart type. The insight data can be generated using a rule set defined for each insight type. Using the insight data, a caption is generated with natural language text for each insight type. A user interface is provided that includes the chart and at least one of the captions.
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining an input prompt. A customized residual is added to a base parameter of an image generation model based on an element of the input prompt to obtain an updated parameter. The customized residual is determined based on the element of the input prompt. A synthesized image is generated using the image generation model with the updated parameter. The synthesized image depicts the element based on the input prompt.
Methods, computer systems, computer storage media, and graphical user interfaces are provided for facilitating management of brand representations. In one implementation, a set of brand guidelines associated with various guideline categories (e.g., text and imagebased guidelines) is obtained. Thereafter, a set of actionable guidelines is identified for the various guideline categories using an artificial intelligence model(s) (e.g., LLM). In accordance with obtaining brand-inclusive content associated with a brand, brand conformity data is generated, via the artificial intelligence model(s), to indicate an extent of conformity of the brand-inclusive content to at least one actionable guideline. Such brand conformity data can be provided for display to convey brand conformance of the brand-inclusive content.
Some aspects relate to technologies for using machine learning models to predict latency for executing neural networks on various hardware configurations. In accordance with some aspects, a neural network representation for a target neural network having a plurality of layers is received. A first machine learning model groups layers of the target neural network to provide a plurality of layer groups based on the neural network representation, with at least one layer group comprising multiple layers from the target neural network that can be executed by a single operation. A second machine learning model generates a latency prediction for executing the target neural network on a target hardware configuration based on the layer groups.
09 - Appareils et instruments scientifiques et électriques
Produits et services
(1) Downloadable software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation from user prompts, image editing, and for generating translations; downloadable software for using artificial intelligence models for content generation and management; downloadable application programming interface (API) software.
A corrective noise system receives an electronic version of a fillable form generated by a segmentation network and receives a correction to a segmentation error in the electronic version of the fillable form. The corrective noise system is trained to generate noise that represents the correction and superimpose the noise on the fillable form. The corrective noise system is further trained to identify regions in a corpus of forms that are semantically similar to a region that was subject to the correction. The generated noise is propagated to the semantically similar regions in the corpus of forms and the noisy corpus of forms is provided as input to the segmentation network. The noise causes the segmentation network to accurately identify fillable regions in the corpus of forms and output a segmented version of the corpus of forms having improved fidelity without retraining or otherwise modifying the segmentation network.
G06V 30/262 - Techniques de post-traitement, p. ex. correction des résultats de la reconnaissance utilisant l’analyse contextuelle, p. ex. le contexte lexical, syntaxique ou sémantique
G06V 30/19 - Reconnaissance utilisant des moyens électroniques
G06V 30/414 - Extraction de la structure géométrique, p. ex. arborescenceDécoupage en blocs, p. ex. boîtes englobantes pour les éléments graphiques ou textuels
33.
RELATIONAL LOSS FOR ENHANCING TEXT-BASED STYLE TRANSFER
An image generation system accessing an input image displayed on a user interface. The image generation system receives, via the user interface, a target style text defining a target style for a stylized image to be generated based on the input image and a request to generate the stylized image. The image generation system generates the stylized image. Generating the stylized image includes applying a text guided image generation model to the input image and the target style text, wherein the text guided image generation model minimizes a loss between a first relationship between the generated stylized image and a set of style templates and a second relationship between the target style text and the set of style templates. The image generation system displays, via the user interface responsive to receiving the request, the generated stylized image.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating visualizations of mask correlations for a layer of a digital image. The disclosed system determines one or more bounding boxes corresponding to one or more hidden areas or one or more visible areas of a layer of a digital image according to a raster mask or a vector mask corresponding to the layer. The disclosed system determines display attributes for the one or more bounding boxes in response to determining that the one or more bounding boxes correspond to the one or more hidden areas or the one or more visible areas. The disclosed system generates, for display with the layer within a graphical user interface, one or more boundary highlights representing the one or more bounding boxes with the display attributes.
Embodiments include a method, apparatus, system and computer-readable medium for generating a set of input features based on user account data associated with a user account, generating a hidden Markov model based on the set of input features, generating a predicted subscription probability matrix comprising probability values representing potential account interactions between the user account a set of computing applications, modifying one or more probability values of the predicted subscription probability matrix to form a modified predicted subscription probability matrix, and determining a predicted account interaction metric for the user account based on the modified predicted subscription probability matrix. Other embodiments are described and claimed.
Methods, systems, and non-transitory computer readable storage media are disclosed that utilizes machine learning models for patch retrieval and deformation in completing three-dimensional digital shapes. In particular, in one or more implementations the disclosed systems utilize a machine learning model to predict a coarse completion shape from an incomplete 3D digital shape. The disclosed systems sample coarse 3D patches from the coarse 3D digital shape and learn a shape distance function to retrieve detailed 3D shape patches in the input shape. Moreover, the disclosed systems learn a deformation for each retrieved patch and blending weights to integrate the retrieved patches into a continuous surface.
G06T 17/20 - Description filaire, p. ex. polygonalisation ou tessellation
G06V 10/22 - Prétraitement de l’image par la sélection d’une région spécifique contenant ou référençant une formeLocalisation ou traitement de régions spécifiques visant à guider la détection ou la reconnaissance
G06V 10/75 - Organisation de procédés de l’appariement, p. ex. comparaisons simultanées ou séquentielles des caractéristiques d’images ou de vidéosApproches-approximative-fine, p. ex. approches multi-échellesAppariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexteSélection des dictionnaires
38.
IDENTIFYING AND ALIGNING VIDEO CLIPS FROM LARGE-SCALE VIDEO DATASETS
Embodiments are disclosed for retrieving videos for a semantic and temporal alignment between a pair of video clips. The method may include receiving a query video clip. The method may further include determining alignment ratios between the query video clip and one or more candidate video clips. The method may further include identifying an alignable video clip from the one or more candidate video clips based on the alignment ratios. The method may further include aligning the alignable video clip with the query video clip.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that perform text-to-image editing using executable code generated from natural language text input. For instance, in one or more embodiments, the disclosed systems receive, from a client device, a digital image and natural language text input providing instructions for modifying the digital image. The disclosed systems also generate, using a large language model, executable action code for modifying the digital image in accordance with the instructions of the natural language text input, the executable action code being compatible with an editing application. The disclosed systems further modify the digital image by executing the executable action code via the editing application and provide the modified digital image for display via a graphical user interface of the client device.
Methods and systems are provided for using Shapley values to evaluate prompt generation parameters. In embodiments described herein, a selection of prompt parameters are accessed. A plurality of prompts are generated as a function of a combination of the prompt parameters. A corresponding quality metric is determined for each of the prompts. Prompt parameter contribution metrics are determined using a Shapley-value-based determination corresponding to a contribution of each of the prompt parameters to the corresponding content quality metric for each of the prompts. The prompt parameter contribution metrics are then displayed.
In implementation of techniques for sampling light directions on neural materials, a computing device implements a light direction system to receive neural features of a material and an indication of a view direction toward the material. Using a mixture of analytical lobes, a normalizing flow, or a histogram prediction, the light direction system predicts a probability density function (PDF). The light direction system then samples the PDF, calculates prominence values for each of a plurality of candidate light directions based on the PDF, and determines a light direction based on the prominence values.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for editing shadows in digital images. In particular, in some embodiments, the disclosed systems determine, utilizing a lighting estimation network, an environment map for a digital image, the environment map comprising a dominant light. In addition, in some embodiments, the disclosed systems generate, utilizing a lighting diffusion network, a diffused image from the digital image, the diffused image comprising smoothed shading. Moreover, in some embodiments, the disclosed systems generate, utilizing a shadow synthesis network, a shadowed image from the diffused image and a modified environment map comprising a modified dominant light. Furthermore, in some embodiments, the disclosed systems generate, from the diffused image and the shadowed image, a modified digital image comprising an edited shadow.
In implementation of techniques for three-dimensional reconstructions based on Gaussian primitives, a computing device implements a reconstruction system to receive a first digital image depicting an object from a first angle and a second digital image depicting the object from a second angle. The reconstruction system segments the first digital image and the second digital image into patches. The reconstruction system then generates, using a machine learning model, three-dimensional Gaussian primitives that predict parameters of points of the object in a three-dimensional space that correspond on a per-pixel basis to pixels of the patches. The reconstruction system then forms a three-dimensional reconstruction of the object for display in a user interface by merging the three-dimensional Gaussian primitives.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that upscale AI-generated digital content via tile-based super resolution. For instance, in one or more embodiments, the disclosed systems determine a first set of tiles from a digital image having a set of pixels to be replaced with a generated content portion. The disclosed systems further determine a second set of tiles from a first modified digital image that corresponds to the digital image and includes the generated content portion at a first resolution. Based on the first set of tiles and the second set of tiles, the disclosed systems use a super resolution neural network to generate a second modified digital image that corresponds to the digital image and includes the generated content portion at a second resolution that is higher than the first resolution.
G06T 3/4053 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur la super-résolution, c.-à-d. où la résolution de l’image obtenue est plus élevée que la résolution du capteur
G06T 3/4046 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement utilisant des réseaux neuronaux
46.
Interactive Network for Selecting, Ranking, Summarizing, and Exploring Data Insights
Insight summary and prompt generation techniques are described. In one or more examples, a plurality of insights is generated from data extracted from digital content. A network representation is produced having a plurality of nodes based on the plurality of insights and a plurality of connections between corresponding insights. A selection is received of a subset of nodes from the plurality of nodes. A prompt is formed by grouping respective insights from the subset of nodes. An insight summary of the digital content is generated based on the prompt using generative artificial intelligence as implemented using one or more machine-learning models. The insight summary is then presented for output in a user interface.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that detects shadows, removes shadows, and synthesizes shadows in a joint-framework. In particular, the disclosed systems access an object mask of an object and a digital image depicting the object and a shadow of the object. Furthermore, the disclosed systems perform object-centered shadow detection and removal to generate a modified digital image without the shadow by utilizing a shadow analyzer model. Moreover, the disclosed systems receive a user interaction to manipulate an object and generate a modified shadow utilizing a shadow synthesis model where the shadow synthesis model is conditioned on a shadow mask generated by the shadow analyzer model.
Systems and methods for natural language processing are described. Embodiments of the present disclosure identify a task set including a plurality of pseudo tasks, wherein each of the plurality of pseudo tasks includes a support set corresponding to a first natural language processing (NLP) task and a query set corresponding to a second NLP task; update a machine learning model in an inner loop based on the support set; update the machine learning model in an outer loop based on the query set; and perform the second NLP task using the machine learning model.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for removing objects from an image stream at capture time of a digital image. For example, the disclosed system contemporaneously detects and segments objects from a digital image stream being previewed in a camera viewfinder graphical user interface of a client device. The disclosed system removes selected objects from the image stream and fills a hole left by the removed object with a content aware fill. Moreover, the disclosed system displays the image stream with the removed object and content fill as the image stream is previewed by a user prior to capturing a digital image from the image stream.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image and a text prompt including an image modification request, generating a text response based on the input image and the text prompt, where the text response describes a modification to the input image corresponding to the image modification request, and generating a synthetic image based on the input image and an output embedding of a language generation model, where the synthetic image depicts the modification to the input image.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image generation include obtaining a text prompt and a noise input, and then generating a synthetic image based on the text prompt and the noise input by performing a single pass with an image generation model. The image generation model is trained based on a multi-term loss comprising a positive term based on an output of a pre-trained model, and a negative term based on an output of a jointly-trained model.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating digital images with a diffusion-based generative neural network conditioned on background-extracted lighting features. The disclosed system determines, in response to a request to generate a digital image, a target background image for inserting a foreground object into the target background image. The disclosed system generates, from the target background image and utilizing a lighting conditioning neural network, a lighting feature representation indicating one or more lighting parameters of the target background image. Additionally, the disclosed system generates, utilizing a diffusion-based generative neural network conditioned on the lighting feature representation, the digital image including the foreground object inserted into the target background image based on a composite image comprising the foreground object and the target background image with a foreground mask corresponding to the foreground object.
G06T 5/50 - Amélioration ou restauration d'image utilisant plusieurs images, p. ex. moyenne ou soustraction
G06T 5/77 - RetoucheRestaurationSuppression des rayures
G06V 10/56 - Extraction de caractéristiques d’images ou de vidéos relative à la couleur
G06V 10/60 - Extraction de caractéristiques d’images ou de vidéos relative aux propriétés luminescentes, p. ex. utilisant un modèle de réflectance ou d’éclairage
G06V 10/771 - Sélection de caractéristiques, p. ex. sélection des caractéristiques représentatives à partir d’un espace multidimensionnel de caractéristiques
H04N 5/272 - Moyens pour insérer une image de premier plan dans une image d'arrière plan, c.-à-d. incrustation, effet inverse
54.
Techniques for Triangle-level Rejection Sampling in Three-dimensional Object Meshes
A graphics generation computing device applies triangle-level rejection sampling to generate a set of surface mesh point samples. A highly parallelized processor included in the graphics generation computing device generates a triangle-level sampling array that includes triangle-level sampling data for each triangle included in a 3D object mesh. Based on the data in the triangle-level sampling array, the highly parallelized processor determines a quantity of point samples in each triangle. The highly parallelized processor calculates, for each point sample, point sample location data that indicates a location of the point sample on a triangle. The highly parallelized processor modifies a set of point samples to include the location data. In some cases, the set of point samples is used to generate digital fibers or other structure data objects at the point sample locations indicated by the set of point samples.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image and an input mask, where the input image depicts a scene and the input mask indicates an inpainting region of the input image. A latent code is generated, using a generator network of an image generation model, based on the input image and the input mask. The latent code includes synthesized content in the inpainting region. A synthetic image is generated, using a decoder network of the image generation model, based on the latent code and the input image. The synthetic image depicts the scene from the input image outside the inpainting region and includes the synthesized content within the inpainting region, and the synthetic image comprises a seamless transition across a boundary of the inpainting region.
A method, non-transitory computer readable medium, apparatus, and system for data processing include obtaining, by a multi-touch attribution model, individual-level user interaction data from a digital content channel, and computing, using the multi-touch attribution model, channel contribution data based on the individual-level user interaction data. Some embodiments include training, using a training component, an aggregate attribution model based on the channel contribution data. Some embodiments include generating, using a calibration component, an individual channel contribution value for the digital content channel based on the channel contribution data and the aggregate attribution model.
In some embodiments, a computing system receives a representation of an object from a client device. The computing system generates a contact representation for hand-object interaction based on the representation of the object. The object-centric contact representation includes a contact map indicating contact points on the representation of the object, a hand part map indicating hand parts contacting the object, and a direction map comprising contact directions of the hand parts contacting the object. The computing system generates a hand grasp representation with respect to the object based on the contact representation using a model-based optimization algorithm. The computing system provides the hand grasp representation to the client device.
G06F 30/23 - Optimisation, vérification ou simulation de l’objet conçu utilisant les méthodes des éléments finis [MEF] ou les méthodes à différences finies [MDF]
G06T 17/10 - Description de volumes, p. ex. de cylindres, de cubes ou utilisant la GSC [géométrie solide constructive]
58.
Relightable Scene Reconstructions Using Radiance Guided Material Extraction
Techniques for relightable scene reconstructions using radiance guided material extraction are described to accurately render 3D scenes under different lighting conditions and perspectives than original source images from which the scenes are constructed. In an example, a processing device is operable to receive a plurality of digital images that depict a scene from multiple perspectives, determine a view-independent radiance of the scene based on the plurality of digital images, and determine a view-dependent radiance of the scene based on the plurality of digital images. The processing device is further operable to determine a set of lighting conditions associated with an input perspective, generate a synthesized image having a reconstruction of the scene based on the set of lighting conditions using the view-independent radiance and the view-dependent radiance, and output the synthesized image.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating and modifying databases using a fairness deduplication algorithm. In particular, in one or more embodiments, the disclosed systems generate, within an embedding space, semantic embeddings from a plurality of digital images stored in a database. In some embodiments, the disclosed systems identify, from among the semantic embeddings in the embedding space, a preservable embedding according to a preservation prototype indicating a semantic concept to preserve within the database. In one or more embodiments, the disclosed systems generate a modified database by pruning one or more digital images corresponding to semantic embeddings other than the preservable embedding from the database.
G06T 11/60 - Édition de figures et de texteCombinaison de figures ou de texte
G06V 10/762 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant le regroupement, p. ex. de visages similaires sur les réseaux sociaux
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
60.
GENERATING VISUALLY AWARE DESIGN LAYOUTS USING A MULTI-DOMAIN DIFFUSION NEURAL NETWORK
The present disclosure relates to systems, methods, and non-transitory computer readable media that generate layouts for digital designs from image elements via multi-domain diffusion. For instance, in some embodiments, the disclosed systems receive, from a client device, a plurality of image elements for generating a digital design. The disclosed systems generate, using an encoder of a multi-domain diffusion neural network, embeddings representing visual characteristics and bounding box characteristics of the plurality of image elements. The disclosed systems further generate, using the multi-domain diffusion neural network, a layout for the digital design from the visual characteristics and bounding box characteristics of the embeddings. Additionally, the disclosed systems provide the layout for display on the client device.
Generative artificial intelligence visual effect techniques are described. A prompt, for example, is received. The prompt includes text specifying a visual effect and text specifying a shape. A mask is formed defining a portion of digital content based on an object selected from digital content. The visual effect is generated using generative artificial intelligence by one or more machine-learning models based on the text specifying the visual effect, the text specifying the shape, and the mask. The digital content is presented as having the visual effect applied to the portion of the digital content for display in a user interface.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image and an input mask, wherein the input mask indicates a region of the input image to be modified and generating, using a first image generation model, an intermediate result based on the input image and the input mask, wherein the intermediate result modifies the region of the input image indicated by the input mask. A second image generation model generates a synthetic image based on the input image and the intermediate result, wherein the synthetic image depicts the input image with content from the modified region at a higher level of detail than the intermediate result.
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
Methods and systems are provided for facilitating document collaboration in accordance with collaboration controls. In embodiments, an indication of a collaboration control for a collaborator of a document is obtained. The collaboration control generally indicates an edit permission for a document section of the document in relation to the collaborator. Thereafter, a set of collaboration control data for the document is generated. In embodiments, the set of collaboration control data includes the collaboration control indicating the edit permission for the document section of the document in relation to the collaborator. Based on an input (e.g., edit) by the collaborator to the document section of the document, a determination is made, using the set of collaboration control data, as to whether to enable an edit to the document section of the document.
H04L 65/401 - Prise en charge des services ou des applications dans laquelle les services impliquent une session principale en temps réel et une ou plusieurs sessions parallèles additionnelles en temps réel ou sensibles au temps, p. ex. accès partagé à un tableau blanc ou mise en place d’une sous-conférence
G06F 40/166 - Édition, p. ex. insertion ou suppression
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a pattern prompt and a text image, where the pattern prompt describes a visual pattern and the text image depicts text, generating a pattern image based on the pattern prompt, where the pattern image depicts the visual pattern, and generating a patterned text image based on the pattern image and the pattern prompt.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining an input prompt and retrieving an intermediate noise state based on a similarity between the input prompt and a candidate prompt corresponding to the intermediate noise state. An image generation model generates a synthetic image based on the input prompt and the intermediate noise state.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input texture image and a plurality of image masks, generating a plurality of image assets corresponding to the plurality of image masks based on the input texture image, and generating a combined asset including the plurality of image assets. The plurality of image assets have a consistent texture based on the input texture image.
Embodiments described herein provide methods and systems for facilitating actively-learned context modeling. In one embodiment, a subset of data is selected from a training dataset corresponding with an image to be compressed, the subset of data corresponding with a subset of data of pixels of the image. A context model is generated using the selected subset of data. The context model is generally in the form of a decision tree having a set of leaf nodes. Entropy values corresponding with each leaf node of the set of leaf nodes are determined. Each entropy value indicates an extent of diversity of context associated with the corresponding leaf node. Additional data from the training dataset is selected based on the entropy values corresponding with the leaf nodes. The updated subset of data is used to generate an updated context model for use in performing compression of the image.
H04N 19/182 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant un pixel
H04N 19/184 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant des bits, p. ex. de flux vidéo compressé
H04N 19/50 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage prédictif
H04N 19/91 - Codage entropique, p. ex. codage à longueur variable ou codage arithmétique
68.
MULTI-MODAL RETRIEVAL USING AN INTERMEDIATE NOISE STATE
A method, apparatus, non-transitory computer readable medium, and system for data processing include obtaining a text prompt and generating a first intermediate noise state based on the text prompt, retrieving a second intermediate noise state based on the text prompt and the first intermediate noise state, and generating a synthetic image based on the text prompt and the second intermediate noise state.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for hierarchical entity segmentation. In particular, in one or more embodiments, the disclosed systems receive a digital image comprising a plurality of object entities. In addition, in some embodiments, the disclosed systems generate, utilizing a segmentation model comprising parameters generated according to pseudo-labels indicating hierarchies of segmentation masks for a set of training digital images, a hierarchical segmentation indicating hierarchical relations of the plurality of object entities of the digital image. Moreover, in some embodiments, the disclosed systems generate, for the digital image, a segmentation map from the hierarchical segmentation of the plurality of object entities.
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/762 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant le regroupement, p. ex. de visages similaires sur les réseaux sociaux
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
70.
GENERATING DIGITAL CONTENT CONSISTENT WITH CONTEXT-SPECIFIC GUIDELINES UTILIZING PROMPT AUGMENTATION AND MODEL TUNING
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a contextual content generation system that trains and implements a unique machine learning architecture to generate context-specific digital content items based on a digital guideline document. In particular, the disclosed systems select a content generation method from among prompt engineering and/or updating one or more machine learning models to generate digital content. For example, the disclosed systems utilize machine learning models to extract key elements from a digital guideline document comprising context-specific guidelines for digital content. Further, the disclosed systems generate an augmented prompt comprising indications of key elements from the digital guideline document. In addition, the disclosed systems select a content generation method from among prompt engineering and/or updating machine learning models to generate the digital content item which incorporates digital content corresponding to the context-specific guidelines based on the augmented prompt.
The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement an image filter for enhancing light text and removing document shadows. In particular embodiments, the disclosed systems use a modified adaptive thresholding approach the relies on image gradients to efficiently guide the thresholding process. In addition, the disclosed systems use a machine-learning model to generate a document shadow map. The document shadow map can include text reflections. Accordingly, the disclosed systems remove text reflections from the document shadow map (e.g., by using an interpolated shadow intensity value of neighboring shadow map pixels). In turn, the disclosed systems use the document text mask and the document shadow map cleaned of text reflections to remove shadows from the digital image. Further, the disclosed systems enhance text in the shadow-removed digital image based on contrast stretching.
G06T 5/40 - Amélioration ou restauration d'image utilisant des techniques d'histogrammes
G06T 5/60 - Amélioration ou restauration d'image utilisant l’apprentissage automatique, p. ex. les réseaux neuronaux
G06T 5/92 - Modification de la plage dynamique d'images ou de parties d'images basée sur les propriétés globales des images
G06V 10/82 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant les réseaux neuronaux
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a style kit including a first image generation input indicating a first image attribute, a second image generation input indicating a second image attribute, and a selectability parameter indicating that the second image generation input is selectable. A third image generation input is received from a user based on the selectability parameter, wherein the third image generation input indicates a third image attribute different from the second image attribute of the second image generation input. An image generation model generates a synthetic image based on the style kit, the first image generation input, and the third image generation input, wherein the synthetic image has the first image attribute and the third image attribute.
Methods and systems are provided for using reinforcement learning to recommend data visualizations. In embodiments described herein, statistical features for each sample of corresponding samples of a dataset are determined by applying each sample of the dataset to a data visualization recommendation model. The computational cost of each of the statistical features for each of the samples is determined based via a regression model. Recommended statistical features are determined by sequentially applying each sample to a reinforcement learning model with a computational budget and with the corresponding computational costs of the statistical features of each sample. A data visualization is then displayed that is generated by applying the dataset and the recommended statistical features to the data visualization recommendation model.
Embodiments are disclosed for correlating video sequences and audio sequences by a media recommendation system using a trained encoder network. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training input including a media sequence, including a video sequence paired with an audio sequence, segmenting the media sequence into a set of video sequence segments and a set of audio sequence segments, extracting visual features for each video sequence segment and audio features for each audio sequence segment, generating, by transformer networks, contextualized visual features from the extracted visual features and contextualized audio features from the extracted audio features, the transformer networks including a visual transformer and an audio transformer, generating predicted video and audio sequence segment pairings based on the contextualized visual and audio features, and training the visual transformer and the audio transformer to generate the contextualized visual and audio features.
G06V 10/774 - Génération d'ensembles de motifs de formationTraitement des caractéristiques d’images ou de vidéos dans les espaces de caractéristiquesDispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant l’intégration et la réduction de données, p. ex. analyse en composantes principales [PCA] ou analyse en composantes indépendantes [ ICA] ou cartes auto-organisatrices [SOM]Séparation aveugle de source méthodes de Bootstrap, p. ex. "bagging” ou “boosting”
G06V 10/74 - Appariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques
G06V 20/40 - ScènesÉléments spécifiques à la scène dans le contenu vidéo
G10L 25/03 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de paramètres extraits
G10L 25/57 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour le traitement des signaux vidéo
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining an input image depicting an entity and a skeleton map depicting a pose of the entity and performing a cross-attention mechanism between image features of the input image and entity features representing the pose to obtain modified image features. An output image is generated based on the modified image features that depicts the entity with the pose.
Generative artificial intelligence (AI) content strategy techniques are described. In one or more examples, a content brief is received describing a goal to be achieved in controlling digital content output. Content brief data is extracted from the content brief and a content strategy is generated based on the content brief data using generative artificial intelligence implemented using one or more machine-learning models.
Aspects and features of the present disclosure relate to providing injective three-dimensional (3D) deformations based on two-dimensional (2D) mesh deformations. For example, a method involves defining at least one 2D mesh deformation based on a designated position of an object represented by an input neural radiance field (NeRF). The method also involves applying the 2D mesh deformation(s) to a 3D piecewise-linear map that operates over a plane and preserves a normal direction to produce prismatic maps. The method further involves composing a 3D deformation for the object from layers defined by the prismatic maps, and parameterizing the 3D piecewise-linear map. The method additionally involves storing or rendering, using the 3D piecewise-linear map, a deformed NeRF injectively representing the object in the designated position. Aspects also include computer systems, apparatus, and computer programs configured to perform the method.
Some aspects relate to technologies for an artificial intelligent (AI) system that, among other things, enhances responses to concepts questions for an application with contextual usage insights. In accordance with some aspects, a user query is determined to comprise a concepts question regarding an application. Responsive to determining the user query comprises the concepts question, documentation regarding the application relevant to the user query is identified. A generative model generates text for a response to the concepts question using the documentation regarding the application. Additionally, a determination is made to add contextual usage insights to the response. Responsive to determining to add contextual usage insights to the response, usage data relevant to the user query and/or the response is retrieved. The generative model generates text for a final response using the response and the usage data, and the final response is provided to a user device for presentation.
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a digital design interface for intuitively creating custom arrows that demonstrate both visual consistency and inherent directionality within vector-based design applications. In particular, in one or more implementations, the disclosed systems receive a request to create a custom arrow from a digital object and a path segment. In addition, the disclosed systems detect that the digital object is within a threshold distance of the path segment and combine the digital object with the path segment to create a custom arrow object. In particular, the disclosed systems utilize a bilateral segmentation machine-learning model to segment the digit object and a symmetry axis detection model to determine an axis of symmetry of the digital object. Moreover, the disclosed systems attach the digital object to an endpoint of the path segment at the axis of symmetry.
41 - Éducation, divertissements, activités sportives et culturelles
Produits et services
Educational and training services; educational and training services in the form of classroom training, online training, web based training, and video training in the fields of computer software, cloud computing, desktop publishing, digital publishing, electronic publishing, graphic design, marketing, advertising, analytics, e-commerce, digital asset management, data management, business management, business process management, business document and forms creation, and automation of business document and forms processing and workflow; educational services; educational services in the form of arranging professional workshops and training courses, conducting classes, seminars, conferences, and workshops in the fields of computer software, cloud computing, desktop publishing, digital publishing, electronic publishing, graphic design, marketing, advertising, analytics, e-commerce, digital asset management, data management, business management, business process management, business document and forms creation, and automation of business document and forms processing and workflow; educational and training sessions in the field of organization and business matters relating to creative professionals.
Embodiments are disclosed for performing a using a neural network to optimize filter weights of an adaptive filter. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving, by a filter, an input audio signal, wherein the input audio signal is a far-end audio signal, the filter including a transfer function with adaptable filter weights, generating a response audio signal modeling the input audio signal passing through the acoustic environment, receiving a target response signal, including the input audio signal and near-end audio signals, calculating an adaptive filter loss, generating, by a trained recurrent neural network, a filter weight update using the calculated adaptive filter loss, updating the adaptable filter weights of the transfer function to create an updated transfer function, generating an updated response audio signal based on the updated transfer function, and providing the updated response audio signal as an output audio signal.
G10L 21/0224 - Traitement dans le domaine temporel
G10L 25/18 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par le type de paramètres extraits les paramètres extraits étant l’information spectrale de chaque sous-bande
G10L 25/30 - Techniques d'analyse de la parole ou de la voix qui ne se limitent pas à un seul des groupes caractérisées par la technique d’analyse utilisant des réseaux neuronaux
82.
EDITING DIGITAL IMAGES WITH LOCAL REFINEMENT VIA SELECTIVE FEATURE TRIMMING
Methods, systems, and non-transitory computer readable storage media are disclosed for modifying digital images via a generative neural network with local refinement. The disclosed system generates, utilizing an encoder neural network, a latent feature vector of a digital image by encoding global context information of the digital image into the latent feature vector. The disclosed system also determines a modified latent feature vector by trimming the latent feature vector to a feature subset corresponding to a masked portion of the digital image. Additionally, the disclosed system generates, utilizing a generative decoder neural network on the modified latent feature vector, digital image data corresponding to the masked portion of the digital image. The disclosed system also generates a modified digital image including the digital image data corresponding to the masked portion combined with additional portions of the digital image.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating digital posters from digital documents with multimodal content using a deep submodular function. Specifically, the disclosed systems generate embedding vectors representing multimodal content of a digital document comprising text and images. Further, disclosed systems determine, utilizing a deep submodular function on the embedding vectors, a content subset comprising one or more digital images aligned with one or more text segments representative of the digital document. Moreover, the disclosed systems generate, utilizing a large language model, a summary of the multimodal content of the digital document from a prompt based on the content subset. Additionally, the disclosed systems generate, for display at a client device, a digital poster comprising the summary of the multimodal content generated via the large language model.
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining an input image and an input prompt, where the input image depicts an object and the input prompt describes a lighting condition for the object, generating relighted image features based on the input image and the input prompt, where the relighted image features represent the object with the lighting condition, and generating a synthetic image based on the relighted image features, where the synthetic image depicts the object with the lighting condition.
Embodiments are disclosed for using the curvature of text lines to detect a document boundary. The method may include receiving a warped image depicting a page of a document having an incomplete document boundary, the page including a plurality of text lines. A complete document boundary may be identified based on the incomplete document boundary and the plurality of text lines. A dewarped image corresponding to the warped image may be determined using the complete document boundary. The dewarped image may then be provided for display on a client device.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining an input prompt describing an image element, generating, using an image generation model, an output image depicting the image element and including a watermark, and identifying the training image as a source of the output image based on the watermark. The image generation model is trained using a training image including the image element and the watermark.
In one aspect, a computer-implemented method includes accessing, by a guidance module of an analysis application executing on a processor, wildcard data associated with data in a data repository. The method further includes displaying, by the guidance module based on the wildcard data, one or more wildcard elements in a graphical user interface (GUI). The method further includes receiving, by the analysis application, selection of a first wildcard element of the one or more wildcard elements. The method further includes displaying, by the guidance module, a suggestion based on the selection of the first wildcard element.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating digital images via a generative neural network with localized constraints. The disclosed system generates, utilizing one or more encoder neural networks, a sequence of embeddings comprising a prompt embedding representing a text prompt and an object text embedding representing a phrase indicating an object in the text prompt. The disclosed system generates, utilizing the one or more encoder neural networks, a visual embedding representing an object image corresponding to the object. The disclosed system determines a modified sequence of embeddings by replacing the object text embedding with the visual embedding in the sequence of embeddings. The disclosed system also generates, utilizing a generative neural network, a synthetic digital image from the modified sequence of embeddings comprising the visual embedding.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that selectively utilizes an image super-resolution model to upscale image patches corresponding to high frequency portions. In particular, the disclosed systems select a set of image patches corresponding to high frequency portions of a digital image at a first resolution. Furthermore, the disclosed systems utilize an image super-resolution model to generate upscaled image patches for the set of image patches of the high-frequency portions to a second resolution higher than the first resolution according to an upscaling factor of at least two. The disclosed systems generate a segmentation map of the digital image based on the upscaled image patches and an upscaled segmentation corresponding to low-frequency portions of the digital image. Further, the disclosed systems generate a vectorized digital image for the digital image according to the segmentation map.
G06T 3/4053 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur la super-résolution, c.-à-d. où la résolution de l’image obtenue est plus élevée que la résolution du capteur
Three dimensional aware video compositing techniques are described. In one or more examples, subject data is produced that defines a subject depicted in frames of a subject video and viewpoint data describing movement of a viewpoint with respect to the frames of the subject video. Three-dimensional data is formed that defines a three-dimensional representation of an environment depicted in frames of an environment video. A composited video is generated by aligning the environment with the movement of the viewpoint of the subject based on the subject data and the three-dimensional data, which is then rendered, e.g., presented for display in a user interface.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a structural input indicating a target spatial structure, encoding, using a condition encoder, the structural input to obtain a structural encoding representing the target spatial structure, and generating, using an image generation model, a synthetic image based on the structural encoding, where the synthetic image depicts an object having the target spatial structure.
A method, apparatus, non-transitory computer readable medium, and system for text-to-color palette generation include encoding a text prompt to obtain text embedding. A color embedding is generated based on the text embedding by performing a diffusion process. Then a color palette is generated based on the color embedding. The color palette includes a plurality of colors corresponding to the text prompt.
Methods, non-transitory computer readable media, apparatuses, and systems for data processing include obtaining, by a machine learning model, a user cluster and interaction data for users in the user cluster, where the interaction data relates to interactions between the users and a digital platform. Some embodiments further include generating, by the machine learning model, a directed graph based on the user cluster and the interaction data, where the directed graph represents causal relationships among the interactions. Some embodiments further include updating, by the machine learning model, the user cluster based on the directed graph. Some embodiments further include providing, by a content component, customized content to a user via the digital platform based on the updated user cluster.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for modifying a digital design by performing a selective object-level undo operation. In one or more embodiments, the disclosed systems generate a modified object by performing a series of operations on an object depicted within the digital design. In some embodiments, the disclosed systems receive a selective object-level undo operation on the modified object, wherein the request specifies an operation to undo from among the series of operations performed on the object. In one or more embodiments, the disclosed systems modify the modified object by performing the selective object-level undo operation on the modified object to undo the operation from among the series of operations. In some embodiments, the disclosed systems provide an updated digital design depicting the modified object reflecting modifications from the series of operations excluding the operation undone by the selective object-level undo operation.
G06F 3/0481 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] fondées sur des propriétés spécifiques de l’objet d’interaction affiché ou sur un environnement basé sur les métaphores, p. ex. interaction avec des éléments du bureau telles les fenêtres ou les icônes, ou avec l’aide d’un curseur changeant de comportement ou d’aspect
G06F 3/04842 - Sélection des objets affichés ou des éléments de texte affichés
G06F 3/04845 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p. ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs pour la transformation d’images, p. ex. glissement, rotation, agrandissement ou changement de couleur
G06F 30/12 - CAO géométrique caractérisée par des moyens d’entrée spécialement adaptés à la CAO, p. ex. interfaces utilisateur graphiques [UIG] spécialement adaptées à la CAO
Digital image visual aesthetic score generation techniques are described. In one or more examples, these techniques are implemented by a system including a training data collection module implemented by a processing device to collect training data including training digital images and user interaction data describing user interaction with the training digital images, respectively. A training module is configured to train a machine-learning model using the training data to generate an aesthetic score based on an input digital image. The aesthetic score is configured to specify an amount of visual aesthetics exhibited by the input digital image.
G06V 10/764 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique utilisant la classification, p. ex. des objets vidéo
G06V 20/70 - Étiquetage du contenu de scène, p. ex. en tirant des représentations syntaxiques ou sémantiques
96.
CONTROLLABLE VISUAL TEXT GENERATION WITH ADAPTER-ENHANCED DIFFUSION MODELS
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a text content image and a text style image. The text content image is encoded to obtain content guidance information and the text style image is encoded to obtain style guidance information. Then a synthesized image is generated based on the content guidance information and the style guidance information. The synthesized image includes text from the text content image having a text style from the text style image.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that iteratively generates, utilizing a machine learning model, text responses to reduce hallucinated content. In particular, in some embodiments, the disclosed systems receive a digital query and selects one or more supporting digital documents for the digital query. Furthermore, in some embodiments the disclosed systems generate a first text response from a first text prompt generated by using the digital query. Moreover, in some embodiments the disclosed systems extract a misalignment portion of the first text response by comparing the first text response and the one or more supporting digital documents. Additionally, from the misalignment portion of the first text response and the digital query, the disclosed systems further generate a second text response.
G06F 16/383 - Recherche caractérisée par l’utilisation de métadonnées, p. ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement utilisant des métadonnées provenant automatiquement du contenu
A method, apparatus, non-transitory computer readable medium, and system for generating suggested prompts include obtaining a sequence of text prompts associated with a user and determining a session concept for the user based on the sequence of text prompts. Embodiments then generate, using a prompt generation model, an image generation prompt based on the sequence of text prompts and the session concept. Subsequently, embodiments generate, using an image generation model, a synthetic image based on the image generation prompt.
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Downloadable software using artificial intelligence for collecting, compiling, converting, organizing, consolidating, collaborating on, sharing, and editing files, links, notes, and documents and for creating an information hub; downloadable assistant and chatbot software using artificial intelligence for preparing insights, notes, and citations based on document content and user input and for collaborating on or sharing the same with other users; downloadable software using artificial intelligence for content generation and management Software as a service (SAAS) services featuring software using artificial intelligence for collecting, compiling, converting, organizing, consolidating, collaborating on, sharing, and editing files, links, notes, and documents and for creating an information hub; software as a service (SAAS) services featuring assistant and chatbot software using artificial intelligence for preparing insights, notes, and citations based on document content and user input and for collaborating on or sharing the same with other users; software as a service (SAAS) services featuring software using artificial intelligence for content generation and management
100.
COMPLETING TEMPORAL KNOWLEDGE GRAPHS BASED ON ENHANCED ENTITY REPRESENTATION AND WEIGHTED FREQUENCY-BASED SAMPLING
The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate predicted relationships for entities of a temporal knowledge graph using enhanced entity representations. For instance, in one or more embodiments, the disclosed systems generate a query for predicting a relationship for a subject entity represented within a temporal knowledge graph. The disclosed systems further determine an enhanced entity representation generated for the subject entity by an enhancement layer of a temporal knowledge graph completion model, the enhanced entity representation including a combination of a connection-based similarity for the subject entity and a relationship-based similarity for the subject entity. Using the temporal knowledge graph completion model and based on the enhanced entity representation of the subject entity, the disclosed systems generate a predicted relationship for the subject entity.