Systems and methods for generating hierarchical queries from text queries are described. Embodiments are configured to encode a text query to obtain a text embedding. Then, embodiments select a field of a data schema by comparing the text embedding to a field embedding corresponding to the field. Subsequently, embodiments generate a hierarchical query including a value corresponding to the selected field. Some embodiments further include one or more formatting models configured to format values included in the text query.
Systems and methods for upsampling low-resolution content within a high-resolution image include obtaining a composite image and a mask. The composite image includes a high-resolution region and a low-resolution region. An upsampling network identifies the low-resolution region of the composite image based on the mask and generates an upsampled composite image based on the composite image and the mask. The upsampled composite image comprises higher frequency details in the low-resolution region than the composite image.
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining, via a user interface, an input image and a user input that indicates a frame for modifying the input image including a first region inside of the input image and a second region outside of the input image, and excluding a third region inside of the input image. A modified image is generated using an image generation model. The modified image includes original content from the input image in the first region and generated content in the second region, and excluding content from the input image in the third region. The modified image is presented for display in the user interface.
G06T 11/60 - Editing figures and textCombining figures or text
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
G06T 3/60 - Rotation of whole images or parts thereof
4.
USING GENERATIVE ARTIFICIAL INTELLIGENCE TO EVALUATE FINE-TUNED LANGUAGE MODELS
Methods and systems are provided for using generative artificial intelligence to evaluate fine-tuned language models. In embodiments described herein, natural language text snippets are generated via a generative language model based on corresponding data. A language model is fine-tuned into a fine-tuned language model via a language model fine-tuning component using the natural language text snippets and the corresponding data as training data. Independent natural language text snippets are generated via the generative language model based on the corresponding data. Each independent natural language text snippet is different than each corresponding natural language text snippet. An evaluation metric of the fine-tuned language model is generated via an evaluation component based on the independent natural language text snippets and the corresponding data.
Embodiments are disclosed for generating a rendering output using a hybrid stochastic layered alpha blending technique. In particular, in one or more embodiments, the method may include receiving a plurality of fragments for rendering into a composited output. The method may further include storing a set of fragments of the plurality of fragments for each pixel in a fragment buffer up to a per pixel fragment buffer limit. For each received fragment of the plurality of fragments in excess of the fragment buffer limit for the fragment buffer, a modified set of fragments is generated by probabilistically replacing a selected fragment in the fragment buffer with the received fragment. The resulting fragments in the fragment buffer after processing the plurality of fragments is a blending set of fragments. A composited output for the pixel is then rendered by blending the blending set of fragments.
In implementation of techniques for vector font generation based on cascaded diffusion, a computing device implements a glyph generation system to receive a sample glyph in a target font and a target glyph identifier. The glyph generation system generates a rasterized glyph in the target font using a raster diffusion model based on the sample glyph and the target glyph identifier, the rasterized glyph having a first level of resolution. The glyph generation system then generates a vector glyph using a vector diffusion model by vectorizing the rasterized glyph, the vector glyph having a second level of resolution different than the first level of resolution. The glyph generation system then displays the vector glyph in a user interface.
Various disclosed embodiments are directed to deriving, via a language model, a summary of data by converting or encoding table data into one or more natural language sentences, which are then used as input to the language model for generating the summary. One or more embodiments are additionally or alternatively directed to deriving, via a language model, a response to a user question or command via a chat interface by providing the language model with the generated summary as input. In this way, for example, the language model can use the summary as a prompt or other target context for providing a response.
The present disclosure relates to systems, methods, and non-transitory computer readable media that recommend editing presets based on editing intent. For instance, in one or more embodiments, the disclosed systems receive, from a client device, a user query corresponding to a digital image to be edited. The disclosed systems extract, from the user query, an editing intent for editing the digital image. Further, the disclosed systems determine an editing preset that corresponds to the editing intent based on an editing state of an edited digital image associated with the editing preset. The disclosed systems generate a recommendation for the editing preset for provision to the client device.
In implementation of techniques for removing image overlays, a computing device implements a reflection removal system to receive an input RAW digital image, the input RAW digital image including both a base image and an overlay image. Using a machine learning model, the reflection removal system segments the base image from the overlay image. The reflection removal system generates an output RAW digital image that includes the base image and displays the output RAW digital image in a user interface.
A high dynamic range editing system is configured to generate visualizations to aide digital image editing in both high dynamic ranges and standard dynamic ranges. In a first example, the visualization is generated as a histogram. In a second example, the visualization is generated to indicate high dynamic range capabilities. In a third example, the visualization is generated to indicate ranges of luminance values within a digital image. In a fourth example, the visualization is generated as a point curve that defines a mapping between detected luminance values from a digital image and output luminance values over both a standard dynamic range and a high dynamic range. In a fifth example, the visualization is generated as a preview to convert pixels from the digital image in a high dynamic range into a standard dynamic range.
Methods, computer systems, computer-storage media, and graphical user interfaces are provided for efficiently generating video insights based on text representations of videos. In embodiments, text data associated with a video is obtained. Thereafter, a model prompt to be input into a large language model is generated. The model prompt includes the text data associated with the video. As output from the large language model, a text representation that represents the video in natural language based on the text data is obtained. The text representation is provided as input into a machine learning model to generate a video insight that indicates context of the video.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
An example vector path trajectory imitation system is configured to create a new vector path or to extend an existing vector path based on a reference. In this manner, a user (e.g., artist, illustrator, or designer) does not need to tweak individual anchor points to align a trajectory of the new vector path with the trajectory of the reference. Instead, the user moves a position indicator (e.g., a mouse cursor) on a digital canvas in a freehand fashion while the vector path trajectory imitation system provides visual feedback to show the user how a resultant curve will look. When the user reaches a position on the digital canvas where a new vector path is to be drawn, the user can perform an action (e.g., releasing a mouse button) and the new vector path, which follows the trajectory of the reference, is created.
G06T 11/20 - Drawing from basic elements, e.g. lines or circles
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]Salient regional features
13.
ENCODING IMAGE VALUES THROUGH ATTRIBUTE CONDITIONING
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a text prompt and a conditioning attribute. The text prompt is encoded to obtain a text embedding. The conditioning attribute is encoded to obtain an attribute embedding. Then a synthesized image is generated using an image generation model based on the text embedding and the attribute embedding. The synthesized image has the conditioning attribute and depicts an element of the text prompt.
A method, apparatus, non-transitory computer readable medium, and system for media processing includes obtaining a text prompt and a style input, where the text prompt describes image content and the style input describes an image style, generating a text embedding based on the text prompt, where the text embedding represents the image content, generating a style embedding based on the style input, where the style embedding represents the image style, and generating a synthetic image based on the text embedding and the style embedding, where the text embedding is provided to the image generation model at a first step and the style embedding is provided to the image generation model at a second step after the first step.
A method, apparatus, non-transitory computer readable medium, and system for media processing includes obtaining a variation parameter and a number of variations, identifying a first variation input and a second variation input for the variation parameter, and obtaining a first media content item and a second media content item based on the first variation input and the second variation input, respectively. The first media content item and the second media content item vary from each other with respect to the variation parameter. The method, apparatus, non-transitory computer readable medium, and system for media processing further includes displaying the first media content item and the second media content item in a grid comprising a grid size based on the number of variations.
A method, apparatus, non-transitory computer readable medium, and system for generating synthetic images depicting an image element with a target composition include obtaining a content input and a composition input. The content input indicates an image element and the composition input indicates a target composition of the image element. Embodiments then encode the composition input to obtain a composition embedding representing the target composition. Subsequently, embodiments generate, using an image generation model, a synthetic image based on the content input and the composition embedding. The synthetic image depicts the image element with the target composition.
Systems and methods employ generative models to generate structured brand data and/or brand-aligned marketing content. In accordance with some aspects, brand source data for an entity is accessed. A first generative model generates structured brand data using the brand source data, wherein the structured brand data is generated to include a number of components. The first generative model or a second generative model generates brand-aligned marketing content using the structured brand data. Alignment scores are determined for the brand-aligned marketing content. Each alignment score corresponds to a component of the structured brand data. The brand-aligned marketing content and at least a portion of the alignment scores are provided for presentation.
A method, apparatus, non-transitory computer readable media, and system for image generation include obtaining an input text prompt and an indication of a level of a target characteristic, where the target characteristic comprises a characteristic used to train an image generation model. Some embodiments generate an augmented text prompt comprising the input text and an objective text corresponding to the level of the target characteristic. Some embodiments generate, using the image generation model, an image based on the augmented text prompt, where the image depicts content of the input text prompt and has the level of the target characteristic.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
19.
HIGH-RESOLUTION IMAGE GENERATION USING DIFFUSION MODELS
Methods, non-transitory computer readable media, apparatuses, and systems for high-resolution image generation using diffusion models include obtaining a prompt and generating, using a first diffusion model, a predicted denoised image at a first resolution based on the prompt. The predicted denoised image is generated at a first intermediate diffusion step of the first diffusion model. The predicted denoised image is upsampled to obtain an upsampled denoised image at a second resolution that is higher than the first resolution. A second diffusion model then generates an output image at the second resolution based on the prompt and the upsampled denoised image.
A high dynamic range editing system is configured to generate visualizations to aide digital image editing in both high dynamic ranges and standard dynamic ranges. In a first example, the visualization is generated as a histogram. In a second example, the visualization is generated to indicate high dynamic range capabilities. In a third example, the visualization is generated to indicate ranges of luminance values within a digital image. In a fourth example, the visualization is generated as a point curve that defines a mapping between detected luminance values from a digital image and output luminance values over both a standard dynamic range and a high dynamic range. In a fifth example, the visualization is generated as a preview to convert pixels from the digital image in a high dynamic range into a standard dynamic range.
Methods, non-transitory computer readable media, apparatuses, and systems for image and depth map generation include receiving a prompt and encoding the prompt to obtain a guidance embedding. A machine learning model then generates an image and a depth map corresponding to the image based on the guidance embedding. The image and the depth map are each generated based on the guidance embedding.
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
Embodiments for colorizing images, including pixel-format images and vector-format images, include obtaining input data including an outline image and a color hint. The color hint includes a colored portion corresponding to a region of the outline image. Then, embodiments process the input data to obtain control guidance for an image generator using an outline encoder. Embodiments generate a synthesized image based on the control guidance using an image generator. The synthesized image depicts an object having a shape based on the outline image and a color based on the color hint. In some cases, embodiments also transfer the colors from the synthesized image to a base vector image to produce a colorized vector image.
Three-dimensional object edit and visualization techniques and systems are described. In a first example, a content navigation control is implemented by a content editing system to aid navigation through a history of how a three-dimensional environment and a three-dimensional object included in the environment is created. In a second example, the content editing system is configured to streamline placement of a three-dimensional object within a three-dimensional environment. The content editing system, for instance, generates a manipulation visualization in support of corresponding editing operations to act as a guide, e.g., as an alignment guide or an option guide. In a third example, the content editing system implements a shadow control that is usable as part of an editing and as a visualization to control rendering of illumination within a three-dimensional environment.
G06T 19/00 - Manipulating 3D models or images for computer graphics
G06F 3/04815 - Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
G06F 3/04847 - Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
Three-dimensional object edit and visualization techniques and systems are described. In a first example, a content navigation control is implemented by a content editing system to aid navigation through a history of how a three-dimensional environment and a three-dimensional object included in the environment is created. In a second example, the content editing system is configured to streamline placement of a three-dimensional object within a three-dimensional environment. The content editing system, for instance, generates a manipulation visualization in support of corresponding editing operations to act as a guide, e.g., as an alignment guide or an option guide. In a third example, the content editing system implements a shadow control that is usable as part of an editing and as a visualization to control rendering of illumination within a three-dimensional environment.
G06T 19/20 - Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image generation include obtaining a sketch input depicting an object, processing the sketch input to obtain sketch guidance, and generating a synthesized image based on the sketch guidance using an image generation model, where the synthesized image depicts the object from the sketch input.
G06T 11/20 - Drawing from basic elements, e.g. lines or circles
G06F 3/04883 - Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
26.
VIDEO DIFFUSION USING SUPERPOSITION NETWORK ARCHITECTURE SEARCH
A method, apparatus, non-transitory computer readable medium, apparatus, and system for video generation include first obtaining a training set including a training video. Then, embodiments initialize a video generation model, sample a subnet architecture from an architecture search space, and a identify a subset of the weights of the video generation model based on the sampled subnet architecture. Subsequently, embodiments train, based on the training video, a subnet of the video generation model to generate synthetic video data. The subnet includes a subset of the weights of the video generation model.
G06T 3/4046 - Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
A method, apparatus, non-transitory computer readable medium, and system for image generation include encoding a text prompt to obtain a text embedding. An image prompt is encoded to obtain an image embedding. Cross-attention is performed on the text embedding and then on the image embedding to obtain a text attention output and an image attention output, respectively. A synthesized image is generated based on the text attention output and the image attention output.
A method, apparatus, non-transitory computer readable medium, and system for generating synthetic videos includes obtaining an input prompt describing a video scene. The embodiments then generate a plurality of frame-wise token embeddings corresponding to a sequence of video frames, respectively, based on the input prompt. Subsequently, embodiments generate, using a video generation model, a synthesized video depicting the video scene. The synthesized includes a plurality of images corresponding to the sequence of video frames.
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining, via a user interface, a reference image and generating, using an image generation model, a synthesized image based on the reference image. A layered image is generated including the synthesized image in a first layer of the layered image and the reference image in a second layer of the layered image. The layered image is then presented for display in the user interface.
A computing system receives a query for a three-dimensional representation of a target object. The query comprises input in the form of text describing the target object, a two-dimensional image of the target object, or a three-dimensional model of the target object. The computing system encodes the input using a machine learning model to generate an encoded representation of the input. The computing system searches a search space using nearest neighbors to identify a three-dimensional representation of the target object. The search space comprises encoded representations of multiple views of a plurality of sample three-dimensional object representations. The computing system outputs the identified three-dimensional representation of the target object.
Systems and methods for generating full designs from text include retrieving a plurality of document templates based on a design prompt, and filtering the document templates based on an image prompt. A document template is then selected based on the filtering, and a document is generated based on the document template and the image prompt. Embodiments are further configured to generate content from one or more of the prompts, where the content is included in the final design document.
A method, apparatus, and non-transitory computer readable medium for natural language processing are described. Embodiments of the present disclosure include obtaining a document comprising a first event mention and a second event mention. Some embodiments generate a dependency tree based on the document. The dependency tree is pruned by removing an irrelevant word to obtain a pruned dependency tree. Subevent relation information is generated for the first event mention and the second event mention based on the pruned dependency tree.
A method, apparatus, and non-transitory computer readable medium for obtaining an input image comprising a plurality of pixels. A machine learning model generates annotation information indicating whether each of the plurality of pixels is synthetically generated. A combined image is generated based on the annotation information. In some cases, the combined image shows a synthetically generated region of the input image.
In implementations of systems and procedures for generating surrogate curvatures for assisted vector drawings, a computing device implements acquisition of a target vector curve and compares a curvature of the target vector curve to a curvature of a reference vector curve. The computing device determines whether the curvature of the target vector curve is within a threshold tolerance of the curvature of the reference vector curve. An edited curvature of the targeted vector curve is generated based on the curvature of the reference vector curve.
Embodiments are disclosed for editing video using image diffusion. The method may include receiving an input video depicting a target and a prompt including an edit to be made to the target. A keyframe associated with the input video is then identified. The keyframe is edited, using a generative neural network, based on the prompt to generate an edited keyframe. A subsequent frame of the input video is edited using the generative neural network, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video.
A method, apparatus, non-transitory computer readable medium, and system for generating a design document from a text prompt include obtaining a design prompt that describes a document type and selecting a design template for the document type based on the design prompt. An image generation model generates an image for the design template based on the design prompt and a design document is generated based on the design template. The design document has the document type and includes the image at a location indicated by the design template.
Systems and methods for detecting object demands based on query images are provided. An image tagging module generates multiple image tags for a query image. An image content analyzer module analyzes the multiple query image tags based on a knowledge graph associated with an online platform to create query feature data. A theme identification module identifies one or more query themes based on aggregated query image feature data. A demand analysis module generates demand data indicating user demand for an object corresponding to the query theme by comparing the query theme to catalog data of the online platform.
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/532 - Query formulation, e.g. graphical querying
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide self-supervised object discovery systems that combine motion and appearance information to generate segmentation masks from a digital image or digital video and delineate one or more salient objects within the digital image/digital video. The disclosed systems utilize a neural network encoder to generate a fully connected graph based on image patches from the digital input, incorporating image patch feature and optical flow patch feature similarities to produce edge weights. The disclosed systems partition the generated graph to produce a segmentation mask. Furthermore, the disclosed systems iteratively train a segmentation network based on the segmentation mask as a pseudo-ground truth via a bootstrapped, self-training process. By utilizing both motion and appearance information to generate a bi-partitioned graph, the disclosed systems produce high-quality object segmentation masks that represent a foreground and background of digital inputs.
In implementation of techniques for template-based behaviors in machine learning, a computing device implements a template system to receive a digital video and data executable to generate animated content. The template system determines a location within a frame of the digital video to place the animated content using a machine learning model. The template system then renders the animated content within the frame of the digital video at the location determined by the machine learning model. The template system then displays the rendered animated content within the frame of the digital video in a user interface.
Embodiments are disclosed for performing content authentication. A method of content authentication may include dividing a query video into a plurality of chunks. A feature vector may be generated, using a fingerprinting model, for each chunk from the plurality of chunks. Similar video chunks are identified from a trusted chunk database based on the feature vectors using a multi-chunk search policy. One or more original videos corresponding to the query video are then returned.
A method, apparatus, non-transitory computer readable medium, and system for 3D model generation include obtaining a plurality of input images depicting an object and a set of 3D position embeddings, where each of the plurality of input images depicts the object from a different perspective, encoding the plurality of input images to obtain a plurality of 2D features corresponding to the plurality of input images, respectively, generating 3D features based on the plurality of 2D features and the set of 3D position embeddings, and generating a 3D model of the object based on the 3D features.
A system applies an object detection model to a mapped graphical spherical object including a 360 degree virtual background image mapped to a spherical graphical model to identify a light source within the mapped graphical spherical object. The system generates a light source mapped graphical spherical object by replacing, in the mapped graphical spherical object, the light source with a light source object associated with one or more light properties. The system extracts, during a video call, a human body object from a live video feed captured via a user computing device. The system generates, based on the light source mapped graphical spherical object and the human body object, a finalized spherical live image including applying one or more lighting and/or shadow effects caused by the light source object within the light source mapped graphical spherical object. The system generates, from the finalized spherical image, a finalized rectangular live image.
System and methods for generating, validating, and augmenting question-answer pairs using generative AI are provided. An online interaction server accesses a set of digital content available at a set of designated network locations. The online interaction server further trains a pre-trained large language model (LLM) using the set of digital content to obtain a customized LLM. The online interaction server generates a set of question-answer pairs based on the set of digital content using the customized LLM and validates the set of question-answer pairs by determining if an answer in a question-answer pair is derived from the set of digital content. The online interaction server also selects a digital asset to augment an answer in a validated question-answer pair based on a semantic similarity between the validated question-answer pair and the digital asset.
Systems and methods are disclosed for reflowing documents to display semantically related content. Embodiments may include receiving a request to view a document that includes body text and one or more images. A trimodal document relationship model identifies relationships between segments of the body text and the one or more images. A linearized view of the document is generated based on the relationships and the linearized view is caused to be displayed on a user device.
In some embodiments, a contact stream is generated or modified based on configuration data received from a machine-learning model. Multiple contact items are selected for a contact stream, to be delivered to a user device via electronic communication channels. In addition, a success metric is identified indicating an engagement with the contact stream or an action performed following the engagement. A machine-learning model is applied to the contact items, where the machine-learning model is trained to identify relationships among actions in an online environment and configuration parameters that control delivery of contact streams. The machine-learning model provides an output indicating configuration data or a success probability for the contact stream. The configuration data includes configuration parameter values computed by the machine-learning model for achieving the identified success metric. The success probability indicates a probability computed by the machine-learning model for achieving the identified success metric.
Techniques for latent space based steganographic image generation are described. A processing device, for instance, receives a digital image and a secret that includes a bit string. A pretrained encoder of an autoencoder generates an embedding of the digital image that includes latent code. A secret encoder is trained and utilized to generate an embedding of the secret to act as a latent offset to the latent code. The processing device leverages a pretrained decoder of the autoencoder to generate a steganographic image based on the embedding of the secret and the embedding of the digital image. The steganographic image includes the secret and is visually indiscernible from the digital image. Further, the processing device is configured to recover the secret from the steganographic image, such as by training and leveraging a secret decoder to extract the secret.
A data insight generation system generates facts from a dataset. Importance scores are determined for the facts. Facts having the highest importance scores are generated for display at a user interface. A selection of a displayed fact is received. Based on the selection, dependent facts are generated by adding subspaces to the selected fact. The dependent facts are generated for display at the user interface.
This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that train a named entity recognition (NER) model with noisy training data through a self-cleaning discriminator model. For example, the disclosed systems utilize a self-cleaning guided denoising framework to improve NER learning on noisy training data via a guidance training set. In one or more implementations, the disclosed systems utilize, within the denoising framework, an auxiliary discriminator model to correct noise in the noisy training data while training an NER model through the noisy training data. For example, while training the NER model to predict labels from the noisy training data, the disclosed systems utilize a discriminator model to detect noisy NER labels and reweight the noisy NER labels provided for training in the NER model.
Embodiments of the present disclosure perform training attribution by identifying a synthesized image and a training image, where the synthesized image was generated by an image generation model that was trained with the training image. A machine learning model computes first attribution features for the synthesized image using a first mapping layer and second attribution features for the training image using a second mapping layer that is different from the first mapping layer. Then, an attribution score is generated based on the first attribution features and the second attribution features, where the attribution score indicates a degree of influence for the training image on generating the synthesized image.
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06N 3/0895 - Weakly supervised learning, e.g. semi-supervised or self-supervised learning
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
50.
GENERATING NON-DESTRUCTIVE SYNTHETIC LENS BLUR WITH IN-FOCUS EDGE RENDERING
Methods, systems, and non-transitory computer readable storage media are disclosed for generating a lens blur effect in a digital image with in-focus edge rendering. The disclosed system generates a focal matte indicating an in-focus range of depth values of a digital image based on a focus region and a depth map of the digital image. The disclosed system generates a layered depth map comprising foreground depth values and background depth values of pixels across the digital image according to the depth map and the focal matte. The disclosed system also renders the digital image to include a lens blur effect by utilizing the focal matte and the layered depth map to determine a combination of the foreground depth values and the background depth values in connection with a splatting operation.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating a lens blur effect in a digital image with interactive depth map refinement. The disclosed system generates a fused depth map comprising a combination of foreground depth values and background depth values from a layered depth map of pixels of a digital image and a focal matte indicating an in-focus range of depth values of the digital image. The disclosed system generates modified depth values for one or more selected portions of the digital image by modifying the fused depth map, the foreground depth values, and the background depth values according to a selected focus range and a selected correction mode. The disclosed system also renders the digital image to include a lens blur effect utilizing the modified depth values of the one or more selected portions of the digital image.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating a lens blur effect in a digital image with interactive light source adjustment. The disclosed system determines a gradient mask by detecting edges of a luminance map comprising luminance values of pixels in a digital image. The disclosed system determines a highlight mask by thresholding the luminance map to determine a subset of pixels with luminance values meeting a threshold luminance. The disclosed system also generates a gradient-highlight mask including pixel values from a combination of the gradient mask and the highlight mask. The disclosed system further generates a highlight guide image comprising indications of one or more light sources in the digital image based on the gradient-highlight mask and the highlight mask.
Embodiments of the present disclosure include obtaining an input video depicting a change to an image, wherein the input video has a first aspect ratio. Some embodiments compute a cost function for a frame of the input video based on a location of the change. A modified frame corresponding to the frame of the input video is generated based on the cost function. In some examples, the modified frame has a second aspect ratio different from the first aspect ratio. Then, an output video including the modified frame is generated and the output video has the second aspect ratio.
A method, apparatus, non-transitory computer readable medium, and system for training a text-guided vector image synthesis include obtaining training data including a vectorizable image and a caption describing the vectorizable image and generating, using an image generation model, a predicted image with a first level of high frequency detail. Then, the training data and the predicted image are used to tune the image generation model to generate a synthetic vectorizable image based on the caption, where the synthetic vectorizable image has a second level of high frequency detail that is lower than the first level of high frequency detail of the predicted image.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
A method, apparatus, non-transitory computer readable medium, and system for generating images with a particular style that fit coherently into a scene includes obtaining a text prompt and a preliminary style image. The text prompt describes an image element, and the preliminary style image includes a region with a target style. Embodiments then extract the region with the target style from the preliminary style image to obtain a style image. Embodiments subsequently generate, using an image generation model, a synthetic image based on the text prompt and the style image. The synthetic image depicts the image element with the target style.
Embodiments are disclosed for recoloring a target graphic using color palettes generated using a stochastic color mapping process. One method of recoloring a target graphic using the stochastic color mapping process includes obtaining a target graphic to be recolored and a source color palette defining source colors for recoloring the target graphic. A target color set of target colors is extracted from the target graphic. The method includes computing a mapping to map source colors of a source color palette to target colors extracted from a target color set of the target graphic based on a transition probability. A destination color palette of destination colors is determined based on the mapping. The target graphic is modified by recoloring at least one object in the target graphic with a destination color from the destination color palette.
Techniques for accessibility-enabled application switching are provided. In an example method, a processing device receives a status indication of a screen reader browsing content using a first application. The processing device receives a context switch indication from the screen reader, including a designation of a second application as a target application based on context switch accessibility code. The processing device generates a token comprising a client device identifier and identifiers of the source and target applications. The processing device then receives a status indication of the screen reader browsing content using the second application and a second context switch indication. The processing device accesses the token based on the client device identifier and determines the source application. The processing device updates the identifiers of the source and target applications. The processing device then receives a status indication of the screen reader browsing the content using the first application.
In accordance with the described techniques, a background generation system receives one or more images depicting an object, and textual information describing the object. A generative text model is employed to generate a prompt based on the one or more images and the textual information. Further, a generative image model is employed to generate an output image. To do so, the generative image model generates a background image based on the prompt, and the object is incorporated into the background image. Using a visual saliency model, the background generation system determines a visual saliency defining a degree of fixation on the object within the output image. The background generation system outputs the output image based on the visual saliency meeting a threshold.
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/86 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognitionArrangements for image or video recognition or understanding using pattern recognition or machine learning using graph matching
59.
SHAPE SPACE GENERATION VIA PROGRESSIVE CORRESPONDENCE ESTIMATION
In some examples, a computing system access a set of registered three-dimensional (3D) digital shapes. The set of registered 3D digital shapes are registered to a shape template. The computing system determines a linear model for an estimate of the shape space using a first subset of the set of registered 3D digital shapes. The computing system then determines a nonlinear deformation model for the shape space using a second subset of the set of registered 3D digital shapes. An unregistered shape can be registered to the shape space using the linear model and the nonlinear deformation model. The registration can be added to the set of registered 3D digital shapes to update the estimate of the shape space if a shape distance between the registration and the unregistered shape is below a threshold value.
Methods and systems are provided for using entitlements deployed on blockchain to manage customer experiences. In embodiments described herein, customer data for a customer is accessed via a blockchain-based entitlement generator component. A representation of an entitlement for the customer is generated based on a plurality of parameters of the entitlement via the blockchain-based entitlement generator component where the representation of the entitlement includes a portion of the customer data for the customer. The representation of the entitlement is recorded on a blockchain via the blockchain-based entitlement generator component.
The present disclosure relates to systems, methods, and non-transitory computer readable media for generating digital images utilizing a diffusion neural network to preserve color harmony and image composition from a sample digital image while modifying image content. In some embodiments, the disclosed systems receive, via user input, a text prompt defining query image content and a sample digital image depicting a color harmony. In some cases, the disclosed systems generate a blurred digital image by blurring pixels of the sample digital image while preserving the color harmony. In some embodiments, the disclosed systems generate, utilizing a diffusion neural network, a modified digital image depicting the query image content having the color harmony of the sample digital image by denoising the blurred digital image toward a noise vector of the text prompt.
G06F 16/532 - Query formulation, e.g. graphical querying
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating digital images by conditioning a diffusion neural network with input prompts. In particular, in one or more embodiments, the disclosed systems generate, utilizing a reverse diffusion model, an image noise representation from a first image prompt. Additionally, in some embodiments, the disclosed systems generate, utilizing a diffusion neural network conditioned with a first vector representation of the first image prompt, a first denoised image representation from the image noise representation. Moreover, in some embodiments, the disclosed systems generate, utilizing the diffusion neural network conditioned with a second vector representation of a second image prompt, a second denoised image representation from the image noise representation. Furthermore, in some embodiments, the disclosed systems combine the first denoised image representation and the second denoised image representation to generate a digital image.
The present disclosure relates to systems, methods, and non-transitory computer readable media that position objects across multiple perspectives within digital images to be equidistant. For instance, in some embodiments, the disclosed systems detect a user interaction for moving a first object within a first perspective of a digital image. Additionally, the disclosed systems extract a first distance between the first object within the first perspective and a joining edge between the first perspective and a second perspective of the digital image. The disclosed systems also extract a second distance between a second object within the second perspective of the digital image and the joining edge. Based on the first distance and the second distance, the disclosed systems modify the digital image by positioning the first object within the first perspective to be equidistant to the joining edge relative to the second object within the second perspective.
G06T 11/60 - Editing figures and textCombining figures or text
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
A method, apparatus, and non-transitory computer readable medium for image processing are described. Embodiments of the present disclosure obtain an image and an input text including a subject from the image and a location of the subject in the image. An image encoder encodes the image to obtain an image embedding. A text encoder encodes the input text to obtain a text embedding. An image processing apparatus based on the present disclosure generates an output text based on the image embedding and the text embedding. In some examples, the output text includes a relation of the subject to an object from the image and a location of the object in the image.
Position-based text-to-speech model and training techniques are described. A digital document, for instance, is received by an audio synthesis service. A text-to-speech model is utilized by the audio synthesis service to generate digital audio from text included in the digital document. The text-to-speech model, for instance, is configured to generate a text encoding and a document positional encoding from an initial text sequence of the digital document. The document positional encoding is based on a location of the text encoding within the digital document. Digital audio is then generated by the text-to-speech model that includes a spectrogram having a reordered text sequence, which is different from the initial text sequence, by decoding the text encoding and the document positional encoding.
G10L 13/08 - Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
G10L 13/04 - Details of speech synthesis systems, e.g. synthesiser structure or memory management
G10L 13/06 - Elementary speech units used in speech synthesisersConcatenation rules
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
A method, apparatus, non-transitory computer readable medium, and system for generating images with an adjustable level of complexity includes obtaining a content prompt, a style prompt, and a complexity value. The content prompt describes an image element, the style prompt indicates an image style, and the complexity value indicates a level of influence of the style prompt. Embodiments then generate, using an image generation model, an output image based on the content prompt, the style prompt, and the complexity value, wherein the output image includes the image element with a level of the image style based on the complexity value.
Knowledge edit techniques for text-to-image models and other generative machine learning models are described. In an example, a location is identified within a text-to-image model by a model edit system. The location is configured to influence generation of a visual attribute by a text-to-image model as part of a digital image. An edited text-to-image model is formed by editing the text-to-image model based on the location. The edit causes a change to the visual attribute in generating a subsequent digital image by the edited text-to-image model. The subsequent digital image is generated as having the change to the visual attribute by the edited text-to-image model.
In various embodiments, systems and methods for design-aware replacement font suggestions are provided. In some embodiments, a substitute font-suggestion algorithm holistically considers how the original string of text from the original layout appears when re-rendered in a same-sized text frame using a potential replacement font. In some embodiments, the substitute font-suggestion algorithm generates a first image of a text frame including the text string using the first font and generates a plurality of second images of the text string using candidate replacement fonts. A ranking of the candidate replacement fonts is generated based on computing a score for each of the individual second images that represents similarity between the first image and the individual second images. Based on the assessed similarities, a ranked listing of substitute font suggestions is displayed.
G06V 30/194 - References adjustable by an adaptive method, e.g. learning
G06F 40/109 - Font handlingTemporal or kinetic typography
G06V 30/244 - Division of the character sequences into groups prior to recognitionSelection of dictionaries using graphical properties, e.g. alphabet type or font
G06V 30/412 - Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
69.
GENERATIVE RECOMMENDATION MODEL LEVERAGING VERBALIZED SEQUENTIAL DATA
Systems and methods provide a generative recommendation model that leverages verbalizations generated from sequential data. In accordance with some aspects, sequential data for a trajectory comprising a plurality of steps is accessed, in which the sequential data comprises a tuple for each step of the trajectory. Verbalized sequential data is generated from the sequential data, in which the verbalized sequential data for each step of the trajectory comprises one or more natural language sentences generated from the tuple for the step. A generative model is trained on the verbalized sequential data to provide a trained generative model that generates a recommended action given a prompt specifying a current state.
An edge node included in a decentralized edge computing network generates a federated partial-data aggregation machine learning model. The edge node learns one or more model parameters via machine learning techniques and receives one or more auxiliary model parameters from additional edge nodes in the decentralized edge computing network, such as from a neighbor node group. In some cases, a neighbor node is identified in response to determining that the neighbor node includes a model with a relatively high estimated relevance to the model of the edge node. The edge node modifies the model to include an aggregation of the learned model parameters and the received auxiliary parameters. Respective weights are learned for the learned model parameters and also for the received auxiliary parameters. During training to learn the respective weights, the edge node stabilizes the learned model parameters and the received auxiliary parameters.
In one aspect, a processor determines a first set of video frames of a video based on a target video frame. The first set of video frames includes the target video frame, one or more frames of the video preceding the target video frame, and one or more frames of the video subsequent to the target video frame. The first set of video frames includes a sequence of video frames of the video. An encoder neural network executing on the processor encodes the first set of video frames of a video to generate a respective feature vector for each video frame in the first set. A decoder neural network executing on the processor decodes the feature vectors to generate a mask for the target video frame.
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
G11B 27/031 - Electronic editing of digitised analogue information signals, e.g. audio or video signals
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Embodiments of the present disclosure include obtaining a text prompt describing an element, layout information indicating a target region for the element, and a precision level corresponding to the element. Some embodiments generate a text feature pyramid based on the text prompt, the layout information, and the precision level, wherein the text feature pyramid comprises a plurality of text feature maps at a plurality of scales, respectively. Then, an image is generated based on the text feature pyramid. In some cases, the image includes an object corresponding to the element of the text prompt at the target region. Additionally, a shape of the object corresponds to a shape of the target region based on the precision level.
Methods and systems are provided for automated inference and evaluation of design relations for elements of a design. In embodiments described herein, a change, related to a type of design relation, is received to an element of a plurality of elements of a design. A corresponding type of design relation between the element and a different element of the plurality of elements is determined from a knowledge graph based on the type of design relation related to the change. A corresponding change is automatically applied to the different element based on the corresponding type of design relation between the element and the different element.
A method for generating a volume for three-dimensional rendering extracts a plurality of images from a source image input, normalizes the extracted images to have a common pixel size, and determines a notional camera placement for each normalized image to obtain a plurality of annotated normalized images, each annotated with a respective point of view through the view frustum of the notional camera. From the annotated normalized images, the method generates a first volume encompassing a first three-dimensional representation of the target object and selects a smaller subspace within the first volume that encompasses the first three-dimensional representation of the target object. The method generates, from the annotated normalized images, a second volume overlapping the first volume, encompassing a second three-dimensional representation of the target object and having a plurality of voxels, and crops the second volume to limit the second volume to the subspace.
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a differentiable tiling system that generates aesthetically plausible, periodic, and tile-able non-square imagery using machine learning and a text-guided, fully automatic generative approach. Namely, given a textual description of the object and a symmetry pattern of the 2D plane, the system produces a textured 2D mesh which visually resembles the textual description, adheres to the geometric rules which ensure it can be used to tile the plane, and contains only the foreground object. Indeed, the disclosed systems generate a plausible textured 2D triangular mesh that visually matches the textual input and optimizes both the texture and the shape of the mesh and satisfy an overlap condition and a tile-able condition. Using the described methods, the differentiable tiling system generates the mesh such that the edges and the vertices align between repeatable instances of the mesh.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating aspect-based summaries utilizing deep learning. In particular, in one or more embodiments, the disclosed systems access a transcript comprising sentences. The disclosed systems generate, utilizing a sentence classification machine learning model, aspect labels for the sentences of the transcript. The disclosed systems organize the sentences based on the aspect labels. The disclosed systems generate, utilizing a summary machine learning model, a summary of the transcript for each aspect of the plurality of aspects from the organized sentences.
Implementations of systems and methods for determining viewpoints suitable for performing one or more digital operations on a three-dimensional object are disclosed. Accordingly, a set of candidate viewpoints is established. The subset of candidate viewpoints provides views of an outer surface of a three-dimensional object and those views provide overlapping surface data. A subset of activated viewpoints is determined from the set of candidate viewpoints, the subset of activated viewpoints providing less of the overlapping surface data. The subset of activated viewpoints is used to perform one or more digital operation on the three-dimensional object.
H04N 13/279 - Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals the virtual viewpoint locations being selected by the viewers or determined by tracking
78.
Learning-based method for generating visualizations via natural language
Graphic visualizations, such as charts or graphs conveying data attribute values, can be generated based on natural language queries, i.e., natural language requests. To do so, a natural language request is parsed into n-grams, and from the n-grams, word embeddings are determined using a natural language model. Data attributes for the graphic visualization are discovered in the vector space from the word embeddings. The type of graphic visualization can be determined based on a request intent, which is determined using a trained intent classifier. The graphic visualization is generated to include the data attribute values of the discovered data attributes, and in accordance with the graphic visualization type.
A method, apparatus, and non-transitory computer readable medium for image generation are described. Embodiments of the present disclosure obtain a content input and a style input via a user interface or from a database. The content input includes a target spatial layout and the style input includes a target style. A content encoder of an image processing apparatus encodes the content input to obtain a spatial layout mask representing the target spatial layout. A style encoder of the image processing apparatus encodes the style input to obtain a style embedding representing the target style. An image generation model of the image processing apparatus generates an image based on the spatial layout mask and the style embedding, where the image includes the target spatial layout and the target style.
Embodiments of the present technology are directed to facilitating generation of experiment metric values, such as expected sample size and/or minimal detectable effect, for anytime valid confidence sequences (e.g., asymptotic confidence sequences). In one embodiment, a set of parameter values associated with an experiment using asymptotic confidence sequences are obtained. The set of parameter values include a minimal detectable effect and an uncertainty interval. Thereafter, an expected sample size for executing the experiment is determined based on the minimal detectable effect and the uncertainty interval. The expected sample size is provided for utilization in association with the experiment using asymptotic confidence sequences.
A digital content management system may include a set of one or more memory components and a set of one or more processing devices coupled to the set of one or more memory components. The set of one or more processing devices to perform operations comprising: obtaining one or more electronic signatures on a digital document to yield an electronically signed digital document, storing the electronically signed digital document at an off-chain address, using a trained machine learning model to create a smart contract, the smart contract comprising a reference to the off-chain address; and deploying the smart contract to a blockchain.
Embodiments are disclosed for reflowing documents to display semantically related content. The method may include receiving a request to view a document that includes body text and one or more images. A trimodal document relationship model identifies relationships between segments of the body text and the one or more images. A linearized view of the document is generated based on the relationships and the linearized view is caused to be displayed on a user device.
G06T 11/60 - Editing figures and textCombining figures or text
G06F 3/04842 - Selection of displayed objects or displayed text elements
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
83.
UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK
The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement a neural network framework for interactive multi-round image generation from natural language inputs. Specifically, the disclosed systems provide an intelligent framework (i.e., a text-based interactive image generation model) that facilitates a multi-round image generation and editing workflow that comports with arbitrary input text and synchronous interaction. In particular embodiments, the disclosed systems utilize natural language feedback for conditioning a generative neural network that performs text-to-image generation and text-guided image modification. For example, the disclosed systems utilize a trained model to inject textual features from natural language feedback into a unified joint embedding space for generating text-informed style vectors. In turn, the disclosed systems can generate an image with semantically meaningful features that map to the natural language feedback. Moreover, the disclosed systems can persist these semantically meaningful features throughout a refinement process and across generated images.
In various examples, a video effect is displayed in a live video stream in response to determining a portion of an audio stream of the live video stream corresponds to a text segment of a script associated with the video effect and detecting performance of a gesture. For example, during presentation of the script, the audio stream is obtained to determine if a portion of the audio stream corresponds to the text segment. In response to the portion of the audio stream corresponding to the text segment, detecting performance of a gestures and causing the video effect to be displayed.
A system accesses, during a camera scan of a 3D physical object, video feed data of a user computing device including a plurality of frames. The system generates camera scan data including a set of 2D images of the physical object generated from a subset of the plurality of frames. Generating the camera scan data can include, responsive to determining that a translation between a previous frame and a current frame is greater than a threshold, including the current frame in the camera scan data. Generating the camera scan data can include excluding the current frame from the camera scan data responsive to determining an undesired camera movement type associated with the current frame. Generating the camera scan data can include indicating, in a surface coverage preview model, points of the 3D physical object included in at least a predefined number of consecutive frames of the subset of frames.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize a text-image alignment loss to train a diffusion model to generate digital images from input text. In particular, in some embodiments, the disclosed systems generate a prompt noise representation form a text prompt with a first text concept and a second text concept using a denoising step of a diffusion neural network. Further, in some embodiments, the disclosed systems generate a first concept noise representation from the first text concept and a second concept noise representation from the second text concept. Moreover, the disclosed systems combine the first and second concept noise representation to generate a combined concept noise representation. Accordingly, in some embodiments, by comparing the combined concept noise representation and the prompt noise representation, the disclosed systems modify parameters of the diffusion neural network.
In implementation of techniques for generating salient regions based on multi-resolution partitioning, a computing device implements a salient object system to receive a digital image including a salient object. The salient object system generates a first mask for the salient object by partitioning the digital image into salient and non-salient regions. The salient object system also generates a second mask for the salient object that has a resolution that is different than the first mask by partitioning a resampled version of the digital image into salient and non-salient regions. Based on the first mask and the second mask, the salient object system generates an indication of a salient region of the digital image using a machine learning model. The salient object system then displays the indication of the salient region in a user interface.
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]Salient regional features
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
88.
USING GENERATIVE ARTIFICIAL INTELLIGENCE FOR AUTOMATED ANALYSIS OF SMART CONTRACTS ON BLOCKCHAIN
Methods and systems are provided for using generative AI for automated analysis of smart contracts on blockchain. In embodiments described herein, smart contract code for a smart contract is accessed in blockchain via a retriever component. The smart contract code includes a condition of the smart contract in a programming language format. A language model generates natural language content based on the smart contract code. The natural language content is then displayed.
G06Q 20/40 - Authorisation, e.g. identification of payer or payee, verification of customer or shop credentialsReview and approval of payers, e.g. check of credit lines or negative lists
The present disclosure relates to systems, methods, and non-transitory computer-readable media that recolors a digital design according to colors of a digital image and further generates an enhanced recolored digital design. In particular, in some embodiments, the disclosed systems identify a digital image for recoloring a digital design and recolors the digital design utilizing a color affine transformation algorithm to generate a recolored digital design. Further, in some embodiments, the disclosed systems generate the enhanced recolored digital design by transforming one or more colors of the recolored digital design to be within a range of the colors of the digital image utilizing a convex hull projection method. Moreover, in some embodiments, the disclosed systems further enhance the recolored digital design utilizing a contrast enhancement algorithm to modify luminescence values.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for selectively conditioning layers of a neural network and utilizing the neural network to generate a digital image. In particular, in some embodiments, the disclosed systems condition an upsampling layer of a neural network with an image vector representation of an image prompt. Additionally, in some embodiments, the disclosed systems condition an additional upsampling layer of the neural network with a text vector representation of a text prompt without the image vector representation of the image prompt. Moreover, in some embodiments, the disclosed systems generate, utilizing the neural network, a digital image from the image vector representation and the text vector representation.
Systems and methods for generating a 3D model from a single input image are described. Embodiments are configured to obtain an input image and camera view information corresponding to the input image; encode the input image to obtain 2D features comprising a plurality of 2D tokens corresponding to patches of the input image; decode the 2D features based on the camera view information to obtain 3D features comprising a plurality of 3D tokens corresponding to regions of a 3D representation; and generate a 3D model of the input image based on the 3D features.
Retexturing items depicted in digital image data is described. An image retexturing system receives image data that depicts an item featuring a pattern. The image retexturing system identifies coarse correspondences between regions in the image data and a two-dimensional image of the pattern. Using the coarse correspondences, the image retexturing system establishes, for each pixel in the image data depicting the item, a pair of coordinates for a surface of the item featuring the pattern. The coordinate pairs are then used to generate a mesh that represents the surface of the item. The image retexturing system then applies a new texture to a surface of the item by mapping the new texture to a surface of the mesh. A shading layer and item mask are generated for the image data, which are combined with the retextured mask to generate a synthesized image that depicts the retextured item.
Systems and methods for generating geolocation-based images for a target object are provided. A geolocation module receives a set of geolocations associated with a geographic region of interest. Each geolocation of the set of geolocations is mapped to context data associated with the geolocation. A prompt generation module generates multiple prompts based on the set of geolocations and the context data. The prompt generation module comprises a first generative artificial intelligence (AI) model. An image generation module generates multiple synthetic images based on the multiple prompts. The image generation module comprises a second generative AI model. Each synthetic image depicts the target object in a background generated based on a prompt.
Various disclosed embodiments are directed to deriving an albedo output image from an input image based on deriving an inverse shading map. For example, an input image can be a photograph of a human face (i.e., the geometric features) with RGB values representing the color values of the face as well as pixels representing shadows (i.e., the shadow features) underneath the chin of the human face. The inverse shading map may be a black and white pixel value image that contains pixels representing the same human face without the RGB values and the shadows underneath the chin. The inverse shading map thus relies on the geometric space, rather than RGB space. Geometric space, for example, allows embodiments to capture the geometric features of a face, as opposed to those geometric features' RGB or shadow details.
In various examples, a machine learning model determines scaling operations for a computing environment based on a state of the computing environment. For example, a first machine learning model determines a scaling operation based on a first state of a computing environment executing a service, and a second machine learning model determines an estimated value associated with a second state of the computing environment after the scaling operation is performed. A set of parameters of the first machine learning model are updated to maximize an advantage value determined based on the estimated value and a reward value.
A modeling system accesses a two-dimensional (2D) input image displayed via a user interface, the 2D input image depicting, at a first view, a first object. At least one region of the first object is not represented by pixel values of the 2D input image. The modeling system generates, by applying a 3D representation generation model to the 2D input image, a three-dimensional (3D) representation of the first object that depicts an entirety of the first object including the first region. The modeling system displays, via the user interface, the 3D representation, wherein the 3D representation is viewable via the user interface from a plurality of views including the first view.
A material search computing system generates a joint feature comparison space by combining joint image-text features of surface material data objects. The joint feature comparison space is a consistent comparison space. The material search computing system extracts a query joint feature set from a query data object that includes text data or image data. In addition, the material search computing system compares the query joint feature set to the joint image-text features included in the joint feature comparison space. Based on the comparison, the material search computing system identifies a result joint feature set and associated result surface material data objects. The material search computing system generates material query result data describing the result surface material data objects, and provides the material query result data to an additional computing system.
G06F 16/535 - Filtering based on additional data, e.g. user or group profiles
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
The present disclosure relates to systems, non-transitory computer-readable media, and methods for providing multilingual semantic search results utilizing meta-learning and knowledge distillation. For example, in some implementations, the disclosed systems perform a first inner learning loop for a monolingual to bilingual meta-learning task for a teacher model. Additionally, in some implementations, the disclosed systems perform a second inner learning loop for a bilingual to multilingual meta-learning task for a student model. In some embodiments, the disclosed systems perform knowledge distillation based on the first inner learning loop for the monolingual to bilingual meta-learning task and the second inner learning loop for the bilingual to multilingual meta-learning task. Moreover, in some embodiments, the disclosed systems perform an outer learning loop and update parameters of a deep learning language model based on the first inner learning loop, the second inner learning loop, and the knowledge distillation.
Systems and methods for pre-deployment user journey evaluation are described. Embodiments are configured to obtain a user journey including a plurality of touchpoints; generate a simulation agent including a plurality of attributes; generate a probability score for the simulation agent for each of the plurality of touchpoints based on the plurality of attributes using a machine learning model; perform a simulation of the user journey based on the probability score; and generate a text describing the user journey based on the simulation.
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
A system of present disclosure, in one or more embodiments, receives selections of first and second points for a path. The first point is at a first position and the second point is at a second position in a digital design document. The system identifies a glyph of text nearest a location of the first position and determines a geometry of the glyph. The system determines a first parametric value of the geometry of the glyph nearest to the first position and determines a second parametric value of the geometry of the glyph nearest to the second position. The system generates the path between the first position and the second position that follows the geometry of the glyph at a consistent offset relative to the glyph by utilizing the first parametric value and the second parametric value to generate path geometry that follows the geometry of the glyph.