A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining an input prompt and retrieving an intermediate noise state based on a similarity between the input prompt and a candidate prompt corresponding to the intermediate noise state. An image generation model generates a synthetic image based on the input prompt and the intermediate noise state.
Embodiments described herein provide methods and systems for facilitating actively-learned context modeling. In one embodiment, a subset of data is selected from a training dataset corresponding with an image to be compressed, the subset of data corresponding with a subset of data of pixels of the image. A context model is generated using the selected subset of data. The context model is generally in the form of a decision tree having a set of leaf nodes. Entropy values corresponding with each leaf node of the set of leaf nodes are determined. Each entropy value indicates an extent of diversity of context associated with the corresponding leaf node. Additional data from the training dataset is selected based on the entropy values corresponding with the leaf nodes. The updated subset of data is used to generate an updated context model for use in performing compression of the image.
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
3.
TEXTURE BASED CONSISTENCY FOR GENERATIVE AI ASSETS, EFFECTS AND ANIMATIONS
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input texture image and a plurality of image masks, generating a plurality of image assets corresponding to the plurality of image masks based on the input texture image, and generating a combined asset including the plurality of image assets. The plurality of image assets have a consistent texture based on the input texture image.
Methods and systems are provided for facilitating document collaboration in accordance with collaboration controls. In embodiments, an indication of a collaboration control for a collaborator of a document is obtained. The collaboration control generally indicates an edit permission for a document section of the document in relation to the collaborator. Thereafter, a set of collaboration control data for the document is generated. In embodiments, the set of collaboration control data includes the collaboration control indicating the edit permission for the document section of the document in relation to the collaborator. Based on an input (e.g., edit) by the collaborator to the document section of the document, a determination is made, using the set of collaboration control data, as to whether to enable an edit to the document section of the document.
H04L 65/401 - Support for services or applications wherein the services involve a main real-time session and one or more additional parallel real-time or time sensitive sessions, e.g. white board sharing or spawning of a subconference
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a pattern prompt and a text image, where the pattern prompt describes a visual pattern and the text image depicts text, generating a pattern image based on the pattern prompt, where the pattern image depicts the visual pattern, and generating a patterned text image based on the pattern image and the pattern prompt.
The present disclosure relates to systems, non-transitory computer-readable media, and methods that implement an image filter for enhancing light text and removing document shadows. In particular embodiments, the disclosed systems use a modified adaptive thresholding approach the relies on image gradients to efficiently guide the thresholding process. In addition, the disclosed systems use a machine-learning model to generate a document shadow map. The document shadow map can include text reflections. Accordingly, the disclosed systems remove text reflections from the document shadow map (e.g., by using an interpolated shadow intensity value of neighboring shadow map pixels). In turn, the disclosed systems use the document text mask and the document shadow map cleaned of text reflections to remove shadows from the digital image. Further, the disclosed systems enhance text in the shadow-removed digital image based on contrast stretching.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for hierarchical entity segmentation. In particular, in one or more embodiments, the disclosed systems receive a digital image comprising a plurality of object entities. In addition, in some embodiments, the disclosed systems generate, utilizing a segmentation model comprising parameters generated according to pseudo-labels indicating hierarchies of segmentation masks for a set of training digital images, a hierarchical segmentation indicating hierarchical relations of the plurality of object entities of the digital image. Moreover, in some embodiments, the disclosed systems generate, for the digital image, a segmentation map from the hierarchical segmentation of the plurality of object entities.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/762 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
8.
GENERATING DIGITAL CONTENT CONSISTENT WITH CONTEXT-SPECIFIC GUIDELINES UTILIZING PROMPT AUGMENTATION AND MODEL TUNING
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a contextual content generation system that trains and implements a unique machine learning architecture to generate context-specific digital content items based on a digital guideline document. In particular, the disclosed systems select a content generation method from among prompt engineering and/or updating one or more machine learning models to generate digital content. For example, the disclosed systems utilize machine learning models to extract key elements from a digital guideline document comprising context-specific guidelines for digital content. Further, the disclosed systems generate an augmented prompt comprising indications of key elements from the digital guideline document. In addition, the disclosed systems select a content generation method from among prompt engineering and/or updating machine learning models to generate the digital content item which incorporates digital content corresponding to the context-specific guidelines based on the augmented prompt.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a style kit including a first image generation input indicating a first image attribute, a second image generation input indicating a second image attribute, and a selectability parameter indicating that the second image generation input is selectable. A third image generation input is received from a user based on the selectability parameter, wherein the third image generation input indicates a third image attribute different from the second image attribute of the second image generation input. An image generation model generates a synthetic image based on the style kit, the first image generation input, and the third image generation input, wherein the synthetic image has the first image attribute and the third image attribute.
A method, apparatus, non-transitory computer readable medium, and system for data processing include obtaining a text prompt and generating a first intermediate noise state based on the text prompt, retrieving a second intermediate noise state based on the text prompt and the first intermediate noise state, and generating a synthetic image based on the text prompt and the second intermediate noise state.
Methods and systems are provided for using reinforcement learning to recommend data visualizations. In embodiments described herein, statistical features for each sample of corresponding samples of a dataset are determined by applying each sample of the dataset to a data visualization recommendation model. The computational cost of each of the statistical features for each of the samples is determined based via a regression model. Recommended statistical features are determined by sequentially applying each sample to a reinforcement learning model with a computational budget and with the corresponding computational costs of the statistical features of each sample. A data visualization is then displayed that is generated by applying the dataset and the recommended statistical features to the data visualization recommendation model.
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining an input image depicting an entity and a skeleton map depicting a pose of the entity and performing a cross-attention mechanism between image features of the input image and entity features representing the pose to obtain modified image features. An output image is generated based on the modified image features that depicts the entity with the pose.
Generative artificial intelligence (AI) content strategy techniques are described. In one or more examples, a content brief is received describing a goal to be achieved in controlling digital content output. Content brief data is extracted from the content brief and a content strategy is generated based on the content brief data using generative artificial intelligence implemented using one or more machine-learning models.
Aspects and features of the present disclosure relate to providing injective three-dimensional (3D) deformations based on two-dimensional (2D) mesh deformations. For example, a method involves defining at least one 2D mesh deformation based on a designated position of an object represented by an input neural radiance field (NeRF). The method also involves applying the 2D mesh deformation(s) to a 3D piecewise-linear map that operates over a plane and preserves a normal direction to produce prismatic maps. The method further involves composing a 3D deformation for the object from layers defined by the prismatic maps, and parameterizing the 3D piecewise-linear map. The method additionally involves storing or rendering, using the 3D piecewise-linear map, a deformed NeRF injectively representing the object in the designated position. Aspects also include computer systems, apparatus, and computer programs configured to perform the method.
Embodiments are disclosed for correlating video sequences and audio sequences by a media recommendation system using a trained encoder network. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training input including a media sequence, including a video sequence paired with an audio sequence, segmenting the media sequence into a set of video sequence segments and a set of audio sequence segments, extracting visual features for each video sequence segment and audio features for each audio sequence segment, generating, by transformer networks, contextualized visual features from the extracted visual features and contextualized audio features from the extracted audio features, the transformer networks including a visual transformer and an audio transformer, generating predicted video and audio sequence segment pairings based on the contextualized visual and audio features, and training the visual transformer and the audio transformer to generate the contextualized visual and audio features.
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
G06V 20/40 - ScenesScene-specific elements in video content
G10L 25/03 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters
G10L 25/57 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals
16.
ENHANCING ARTIFICIAL INTELLIGENCE RESPONSES WITH CONTEXTUAL USAGE INSIGHTS
Some aspects relate to technologies for an artificial intelligent (AI) system that, among other things, enhances responses to concepts questions for an application with contextual usage insights. In accordance with some aspects, a user query is determined to comprise a concepts question regarding an application. Responsive to determining the user query comprises the concepts question, documentation regarding the application relevant to the user query is identified. A generative model generates text for a response to the concepts question using the documentation regarding the application. Additionally, a determination is made to add contextual usage insights to the response. Responsive to determining to add contextual usage insights to the response, usage data relevant to the user query and/or the response is retrieved. The generative model generates text for a final response using the response and the usage data, and the final response is provided to a user device for presentation.
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a digital design interface for intuitively creating custom arrows that demonstrate both visual consistency and inherent directionality within vector-based design applications. In particular, in one or more implementations, the disclosed systems receive a request to create a custom arrow from a digital object and a path segment. In addition, the disclosed systems detect that the digital object is within a threshold distance of the path segment and combine the digital object with the path segment to create a custom arrow object. In particular, the disclosed systems utilize a bilateral segmentation machine-learning model to segment the digit object and a symmetry axis detection model to determine an axis of symmetry of the digital object. Moreover, the disclosed systems attach the digital object to an endpoint of the path segment at the axis of symmetry.
Methods, systems, and non-transitory computer readable storage media are disclosed for modifying digital images via a generative neural network with local refinement. The disclosed system generates, utilizing an encoder neural network, a latent feature vector of a digital image by encoding global context information of the digital image into the latent feature vector. The disclosed system also determines a modified latent feature vector by trimming the latent feature vector to a feature subset corresponding to a masked portion of the digital image. Additionally, the disclosed system generates, utilizing a generative decoder neural network on the modified latent feature vector, digital image data corresponding to the masked portion of the digital image. The disclosed system also generates a modified digital image including the digital image data corresponding to the masked portion combined with additional portions of the digital image.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating digital posters from digital documents with multimodal content using a deep submodular function. Specifically, the disclosed systems generate embedding vectors representing multimodal content of a digital document comprising text and images. Further, disclosed systems determine, utilizing a deep submodular function on the embedding vectors, a content subset comprising one or more digital images aligned with one or more text segments representative of the digital document. Moreover, the disclosed systems generate, utilizing a large language model, a summary of the multimodal content of the digital document from a prompt based on the content subset. Additionally, the disclosed systems generate, for display at a client device, a digital poster comprising the summary of the multimodal content generated via the large language model.
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining an input image and an input prompt, where the input image depicts an object and the input prompt describes a lighting condition for the object, generating relighted image features based on the input image and the input prompt, where the relighted image features represent the object with the lighting condition, and generating a synthetic image based on the relighted image features, where the synthetic image depicts the object with the lighting condition.
Embodiments are disclosed for performing a using a neural network to optimize filter weights of an adaptive filter. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving, by a filter, an input audio signal, wherein the input audio signal is a far-end audio signal, the filter including a transfer function with adaptable filter weights, generating a response audio signal modeling the input audio signal passing through the acoustic environment, receiving a target response signal, including the input audio signal and near-end audio signals, calculating an adaptive filter loss, generating, by a trained recurrent neural network, a filter weight update using the calculated adaptive filter loss, updating the adaptable filter weights of the transfer function to create an updated transfer function, generating an updated response audio signal based on the updated transfer function, and providing the updated response audio signal as an output audio signal.
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
22.
DOCUMENT BOUNDARY DETECTION USING THE CURVATURE OF TEXT LINES
Embodiments are disclosed for using the curvature of text lines to detect a document boundary. The method may include receiving a warped image depicting a page of a document having an incomplete document boundary, the page including a plurality of text lines. A complete document boundary may be identified based on the incomplete document boundary and the plurality of text lines. A dewarped image corresponding to the warped image may be determined using the complete document boundary. The dewarped image may then be provided for display on a client device.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining an input prompt describing an image element, generating, using an image generation model, an output image depicting the image element and including a watermark, and identifying the training image as a source of the output image based on the watermark. The image generation model is trained using a training image including the image element and the watermark.
In one aspect, a computer-implemented method includes accessing, by a guidance module of an analysis application executing on a processor, wildcard data associated with data in a data repository. The method further includes displaying, by the guidance module based on the wildcard data, one or more wildcard elements in a graphical user interface (GUI). The method further includes receiving, by the analysis application, selection of a first wildcard element of the one or more wildcard elements. The method further includes displaying, by the guidance module, a suggestion based on the selection of the first wildcard element.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating digital images via a generative neural network with localized constraints. The disclosed system generates, utilizing one or more encoder neural networks, a sequence of embeddings comprising a prompt embedding representing a text prompt and an object text embedding representing a phrase indicating an object in the text prompt. The disclosed system generates, utilizing the one or more encoder neural networks, a visual embedding representing an object image corresponding to the object. The disclosed system determines a modified sequence of embeddings by replacing the object text embedding with the visual embedding in the sequence of embeddings. The disclosed system also generates, utilizing a generative neural network, a synthetic digital image from the modified sequence of embeddings comprising the visual embedding.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that selectively utilizes an image super-resolution model to upscale image patches corresponding to high frequency portions. In particular, the disclosed systems select a set of image patches corresponding to high frequency portions of a digital image at a first resolution. Furthermore, the disclosed systems utilize an image super-resolution model to generate upscaled image patches for the set of image patches of the high-frequency portions to a second resolution higher than the first resolution according to an upscaling factor of at least two. The disclosed systems generate a segmentation map of the digital image based on the upscaled image patches and an upscaled segmentation corresponding to low-frequency portions of the digital image. Further, the disclosed systems generate a vectorized digital image for the digital image according to the segmentation map.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
Three dimensional aware video compositing techniques are described. In one or more examples, subject data is produced that defines a subject depicted in frames of a subject video and viewpoint data describing movement of a viewpoint with respect to the frames of the subject video. Three-dimensional data is formed that defines a three-dimensional representation of an environment depicted in frames of an environment video. A composited video is generated by aligning the environment with the movement of the viewpoint of the subject based on the subject data and the three-dimensional data, which is then rendered, e.g., presented for display in a user interface.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a structural input indicating a target spatial structure, encoding, using a condition encoder, the structural input to obtain a structural encoding representing the target spatial structure, and generating, using an image generation model, a synthetic image based on the structural encoding, where the synthetic image depicts an object having the target spatial structure.
A method, apparatus, non-transitory computer readable medium, and system for text-to-color palette generation include encoding a text prompt to obtain text embedding. A color embedding is generated based on the text embedding by performing a diffusion process. Then a color palette is generated based on the color embedding. The color palette includes a plurality of colors corresponding to the text prompt.
Methods, non-transitory computer readable media, apparatuses, and systems for data processing include obtaining, by a machine learning model, a user cluster and interaction data for users in the user cluster, where the interaction data relates to interactions between the users and a digital platform. Some embodiments further include generating, by the machine learning model, a directed graph based on the user cluster and the interaction data, where the directed graph represents causal relationships among the interactions. Some embodiments further include updating, by the machine learning model, the user cluster based on the directed graph. Some embodiments further include providing, by a content component, customized content to a user via the digital platform based on the updated user cluster.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for modifying a digital design by performing a selective object-level undo operation. In one or more embodiments, the disclosed systems generate a modified object by performing a series of operations on an object depicted within the digital design. In some embodiments, the disclosed systems receive a selective object-level undo operation on the modified object, wherein the request specifies an operation to undo from among the series of operations performed on the object. In one or more embodiments, the disclosed systems modify the modified object by performing the selective object-level undo operation on the modified object to undo the operation from among the series of operations. In some embodiments, the disclosed systems provide an updated digital design depicting the modified object reflecting modifications from the series of operations excluding the operation undone by the selective object-level undo operation.
G06F 3/0481 - Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
G06F 3/04842 - Selection of displayed objects or displayed text elements
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
G06F 30/12 - Geometric CAD characterised by design entry means specially adapted for CAD, e.g. graphical user interfaces [GUI] specially adapted for CAD
Digital image visual aesthetic score generation techniques are described. In one or more examples, these techniques are implemented by a system including a training data collection module implemented by a processing device to collect training data including training digital images and user interaction data describing user interaction with the training digital images, respectively. A training module is configured to train a machine-learning model using the training data to generate an aesthetic score based on an input digital image. The aesthetic score is configured to specify an amount of visual aesthetics exhibited by the input digital image.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
33.
CONTROLLABLE VISUAL TEXT GENERATION WITH ADAPTER-ENHANCED DIFFUSION MODELS
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a text content image and a text style image. The text content image is encoded to obtain content guidance information and the text style image is encoded to obtain style guidance information. Then a synthesized image is generated based on the content guidance information and the style guidance information. The synthesized image includes text from the text content image having a text style from the text style image.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that iteratively generates, utilizing a machine learning model, text responses to reduce hallucinated content. In particular, in some embodiments, the disclosed systems receive a digital query and selects one or more supporting digital documents for the digital query. Furthermore, in some embodiments the disclosed systems generate a first text response from a first text prompt generated by using the digital query. Moreover, in some embodiments the disclosed systems extract a misalignment portion of the first text response by comparing the first text response and the one or more supporting digital documents. Additionally, from the misalignment portion of the first text response and the digital query, the disclosed systems further generate a second text response.
G06F 16/383 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
A method, apparatus, non-transitory computer readable medium, and system for generating suggested prompts include obtaining a sequence of text prompts associated with a user and determining a session concept for the user based on the sequence of text prompts. Embodiments then generate, using a prompt generation model, an image generation prompt based on the sequence of text prompts and the session concept. Subsequently, embodiments generate, using an image generation model, a synthetic image based on the image generation prompt.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable software using artificial intelligence for collecting, compiling, converting, organizing, consolidating, collaborating on, sharing, and editing files, links, notes, and documents and for creating an information hub; downloadable assistant and chatbot software using artificial intelligence for preparing insights, notes, and citations based on document content and user input and for collaborating on or sharing the same with other users; downloadable software using artificial intelligence for content generation and management Software as a service (SAAS) services featuring software using artificial intelligence for collecting, compiling, converting, organizing, consolidating, collaborating on, sharing, and editing files, links, notes, and documents and for creating an information hub; software as a service (SAAS) services featuring assistant and chatbot software using artificial intelligence for preparing insights, notes, and citations based on document content and user input and for collaborating on or sharing the same with other users; software as a service (SAAS) services featuring software using artificial intelligence for content generation and management
37.
COMPLETING TEMPORAL KNOWLEDGE GRAPHS BASED ON ENHANCED ENTITY REPRESENTATION AND WEIGHTED FREQUENCY-BASED SAMPLING
The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate predicted relationships for entities of a temporal knowledge graph using enhanced entity representations. For instance, in one or more embodiments, the disclosed systems generate a query for predicting a relationship for a subject entity represented within a temporal knowledge graph. The disclosed systems further determine an enhanced entity representation generated for the subject entity by an enhancement layer of a temporal knowledge graph completion model, the enhanced entity representation including a combination of a connection-based similarity for the subject entity and a relationship-based similarity for the subject entity. Using the temporal knowledge graph completion model and based on the enhanced entity representation of the subject entity, the disclosed systems generate a predicted relationship for the subject entity.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate modified digital images. In particular, in some embodiments, the disclosed systems generate image editing directions between textual identifiers of two visual features utilizing a language prediction machine learning model and a text encoder. In some embodiments, the disclosed systems generated an inversion of a digital image utilizing a regularized inversion model to guide forward diffusion of the digital image. In some embodiments, the disclosed systems utilize cross-attention guidance to preserve structural details of a source digital image when generating a modified digital image with a diffusion neural network.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
39.
HUMAN INPAINTING UTILIZING A SEGMENTATION BRANCH FOR GENERATING AN INFILL SEGMENTATION MAP
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via scene-based editing using image understanding facilitated by artificial intelligence. For example, in one or more embodiments the disclosed systems utilize generative machine learning models to create modified digital images portraying human subjects. In particular, the disclosed systems generate modified digital images by performing infill modifications to complete a digital image or human inpainting for portions of a digital image that portrays a human. Moreover, in some embodiments, the disclosed systems perform reposing of subjects portrayed within a digital image to generate modified digital images. In addition, the disclosed systems in some embodiments perform facial expression transfer and facial expression animations to generate modified digital images or animations.
Embodiments provide systems, methods, and computer storage media for management, assessment, navigation, and/or discovery of data based on data quality, consumption, and/or utility metrics. Data may be assessed using attribute-level and/or record-level metrics that quantify data: “quality”—the condition of data (e.g., presence of incorrect or incomplete values), its “consumption”—the tracked usage of data in downstream applications (e.g., utilization of attributes in dashboard widgets or customer segmentation rules), and/or its “utility”−a quantifiable impact resulting from the consumption of data (e.g., revenue or number of visits resulting from marketing campaigns that use particular datasets, storage costs of data). This data assessment may be performed at different stages of a data intake, preparation, and/or modeling lifecycle. For example, an interactive tree view may visually represent a nested attribute schema and attribute quality or consumption metrics to facilitate discovery of bad data before ingesting into a data lake.
Certain aspects and features of this disclosure relate to providing a vector graphics entity component system that supports collaborative editing in real time or near real time. Graphical constructs are efficiently described by integer-based identifiers, and graphical constructs of the same type are stored in a definitional component. Each client maintains both a pending state representation and a synchronized state representation of the graphical design to independently track the state of the representation at a live editing server. The use of integer-based identifiers for graphical constructs provides an efficient change representation that can be communicated with minimal network traffic. All copies of the graphical design represented among clients reach a consistent state quickly even when multiple users are making changes to the same vector path, eliminating the need to track changes manually or to move large files.
A method, apparatus, non-transitory computer readable medium, apparatus, and system for image processing include obtaining a text prompt; generating, using a generator of an image generation model, a feature embedding based on the text prompt, wherein the feature embedding includes a first set of channels that encodes a first value of an image characteristic and a second set of channels that encodes a residual between the first value of the image characteristic and a second value of the image characteristic; and generating, using a decoder of the image generation model, a synthetic image corresponding to the second value of the image characteristic based on the feature embedding.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining an input image depicting a first element, a text description of the input image, and a modification prompt describing a second element different from the first element, generating an intermediate output based on the input image and the text description, where the intermediate output represents the first element, and generating a synthetic image based on the intermediate output and the modification prompt, where the synthetic image replaces the first element from the input image with the second element from the modification prompt.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating three-dimensional meshes representing two-dimensional images for editing the two-dimensional images. The disclosed system utilizes a first neural network to determine density values of pixels of a two-dimensional image based on estimated disparity. The disclosed system samples points in the two-dimensional image according to the density values and generates a tessellation based on the sampled points. The disclosed system utilizes a second neural network to estimate camera parameters and modify the three-dimensional mesh based on the estimated camera parameters of the pixels of the two-dimensional image. In one or more additional embodiments, the disclosed system generates a three-dimensional mesh to modify a two-dimensional image according to a displacement input. Specifically, the disclosed system maps the three-dimensional mesh to the two-dimensional image, modifies the three-dimensional mesh in response to a displacement input, and updates the two-dimensional image.
09 - Scientific and electric apparatus and instruments
Goods & Services
Downloadable software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation from user prompts, image editing, and for generating translations; downloadable software for using artificial intelligence models for content generation and management; downloadable application programming interface (API) software
A method, apparatus, non-transitory computer readable medium, apparatus, and system for scene re-lighting using direct shading control include obtaining an input image and a lighting direction indicator that describes a lighting direction. A direct shading map is generated based on the input image and the lighting direction indicator and a shaded image is generated depicting an object from the input image with shading consistent with the lighting direction based on the shading map.
Methods and systems disclosed herein relate generally to radiance field gradient scaling for unbiased near-camera training. In a method, a computing system receives information about a 3D environment. The computing system receives a camera location and a camera direction. The computing system determines, using a machine learning model, a multiple densities and colors of the 3D environment from a perspective of the camera location at a number of respective points sampled along a first projected ray from the camera location in the direction of the camera direction. The computing system aggregates the multiple densities and colors of the 3D environment to generate an output pixel comprising an integrated color that represents the 3D environment.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation and editing from user prompts, image editing, and for generating translations; downloadable software for using artificial intelligence models for content generation and management; downloadable application programming interface (API) software Software as a service (SAAS) services featuring software for using artificial intelligence models for content generation and management, namely, image, video, sound, audio, and music generation and editing from user prompts and for generating translations; Software as a service (SAAS) services featuring software for using artificial intelligence models for content generation and management
49.
ANONYMIZING DIGITAL IMAGES UTILIZING A GENERATIVE NEURAL NETWORK
The present disclosure relates to systems, methods, and non-transitory computer readable media for generating anonymized digital images utilizing a face anonymization neural network. In some embodiments, the disclosed systems utilize a face anonymization neural network to extract or encode a face anonymization guide that encodes face attribute features, such as gender, ethnicity, age, and expression. In some cases, the disclosed systems utilize the face anonymization guide to inform the face anonymization neural network in generating synthetic face pixels for anonymizing a digital image while retaining attributes, such as gender, ethnicity, age, and expression. The disclosed systems learn parameters for a face anonymization neural network for preserving face attributes, accounting for multiple faces in digital images, and generating synthetic face pixels for faces in profile poses.
Techniques for generation of images based on a variety of input conditions or modalities are described, whereby one or more processing devices (300) receive a plurality of input modalities comprising multiple images (302) and a text input in a natural language (308). The processing devices (300) generate image embeddings for the multiple images (306) and a text embedding for the text input (312). The processing devices, using a machine learning model (132), generate an output image (322) based on the image embeddings (306) and the text embedding (312). The output image includes portions of the multiple images.
Techniques for personalizing multimedia content based on a knowledge graph are described. In one embodiment, a method includes receiving activity data associated with a user from a device, generating a touchpoint embedding and a decision embedding using a graph neural network (GNN) model based on the activity data, the GNN model trained using a knowledge graph, predicting a touchpoint using a first classifier based on the touchpoint embedding, predicting a decision stage using a second classifier based on the decision embedding, and generating personalized content for the touchpoint based on the decision stage using a large language model (LLM). Other embodiments are described and claimed.
Embodiments are disclosed for a process of applying and blending new textures to surfaces across frames of a video sequence. The method may include obtaining a new texture for a selected region of a video frame of a video sequence. The method may further comprise generating a mesh for the selected region of the first video frame that includes a plurality of control points. The method may further comprise determining control point location data for each of the plurality of control points for additional video frames of the video sequence and using the control point location data to generate a plurality of warped video frames by applying the new texture to the additional video. The method may further comprise generating blended video frames by blending the new texture in the warped video frames and providing a modified version of the video sequence using the generated blended video frames.
Embodiments are disclosed for a process of detecting and processing curved text in a document using a digital design system. The method may include identifying, by a page segmentation model, a plurality of paragraph objects in a document. The disclosed systems and methods further comprise determining that a paragraph object of the plurality of paragraph objects includes curved text in view of positions of baselines of text runs in the paragraph object. The processing of the curved text in the paragraph object can include determining spacing data for text runs of the curved text in the paragraph object. The disclosed systems and methods further comprise presenting output data representing the curved text using the spacing data for the text runs of the curved text.
Embodiments are disclosed for summary page generation using a document. The method may include receiving a text document. The method may further include generating a test summary based on the text document and a structured representation of the text summary using the document summarized model. The method may further include generating an image generation prompt based on the text summary and the structured representation of the text summary using a prompt generator. The method may further include generating a multimedia summary document corresponding to the text document using a diffusion model and the image generation prompt. The multimedia summary document includes a generated background imagery based on the text summary. The multimedia summary document includes at least a portion of the text summary which is placed within the multimedia summary document based on the structed representation of the text summary.
Digital content generation techniques are described that are performed using a text-based input. A text-based input is received and asset recommendation data is generated based on the text-based input using a machine-learning model, e.g., a large language model (LLM). A selection of a plurality of assets is received from the asset recommendation data and a selection is also received of at least one interaction from a plurality of interactions for the plurality of assets. The digital content is generated as having the interaction between the selection of the plurality of assets.
Techniques for generation of images based on a variety of input conditions or modalities are described. In one embodiment, one or more processing devices receive a plurality of input modalities comprising multiple images and a text input in a natural language. The processing devices generate image embeddings for the multiple images and a text embedding for the text input. The processing devices, using a machine learning model, generate an output image based on the image embeddings and the text embedding. The output image includes portions of the multiple images.
A method, apparatus, non-transitory computer readable medium, and system include obtaining a first image depicting a background scene and a second image depicting a foreground element, generating a guidance embedding based on the second image, and generating a synthetic image depicting the foreground element and the background scene based on the first image and the guidance embedding, wherein the image generation model determines a location of the foreground element within the synthetic image in light of the background scene.
Object identification techniques from a digital image are described. In an implementation, edges of an object are determined by analyzing gradients from a digital image. A structure of the object is computed by detecting line segments from the digital image. A boundary of the object is defined based on the edges and the structure. A display of the object is edited in a user interface based on the boundary using an edit operation.
G06V 10/94 - Hardware or software architectures specially adapted for image or video understanding
G06F 3/0484 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
G06V 10/36 - Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given pointNon-linear local filtering operations, e.g. median filtering
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
The present disclosure relates to a digital asset synchronization system that provides improved digital asset management and synchronization of a digital asset stored either within a component database or a packaged file. For example, the digital asset synchronization system enables a set of components that makes up a digital asset to appear as a singular packaged file, while also maintaining the benefits of having the digital asset made up of the components. In this manner, the digital asset synchronization system provides a bridge between a digital asset stored in a packaged file format and conventional file formats. In addition, the digital asset synchronization system also provides digital asset management and improved synchronization between a client device and a cloud storage system.
G06F 16/25 - Integrating or interfacing systems involving database management systems
H04L 67/10 - Protocols in which an application is distributed across nodes in the network
H04L 67/1095 - Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
H04L 67/1097 - Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
In some embodiments, a computing system provides a graphical interface that displays one or more graphical objects including a moving object and a static object. The computing system generates an impact contour for the moving object that has a predefined distance from a first boundary of the moving object. Based on detecting that the impact contour of the moving object intersects a second boundary of the static object, the computing system determines a first snapping point on the first boundary of the moving object and a second snapping point on the second boundary of the static object. The computing system updates the graphical interface to execute a snapping operation by translating the moving object to a location where the first snapping point and the second snapping point touch each other.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating action plans utilizing a large language model with a best-first search model. The disclosed system determines a request to utilize a large language model to generate an action plan via one or more software tools. The disclosed system generates the action plan by traversing a decision tree comprising an action space involving the one or more software tools by iteratively: selecting, utilizing a best-first search model, an action from a set of possible actions in the action space of the decision tree; and expanding, utilizing the best-first search model, the action space of the decision tree to include an additional set of possible actions. The disclosed system also executes the action plan via one or more interactions with the one or more software tools according to the action.
Techniques for reference image based material retrieval are described that support identification of procedural materials based on visual features of input images. A processing device, for instance, receives an input image that has a particular visual appearance. The processing device generates a histogram representation of the input image that represents a color prominence of the input image and generates a color distribution based on the color prominence. The processing device leverages a vision language model to filter candidate procedural materials by a semantic similarity to the input image. The processing device then identifies a procedural material that has a visual similarity to the particular visual appearance by comparing the color distribution for the input image to color distributions associated with the filtered candidate procedural materials. In this way, the techniques described herein support efficient retrieval of procedural materials based on color and on semantic features of the input image.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a reference image an input prompt describing an image element, identifying an object from the reference image; generating, using an image generation model, image features representing the object based on the reference image, and generating, using the image generation model, a synthetic image depicting the image element and the object based on the input prompt and the image features from the reference image.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images via multi-layered scene completion techniques facilitated by artificial intelligence. For instance, in some embodiments, the disclosed systems receive a digital image portraying a first object and a second object against a background, where the first object occludes a portion of the second object. Additionally, the disclosed systems pre-process the digital image to generate a first content fill for the portion of the second object occluded by the first object and a second content fill for a portion of the background occluded by the second object. After pre-processing, the disclosed systems detect one or more user interactions to move or delete the first object from the digital image. The disclosed systems further modify the digital image by moving or deleting the first object and exposing the first content fill for the portion of the second object.
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
In implementation of techniques for offsetting camera filter shift, a computing device implements an offset system to capture a first digital image using a filter at a first position relative to an image capture device and to capture a second digital image using the filter at a second position relative to the image capture device resulting from movement of the filter between the first position and the second position. The offset system determines a filter shift resulting from the movement by comparing the first and second digital images. The offset system then controls an offset of a portion of the image capture device based on the filter shift.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for training and implementing a vision-language model using masked distillation and contrastive image-text training. In particular, in one or more embodiments, the disclosed systems generate, utilizing a vision encoder, an image embedding from a masked digital image comprising a digital image with one or more masked patches. In some embodiments, the disclosed systems generate, utilizing a text encoder, a text embedding from a masked text phrase. In one or more embodiments, the disclosed systems generate, utilizing the vision-language model from the image embedding and the text embedding, a predicted text reconstruction of the text description and a predicted image reconstruction of the digital image. In some embodiments, the disclosed systems modify parameters of the vision-language model according to a masked distillation loss between the predicted text reconstruction and a text reconstruction generated by a pretrained large language model.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06F 40/40 - Processing or translation of natural language
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
42 - Scientific, technological and industrial services, research and design
Goods & Services
Software as a service (SAAS) services featuring software for using artificial intelligence models forcontent generation and management, all relating to sound, audio, and music generation from user prompts and for generating translations; software as a service (SAAS) services featuring software forusing artificial intelligence models for content generation and management.
70.
CHANNEL INCREMENTALITY MEASUREMENT USING CAUSAL FOREST
One or more aspects of the method, apparatus, and non-transitory computer readable medium include obtaining content presentation data; generating, using a machine learning model, predicted user interaction data by computing a plurality of decision tree regressors, wherein nodes of the decision tree regressors are trained to infer a causal relationship between a user interaction variable and a treatment variable; and present content to the user based on the predicted user interaction data. The causal relationship is based on maximizing a difference in a relationship between a user interaction variable and a treatment variable of a tree.
Artificial intelligence (AI) agent control and progress indicator techniques are described. A search-query type of a search query, for instance, is detected using a machine-learning model. Responsive to detecting the search-query type is a first type, the search query is communicated for processing by an algorithmic search engine to generate a search result. Responsive to the detecting that the search-query type is a second type, the search query is communicated for processing using an artificial intelligence (AI) search assistant implemented using a large language model (LLM) to generate the search result.
Digital video editing techniques are described that are based on a target digital image. In one or more implementations, inputs are received. The inputs include a target text prompt, a target digital image depicting a target object, and a source digital video having a plurality of frames depicting a source object. Regions-of-interest are identified in the plurality of frames of the source digital video, respectively, based on the target text prompt and the target digital image using a machine-learning model, e.g., a diffusion model. A plurality of frames of a target digital video are generated as having the target object using a generative machine-learning model. The generating is based on the regions-of-interest, the target digital image, the source digital video, and a source text prompt describing the source digital video.
Digital image synthesis techniques are described that leverage splatting, i.e., forward warping. In one example, a first digital image and a first optical flow are received by a digital image synthesis system. A first splat metric and a first merge metric are constructed by the digital image synthesis system that defines a weighted map of respective pixels. From this, the digital image synthesis system produces a first warped optical flow and a first warp merge metric corresponding to an interpolation instant by forward warping the first optical flow based on the splat metric and the merge metric. A first warped digital image corresponding to the interpolation instant is formed by the digital image synthesis system by backward warping the first digital image based on the first warped optical flow.
This disclosure describes one or more implementations of systems, non-transitory computer-readable media, and methods that utilizes a segmentation approach that distinguishes between smooth-shaded regions from high-frequency regions in an image within a vectorization pipeline to generate a vector image. For instance, the disclosed systems utilize a smoothing function to identify non-overlapping sets of pixels that include locally smooth pixels and pixels with high frequency details for an image. Furthermore, in some instances, the disclosed systems generate separate sets of fill functions (representing color-based regions) using color-based pixel clustering for the non-overlapping sets of pixels. Moreover, in one or more instances, the disclosed systems merge neighboring color-based regions in the sets of fill functions (using color similarity) to generate a set of segmented regions for an image. In some implementations, the disclosed systems utilize the set of segmented regions, from the image, to generate a vector image from the image.
A method, apparatus, non-transitory computer readable medium, and system for image processing include obtaining a text prompt describing an element and an attribute value for a continuous attribute of the element, embedding the text prompt to obtain a text embedding in a text embedding space, embedding the attribute value to obtain an attribute embedding in the text embedding space, and generating a synthetic image based on the text embedding and the attribute embedding, where the synthetic image depicts the continuous attribute of the element based on the attribute value.
Embodiments of the disclosure provide a machine learning model for generating a predicted executable command for an image. The learning model includes an interface configured to obtain an utterance indicating a request associated with the image, an utterance sub-model, a visual sub-model, an attention network, and a selection gate. The machine learning model generates a segment of the predicted executable command from weighted probabilities of each candidate token in a predetermined vocabulary determined based on the visual features, the concept features, current command features, and the utterance features extracted from the utterance or the image.
G06V 10/86 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognitionArrangements for image or video recognition or understanding using pattern recognition or machine learning using graph matching
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G06N 3/044 - Recurrent networks, e.g. Hopfield networks
G06N 3/088 - Non-supervised learning, e.g. competitive learning
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a sketch input and a value of a fidelity parameter indicating a level of adherence to the sketch input. The sketch input and the value of the fidelity parameter are encoded to obtain sketch guidance information. Then a synthesized image is generated based on the sketch guidance information. The synthesized image depicts an object from the sketch input based on the fidelity parameter.
Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.
Methods and systems are provided for generating and using a unified fine-tuning dataset to fine-tune a language model. In embodiments described herein, a first labeled dataset having a first format of human feedback associated with performance of a pre-trained language model is accessed. Additionally, a second labeled dataset having a second format of human feedback associated with performance of the pre-trained language model is accessed. Thereafter, a unified fine-tuning dataset is generated by converting the second labeled dataset to a refined labeled dataset having the first format of human feedback and aggregating the first labeled dataset having the first format of human feedback with the refined labeled dataset having the first format of human feedback. The pre-trained language model is fine-tuned using the unified fine-tuning dataset and output for subsequent utilization.
The present disclosure is directed toward systems, methods, and non-transitory computer readable media that provide a contextual query answering system that trains and implements a unique machine learning architecture to generate accurate domain-specific contextual responses. For example, the disclosed systems receive a contextual query indicating a software context of a computer application within a software-specific domain. The disclosed systems utilize a context retrieval model to generate query embeddings from the contextual query and data segment embeddings from data segments of stored digital documents. Further, the context retrieval model determines relevant digital documents from among the stored digital documents based on comparing the query embeddings and the data segment embeddings. The disclosed systems provide the relevant digital documents to a response generator model to generate a contextual response within the software-specific domain.
Techniques are disclosed for using a meta-machine learning (ML) model for monitoring and attribution of drift associated with an ML model. In an example method, a training module trains a meta-ML model to map an input feature drift to an output metric drift for an ML model. The meta-ML model is trained using meta training data including a number of meta training data points. Each data point includes an input feature drift value and an output metric drift value generated by determining, for a set of input features mapped to a corresponding set of output metrics, divergences between a set of baseline input features and the set of input features and between a set of baseline output metrics and the corresponding set of output metrics. An output module outputs a predicted output metric drift of the ML model for a particular input feature drift of the ML model.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating intertwined digital designs according to the visual order of structural graph nodes. In particular, in one or more embodiments, the disclosed systems generate, by at least one processor, a structural graph of a digital design that represents overlapping surfaces of objects in the digital design as nodes and object paths between the overlapping surfaces as edges. Further, the disclosed systems assign, by the at least one processor, a visual order to the nodes based on a configuration of the structural graph. Moreover, the disclosed systems generate, by the at least one processor, an intertwined digital design by ordering the overlapping surfaces of the objects in accordance with the assigned visual order of the nodes.
A conversational branch data prediction system predicts conversational branch data that can be used in automated conversational services (e.g., “chatbots”). The conversational branch data prediction system predicts the conversational branch data based on interactions with multiple sections across multiple webpages of a website, such as sections that include particular portions of text on webpages and omit additional portions of the text. For each section, the conversational branch data prediction system determines a vector embedding of text data in the section and a topic. Based on event metrics data, the conversational branch data prediction system identifies interactions with a particular subset of the sections having a particular topic. A trained machine-learning dialogue model identifies conversational text data correlated with vector embeddings associated with the particular subset of sections. The conversational branch data prediction system provides the identified conversational text data to an additional computing system.
H04L 51/02 - User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
85.
NEURAL NETWORKS TO RENDER TEXTURED MATERIALS ON CURVED SURFACES
A scene modeling system accesses a three-dimensional (3D) scene including a 3D object. The scene modeling system applies a silhouette bidirectional texture function (SBTF) model to the 3D object to generate an output image of a textured material rendered as a surface of the 3D object. Applying the SBTF model includes determining a bounding geometry for the surface of the 3D object. Applying the SBTF model includes determining, for each pixel of the output image, a pixel value based on the bounding geometry. The scene modeling system displays, via a user interface, the output image based on the determined pixel values.
Methods, computer systems, computer-storage media, and graphical user interfaces are provided for facilitating generation of data story recommendations. In one implementation, a set of candidate data stories is generated. Each candidate data story can include various data visualizations. From the set of candidate data stories, a data story recommendation is determined based on an adaptive elicitation of user feedback via a set of inquiries selected in accordance with at least one potential reduction of the set of candidate data stories. Thereafter, the data story recommendation, including a set of data visualizations is provided for display.
A method, apparatus, non-transitory computer readable medium, and system for video generation include obtaining an input image having an element depicted in a first view angle, generating a synthetic image depicting the element of the input image from a second view angle different from the first view angle, generating an intermediate image by interpolating based on the synthetic image, and generating a video based on the synthetic image and the intermediate image, where the video depicts the element of the input image from a changing view angle.
The present disclosure relates to systems, methods, and non-transitory computer readable media that generate segment labels for image segments of a digital image that have been determined via deep segmentation. For instance, in some embodiments, the disclosed systems generate, using a segment classification neural network, an image embedding for a digital image portraying a plurality of image segments. Additionally, the disclosed systems determine, using the segment classification neural network, masked segment embeddings for the plurality of image segments of the digital image based on the image embedding and a plurality of masks corresponding to the plurality of image segments. Based on the masked segment embeddings, the disclosed systems use the segment classification neural network to determine segment labels for the plurality of image segments.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
Multimodal digital audio generation techniques are described that leverage multimodal inputs such as a digital image and text to generate digital audio using machine learning. In one or more examples, a digital image and text are received. Image semantic information is extracted from the digital image using machine learning. Digital audio is generated using generative machine learning based on the text and the image semantic information. The digital audio is then rendered and output by a digital audio output device.
G10H 1/00 - Details of electrophonic musical instruments
G06F 40/40 - Processing or translation of natural language
G06V 10/40 - Extraction of image or video features
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
90.
USER INSIGHTS USING DEEP GENERATIVE FOUNDATION MODELS
Systems and methods for generating user insights include obtaining a query about a user interaction with a software application. The query can be in the form of a natural language question. Embodiments then select a task from a plurality of event prediction tasks based on the query. Next, embodiments generate, using a machine learning model, an event prediction based on the query and the task, where the machine learning model is trained to predict an event based on a sequence of user interactions with the software application. Embodiments then generate a natural language response to the query based on the task and the event prediction.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates a design representation to further construct a digital design multigraph and generate a structural representation for a digital design document from the digital design multigraph. For instance, the disclosed systems generate a design representation of a digital design document that includes design properties with multiple digital design elements. In particular, the disclosed systems construct a digital design (multi-)graph from the design representation by generating nodes to represent digital design elements and edges based on relationships between these elements. In addition, the disclosed systems generate a structural representation based on the digital design multigraph for downstream applications. For instance, downstream applications include utilizing the structural representation to select a resizing model from a plurality of resizing models and resizing a digital design document using the structural representation.
A method, apparatus, non-transitory computer readable medium, and system for image generation includes obtaining a text prompt describing an object and a keyable background and generating an image including the object and the keyable background based on the text prompt. Some embodiments generate an alpha image by replacing the keyable background with an alpha channel.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating and visualization font variation instances of a variable font using deep-aware focal points in a font variation space. In particular, in one or more embodiments, the disclosed systems determine, for a variable font, a set of predefined font variation instances defining named typeface variations of the variable font. In some embodiments, the disclosed systems generate, from the set of predefined font variation instances, a plurality of focal points representing additional typeface variations of the variable font in a font variation space. In one or more embodiments, the disclosed systems generate, from the plurality of focal points in the font variation space, a modified set of font variation instances defining intermediate typeface variations between the named typeface variations of the variable font.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modifies parameters of a fused feature extractor. In particular, the disclosed systems generate inferred masks from digital images and digital text prompts using a fused feature extractor. Furthermore, the disclosed systems identify a subset of the inferred masks that satisfy a validity threshold. Moreover, the disclosed systems generate an augmented training set by combining the subset of the inferred masks with a training set that includes the ground truth masks. Further, the disclosed systems generate object mask predictions from the augmented training set and determine ground truth and pseudo measures of loss by comparing the object mask predictions with the inferred masks and the ground truth masks. From the ground truth and pseudo measures of loss, the disclosed systems modify parameters of the fused feature extractors.
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
95.
MATCHING DIGITAL FONTS UTILIZING A RESIDUAL NEURAL NETWORK
The present disclosure relates to systems, non-transitory computer-readable media, and methods for determining predicted digital fonts for textual characters within digital images utilizing one or more machine learning models or neural networks. In particular, in one or more embodiments, the disclosed systems determine textual characters within a target digital image and determine one or more predicted fonts for the textual characters utilizing a font recognition machine learning model to extract features of the textual characters from the target digital image, the font recognition machine learning model comprising parameters learned from synthetic text data comprising sample textual images generated with a multi-attribute probabilistic model across a distribution of text attributes.
In one aspect, a query management module executing on a processor receives, from a large language model (LLM), a graph query generated by the LLM based on a natural language query (NLQ). A validation module identifies an error in the graph query. The query management module provides an indication of the error to the LLM. The query management module receives a modified graph query from the LLM. The validation module validates the modified graph query. Based on the validation of the modified graph query, the query management module executes the modified graph query against a knowledge graph to return a result as a response to the NLQ.
Methods, computer systems, computer storage media, and graphical user interfaces are provided for facilitating generation of data stories and data summaries in accordance with user queries. In one implementation, a user query is obtained in association with a dataset. Thereafter, a set of facts relevant to the user query are identified. A data story is generated using a portion of the set of facts relevant to the user query. The data story includes a set of visualizations corresponding with the portion of the set of facts relevant to the user query. The set of facts relevant to the user query is used to generate a data summary of the set of relevant facts. The data story and/or the data summary are provided for display.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that modify digital images using an intelligent user interface tool that determines the intent of a user interaction. For instance, in some embodiments, the disclosed systems receive, via a graphical user interface of a client device, a user interaction with a set of pixels within a digital image. The disclosed systems determine, based on the user interaction, a user intent for targeting one or more portions of the digital image for deletion, the one or more portions including an additional set of pixels that differs from the set of pixels. Based on the user intent, the disclosed systems modify the digital image to delete the one or more portions from the digital image
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
42 - Scientific, technological and industrial services, research and design
Goods & Services
(1) Software as a service (SAAS) services featuring software for using artificial intelligence models for creating and editing of digital audio content, namely, sound and music generated from user prompts; Software as a service (SAAS) services featuring software for using artificial intelligence models for generating language translations from user prompts; Software as a service (SAAS) services featuring software for using artificial intelligence models for creating and editing of digital content being images, photos, text messages, videos, sound, and music.
100.
DETECTING DIGITAL OBJECTS AND GENERATING OBJECT MASKS ON DEVICE
The present disclosure relates to systems, methods, and non-transitory computer-readable media that generates object masks for digital objects portrayed in digital images utilizing a detection-masking neural network pipeline. In particular, in one or more embodiments, the disclosed systems utilize detection heads of a neural network to detect digital objects portrayed within a digital image. In some cases, each detection head is associated with one or more digital object classes that are not associated with the other detection heads. Further, in some cases, the detection heads implement multi-scale synchronized batch normalization to normalize feature maps across various feature levels. The disclosed systems further utilize a masking head of the neural network to generate one or more object masks for the detected digital objects. In some cases, the disclosed systems utilize post-processing techniques to filter out low-quality masks.