A data object from a data source is received by a distributed process in a data stream. The distributed process has a sequence of categories, each category containing one or more tasks that operate on the data object. The data object includes files that can be processed by the tasks. If the task is able to operate on the data object, then the data object is passed to the task. If the task is unable to operate on the data object, then the files in the data object are passed to a file staging area of the distributed process and stored in memory. The files in the file staging area are passed, in sequence, from the file staging area to the task that was unable to operate on the data object. The data object is outputted to a next category or data sink after being operated on by the task.
A computer-implemented system to detect vulnerabilities in artificial intelligence (AI) models, the system comprising a first AI model for calculating a first score for a first transaction based on one or more features extracted from the first transaction and transaction history associated with the first transaction, the first transaction being tagged as potentially adversarial, in response to determining that the first score is in an improbable range based on comparing first attributes associated with the first transaction with second attributes associated with at least a second transaction, the comparison indicating the first transaction has a low likelihood of occurrence; and a second AI model for identifying adversarial transactions, in response to determining that number of plurality of example transactions scored by the first model is sufficient to train the second AI model.
A computer-implemented method includes maintaining a historical data sample comprising a plurality of observation records, each observation record comprising a set of predictive features, a set of dependent variables, and a baseline weight variable; generating a target distribution for the plurality of observation records, wherein the target distribution comprises a first plurality of target percentages for a subset of the predictive features and a second plurality of target percentages for a subset of the dependent variables, generating a reweight variable for each observation record based at least in part on the target distribution and the baseline weight variable, wherein the reweight variable comprises optimized weights for each of the predictive features and each of the dependent variables, and generating a reweighted data sample by replacing the baseline weight variable in each observation record of the historical data sample with a corresponding reweight variable.
Systems and methods are provided for accessing a database of records to identify a set of records represented by one or more nodes in a graph model. A connection between a first node and a second node in the one or more nodes is monitored to determine an association between a first record, represented by the first node, and a second record, represented by the second node. The set of records may be partitioned into a plurality of groups. For at least a first group, including a first set of records, it may be determined whether two or more records in the first group are related. In response to determining that the two or more records in the first group are related, a first group identifier may be assigned to the two or more records.
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 16/901 - IndexationStructures de données à cet effetStructures de stockage
5.
CONFIDENCE METRIC DRIVEN MODEL OUTPUTS MANAGEMENT BY ACCOUNTING FOR UNCERTAINTY IN INPUT ENTITIES
A computer-implemented method, the method comprises receiving input feature data from a plurality of entities, wherein the plurality of entities comprises deterministic entities and uncertain entities, wherein the uncertain entities are subject to an indeterministic state; processing the input feature data derived from the deterministic entities using a first model to generate a first output; processing the input feature data derived from the deterministic entities and the uncertain entities using a second model to generate a second output; and selecting between the first output and the second output as a final output, wherein the selecting is based at least in part on a confidence level on the second output and a predetermined confidence threshold.
A method, a system, and a computer program product for generating a refined synthetic data from one or more sources of data. One or more source data are received from one or more data sources. One or more encoded source data are generated from the one or more source data. A synthetic data is generated by decoding one or more encoded source data. One or more variables in the synthetic data are selected and one or more predetermined identifiability values and one or more predetermined anonymity values are associated with them. The generated synthetic data including the selected variables is decoded using associated one or more predetermined identifiability values and one or more predetermined anonymity values. The decoded synthetic data is outputted.
A computer-implemented method for generating a classifier, comprising: assigning a plurality of hierarchies of tags to a collection of training examples, wherein a higher level tag of the plurality of hierarchies of tags comprises a set of lower level tags; associating, in the classifier, a plurality of latent features with each of the plurality of hierarchies of tags, respectively; constructing a plurality of loss functions, wherein each loss function is associated with each level of the plurality of hierarchies of tags and associated latent features of the classifier, wherein the loss function aggregates a plurality of binary cross entropy for each member of a level of tags and associated latent features; and training the classifier by minimizing the loss functions for each level of the plurality of hierarchies of tags and associated latent features of the classifier.
Computer-implemented machines, systems and methods for providing insights about misalignment in a latent space of a machine learning model. A method includes initializing a second weight matrix of a second artificial neural network based on a first weight matrix from a first artificial neural network. The method further includes applying transfer learning between the first artificial neural network and the second artificial neural network. The method further includes comparing the first latent space with the second latent space. The method further includes determining, responsive to the comparing, a first score indicating alignment of the first latent space and the second latent space. The method further includes determining, and responsive to the first score satisfying a threshold, an appropriateness of the machine learning model.
A computer-implemented method for managing deployment units in a decision management platform; the method comprises creating a revision of a deployment unit selected from a list of deployment units available for a solution, wherein the solution is an application that results in a change to the decision management platform; receiving a promotion request to promote the revision to a lifecycle environment, wherein the revision comprises a snapshot of the selected deployment unit at a time of the promotion request to promote the revision; and deploying the revision of the deployment unit to the lifecycle environment upon approval of the promotion request, wherein the deployment units are managed independently from one another.
A method, a system, and a computer program product for generating an interpretable set of features. One or more search parameters and one or more constraints on one or more search parameters for searching data received from one or more data sources are defined. The data received from one or more data sources is searched using the defined search parameters and constraints. One or more first features are extracted from the searched data. The first features are associated with one or more predictive score values. The searching is repeated in response to receiving a feedback data responsive to the extracted first features. One or more second features resulting from the repeated searching are generated.
Provided herein is A computer-implemented method for generating a configuration package, comprising: maintaining a list of capabilities at a computational platform, wherein the list of capabilities is associated with a list of configurations stored in a configuration repository; detecting, by one or more processors, an additional capability at the computational platform; automatically adding, by the computational platform, a configuration associated with the additional capability to the configuration repository; receiving, by a packaging service, a user-generated input from a user, wherein the user-generated input configures a solution service to generate solution data by one or more capabilities, the user-generated input selecting the one or more capabilities at the computational platform, and generating, by the packaging service, a configuration package comprising one or more configurations associated with the selected one or more capabilities.
A computer-implemented method for generating a classifier, comprising: processing a training data set, wherein the training data set comprises a plurality of training examples, and wherein each of the plurality of training examples is associated with multiple class labels; training a set of candidate classifiers, wherein random weights are assigned to multiple training objectives for training the set of candidate classifiers, and wherein the multiple training objectives corresponds to the multiple class labels; assessing each classifier of the set of candidate classifiers to generate multiple performance measurements for each of the set of candidate classifiers, wherein each of the multiple performance measurements is associated with each of the multiple training objectives, respectively; generating a tradeoff table presenting performances of each classifier of the set of candidate classifiers, based at least in part on the multiple performance measurements.
A computer-implemented method for generating a classifier, comprising: processing a training data set, wherein the training data set comprises a plurality of training examples, and wherein each of the plurality of training examples is associated with multiple class labels and a group membership label; training a set of candidate classifiers, wherein random weights are assigned to multiple performance objectives for training the set of candidate classifiers, and wherein the multiple performance objectives corresponds to the multiple class labels; assessing each classifier of the set of candidate classifiers to generate multiple performance measurements for each of the set of candidate classifiers, wherein each of the multiple performance measurements is associated with each of the multiple performance objectives, respectively; generating a tradeoff table presenting performances of each classifier of the set of candidate classifiers, based at least in part on the multiple performance measurements.
A computer-implemented method for generating a classifier, comprising: receiving aggregated statistics objects, wherein the aggregated statistics objects comprise bin frequencies F0 and F1, wherein the bin frequencies F0 and F1 are calculated for each of a plurality of predictive features, conditioned on a target value being 0 or 1, respectively; bin-level covariances C0 and C1, wherein the bin-level covariances C0 and C1 are calculated for each pair of bins, conditioned on the target value being 0 or 1, respectively; feeding the aggregate statistics objects F0, F1, C0, and C1 into the classifier, wherein the classifier generates a score calculated as a sum of a plurality of flexible nonlinear shape functions applied to the plurality of predictive features, respectively; and training the classifier by fitting the plurality of shape functions to maximize a divergence for score separation between target value 0 and target value 1.
A system and method for analyzing coverage, bias and model explanations in large dimensional modeling data includes discretizing three or more variables of a dataset to generate a discretized phase space represented as a grid of a plurality of cells, the dataset comprising a plurality of records, each record of the plurality of records having a value and a unique identifier (ID). A grid transformation is applied to each record in the dataset to assign each record to a cell of the plurality of cells of the grid according to the grid transformation. A grid index is generated to reference each cell using a discretized feature vector. A grid storage for storing the records assigned to each cell of the grid is then created. The grid storage using the ID of each record as a reference to each record and the discretized feature vector as a key to each cell.
A method is provided for implanting a pairwise interaction detection tool. The method includes binning input samples in a first dimension associated with a first predictor of an outcome based at least on a sample minimum, binning the input samples in a second dimension associated with a second predictor of the outcome, determining a two-dimensional risk pattern based at least on a first one-dimensional risk pattern associated with the first predictor along the first dimension and a second one-dimensional risk pattern associated with the second predictor along the second dimension, comparing a first divergence of a first machine learning model to a second divergence of a second machine learning model, and predicting a strength of an interaction effect between the first predictor and the second predictor based on the comparison. Related methods and articles of manufacture are also disclosed.
A method is provided for multivariate counterfactual diffusion in desensitizing behavior latent features learned on limited data. The method includes generating a plurality of synthetic vectors for each input vector of a plurality of input vectors used to train a first machine learning model, where the plurality of synthetic vectors represent potential counterfactual s associated with the corresponding input vector. The method also includes filtering the plurality of synthetic vectors to identify counterfactual synthetic vectors. The method further includes predicting, by a second machine learning model trained based on the plurality of input vectors and the filtered plurality of counterfactual synthetic vectors, a classification of at least one input vector of the plurality of input vectors. Related methods and articles of manufacture are also disclosed.
A method is provided for multivariate counterfactual diffusion in desensitizing behavior latent features learned on limited data. The method includes generating a plurality of synthetic vectors for each input vector of a plurality of input vectors used to train a first machine learning model, where the plurality of synthetic vectors represent potential counterfactuals associated with the corresponding input vector. The method also includes filtering the plurality of synthetic vectors to identify counterfactual synthetic vectors. The method further includes predicting, by a second machine learning model trained based on the plurality of input vectors and the filtered plurality of counterfactual synthetic vectors, a classification of at least one input vector of the plurality of input vectors. Related methods and articles of manufacture are also disclosed.
A method includes determining, by a trained machine learning model, a score based at least on one or more latent features. The method also includes monitoring the determining of the score by the trained machine learning model. The monitoring includes determining one or more production statistics associated with the one or more latent features, derived variables and input data elements, and accessing one or more reference assets persisted on a model governance blockchain. The one or more reference assets includes one or more reference statistics and a threshold indicating a deviation between the one or more production statistics and the one or more reference statistics. The method also includes generating an alert based on the one or more production statistics associated with the one or more latent features meeting the threshold. Related methods and articles of manufacture are also disclosed.
A method includes determining, by a trained machine learning model, a score based at least on one or more latent features. The method also includes monitoring the determining of the score by the trained machine learning model. The monitoring includes determining one or more production statistics associated with the one or more latent features, derived variables and input data elements, and accessing one or more reference assets persisted on a model governance blockchain. The one or more reference assets includes one or more reference statistics and a threshold indicating a deviation between the one or more production statistics and the one or more reference statistics. The method also includes generating an alert based on the one or more production statistics associated with the one or more latent features meeting the threshold. Related methods and articles of manufacture are also disclosed.
Explanatory dropout systems and methods for improving a computer implemented machine learning model are provided using on-manifold/on-distribution evaluation of dropout of key features to explain model outputs. The machine learning model is trained using a plurality of input examples, including input records with explicit dropout operators applied effectuating the removal of influence of features associated with an explanation reason class. One or more dropout operators may be stochastically applied to one or more input examples. The procedure includes on-manifold/on-distribution evaluation of the machine learning model under conditions of absence or presence of the one or more dropout operators for reliable calculation of numerical statistics associated with reason classes to yield model explanations. The training and evaluation procedures present advantages over traditional off-manifold or off-distribution perturbative explanation procedures.
A method includes generating a plurality of binary feature maps containing a set of feature map values including a first binary value and/or a second binary value, by at least converting each input value of a set of input values of a plurality of input feature vectors to the first binary value when the corresponding input value is the zero value or the second binary value when the corresponding input value is the non-zero value. The method includes segmenting the plurality of binary feature maps into a plurality of segments representing behavior profiles. Each segment includes at least one subsegment in which the set of feature map values is the same for all binary feature maps in the at least one subsegment. The method includes predicting, based on a segment of the plurality of segments, a specific outcome. Related methods and articles of manufacture are also disclosed.
G06V 10/75 - Organisation de procédés de l’appariement, p. ex. comparaisons simultanées ou séquentielles des caractéristiques d’images ou de vidéosApproches-approximative-fine, p. ex. approches multi-échellesAppariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexteSélection des dictionnaires
G06V 10/771 - Sélection de caractéristiques, p. ex. sélection des caractéristiques représentatives à partir d’un espace multidimensionnel de caractéristiques
G06V 40/20 - Mouvements ou comportement, p. ex. reconnaissance des gestes
Computer-implemented methods, systems and products for analytics and discovery of patterns or signals. The method includes a set of operations or steps, including collecting data from a plurality of data sources, the data having a plurality of associated data types, and filtering the collected data based on identifying viable data sources from which the data is collected. The method further includes prioritizing discovery objectives based on analyzing the filtering results, and enriching the filtered collected data from viable data sources according to the prioritized discovery objectives. The method further includes extracting one or more signals from the enriched data using one or more machine learning mechanisms in combination with qualified subject matter expertise input, and graphically displaying the extracted signals in a meaningful way to a human operator such that the human operator is enabled to understand importance of extracted signals.
A method is provided for a first to saturate single modal latent feature activation network. The method includes training, based on a plurality of training examples including a plurality of input features, a first machine learning model including a hidden node. The method includes determining a plurality of subsets of the plurality of input features including a minimum combination of the plurality of input features first to cause saturation of the hidden node. The method includes determining a hidden node ordered saturation list including a subset of the plurality of subsets. The method includes generating a sparsely trained machine learning model to determine an output for a training example of the plurality of training examples based on at least one input feature of the subset included in the hidden node ordered saturation list corresponding to the hidden node. Related methods and articles of manufacture are also disclosed.
A data object from a data source is received by a distributed process in a data stream. The distributed process has a sequence of categories, each category containing one or more tasks that operate on the data object. The data object includes files that can be processed by the tasks. If the task is able to operate on the data object, then the data object is passed to the task. If the task is unable to operate on the data object, then the files in the data object are passed to a file staging area of the distributed process and stored in memory. The files in the file staging area are passed, in sequence, from the file staging area to the task that was unable to operate on the data object. The data object is outputted to a next category or data sink after being operated on by the task.
A method, a system, and a computer program product for generating an interpretable set of features. One or more search parameters and one or more constraints on one or more search parameters for searching data received from one or more data sources are defined. The data received from one or more data sources is searched using the defined search parameters and constraints. One or more first features are extracted from the searched data. The first features are associated with one or more predictive score values. The searching is repeated in response to receiving a feedback data responsive to the extracted first features. One or more second features resulting from the repeated searching are generated.
A method may include generating synthetic data based on input data and training a machine learning model based on the synthetic data. The synthetic data may be generated by determining a plurality of data points representing an archetype probability distribution of a plurality of archetypes, clustering the plurality of data points into one or more clusters associated with transactional behavior patterns, generating a threshold metric representing a peak distribution density of the plurality of data points associated with a corresponding cluster, removing, from the plurality of data points, one or more non-representative data points to define a reduced set of the plurality of data points, generating an updated archetype probability distribution based at least on the reduced set of the plurality of data points, and generating representative transaction data based on the updated archetype probability distribution and threshold metric. Related methods and articles of manufacture are al so disclosed.
A diagnostic system for model governance is presented. The diagnostic system includes an auto-encoder to monitor model suitability for both supervised and unsupervised models. When applied to unsupervised models, the diagnostic system can provide a reliable indication on model degradation and recommendation on model rebuild. When applied to supervised models, the diagnostic system can determine the most appropriate model for the client based on a reconstruction error of a trained auto-encoder for each associated model. An auto-encoder can determine outliers among subpopulations of consumers, as well as support model go-live inspections.
i, and constructing at least one data-driven estimator based on an explanatory statistic, the estimator being represented in a computationally efficient form and packaged with the machine learning model and utilized to provide a definition of explainability for a score generated by the machine learning model.
Systems and methods for generating concise explanations of scored observations that strike good, and computationally efficient, trade-offs between rank-ordering performance and explainability of scored observations are disclosed. The systems and methods described herein for explaining scored observations are based on a framework of partial dependence functions (PDFs), multi-layered neural networks (MNNs), and Latent Explanations Neural Network Scoring (LENNS).
09 - Appareils et instruments scientifiques et électriques
36 - Services financiers, assurances et affaires immobilières
41 - Éducation, divertissements, activités sportives et culturelles
Produits et services
Downloadable documents in the field of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit provided via a website Financial counseling services in the nature of financial consulting; Credit counseling services; Financial credit scoring services; Providing information about financial credit and credit scores; Providing a website featuring information in the fields of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit; Providing educational financial information in the fields of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit Education services, namely, providing in person and online classes, seminars, and workshops in the fields of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit; Providing on-line non-downloadable educational articles in the field of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit; Providing online non-downloadable electronic publications in the nature of educational books and brochures in the field of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit; Educational services, namely, providing web-based online educational computer games in the fields of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit; Providing education curriculum in the nature of educational services, namely, developing curriculum for others in the field of financial literacy, financial goals, credit scores, credit decisions, credit management, and building credit
32.
Overly optimistic data patterns and learned adversarial latent features
Systems for improving security of a computer-implemented artificial intelligence by monitoring one or more transactions received by the machine learning decision model; receiving a first score generated by the machine learning decision model in association with a first transaction; identifying the first transaction as belonging to a first class, in response to the first score being lower than a certain score threshold and the first transaction having a low occurrence likelihood; receiving a second score in association with the first transaction based on one or more adversarial latent features associated with the first transaction as detectable by an adversary detection model; and determining at least one adversarial latent transaction feature being exploited by the first transaction, in response to determining that the second score falls above the certain score threshold.
Computer-implemented systems, methods and products for modeling sensitivities to potential disruptions by observing performances of entities in a first sub-population and a second sub-population using a machine learning model comprising a set of predictors and a binary indicator variable associated with a first entity subjected to a first event associated with the first sub-population, the machine learning model trained to predict an expected performance for the first entity based on at least one of a known attribute associated with the first entity in relation to the first event and a value of the binary indicator variable associated with the first event.
G06Q 10/06 - Ressources, gestion de tâches, des ressources humaines ou de projetsPlanification d’entreprise ou d’organisationModélisation d’entreprise ou d’organisation
G06N 20/20 - Techniques d’ensemble en apprentissage automatique
G06Q 10/0635 - Analyse des risques liés aux activités d’entreprises ou d’organisations
G06Q 10/0637 - Gestion ou analyse stratégiques, p. ex. définition d’un objectif ou d’une cible pour une organisationPlanification des actions en fonction des objectifsAnalyse ou évaluation de l’efficacité des objectifs
G06Q 30/02 - MarketingEstimation ou détermination des prixCollecte de fonds
G16H 50/20 - TIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour le diagnostic assisté par ordinateur, p. ex. basé sur des systèmes experts médicaux
G16H 50/30 - TIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour le calcul des indices de santéTIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour l’évaluation des risques pour la santé d’une personne
34.
METHOD AND SYSTEM FOR PREDICTING ADHERENCE TO A TREATMENT
Data characterizing an individual is received. Thereafter, one or more variables are extracted from the data so that, using a predictive model populated with the extracted variables, a likelihood of the individual adhering to a treatment regimen can be determined. The predictive model is trained on historical treatment regimen adherence data empirically derived from a plurality of subjects. Subsequently, data characterizing the determined likelihood of adherence can be promoted.
G16H 50/50 - TIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour la simulation ou la modélisation des troubles médicaux
G16H 20/10 - TIC spécialement adaptées aux thérapies ou aux plans d’amélioration de la santé, p. ex. pour manier les prescriptions, orienter la thérapie ou surveiller l’observance par les patients concernant des médicaments ou des médications, p. ex. pour s’assurer de l’administration correcte aux patients
G16H 40/63 - TIC spécialement adaptées à la gestion ou à l’administration de ressources ou d’établissements de santéTIC spécialement adaptées à la gestion ou au fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement local
G16H 40/67 - TIC spécialement adaptées à la gestion ou à l’administration de ressources ou d’établissements de santéTIC spécialement adaptées à la gestion ou au fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement à distance
G16H 70/20 - TIC spécialement adaptées au maniement ou au traitement de références médicales concernant des pratiques ou des directives
G16H 10/60 - TIC spécialement adaptées au maniement ou au traitement des données médicales ou de soins de santé relatives aux patients pour des données spécifiques de patients, p. ex. pour des dossiers électroniques de patients
G06Q 50/22 - Aide sociale ou assistance sociale, p. ex. activités de développement communautaire ou services de consultation
Software validation systems, products, and methods for determining a plurality of test scenarios for a software code under test. The test scenarios may be defined based on at least one of values assigned to one or more variables declared in the software code, relationships defined between the one or more variables, and execution paths leading to one or more outcomes based on the values and the relationships, in response to the software code being executed. At least two or more test scenarios, from among the plurality of test scenarios, are consolidated into a first test scenario based on values defined in a modifiable script.
A method, a system, and a computer program product for calibrating synthetic data. A synthetic data is generated based on one or more source data using one or more generative models. The generative models are used to generate a latent space based on one or more source data. One or more latent space vectors associated with the generated latent space are determined in accordance with one or more data profiles associated with the one or more source data. The latent space vectors associated with the generated latent space are sampled. Based on the sampling, an optimized synthetic data is generated by comparing the sampled latent space vectors with one or more baseline data associated with one or more data profiles.
A method, a system, and a computer program product for generating an interpretable set of features. One or more search parameters and one or more constraints on one or more search parameters for searching data received from one or more data sources are defined. The data received from one or more data sources is searched using the defined search parameters and constraints. One or more first features are extracted from the searched data. The first features are associated with one or more predictive score values. The searching is repeated in response to receiving a feedback data responsive to the extracted first features. One or more second features resulting from the repeated searching are generated.
A method, a system, and a computer program product for generating a refined synthetic data from one or more sources of data. One or more source data are received from one or more data sources. One or more encoded source data are generated from the one or more source data. A synthetic data is generated by decoding one or more encoded source data. One or more variables in the synthetic data are selected and one or more predetermined identifiability values and one or more predetermined anonymity values are associated with them. The generated synthetic data including the selected variables is decoded using associated one or more predetermined identifiability values and one or more predetermined anonymity values. The decoded synthetic data is outputted.
A method, a system, and a computer program product for detecting a diverse set of rare behavior. A time-series data representing one or more actions executed by an entity is received from a plurality of time-series data sources and is processed. A data structure corresponding to the entity, identifying the entity, and including one or more representations of processed time-series data identifying the actions is generated. A current action executed by the entity is detected. Current time-series data corresponding to the current action is received and associated with the data structure. First features are extracted from the generated data structure based on current time-series data and compared to second features extracted for at least another entity to determine difference parameters between first and second features. One or more models are trained using difference parameters, and a score for each action executed by the entity is determined. An action is identified based on the determined scores and the training of the models is updated in response to receiving a feedback data to the identified action, and at least another action is identified. A consistency score is generated for the feedback data.
Systems, machines, methods and products for generating a configured software solution using one or more configuration packages. A decision service may be configured to generate decision data based on a configuration package comprising user-generated input, a collection of configurations, and a decision flow template. The user-generated input may be used for selecting an artifact from an artifact library in a configuration database. The collection of configurations may be infused, dynamically, into the decision flow template. The decision flow template may be exposed for user modification. The decision flow template may be integrated into the configuration package in association with at least one configurable decision element and a user configuration selected from the collection of configurations for specifying one or more parameters in the artifact. The artifact and the user configuration may be combined with the decision flow template to generate the configured software solution.
Systems, methods and products for quantitative translation of design requirements into a machine learning framework for training a classification model. A plurality of auxiliary tasks associated with a plurality of auxiliary task models are specified. The plurality of auxiliary task models are concurrently trained on the auxiliary tasks to generate one or more latent features learned by the plurality of auxiliary task models. The one or more latent features may be transferred from the plurality of auxiliary task models to augment a latent feature space of a target task for the classification model. Contribution levels of the transferred one or more latent features are adjusted based on design requirements for the target task for the classification model. First and second contribution levels are specified for respective first and second sets of auxiliary task latent features being quantified and enforced.
Computer-implemented method and systems to improve training and performance of artificial intelligence (AI) systems having one or more machine learning models stored in one or more data storage mediums connected in at least one computing network is provided. The method comprises receiving student machine scores, generated by a student machine learning model stored in a data storage medium, the student machine learning model having a primary loss function; receiving teacher scores provided by one or more analytic resources, the teacher scores being provided based on known results and behavior of pre-existing machine learning models used for accomplishing a first series of classification objectives; transforming the teacher scores into transformed teacher scores.
Computer-implemented machines, systems and methods for managing missing values in a dataset for a machine learning model. The method may comprise importing a dataset with missing values; computing data statistics and identifying the missing values; verifying the missing values; updating the missing values; imputing missing values; encoding reasons for why values are missing; combining imputed missing values and the encoded reasons; and recommending models and hyperparameters to handle special or missing values.
G06F 17/18 - Opérations mathématiques complexes pour l'évaluation de données statistiques
G06V 10/70 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique
G06F 18/21 - Conception ou mise en place de systèmes ou de techniquesExtraction de caractéristiques dans l'espace des caractéristiquesSéparation aveugle de sources
Systems, methods, and products for detection of selective omissions in an open data sharing computing platform comprises monitoring a plurality of events associated with a first digital record stored in a database of digital records, the first digital record uniquely identifying a first entity; associating a first detected event with a first set of words at least partially descriptive of the first detected event; associating a second detected event with a second set of words at least partially descriptive of the second detected event, the first event and the second event being detected, in response to digital records associated with the first event and the second event being shared over an open data sharing computing platform with express authorization provided by the first entity.
In one aspect, a computer implemented method for translating and executing rules using a directed acyclic graph is provided. The method includes transforming a ruleset into a directed acyclic graph. The directed acyclic graph includes a plurality of nodes and a plurality of branches. The method further includes identifying similarities across the plurality of branches. The method further includes grouping branches of the directed acyclic graph based on the identified similarities. The method further includes creating a modified directed acyclic graph based on the grouping. The method further includes selecting and using a method of processing a group of the modified directed acyclic graph based on an aspect of the group.
G06N 7/08 - Agencements informatiques fondés sur des modèles mathématiques spécifiques utilisant des modèles de chaos ou des modèles de systèmes non linéaires
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
Computer-implemented machines, systems and methods for providing insights about uncertainty of a machine learning model. A method includes determining an uncertainty value associated with a first machine learning model output of a first machine learning model. The method further includes generating a confidence interval for the first machine learning model output associated with an input. The method further includes switching, responsive to the uncertainty value satisfying a threshold, from the first machine learning model to a second machine learning model, the second machine learning model generating a second machine learning model output. The method further includes generating the second machine learning model. The method further includes providing, responsive to the switching, the machine learning output, the uncertainty value, the confidence interval, and the second machine learning output to a user interface.
To eliminating bias from artificial intelligent (AI) systems, a list of class identifiers and features derived from class identifiers represented in training data fed to an AI system are identified for purpose of training a predictive model. Correlation analysis of input features is conducted from a list of raw variables, r, in a dataset and a plurality of derived features, x, with one or more class identifiers in the list of class identifiers and features derived from these class identifiers. A first list of input features is identified, one or more input features are in the first list belonging to and correlated with the one or more class identifiers or features derived from class identifiers. A second list of sets of input features is created to identify a set of combinations of input features that are not allowed to interact based on identifying biased latent features.
In one aspect there is provided a method. The method may include collecting one or more functions that implement the decision logic of a solution. A snapshot of the one or more functions can be generated. The snapshot can executable code associated with the one or more functions. The solution can be deployed by at least storing the snapshot of the one or more functions to a repository Systems and articles of manufacture, including computer program products, are also provided.
G06Q 10/04 - Prévision ou optimisation spécialement adaptées à des fins administratives ou de gestion, p. ex. programmation linéaire ou "problème d’optimisation des stocks"
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
G06F 16/11 - Administration des systèmes de fichiers, p. ex. détails de l’archivage ou d’instantanés
G06F 16/901 - IndexationStructures de données à cet effetStructures de stockage
G06F 40/16 - Apprentissage automatique des règles de transformation, p. ex. au moyen d’exemples
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
G06F 11/16 - Détection ou correction d'erreur dans une donnée par redondance dans le matériel
G06N 5/02 - Représentation de la connaissanceReprésentation symbolique
G06N 5/04 - Modèles d’inférence ou de raisonnement
49.
TEMPORAL EXPLANATIONS OF MACHINE LEARNING MODEL OUTCOMES
In transactional systems where past transactions can have impact on the current score of a machine learning based decision model, the transactions that are most responsible for the score and the associated reasons are determined by the transactional system. A system and method identifies such past transactions that maximally impact the current score and allow for a more effective understanding of the scores generated by a model in a transactional system and explanation of specific transactions for automated decisioning, to explain the scores in terms of past transactions. Further an existing instance-based explanation system is used to identify the reasons for the score, and how the identified transactions influence these reasons. A combination of impact on score and impact on reasons determines the most impactful past transaction with respect to the most recent score being explained.
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Providing temporary use of online non-downloadable software and applications to enable and assist with compliance with regulations against financial crimes; Providing temporary use of online non-downloadable software and applications using artificial intelligence and machine learning for predictive analytics, decision modeling and optimization of business decisions and processes, customer management, and governance and compliance, including regulatory oversight; Providing temporary use of online non-downloadable software and applications that enable complex, large-scale optimizations involving dozens of networked action-effect models, and enables exploration and simulation of many optimized scenarios; Providing temporary use of online non-downloadable software and applications incorporating proprietary mathematical modeling and programming language, an easy-to-use development environment, rapid application development, and a state-of-the-art set of optimization algorithms for use in creating and implementing business decision processes and management; Technical support services, namely, remote and on-site infrastructure management services for monitoring, administration and management of cloud computing IT and application systems; Providing temporary use of on-line non-downloadable cloud computing software for use in business decision processes and management; Consulting in the field of implementation and configuration management for computer software; Design, development and implementation of software; Providing temporary use of online non-downloadable software and applications for user authentication and identity verification; Providing temporary use of online non-downloadable software and applications for identity resolution and social network analysis, namely, software that enables users to engage in real-time searching across their enterprise data to find, match, and link similar entities and uncover hidden relationships between people, places, and things; Providing temporary use of online non-downloadable software and applications to identify medical patients at risk for non-compliance with recommended medical prescriptions and instructions; Providing temporary use of online non-downloadable software and applications for measuring driver risk and safety based on driving behaviors
51.
Method and apparatus for analyzing coverage, bias, and model explanations in large dimensional modeling data
A system and method for analyzing coverage, bias and model explanations in large dimensional modeling data includes discretizing three or more variables of a dataset to generate a discretized phase space represented as a grid of a plurality of cells, the dataset comprising a plurality of records, each record of the plurality of records having a value and a unique identifier (ID). A grid transformation is applied to each record in the dataset to assign each record to a cell of the plurality of cells of the grid according to the grid transformation. A grid index is generated to reference each cell using a discretized feature vector. A grid storage for storing the records assigned to each cell of the grid is then created. The grid storage using the ID of each record as a reference to each record and the discretized feature vector as a key to each cell.
An automated system for detecting risky entity behavior using an efficient frequent behavior-sorted list is disclosed. From these lists, fingerprints and distance measures can be constructed to enable comparison to known risky entities. The lists also facilitate efficient linking of entities to each other, such that risk information propagates through entity associations. These behavior sorted lists, in combination with other profiling techniques, which efficiently summarize information about the entity within a data store, can be used to create threat scores. These threat scores may be applied within the context of anti-money laundering (AML) and retail banking fraud detection systems. A particular instantiation of these scores elaborated here is the AML Threat Score, which is trained to identify behavior for a banking customer that is suspicious and indicates high likelihood of money laundering activity.
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
G06Q 50/26 - Services gouvernementaux ou services publics
A sensitivity index model for predicting the sensitivity of an entity to a potential future disruption can be trained using a process that includes dividing a population of entities for which data attributes are available into matched pairs in a first sub-population and a second sup-population based on matching propensity scores for the entities using supervised machine learning, modeling outcomes for the two sub-populations, using the resultant models to calculate expected performances of the entities under differing conditions, and generating the sensitivity index model using supervised learning techniques based on quantification of differences between the calculated expected performances for the entities.
G06Q 10/06 - Ressources, gestion de tâches, des ressources humaines ou de projetsPlanification d’entreprise ou d’organisationModélisation d’entreprise ou d’organisation
G06Q 30/02 - MarketingEstimation ou détermination des prixCollecte de fonds
G16H 50/20 - TIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour le diagnostic assisté par ordinateur, p. ex. basé sur des systèmes experts médicaux
G06N 20/20 - Techniques d’ensemble en apprentissage automatique
G16H 50/30 - TIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour le calcul des indices de santéTIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour l’évaluation des risques pour la santé d’une personne
G06Q 10/0637 - Gestion ou analyse stratégiques, p. ex. définition d’un objectif ou d’une cible pour une organisationPlanification des actions en fonction des objectifsAnalyse ou évaluation de l’efficacité des objectifs
G06Q 10/0635 - Analyse des risques liés aux activités d’entreprises ou d’organisations
Computer-implemented methods, systems and products for analytics and discovery of patterns or signals. The method includes a set of operations or steps, including collecting data from a plurality of data sources, the data having a plurality of associated data types, and filtering the collected data based on identifying viable data sources from which the data is collected. The method further includes prioritizing discovery objectives based on analyzing the filtering results, and enriching the filtered collected data from viable data sources according to the prioritized discovery objectives. The method further includes extracting one or more signals from the enriched data using one or more machine learning mechanisms in combination with qualified subject matter expertise input, and graphically displaying the extracted signals in a meaningful way to a human operator such that the human operator is enabled to understand importance of extracted signals.
Generating optimal strategies for providing offers to a plurality of customers is described. A plurality of categorical attributes (for example, gender and residential status) and ordinal attributes (for example, risk score and credit line utilization) can be determined. Values of one of more categorical attributes can be changed as per a transition probability table. Some probabilities can be varied to determine a first tradeoff, based on which a first updated strategy can be generated. Further, noise can be added to one or more ordinal attributes. Standard deviation of a noise distribution associated with the noise can be varied so as to determine a second tradeoff, based on which a second updated strategy can be generated. The second updated strategy can be an update of the first updated strategy. Offers can be provided to the plurality of customers in accordance with the second updated strategy.
Computer-implemented machines, systems and methods for providing insights about misalignment in a latent space of a machine learning model. A method includes initializing a second weight matrix of a second artificial neural network based on a first weight matrix from a first artificial neural network. The method further includes applying transfer learning between the first artificial neural network and the second artificial neural network. The method further includes comparing the first latent space with the second latent space. The method further includes determining, responsive to the comparing, a first score indicating alignment of the first latent space and the second latent space. The method further includes determining, and responsive to the first score satisfying a threshold, an appropriateness of the machine learning model.
Systems, methods and computer program products for improving security of artificial intelligence systems. The system comprising processors for monitoring one or more transactions received by a machine learning decision model to determine a first score associated with a first transaction. The first transaction may be identified as likely adversarial, in response to the first score being lower than a certain score threshold and the first transaction having a low occurrence likelihood. A second score may be generated in association with the first transaction based on one or more adversarial latent features associated with the first transaction. At least one adversarial latent feature may be detected as being exploited by the first transaction, in response to determining that the second score falls above the certain score threshold. Accordingly, an abnormal volume of activations of adversarial latent features spanning across a plurality of transactions scored may be detected and blocked.
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Providing temporary use of online non-downloadable software and applications using artificial intelligence and machine learning for predictive analytics, decision modeling and optimization of business decisions and processes, customer management, and governance and compliance, including regulatory oversight; Providing temporary use of online non-downloadable software and applications that enable complex, large-scale optimizations involving dozens of networked action-effect models, and enables exploration and simulation of many optimized scenarios; Providing temporary use of online non-downloadable software and applications incorporating proprietary mathematical modeling and programming language, an easy-to-use development environment, rapid application development, and a state-of-the-art set of optimization algorithms for use in creating and implementing business decision processes and management; Technical support services, namely, remote and on-site infrastructure management services for monitoring, administration and management of cloud computing IT and application systems; Providing temporary use of on-line non-downloadable cloud computing software for use in business decision processes and management; Consulting in the field of implementation and configuration management for computer software; Design, development and implementation of software; Providing temporary use of online non-downloadable software and applications for user authentication and identity verification; Providing temporary use of online non-downloadable software and applications for identity resolution and social network analysis, namely, software that enables users to engage in real-time searching across their enterprise data to find, match, and link similar entities and uncover hidden relationships between people, places, and things; Providing temporary use of online non-downloadable software and applications to identify medical patients at risk for non-compliance with recommended medical prescriptions and instructions; Providing temporary use of online non-downloadable software and applications for measuring driver risk and safety based on driving behaviors; providing temporary use of online non-downloadable software and applications to enable and assist with compliance with regulations against financial crimes.
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
(1) Providing temporary use of online non-downloadable software and applications using artificial intelligence and machine learning for predictive analytics, decision modeling and optimization of business decisions and processes, customer management, and governance and compliance, including regulatory oversight; Providing temporary use of online non-downloadable software and applications that enable complex, large-scale optimizations of business decisions and processes involving dozens of networked action-effect models, and enables exploration and simulation of many optimized scenarios for business decisions and processes; Providing temporary use of online non-downloadable software and applications incorporating proprietary mathematical modeling and programming language, an easy-to-use development environment, rapid application development, and a state-of-the-art set of optimization algorithms for use in creating and implementing business decision processes and management; Technical support services, namely, remote and on-site infrastructure management services for monitoring, administration and management of cloud computing IT and application systems; Providing temporary use of on-line non-downloadable cloud computing software for use in business decision processes and management; Consulting in the field of implementation and configuration management for computer software; Design, development and implementation of software; Providing temporary use of online non-downloadable software and applications for user authentication and identity verification; Providing temporary use of online non-downloadable software and applications for identity resolution and social network analysis, namely, software that enables users to engage in real-time searching across their enterprise data to find, match, and link similar entities and uncover hidden relationships between people, places, and things; Providing temporary use of online non-downloadable software and applications to identify medical patients at risk for non-compliance with recommended medical prescriptions and instructions; Providing temporary use of online non-downloadable software and applications for measuring driver risk and safety based on driving behaviors; providing temporary use of online non-downloadable software and applications to enable and assist with compliance with regulations against financial crimes
Systems, machines, methods and products for generating a configured software solution using one or more configuration packages. A decision service may be configured to generate decision data based on a configuration package comprising user-generated input, a collection of configurations, and a decision flow template. The user-generated input may be used for selecting an artifact from an artifact library in a configuration database. The collection of configurations may be infused, dynamically, into the decision flow template. The decision flow template may be exposed for user modification. The decision flow template may be integrated into the configuration package in association with at least one configurable decision element and a user configuration selected from the collection of configurations for specifying one or more parameters in the artifact. The artifact and the user configuration may be combined with the decision flow template to generate the configured software solution. receiving input for the at least one configurable decision element.
N possible equisized hypercells to estimate a fundamental dimensionality for the dataset; generating one or more samples by assigning a record in the dataset with numbers j through k as set id; generating a merged sample Si, for one or more values of the set id i, where i goes from j to k; and computing a fractal dimension of the equisized hypercube phase space based on count of cells with data coverage of at least one data point.
A predictive analytics system and method in the setting of multi-class classification are disclosed, for identifying systematic changes in an evaluation dataset processed by a fraud-detection model by examining the time series histories of an ensemble of entities such as accounts. The ensemble of entities is examined and processed both individually and in aggregate, via a set of features determined previously using a distinct training dataset. The specific set of features in question may be calculated from the entity's time series history, and may or may not be used by the model to perform the classification. Certain properties of the detected changes are measured and used to improve the efficacy of the predictive model.
In one aspect, there is provided a system. The system may store instructions that result in operations when executed by the at least one data processor. The operations may include receiving raw transactional data, collating, and reading the raw transactional data from the plurality of data sources. The operations may further include randomly sampling the raw transactional data. The operations may further include transforming the raw transactional data into at least one resilient distributed dataset. The operations may further include mapping the at least one resilient distributed dataset with a corresponding unique key. The operations may further include aggregating the at least one resilient distributed dataset on a key field. The operations may further include iterating over a lookup table. The operations may further include aggregating the data lines corresponding to the unique key associated with the at least one resilient distributed dataset. The operations may further include appending in-memory data lines serially to form a consumer level data string.
G06F 16/25 - Systèmes d’intégration ou d’interfaçage impliquant les systèmes de gestion de bases de données
G06F 11/07 - Réaction à l'apparition d'un défaut, p. ex. tolérance de certains défauts
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06Q 30/02 - MarketingEstimation ou détermination des prixCollecte de fonds
64.
Density based confidence measures of neural networks for reliable predictions
Computer-implemented systems and methods for selecting a first neural network model from a set of neural network models for a first dataset, the first neural network model having a set of predictor variables and a second dataset comprising a plurality of datapoints mapped into a multi-dimensional grid that defines one or more neighborhood data regions; applying the first neural network model on the first dataset to generate a model score for one or more datapoints in the second dataset, the model score representing an optimal fit of input predictor variables to a target variable for the set of variables of the first neural network model.
Systems and methods for training a machine learning model implemented over a network configured to represent the machine learning model are provided. At least one or more directed edges connect the one or more nodes an edge representing a connection between a first node and a second node, the second node computing an activation depending on the values of activations on first nodes and values associated with the connections, the connection being either conforming or non-conforming. The machine learning model may be trained by iteratively adjusting parameters w and b, respectively associated with weights and biases associated with edges connecting computational nodes. Connections between nodes may be sparsified by adjusting the parameter w to a first value for non-conforming connections during the training phase to reduce complexity of the connections among the plurality of nodes, or to ensure the input-output function of the network adheres to additional constraints.
A configuration package receives user-generated input that configures a decision service to generate decision data. The configuration package includes artifacts and the user-generated input selects the artifacts from an artifact library in a configuration database. A configured decision service is generated, where the generating includes receiving, by a decision service factory, the configuration package. Also, the decision service factory receives a decision template including configurable decision elements and non-configurable decision elements. Further, the decision service factory receives a user configuration specifying a parameter in the corresponding artifact. The artifact from the configuration package, the user configuration and the decision template are combined to generate the configured decision service. The configured decision service receives, from a client computer, input for each of the configurable decision elements. Based on the received input, the decision data is generated by the configured decision service. The generated decision data is transmitted to the client computer.
Computer-implemented systems and methods for efficiently searching large data volumes for one or more items with a definable degree of similarity. The systems and methods may include functionality directed to selecting at least one token from the one or more tokens in a target item, the token including an identifiable character string defining, fully or partially, at least one of a name, an address, an entity or other identifier associated with the target item; extracting a character from the identifiable character string after the character string is standardized to a known common version of the character string; responsive to a character distribution lookup, determining that the extracted character corresponds to a first shard from among a plurality of discrete shards; and grouping the item into the first shard, the character distribution lookup being adjustable overtime to provide for a balanced distribution of items across the plurality of discrete shards.
G06F 16/30 - Recherche d’informationsStructures de bases de données à cet effetStructures de systèmes de fichiers à cet effet de données textuelles non structurées
G06F 16/28 - Bases de données caractérisées par leurs modèles, p. ex. des modèles relationnels ou objet
G06F 16/2458 - Types spéciaux de requêtes, p. ex. requêtes statistiques, requêtes floues ou requêtes distribuées
Systems and methods for generating concise explanations of scored observations that strike good, and computationally efficient, trade-offs between rank-ordering performance and explainability of scored observations are disclosed. The systems and methods described herein for explaining scored observations are based on a framework of partial dependence functions (PDFs), multi-layered neural networks (MNNs), and Latent Explanations Neural Network Scoring (LENNS).
A data object from a data source is received by a distributed process in a data stream. The distributed process has a sequence of categories, each category containing one or more tasks that operate on the data object. The data object includes files that can be processed by the tasks. If the task is able to operate on the data object, then the data object is passed to the task. If the task is unable to operate on the data object, then the files in the data object are passed to a file staging area of the distributed process and stored in memory. The files in the file staging area are passed, in sequence, from the file staging area to the task that was unable to operate on the data object. The data object is outputted to a next category or data sink after being operated on by the task.
Systems and methods for utilizing an image capture device to scan facial features of a user, responsive to recognition of a plurality of beam projection points on the face of the user. The first data captured from scanning the facial features may be authenticated against a facial depth map stored as a data structure in a data storage medium. In response to successful authentication, the facial features of the user may be continually scanned to detect facial movements indicative of the user's liveness. Access may be granted to the user, in response to verifying the user's liveness.
G06V 40/40 - Détection d’usurpation, p. ex. détection d’activité
G06V 40/60 - Moyens statiques ou dynamiques permettant d’aider l’utilisateur à positionner une partie du corps pour l’acquisition de données biométriques
G06V 40/16 - Visages humains, p. ex. parties du visage, croquis ou expressions
i, and constructing at least one data-driven estimator based on an explanatory statistic, the estimator being represented in a computationally efficient form and packaged with the machine learning model and utilized to provide a definition of explainability for a score generated by the machine learning model.
This document presents multi-layered, self-calibrating analytics for detecting fraud in transaction data without substantial historical data. One or more variables from a set of variables are provided to each of a plurality of self-calibrating models that are implemented by one or more data processors, each of the one or more variables being generated from real-time production data related to the transaction data. The one or more variables are processed according to each of the plurality of self-calibrating models implemented by the one or more data processors to produce a self-calibrating model output for each of the plurality of self-calibrating models. The self-calibrating model output from each of the plurality of self-calibrating models is combined in an output model implemented by one or more data processors. Finally, a fraud score output for the real-time production data is generated from the self-calibrating model output.
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
G06Q 30/018 - Certification d’entreprises ou de produits
G06Q 40/02 - Opérations bancaires, p. ex. calcul d'intérêts ou tenue de compte
73.
Method and system for predicting adherence to a treatment
Data characterizing an individual is received. Thereafter, one or more variables are extracted from the data so that, using a predictive model populated with the extracted variables, a likelihood of the individual adhering to a treatment regimen can be determined. The predictive model is trained on historical treatment regimen adherence data empirically derived from a plurality of subjects. Subsequently, data characterizing the determined likelihood of adherence can be promoted.
G16H 50/50 - TIC spécialement adaptées au diagnostic médical, à la simulation médicale ou à l’extraction de données médicalesTIC spécialement adaptées à la détection, au suivi ou à la modélisation d’épidémies ou de pandémies pour la simulation ou la modélisation des troubles médicaux
G16H 20/10 - TIC spécialement adaptées aux thérapies ou aux plans d’amélioration de la santé, p. ex. pour manier les prescriptions, orienter la thérapie ou surveiller l’observance par les patients concernant des médicaments ou des médications, p. ex. pour s’assurer de l’administration correcte aux patients
G16H 40/63 - TIC spécialement adaptées à la gestion ou à l’administration de ressources ou d’établissements de santéTIC spécialement adaptées à la gestion ou au fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement local
G16H 40/67 - TIC spécialement adaptées à la gestion ou à l’administration de ressources ou d’établissements de santéTIC spécialement adaptées à la gestion ou au fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement d’équipement ou de dispositifs médicaux pour le fonctionnement à distance
G16H 70/20 - TIC spécialement adaptées au maniement ou au traitement de références médicales concernant des pratiques ou des directives
G16H 10/60 - TIC spécialement adaptées au maniement ou au traitement des données médicales ou de soins de santé relatives aux patients pour des données spécifiques de patients, p. ex. pour des dossiers électroniques de patients
G06Q 50/22 - Aide sociale ou assistance sociale, p. ex. activités de développement communautaire ou services de consultation
Computer-implemented machines, systems and methods for providing insights about a machine learning model, the machine learning model trained, during a training phase, to learn patterns to correctly classify input data associated with risk analysis. Analyzing one or more features of the machine learning model, the one or more features being defined based on one or more constraints associated with one or more values and relationships and whether said one or more values and relationships satisfy at least one of the one or more constraints. Displaying one or more visual indicators based on an analysis of the one or more features and training data used to train the machine learning model, the one or more visual indicators providing a summary of the machine learning model's performance or efficacy.
Computer-implemented machines, systems and methods for managing missing values in a dataset for a machine learning model. The method may comprise importing a dataset with missing values; computing data statistics and identifying the missing values; verifying the missing values; updating the missing values; imputing missing values; encoding reasons for why values are missing; combining imputed missing values and the encoded reasons; and recommending models and hyperparameters to handle special or missing values.
G06K 9/62 - Méthodes ou dispositions pour la reconnaissance utilisant des moyens électroniques
G06F 17/18 - Opérations mathématiques complexes pour l'évaluation de données statistiques
G06V 10/70 - Dispositions pour la reconnaissance ou la compréhension d’images ou de vidéos utilisant la reconnaissance de formes ou l’apprentissage automatique
A request to generate a responsibility score is received that characterizes a likelihood of a change in a level of creditworthiness of an individual in response to at least one unknown financial event. Such responsibility score can provide useful insight into a consumer that is complementary to a credit score. Thereafter, a responsibility score is generated based on historical creditworthiness data for the individual using at least one predictive model. The at least one predictive model was trained using historical creditworthiness data of a plurality of consumers subjected to a plurality of financial events. In addition, the at least one predictive model associates the historical creditworthiness data of the individual with matching states for each of a plurality of pre-defined performance behaviors—with each pre-defined performance behavior having at least two corresponding states. The responsibility score can be later provided to a user (e.g., persisted, transmitted, displayed, etc.). Related apparatus, systems, techniques, and articles are also described.
Computer-implemented methods, systems and products for character string frequency analysis. The method includes a set of operations or steps, including parsing a plurality of character strings into one or more tokens, categorizing the one or more tokens into one or more token frequency categories, and generating a first similarity score between one or more pairs of character strings of the plurality of character strings. The method further includes calculating one or more degrees of commonality or rarity of the plurality of character strings based on the categorizing, generating one or more penalties for token pairs of the one or more pairs of character strings associated with the first similarity score based on the one or more degrees of commonality or rarity and the categorizing, and generating a second similarity score based the first similarity score and the one or more penalties.
In transactional systems where past transactions can have impact on the current score of a machine learning based decision model, the transactions that are most responsible for the score and the associated reasons are determined by the transactional system. A system and method identifies such past transactions that maximally impact the current score and allow for a more effective understanding of the scores generated by a model in a transactional system and explanation of specific transactions for automated decisioning, to explain the scores in terms of past transactions. Further an existing instance-based explanation system is used to identify the reasons for the score, and how the identified transactions influence these reasons. A combination of impact on score and impact on reasons determines the most impactful past transaction with respect to the most recent score being explained.
G06Q 40/02 - Opérations bancaires, p. ex. calcul d'intérêts ou tenue de compte
G06Q 10/06 - Ressources, gestion de tâches, des ressources humaines ou de projetsPlanification d’entreprise ou d’organisationModélisation d’entreprise ou d’organisation
G06F 21/32 - Authentification de l’utilisateur par données biométriques, p. ex. empreintes digitales, balayages de l’iris ou empreintes vocales
Systems and methods are provided for allowing a merchant to provide a consumer with a real-time, personalized offer to execute a consumer transaction in response to evaluating that consumer's credential information. The consumer provides the credential information while, or just before, the consumer selects items to purchase from the website. The credential information provided by the consumer can be a compilation of different information associated with the consumer and may take the form of a score. According to one embodiment of the present invention, a merchant receives credential information relating to a consumer, while the consumer is at that merchant's website. The merchant evaluates the credential information while the consumer remains at the website and makes a real-time personalized offer of goods, services or pricing based at least in part on that evaluation.
Systems and methods are provided for accessing a database of records to identify a set of records represented by one or more nodes in a graph model. A connection between a first node and a second node in the one or more nodes is monitored to determine an association between a first record, represented by the first node, and a second record, represented by the second node. The set of records may be partitioned into a plurality of groups. For at least a first group, including a first set of records, it may be determined whether two or more records in the first group are related. In response to determining that the two or more records in the first group are related, a first group identifier may be assigned to the two or more records.
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 16/901 - IndexationStructures de données à cet effetStructures de stockage
81.
Soft segmentation based rules optimization for zero detection loss false positive reduction
A system and method includes soft-segment based rules optimization that can mitigate the overall false positives while maintaining 100% true positive detection. The soft clustering allows real-time re-assignment of an account to a dominate archetype behavior, as well as rule optimization based on a logical order with more relaxation on thresholds for the most inefficient rules is performed within each archetype. The rule optimization provides false positive reduction compared to a baseline rule system. The method can be used to reduce false positives for any rule-based detection system in which the same true positive detection is required.
Systems and methods for mitigating the false positives while maintaining true positive detection are provided. A soft clustering scheme allows real-time re-assignment of an account to a dominate archetype behavior, as well as rule optimization based on a logical order with more relaxation on thresholds for the most inefficient rules is performed within each archetype. The rule optimization provides false positive reduction compared to a baseline rule system. The method can be used to reduce false positives for any rule-based detection system in which the same true positive detection is required.
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
In one aspect, a method for similarity sharding of datatype items is provided. The method includes a set of operations or steps, including parsing a datatype item into one or more tokens, extracting at least one selected token from the parsed datatype item, the at least one selected token comprising a character string including one or more characters. The method further includes standardizing the character string of the at least one selected token, extracting a first character from the one or more characters included in the at least one standardized selected token, and assigning the datatype item to a select shard of a plurality of shards via character distribution lookup based on the extracted first character.
G06F 16/30 - Recherche d’informationsStructures de bases de données à cet effetStructures de systèmes de fichiers à cet effet de données textuelles non structurées
G06F 16/28 - Bases de données caractérisées par leurs modèles, p. ex. des modèles relationnels ou objet
G06F 16/2458 - Types spéciaux de requêtes, p. ex. requêtes statistiques, requêtes floues ou requêtes distribuées
84.
Computer-implemented decision management systems and methods
Computer-implemented decision management systems and methods are provided. The method comprises obtaining information associated with factors usable for making a decision from among a plurality of inter-related decisions represented by a plurality of corresponding nodes. The computing environment provides access to resources that store information about relationships among the plurality of nodes. A relationship may be presentable as an edge connecting at least two nodes from among the plurality of nodes. The strength of the relationship between the at least two nodes is measurable and definable based on associations between the inter-related decisions. A valued may be determined that provides a measure for the strength of the relationship between the at least two nodes based on the information associated with the factors and the information about the relationships among the plurality of nodes.
Systems and methods for efficient association of related entities. The method may comprise accessing a database of records, using a processor, to identify a set of unoptimized entities represented by one or more nodes in a graph model, a connection between a first node and a second node in the one or more nodes representing an association between a first entity represented by the first node and a second entity represented by the second node; determining the first entity is unoptimized; and determining a set of related entities for the unoptimized first entity in the graph model, the graph model having at least one common entity with a corresponding label model.
A system and method for learning and associating reliability and confidence corresponding to a model's predictions by examining the support associated with datapoints in the variable phase space in terms of data coverage, and their impact on the weights distribution. The approach disclosed herein examines the impact of minor perturbations on a small fraction of the training exemplars in the variable phase space on the weights to understand whether the weights remain unperturbed or change significantly.
Computer-implemented machines, systems and methods for providing insights about a machine learning model, the machine learning model trained, during a training phase, to learn patterns to correctly classify input data associated with risk analysis. Analyzing one or more features of the machine learning model, the one or more features being defined based on one or more constraints associated with one or more values and relationships and whether said one or more values and relationships satisfy at least one of the one or more constraints. Displaying one or more visual indicators based on an analysis of the one or more features and training data used to train the machine learning model, the one or more visual indicators providing a summary of the machine learning model's performance or efficacy.
G06N 5/04 - Modèles d’inférence ou de raisonnement
G06Q 10/06 - Ressources, gestion de tâches, des ressources humaines ou de projetsPlanification d’entreprise ou d’organisationModélisation d’entreprise ou d’organisation
A computer-implemented method and system for generating a confidence indicia. The method may comprise obtaining information associated with one or more items available for selection via user interaction with a user interface of a computing device operating in a computing environment. The computing environment may provide access to resources that store information about at least one characteristic of the one or more items, wherein the at least one characteristic is measurable. A trust level may be determined for a characteristic of a first item based on one or more computerized processes that evaluate values associated with the at least one characteristic across a set of values provided by users who considered at least one of the first item or a second item. The first item and the second item are associable according to a common category.
A computer-implemented fraud detection method and system for periodically identifying network associations in a consumer population at a national credit reporting agency and computing associated network level variables related to credit use and potential first party fraud for the consumer population. In response to receiving a request for a target account from among the consumer population the computer-implemented system retrieves credit report for the target account and computes tradeline or account level variables related to credit use and potential fraudulent behavior. A fraud score is calculated based on a combined evaluation of the network level variables and the tradeline or account level variables.
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
G06Q 40/02 - Opérations bancaires, p. ex. calcul d'intérêts ou tenue de compte
G06F 16/901 - IndexationStructures de données à cet effetStructures de stockage
90.
Building resilient models to address dynamic customer data use rights
A system and method for constructing an improved computing model that preserves use rights for data utilized by the model. A first dataset is accessed to build a computing model. The first data set is subject to terminable usage rights provisions. A portion of the first dataset is sampled to generate a second dataset. Vectors present in the first dataset and the second dataset are discretized. In response to determine that the usage rights associated with the primary dataset have been terminated, a coverage depletion for the second dataset is computed based on the usage rights termination associated with the first dataset. An estimated mean time to coverage failure for the first model based on the depletion coverage is determined for the second dataset. One or more data points are removed from the first dataset due to the termination of usage rights.
G06F 21/10 - Protection de programmes ou contenus distribués, p. ex. vente ou concession de licence de matériel soumis à droit de reproduction
G06F 21/62 - Protection de l’accès à des données via une plate-forme, p. ex. par clés ou règles de contrôle de l’accès
G06Q 10/04 - Prévision ou optimisation spécialement adaptées à des fins administratives ou de gestion, p. ex. programmation linéaire ou "problème d’optimisation des stocks"
G06Q 30/0201 - Modélisation du marchéAnalyse du marchéCollecte de données du marché
91.
System and method for executing consumer transactions based on credential information relating to the consumer
Systems and methods are provided for allowing a merchant to provide a consumer with a real-time, personalized offer to execute a consumer transaction in response to evaluating that consumer's credential information. The consumer provides the credential information while, or just before, the consumer selects items to purchase from the website. The credential information provided by the consumer can be a compilation of different information associated with the consumer and may take the form of a score. According to one embodiment of the present invention, a merchant receives credential information relating to a consumer, while the consumer is at that merchant's website. The merchant evaluates the credential information while the consumer remains at the website and makes a real-time personalized offer of goods, services or pricing based at least in part on that evaluation.
A computer-implemented method for risk assessment and providing refinements to credit risk analysis based on a variety of information, including information voluntarily contributed by an applicant. The method may comprise performing a risk analysis based on a first set of information available in at least a first credit information data source and receiving a second set of information, in response to determining that the analysis provides a first result that is unfavorable to the applicant. The second set of information may be unavailable in the at least first credit information data source, and the second set of information may be retrievable from at least a secondary data source after the applicant's informed interaction with a computer-implemented interface configured to verify an authenticated approval by the applicant to provide access to information associated with at least one of the applicant's financial accounts.
A computer-implemented method for risk assessment and providing refinements to credit risk analysis based on a variety of information, including information voluntarily contributed by an applicant. The method may comprise performing a risk analysis based on a first set of information available in at least a first credit information data source and receiving a second set of information, in response to determining that the analysis provides a first result that is unfavorable to the applicant. The second set of information may be unavailable in the at least first credit information data source, and the second set of information may be retrievable from at least a secondary data source after the applicant's informed interaction with a computer-implemented interface configured to verify an authenticated approval by the applicant to provide access to information associated with at least one of the applicant's financial accounts.
G06Q 40/02 - Opérations bancaires, p. ex. calcul d'intérêts ou tenue de compte
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
A computer-implemented method for technologically improving a computer-implemented machine-learning model, the method comprising receiving, by a model, at least a first data record; generating a first score representing a first likelihood that the first data record is associated with a first classification, in response to feedback received from one or more data sources communicating with at least one computing system on which the model is implemented; generating a second score to represent a second likelihood that the first data record is associated with the first classification, in response to the first score being higher than a threshold value.
G06Q 10/04 - Prévision ou optimisation spécialement adaptées à des fins administratives ou de gestion, p. ex. programmation linéaire ou "problème d’optimisation des stocks"
G06Q 10/06 - Ressources, gestion de tâches, des ressources humaines ou de projetsPlanification d’entreprise ou d’organisationModélisation d’entreprise ou d’organisation
95.
False positive reduction in abnormality detection system models
The subject matter disclosed herein provides methods, apparatus, systems, techniques, and articles for false positive reduction in abnormality detection models. A date and time of a first transaction by a transaction entity and associated with a transaction characteristic can be stored. Data representing subsequent transactions associated with the transaction characteristic can be stored. A history marker profile specific to the transaction characteristic and transaction entity can be generated and can include the transaction characteristic, the date and time of the first transaction, and maximum and mean abnormality scores. A date and time of a current transaction can be determined. A current abnormality score for the current transaction can be received. A tenure of the observed transaction characteristic can be computed. The current abnormality score can be recalibrated from the transaction entity abnormality detection system according to the maximum, mean, and current abnormality scores and a length of the tenure.
G06Q 20/40 - Autorisation, p. ex. identification du payeur ou du bénéficiaire, vérification des références du client ou du magasinExamen et approbation des payeurs, p. ex. contrôle des lignes de crédit ou des listes négatives
G06Q 20/38 - Protocoles de paiementArchitectures, schémas ou protocoles de paiement leurs détails
G06Q 10/04 - Prévision ou optimisation spécialement adaptées à des fins administratives ou de gestion, p. ex. programmation linéaire ou "problème d’optimisation des stocks"
96.
Devices and methods for efficient execution of rules using pre-compiled directed acyclic graphs
In one aspect, a computer implemented method for translating and executing rules using a directed acyclic graph is provided. The method includes transforming a ruleset into a directed acyclic graph. The directed acyclic graph includes a plurality of nodes and a plurality of branches. The method further includes identifying similarities across the plurality of branches. The method further includes grouping branches of the directed acyclic graph based on the identified similarities. The method further includes creating a modified directed acyclic graph based on the grouping. The method further includes selecting and using a method of processing a group of the modified directed acyclic graph based on an aspect of the group.
G06N 7/08 - Agencements informatiques fondés sur des modèles mathématiques spécifiques utilisant des modèles de chaos ou des modèles de systèmes non linéaires
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
In one aspect, a computer implemented method for efficient value lookup in a set of scalar intervals is provided. The method includes determining, in response to a query for a scalar value, that the scalar value is located in a set of scalar intervals, wherein each of the scalar intervals comprises a left bound and a right bound. The method further includes sorting the scalar intervals based on left bounds. The method further includes comparing, in response to the sorting, a pair of scalar intervals to determine if the pair of scalar intervals overlaps. The method further includes identifying, based on the comparing indicating that the pair overlaps, a method of processing the scalar intervals.
G06N 5/00 - Agencements informatiques utilisant des modèles fondés sur la connaissance
G06N 7/08 - Agencements informatiques fondés sur des modèles mathématiques spécifiques utilisant des modèles de chaos ou des modèles de systèmes non linéaires
G06N 5/02 - Représentation de la connaissanceReprésentation symbolique
G06F 16/901 - IndexationStructures de données à cet effetStructures de stockage
Computer-implemented methods, systems and products for analytics and discovery of patterns or signals. The method includes a set of operations or steps, including collecting data from a plurality of data sources, the data having a plurality of associated data types, and filtering the collected data based on identifying viable data sources from which the data is collected. The method further includes prioritizing discovery objectives based on analyzing the filtering results, and enriching the filtered collected data from viable data sources according to the prioritized discovery objectives. The method further includes extracting one or more signals from the enriched data using one or more machine learning mechanisms in combination with qualified subject matter expertise input, and graphically displaying the extracted signals in a meaningful way to a human operator such that the human operator is enabled to understand importance of extracted signals.
G06F 7/00 - Procédés ou dispositions pour le traitement de données en agissant sur l'ordre ou le contenu des données maniées
G06Q 10/06 - Ressources, gestion de tâches, des ressources humaines ou de projetsPlanification d’entreprise ou d’organisationModélisation d’entreprise ou d’organisation
G06F 16/9038 - Présentation des résultats des requêtes
G06F 16/9035 - Filtrage basé sur des données supplémentaires, p. ex. sur des profils d'utilisateurs ou de groupes
One or more datasets are received by a data wrangling module and wrangled into a form that is computationally actionable by a user. At least some data from the one or more datasets are enriched by one or more data enrichment modules to generate an enriched form of at least some data corresponding to the one or more datasets that is computationally actionable by the user. The one or more datasets and the enriched form of the at least some data are processed by a signal detection module to identify relationships, anomalies, and/or patterns within the one or more datasets.
G06F 7/00 - Procédés ou dispositions pour le traitement de données en agissant sur l'ordre ou le contenu des données maniées
G06F 16/25 - Systèmes d’intégration ou d’interfaçage impliquant les systèmes de gestion de bases de données
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
100.
System and method for round trip engineering of decision metaphors
A testing framework associated with a decision metaphor model tool reads table profile files to generate requests for a test of a decision metaphor. The testing framework sends the requests for the test to a decision engine and receives responses for the requests for comparison against expected values and possible errors. The testing framework also outputs an output file that includes a result of the test, where the output file is formatted in a computer-displayable and user-readable graphical format.