In a database system, wherein data is stored as objects within an object storage system, a system and method for estimating object cardinality, determining query execution plan costs, and selecting a query plan for execution by the database system. Multiple object cardinality estimation approaches for estimating the number of objects to be accessed for a given query condition on a column of a relation composed of a set of objects, where each object maintains the minimum value and the maximum value of individual columns are presented. A set of global statistics is also maintained, consisting of the total number of objects and the minimum and maximum values of individual columns. The object cardinality estimation is determined based on the global statistics without retrieving individual object-level statistics.
In some examples, a database system includes processing modules with access to a remote object store and a local database storage associated with the database system. The processing modules perform a database operation that involves use of a plurality of instances of spool data. The processing modules store a first instance of spool data in the remote object store based on a first characteristic of the first instance of spool data, and the processing modules store a second instance of spool data in the local database storage based on a second characteristic of the second instance of spool data.
Various techniques may be employed in a system, method, and computer-readable medium to allow declarative database syntax language to accommodate matrix multiplication.
A system includes a plurality of processing nodes. at least one processing node of the plurality of processing nodes receives a user-defined function. The at least one processing node scans source code of the user-defined function. The at least one processing node, in response to identification of at least one of a plurality of predetermined conditions in the user-defined function during the scan, requires that the UDF is executed at a secure server outside of the plurality of processing nodes.
A system may include a storage device. The system may further include a plurality of processing node in communication with the storage device. At least one processing node of the plurality of processing nodes may receive a data set from a data source. The at least one processing node may execute a model on the received data set to generate a vector embeddings array representative of the received data. The at least one processing node may identify temporal data associated with the vector embeddings array. The at least one processing node may store the vector embeddings array with the associated temporal data in the storage device. A method and computer-readable medium are also disclosed.
In a cloud database system employing multiple types of storage, such as external object store, managed object store. block storage, and compute node memory, each type of storage having different kinds of file organization, different types of data organization, different forms of storage access, and different latency and throughput costs, a system and method for caching different data transformations created during query executions involving different data stores. Transformed versions of data read from external object storage are saved to a multi-layered warehouse cache for use in subsequent query executions.
In some examples, a database system receives a database query. The database system computes a threshold based on sizes of objects, and invokes a distribution process that accounts for data skew to distribute the objects of the object store to processing engines. The distribution process includes determining whether an assignment of a first object to a given processing engine causes a load of the given processing engine to exceed the threshold. In response to a determination that the load of the given processing engine exceeds the threshold, the distribution process divides the first object into object parts and distribute the object parts among one or more processing engines. In response to a determination that the load of the given processing engine does not exceed the threshold, the distribution process assigns the first object to the given processing engine.
A query is received. It is determined that the query does not fit a profile for a run-the-business set of queries, where the profile for the run-the-business set of queries excludes queries that are not routine parts of running a business and that do not require priority processing. The query is executed with a dynamically-created compute capacity that is not part of a compute capacity used to run the run-the-business set of queries.
A SQL query performs a function. The SQL query includes a SQL operator that has two input relations. The first input relation is a script relation having a plurality of script records. Each script record includes a transformation field, the contents of which specify a transformation to be performed by the SQL operator. The second input relation is a parameter relation having a plurality of parameter records. Each parameter record includes a data-to-process field that identifies data to be processed by the transformation specified in the transformation field of a selected script record. The selected script record is determined by a mapping. The SQL operator has one output relation having a plurality of output records. Each output record contains the result of transformation specified in a respective selected script record using the data to be processed identified in the data-to-be-processed field in a respective selected parameter record.
A computer system executes a database management system (DBMS). The DBMS manages a database comprised of DBMS resources. The DBMS receives a request to be executed. The request is a DBMS action to be executed using the DBMS resources. The request includes a predicate specifying a maximum cost for executing the request, and a deadline, specifying a deadline by which the request is to be completed in its execution. The DBMS determines a plurality of workloads under which the request is qualified to execute. Each workload of the plurality of workloads includes a respective set of requests that have common characteristics. Each workload of the plurality of workloads includes a respective cost criterion and a respective elapsed time criterion. The DBMS selects a selected workload from among the plurality of workloads. The selected workload has a selected cost criterion and a selected elapsed time criterion. The DBMS begins execution of the request using the selected workload.
09 - Scientific and electric apparatus and instruments
37 - Construction and mining; installation and repair services
41 - Education, entertainment, sporting and cultural services
42 - Scientific, technological and industrial services, research and design
Goods & Services
Business consulting in the field of data warehousing and data analytics Recorded and downloadable database management system software; recorded and downloadable computer software and computer hardware for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced analytics, and data and marketing analytics; recorded and downloadable computer software and computer hardware for data security and recovery; recorded and downloadable computer software and computer hardware for capturing, storing, analyzing and managing data across multiple data platforms; recorded and downloadable computer software for data analysis; recorded and downloadable computer software for cloud computing; data processing equipment; providing on-line, downloadable electronic publications in the nature of articles, newsletters, case studies, and white papers in the field of data warehousing and data analytics Installation, maintenance and repair of equipment and computer systems; information, advice and consultancy in relation to the aforesaid services Providing courses, webinars, and training in the fields of data warehousing and data analytics; providing online, non-downloadable electronic publications in the nature of articles, newsletters, case studies, and white papers in the field of data warehousing and data analytics; online journals, namely, blogs in the field of data warehousing and data analytics; providing online, non-downloadable videos in the field of data warehousing and data analytics Technical consulting services in the fields of data warehousing, data analytics, public and private cloud computing solutions, and evaluation and implementation of data architecture and technology; technology consultation in the field of artificial intelligence; computer software consultancy; consulting services in the field of cloud computing; providing temporary use of non-downloadable software for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced analytics, and data and marketing analytics; software as a service (SAAS), platforms as a service (PAAS), and infrastructure as a service (IaaS), all featuring database management system software; software as a service (SAAS), platforms as a service (PAAS), and infrastructure as a service (IaaS), all featuring software for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced data analytics, and data and marketing analytics; data warehousing; data mining; technical support services, namely, troubleshooting in the nature of diagnosing computer hardware and software problems; maintenance and repair of computer software; information, advice and consultancy in relation to the aforesaid services
12.
Discovering candidate referential integrities in a database
A database system enumerates one-column candidate referential integrities (1CRIs) from a plurality of input columns in one or more relations. The database system applies one or more disqualification tests to the 1CRIs to eliminate illegitimate 1CRIs resulting in a list of non-disqualified 1CRIs, wherein the disqualification tests are applied to an 1CRI being tested (hereinafter (A*,B*), A* representing a set of values of a referenced column or columns and B* representing a set of values of a referencing column or columns) until (A*,B*) is disqualified or until all of the disqualification tests have been executed and (A*,B*) has not been disqualified, in which case (A*,B*) is added to the list of non-disqualified 1CRIs, wherein each of the disqualification tests reduces the likelihood of incorrectly adding (A*,B*) to the list of non-disqualified 1CRIs.
A database system receives a query that includes a reference to a foreign table. The foreign table is used to access an Object Store (OS) outside the database system. The OS stores objects. The objects have path names, which are pointers to the objects. When the foreign table was created one or more wildcards were used to specify the path names for the objects in the OS to be accessed by the query. The database system directing the OS to provide a list containing the path names of the objects in the OS. The database system receiving the list and applying the one or more wildcards to identify the path names of the objects to be accessed by the query. The database system producing a result by executing the query, accessing the objects in the OS identified by the path names of the objects to be accessed by the query.
A data store system may include a storage device configured to store a plurality of data store tables and may include a processor in communication with the storage device. The processor may receive a plurality of requests. For each request, the processor may: (1) determine an associated workload type for the request; (2) determine a first respective rate at which the request is to be released for scheduling of execution; and (3) release the request for scheduling of execution based on the first respective rate. For each released request, the processor may: (1) determine a second respective rate based on the associated workload type at which each released request is scheduled to be executed; and (2) in response to execution being scheduled for a released request, execute the released request. A method and computer-readable medium are also disclosed.
A system and method for caching data objects retrieved from a network object store or cloud storage remotely accessible by a database management node. Retrieved data objects are stored within the database management node in a cache memory having multiple cache zones providing different input/output (I/O) latencies with respect to cache data access. Retrieved data objects are placed within the cache zones in accordance with access and storage costs associated with the retrieved data objects, wherein data objects having higher associated costs are placed in cache zones having lower I/O latencies. The costs associated a data object may be determined from object store vendor costs, object store storage tier levels, locations of the data management node and the object store, method of connection to the object store, or read from a pricing matrix containing predetermined object costs associated with stored data objects.
A system may include a storage device. The system may include a plurality of processing nodes. The plurality of processing nodes communicates with the storage device. At least one processing node schedules a group of compute nodes to be active during a selected time window. The at least one processing node receives a query and determines that the query is to be executed by one of the plurality of processing nodes and the group of compute nodes. The at least one processing node schedules the query to be executed by the determined one of the plurality of processing nodes or the group of compute nodes. A method and computer-readable medium are also disclosed.
A system may include a storage device. The storage device may store a plurality of user-defined functions (“UDFs”). Each of the plurality of UDFs may be containerized to allow each UDF to be executed using content unshared with other UDFs. The storage device may also include a plurality of data objects. The system may further include a plurality of processing nodes. At least one processing node may receive a call to execute one of the plurality of UDFs on at least one of the plurality of data objects. The at least one processing node may execute the called UDF on the at least one of the plurality of data objects. A method and computer-readable medium are also disclosed.
In some examples, a system receives delimiter separated value (DSV) data, and categorizes a character in the DSV data into a selected layer of a plurality of layers, where characters in a first layer of the plurality of layers comprise data characters, characters in a second layer of the plurality of layers comprise delimiters, and characters in a third layer of the plurality of layers comprise grouping symbols to group a string of characters into a semantic unit. The system parses the DSV data according to the categorizing.
In a cloud database system, a system and method for analyzing query workloads on installed customer systems and generating tiered offers promoting higher query execution speeds in the form of better response times for a selected portion of queries in exchange for a higher price. Upon selecting an offer, the cloud database system is automatically configured to include additional compute resources as required to execute future instances of the selected queries to take advantage of the performance improvements provided with the selected offer.
An apparatus, method and computer program product for query optimization in a Relational Database Management System (RDBMS), wherein an optimizer accesses a query expression repository (QER) storing planning and execution information for QEs from previous queries, wherein the QEs comprise table relations, intermediate results and/or final results of operations in the previous queries. Additionally, dynamic join indexes representing QE results are created for high-value QEs selected from the QER and maintained within a DJI repository. During query plan creation for a current or subsequent query, the optimizer searches the QER and DJI repository for DJIs created for high-value QEs corresponding to QEs contained in the current or subsequent query. DJIs corresponding to the matching QEs are used in the query planning phase to rewrite the current or subsequent user query so that stored QE results are used to answer QEs contained in the current or subsequent query.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable computer software for data access, data processing, data management, data monitoring, and data movement across multiple data sources Providing temporary use of non-downloadable software for data access, data processing, data management, data monitoring, and data movement across multiple data sources; software as a service (SAAS) featuring computer software for data access, data processing, data management, data monitoring, and data movement across multiple data sources
22.
Artificial intelligence-based resource management of computing systems and complex database systems
Artificial Intelligence-based (AI-based) modeling can be used to predict “Critical Times” when “bottlenecks” in a processing of data would occur. Moreover, for each one of the predicted Critical Times, it can be determined which one of multiple Computing Resources would cause the bottleneck, so that more precise measures can be taken and taken before a Critical Time, in an effort to prevent bottlenecks from happening in computing systems, especially more complex database systems with more demeaning service needs and requirements.
In some examples, a system performs a delimiter identification process that includes identifying candidate record delimiters and candidate field delimiters in the input data, and providing different pairs of candidate record delimiters and candidate field delimiters. For each respective pair of the different pairs, the system identifies records using the corresponding candidate record delimiter of the respective pair, and computes a collection of measures including a measure indicating a quantity of unique fields observed in the records identified using the corresponding field delimiter of the respective pair. The system selects, based on values of the collection of measures computed for corresponding pairs of the different pairs, a record delimiter and a field delimiter in a pair of the different pairs.
A database system receives a query. The database system retrieves an old query execution plan (QEP), OldPlan, for the query. The database system submits the query to an optimizer. The optimizer returns a new QEP, NewPlan, for the query. The database system submits the OldPlan and the NewPlan to a machine learning classifier (ML classifier). The ML classifier predicts that executing the NewPlan will result in a performance regression as compared to executing the OldPlan. The database system executes the OldPlan instead of the NewPlan.
In some examples, a database system accesses a plurality of objects in a remote object store. In response to a query to change data in a first object of the plurality of objects, the database system specifies the first object prior to the change as a first version of the first object, and creates a second version of the first object after the change. The database system maintains metadata identifying unmodified objects of the plurality of objects, and during a garbage collection process when deciding whether to remove a given object of the plurality of objects, accesses the metadata to determine whether the given object has been modified, and prevents removal of the given object in response to determining that the given object is unmodified.
In some examples, a system receives an input graph representation of one or more query plans for one or more database queries, and generates, by an embedding machine learning model based on the input graph representation, a feature vector that provides a distributed representation of the one or more query plans. The system determines, using the feature vector, one or more user behaviors and/or workload characteristics of one or more workloads in one or more database systems.
A method, apparatus and computer program product for estimating resource consumption for steps in a query execution plan for a query performed by a relational database management system (RDBMS) in a computer system. Past execution data for the steps are used to train a machine learning (ML) model and its model parameters to predict execution times for the steps. A prediction module comprised of the ML model configured by the model parameters predicts an execution time for a current step of the query execution plan for the query, based on current step information and current system load. A boosting module boosts the current step either up or down for processing by the RDBMS to meet a service level goal (SLG) for the query, based on the predicted execution time for the current step, as well as an elapsed query time, a query SLG time, and/or a query CPU time.
A multi-parameter data type framework can, among other things, provide a more comprehensive, systematic, and/or formal mechanisms for determining an appropriate data type for a data set. For example, the multi-parameter data type framework can be used to allow analytic tools to virtually automatically figure out an appropriate data type for a set of data values.
A system and method for extending compression-aware aggregation logic to column partitioned database sources when an SQL query involves simple or complex aggregate expressions. The logic can be applied when there are multiple fields specified in a Group By clause, when a Group By clause includes an expression involving multiple columns from a column partitioned table, or when there is no Group By clause in the query. This logic extends the benefits of push-down aggregation to complex aggregate queries to build partially aggregated rows that can be directly added into an intermediate cache. For cases where the fields within aggregate expressions are themselves compressed, the aggregation techniques leverage the compression information of the aggregate fields. This aggregation mechanism can be applicable to compression techniques including run-length encoding (RLE), value list compression (VLC) and Presence, Delta on Mean (PDM) on columnar source tables such as Column Partitioned (CP) or Parquet tables.
A method and apparatus for optimizing a query in a relational database management system (RDBMS) when a predicate on a data column in the query has a correlation to a partitioning attribute of a partitioning column in data retrieved from a cloud-based store, wherein the optimizing uses the correlation between the data column in the query to the partitioning column in the data retrieved from the cloud-based store for data elimination when processing the query. The correlation is defined in a formula or lookup data structure that maps or range-maps from the data column to the partitioning column.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable and recorded computer software for enterprise wide business intelligence Platform as a service (PaaS) for enterprise wide business intelligence; providing online non-downloadable software for capturing, storing, analyzing and managing business intelligence
A database system includes a storage medium to store a semi-materialized view (MV) defined on an MV condition, the semi-MV including metadata containing references to objects containing data of one or more tables that satisfy the MV condition, the objects stored in a remote data store that is coupled to the database system over a network. The database system includes at least one processor to receive a query including a query condition, determine that the semi-MV can be used to satisfy the query based on the MV condition and the query condition, and use the metadata in the semi-MV to retrieve data of the objects in the remote data store for the query.
A system may include a storage device configured to persistently store a plurality of data elements. The system may further include a processor in communication with the storage device. The processor may receive a data element. The processor may further identify contents of the data element. The processor may further create a data structure indicative of the contents of the data element. The processor may further store the data structure in the storage device. A method and computer-readable medium are also disclosed.
A method, apparatus, and computer program product for executing a relational database management system (RDBMS) in a computer system, wherein the RDBMS manages a relational database comprised of one or more tables storing data. The RDBMS executes a query with a semi-join operation comprising an inclusion join and/or an exclusion join performed against at least an outer table and an inner table, wherein the inclusion join returns a row from the outer table when there is a match with a row in the inner table, and the exclusion join returns a row from the outer table when there is no match with a row in the inner table. The RDBMS performs a rewrite of the query to avoid spooling and/or sorting of the inner table, when the inner table is larger than the outer table and a cost after the rewrite is lower than before the rewrite.
Improved techniques for management of access in computing environments and systems are disclosed. An object-level data access mechanism can be provided. to effectively provide an object-level locking mechanism for locking data objects of database tables, individually, as individual data objects. Furthermore, the object-level data access mechanism can be provided as a safe and efficient filtering mechanism (e.g., cuckoo filter) that effectively provide an object-level locking mechanisms for locking data objects of a database table, individually (i.e., as individual locks placed on individual data objects). For example, a set of filters (e.g., write cuckoo and read cuckoo) can be provided for a database table to facilitate concurrent database operations in a safe but efficient manner.
A data store system may include at least one storage device to store a plurality of data and at least one processor with access to the storage device. The at least one processor may receive a plurality of features associated with an environment. The at least one processor may further generate a state representation of the environment based on the plurality of features. The at least one processor may further generate a plurality of predicted future states of the environment based on the state representation. The at least one processor may further generate at least one action to be performed by the environment based on the plurality of predicted future states. The at least one processor may provide the at least one action to the environment to be performed. A method and computer-readable medium are also disclosed.
A data store system may include a storage device configured to store a plurality of data store tables. The data store may further include a plurality of processing units. At least one processing unit from the plurality of processing units may receive an analytic function call. The at least one processing unit may further identify, in the analytic function call, at least one column of a data store table on which to execute an analytic function in the analytic function call and may further identify, in the analytic function call, an identifier column of the data store table. Each row of the at least one column may be associated with a common row value of the identifier column. The at least one processing unit may further identify, in the analytic function call, at least one index column of the data store table. Each value in each at the least one index column may identify an index value on which to index each value of the at least one column with respect to each value of the identifier column. The at least one processing unit may further order values of the at least one column in accordance with the identifier column and the at least one index column, execute the analytic function on the ordered values to generate a result set, and order the result set in accordance with the identifier column and the at least one index column. A computer-readable medium and method are also disclosed.
A cardinality of a query is estimated by creating a join plan for the query. The join plan is converted to a graph representation. A subtree graph kernel matrix is generated for the graph representation of the join plan. The subtree graph kernel matrix is submitted to a trained model for cardinality prediction which produces a predicted cardinality of the query.
An imaging system may include a housing having shape and size sufficient to receive an industrial tool inserted into the housing. The imaging system may further include a plurality of cameras and a plurality of light sources positioned within the housing in a manner to surround the industrial tool upon insertion of the industrial tool into the housing. The imaging system may include a processing unit to control operation of the cameras and light sources and adjust relative positions of the cameras and light sources in relation to the industrial tool to capture a plurality of images of relevant portions of the industrial tool. The plurality of images collectively reveals substantially all of the relevant portions of the industrial tool. A method and computer-readable medium are also disclosed.
A database system receives a query and determines that the query includes an inner join between a parent table and a child table. The database system determines that the following relationships exists between the parent table and the child table: referential integrity (“RI”) between a primary key attribute (pk) in the parent table and a foreign key attribute (fk) in the child table and a temporal relationship constraint (“TRC”) between a period attribute in the parent table and a TRC-attribute in the child table. The database system determines that the query satisfies non-temporal join elimination conditions and temporal join elimination conditions and that the query contains no other qualification conditions on the parent table's period attribute and eliminates the inner join when planning execution of the query.
A data store system may include at least one storage device to store a plurality of data and at least one processor with access to the storage device. The at least one processor may receive a plurality of features associated with an environment. The at least one processor may further generate a state representation of the environment based on the plurality of features. The at least one processor may further generate a plurality of predicted future states of the environment based on the state representation. The at least one processor may further generate at least one action to be performed by the environment based on the plurality of predicted future states. The at least one processor may provide the at least one action to the environment to be performed. A method and computer-readable medium are also disclosed.
A security mechanism, e.g., a computing system, security server, can effectively serve as a centralized security mechanism, e.g., a computing system, security server, for an ecosystem that can include diverse clients and servers. The security mechanism can obtain redirected requests for services, authenticate credentials of a client and generate a (client-side) token that can be provided by the client to the server for verification of the identity of the client. The security mechanism can also obtain a token from a server that can be similar to a (client-side) token provided to a client and then generate a (server-side) token that can be provided to a server. The server-side token can include authorization information that allows access to one or more services of one or more other servers.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Downloadable computer software for data warehousing, data management, data mining, data monitoring, data retention, advanced analytics, and data and marketing analytics; downloadable computer software for capturing, storing, analyzing and managing data across multiple data platforms; downloadable computer software for big data analysis; downloadable electronic publications in the nature of e-books, best practice guides, data sheets, reports, articles, and white papers in the field of big data analytics Cloud computing featuring data platform software for data warehousing, data management, data mining, data monitoring, data retention, advanced data analytics, and data and marketing analytics; platform-as-a-service featuring computer software platforms for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced data analytics, and data and marketing analytics
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Computer hardware and downloadable and recorded computer software for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced analytics, and data and marketing analytics; computer hardware and downloadable and recorded computer software for capturing, storing, analyzing and managing data across multiple data platforms; downloadable and recorded computer software for big data analysis Cloud computing featuring data platform software for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced data analytics, and data and marketing analytics; platform-as-a-service featuring computer software platforms for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced data analytics, and data and marketing analytics
42 - Scientific, technological and industrial services, research and design
Goods & Services
Cloud computing featuring data platform software for data warehousing, data management, data mining, data monitoring, data retention, advanced data analytics, and data and marketing analytics; platform-as-a-service featuring computer software platforms for data warehousing, data management, data mining, data monitoring, data optimization, data retention, advanced data analytics, and data and marketing analytics
In a database system, at least one metric associated with resources in a database system used by multiple classes of requests is monitored, where a first of the multiple classes is associated with a lower priority than a second of the multiple classes. A throttle limit is calculated for requests of the first class, based on the monitored metric. The calculated throttle limit is used to determine scheduling of the request of the first class for execution.
A data store system may include a storage device configured to store a plurality of data store tables. The data store system a further include a processor in communication with the storage device. The processor may receive a request to encode a column of a data store table from the plurality of data store tables. The processor may further generate a bit value representation of each value in the column of the data store table. The processor may further generate an index. The index may include an index value representative of each bit position of the bit value representations. The processor may further reorder bits of each bit value representation according to a predetermined pattern. The processor may further encode each reordered bit value representation according to an encoding technique. The processor may further store each encoded reordered bit value representations and the index. A method and computer-readable medium are also disclosed.
G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
H03M 7/46 - Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 16/21 - Design, administration or maintenance of databases
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
49.
Run time memory management for computing systems, including massively parallel database processing systems
Improved techniques for management of memory (or memory management) for computing systems and environments are disclosed. The improved techniques are especially well suited for computing systems that operate in highly complex and/or demanding computing environments (e.g., massively parallel database systems that may be required to process many complex database queries in parallel. Memory can be managed dynamically at run time to determine and designate one of multiple memories that are available for execution of executable components (e.g., database queries, Opcodes of a Virtual Machine). In addition, memory can be managed dynamically at run time to effectively reuse memory locations of a memory (e.g., stack memory) being used for execution of one or more executable components (e.g., Opcodes of a Virtual Machine) at run time when the memory is being actively used to execute the one or more executable components.
A database system receives a query to be processed. The database system has resources. A user assigns the query to a tier of resource allocation priorities in a hierarchy of tiers. The tier has been designated as being automatically managed by the database system. The tier has a plurality of levels of priority for resource allocation (LPRAs). The database system decomposes the query into a first step and a set of subsequent steps. The first step has a beginning and each of the set of subsequent steps has a respective beginning. The database system assigns the first step to a first LPRA, wherein executing the query at the first LPRA is projected by the database system to satisfy a service level goal (SLG) within a on_schedule_range of the SLG. The database system determines during execution of the set of subsequent steps that the query is no longer projected to satisfy the SLG within the on_schedule_range of the SLG and, as a result, assigns one of the set of subsequent steps to a second LPRA different from the first LPRA, wherein executing the query at the second LPRA is projected by the database system to return execution of the query to within the on_schedule_range of the SLG.
In some examples, a database system receives data relating to plural micro-models that apply respective analytics, and distributes a plurality of data segments of the received data across the plurality of processing engines based on values of a segmentation key included in the received data. A plurality of processing engines, performs in parallel, operations associated with the plural micro-models using respective data segments of the plurality of data segments, where different processing engines of the plurality of processing engines perform operations associated with respective micro-models of the plural micro-models.
A method and apparatus for optimizing a query in a relational database management system (RDBMS) when a predicate on a data column in the query has a correlation to a partitioning attribute of a partitioning column in data retrieved from a cloud-based store, wherein the optimizing uses the correlation between the data column in the query to the partitioning column in the data retrieved from the cloud-based store for data elimination when processing the query. The correlation is defined in a formula or lookup data structure that maps or range-maps from the data column to the partitioning column.
In some examples, a database system includes a plurality of processing engines to process data for database operations, and instructions executable on at least one processor to insert first data into first objects stored in a remote data store coupled to the database system over a network, and select, based on a size of the first data, a first partition level from a plurality of different partition levels to associate with the first objects. Different partition levels define different quantities of hash buckets that correspond to different distributions of objects across the plurality of processing engines. The first partition level is associated with the first objects.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
54.
Optimizing performance using a metadata index subtable for columnar storage
A method, apparatus, and computer program product for executing a relational database management system (RDBMS) in a computer system, wherein the RDBMS manages a relational database comprised of at least one column-partitioned base table storing data. Column values from at least one column of the column-partitioned base table are stored in one or more containers spread across one or more data blocks. Metadata comprising summarized information about the column values in the containers is stored in a metadata index subtable. A query with a filtering condition on the column is applied to the metadata index subtable before the column-partitioned base table is accessed, so that only qualified containers and data blocks are accessed, and unqualified containers and data blocks are eliminated, when responding to the query.
In some examples, a system receives function descriptors for different types of functions to be used when processing database queries, each function descriptor of the function descriptors comprising information relating to a respective function of the different types of functions. The system computes, based on a first function descriptor for a first function of the different types of functions, an estimate of a runtime metric associated with execution of the first function for processing a database query.
In some examples, in response to a join query to join a plurality of tables, a first processing engine retrieves tuples of a first table from a subset of objects of a data store, and adds content of the retrieved tuples to an in-memory table, where the objects are range partitioned across a plurality of processing engines based on respective ranges of values of at least one join attribute in the join query. The first processing engine retrieves, from the data store, tuples of a second table of the plurality of tables based on a range of values of the at least one join attribute in the retrieved tuples of the first table. The first processing engine performs an in-memory join of the plurality of tables based on the retrieved tuples of the second table and the in-memory table.
In some examples, the database system maintains metadata for a plurality of data objects, the metadata containing ranges of values of an attribute for the plurality of data objects, where the ranges of values of the attribute comprise a respective range of values of the attribute for each corresponding data object of the plurality of data objects. The database system generates a data structure tracking quantities of ranges of values of the attribute that have a specified relationship with respect to corresponding different values of the attribute. The database system receives a database query comprising a predicate specifying a condition on a given value of the attribute, and computes, for the database query, a selectivity of filtering based on the metadata, the selectivity computed based on the data structure.
In some examples, a database system identifies a plurality of query portions in a database query that contain references to a first external table, the first external table being based on data from a remote data store coupled to the database system over a network. The database system creates a common spool portion that includes projections and selections of the plurality of query portions, and rewrites the plurality of query portions into rewritten query portions that refer to a spool containing an output of the common spool portion. For execution of the database query, the database system determines, as part of optimizer planning, whether to use the plurality of query portions or the common spool portion and the rewritten query portions.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
A data store system includes a storage device and a processor in communication with the storage device. The processor may receive data from a source and generate a plurality of rows from the data. The processor may further apply row reduction criteria to the buffered plurality of rows. The processor may further, in response to application of the row reduction criteria, determine at least one resultant row. A number of the at least one resultant row is less than a number of the plurality of rows. The processor may further store the at least one resultant row in the storage device. A method and computer-readable medium is also disclosed.
In some examples, a database system includes a memory to store a predicate-object name cache, where the predicate-object name cache contains predicates mapped to respective object names. The database system further includes at least one processor to receive a query containing a given predicate, identify, based on accessing the predicate-object name cache, one or more object names indicated by the predicate-object name cache as being relevant for the given predicate, retrieve one or more objects identified by the one or more object names from a remote data store, and process the query with respect to data records of the one or more objects retrieved from the remote data store.
An apparatus, method and computer program product for estimating as-a-Service (aaS) query prices in a relational database management system (RDBMS). An optimizer of the RDBMS inserts an EXPLAIN modifier into a query, wherein the EXPLAIN modifier results in the optimizer generating a summary of a query execution plan for the query that includes one or more cost estimates for the RDBMS to perform the query. A price estimate for the query is then generated based on the cost estimates, wherein the price estimate is generated using one or more configurable pricing formulae. The price estimate is merged into the summary of the query execution plan for the query. Moreover, a price guarantee may be generated for the price estimate, wherein the price guarantee is honored when the query is subsequently invoked for execution by the RDBMS.
Hyperparameter tuning for a machine learning model is performed in a massively parallel database system. A computer system comprised of a plurality of compute units executes a relational database management system (RDBMS), wherein the RDBMS manages a relational database comprised of one or more tables storing data. One or more of the compute units perform the hyperparameter tuning for the machine learning model, wherein the hyperparameters are control parameters used in construction of the model, and the tuning of the hyperparameters is implemented as an operation in the RDBMS that accepts training and scoring data for the model, constructs the model using the hyperparameters and the training data, and generates goodness metrics for the model using the scoring data.
42 - Scientific, technological and industrial services, research and design
Goods & Services
Software as a service (SaaS) services, namely, software for use in managing, collecting, searching, accessing, navigating, storing, organizing and governing data; building data catalogs for others, namely, designing and developing computer systems for use in managing, collecting, searching, accessing, navigating, storing, organizing and governing data for others.
64.
Optimizing limit queries over analytical functions
A relational database management system (RDBMS) optimizes limit queries over analytical functions, wherein the limit queries include an output clause comprising a LIMIT, TOP and SAMPLE clause with an expression specifying a limit that is a number K or a percentage α %. The optimizations of the limit queries include: (1) static compile-time optimizations, and (2) dynamic run-time optimizations, based on semantic properties of “granularity” and “input-to-output cardinality” for the analytical functions.
Improved techniques for performing Matrix-Related operations (e.g., Matrix Multiplication, Matrix Transpose) in Relational Database systems are disclosed. Techniques provide Matrix Data Sets for performing Matrix-Related operations in Relational Databases more efficiently than conventional techniques. By way of example, Matrix Data can be partitioned such that data each partition can be processed directly in a cache memory of a processor thereby reducing the need for copying data as it is conventionally done in Relational Databases. In addition, database queries involving Matrix-Related operations can be optimized for a Relational Database by providing Matrix Operations that can be directly used as declarative statements in a Database Query language (e.g., SQL). Furthermore, database query optimizers of a Relational Database can be further enhanced by allowing them to consider Matrix Algebra, as well as other opportunities in processing Matrix-related operation, possibly in connection of one or more the facets of the improved techniques.
A system may include at least one processor. The at least one processor may receive data from a plurality of independent data sources. The data from each respective data source is received at a rate determined by the respective data source. The at least one processor may further write the received data to at least one data store at a rate independent of the respective rates at which data from the plurality of independent data sources is received. A method and computer-readable medium are also disclosed.
A system may include a server and a data store system. The server may include at least one storage device and at least one processor. The server may execute an application and may store an encrypted password. The data store system may include at least one persistent storage device configured to store a data store. The data store system may further include a plurality of processing nodes configured to operate on the data store. The data store system may receive the encrypted password from the application with one of the plurality of processing nodes and may decrypt the encrypted password with the one of the plurality of processing nodes. The data store system may authenticate the decrypted password with the one of the processing nodes and provide the decrypted password to other processing nodes. Each processing node that has the decrypted password may be accessible to the application to operate on the data store. A method and computer-readable medium may also be implemented.
H04L 9/32 - Arrangements for secret or secure communicationsNetwork security protocols including means for verifying the identity or authority of a user of the system
H04L 29/06 - Communication control; Communication processing characterised by a protocol
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
A data store system includes an array of persistent storage devices configured to store a plurality of data store tables. The data store system includes a processor in communication with the storage device. The processor may receive a query comprising an aggregate function and identify structure of an argument of the aggregate function. The subset of data store tables may be associated with the argument. The processor may partially-execute the aggregate function on each data store table in the subset involved in the argument of the aggregate function to create partially-executed results for each data store table of the subset of data store tables. The processor may join the partially-executed results based on join conditions contained in the aggregate function. The processor may complete execution of the aggregate function on the partially-executed results to generate a final result of the aggregate function. A method and computer-readable medium are also disclosed.
A method may include receiving a stored procedure associated with data stored in a plurality of data stores. The stored procedure may include a plurality of executable statements. The method may further include identifying a first executable statement of the plurality of executable statements to be executed by the processor and a second executable statement of the plurality of executable statements that is executable by at least one of a plurality of other processors. The other processors each may have access to only a respective one of the plurality of copies of the data. The method may further include executing the first executable statement. A system and computer-readable medium may also be implemented.
42 - Scientific, technological and industrial services, research and design
Goods & Services
Software as a service (SaaS) services featuring software for use in managing, collecting, searching, accessing, navigating, storing, organizing and governing data not for use as an aid in memorization of flash cards; Software as a service (SaaS) services featuring software for building data catalogs for others not for use as an aid in memorization of flash cards
71.
Data reduction in multi-dimensional computing systems including information systems
Improved techniques for processing large-scale data and various large-scale data applications (e.g., large-scale Data Mining (DM), large-scale data analysis (LSDA)) in computing systems (e.g., Data Information Systems, Database Systems) are disclosed. Redundancy-reduced data (RRDS) can be provided as data that can be used more efficiently by various applications, especially, large-scale data applications. In doing so, at least one assumption about the distribution of a multi-dimensional data set (MDDS) and its corresponding set of responses (Y) can be made in order to reduce the multi-dimensional data set (MDDS). For example, a normal distribution (e.g., bell-shape, symmetric) can be assumed and Mutual information of the combination of a multi-dimensional set (X) and its corresponding responses (Y) can be optimized, for example, by using linear transformations, iterative numerical procedures, one or more constraints associated with the at least one assumption, and using one or more Lagrange multipliers to provide a constraint optimization function.
A relational database management system (RDBMS) accepts a workload comprised of one or more queries against a relational database. The RDBMS evolves a default cost profile into a plurality of cost profiles using fixed or dynamic evolution, wherein each of the cost profiles captures one or more cost parameters for the workload. The cost profiles are represented by a multi-dimensional matrix that has one or more dimensions, and each of the dimensions represents one of the cost parameters. The RDBMS dynamically determines which of the cost profiles is an optimal cost profile for the workload by mapping the cost profiles to the workload using a random walk scoring algorithm or a biased walk scoring algorithm that searches the multi-dimensional matrix to identify the optimal cost profile. The RDBMS selects and performs one or more query execution plans for the workload based on the optimal cost profile for the workload.
In some examples, a database management node updates object metadata with indicators of access frequencies of a plurality of objects in a data store that is remotely accessible by the database management node over a network. The database management node selects a subset of the plurality of objects based on the indicators, and caches the subset in the local storage.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
A data store system may include an array of persistent storage devices configured to store a plurality of data store tables. The data store system may further include a processor in communication with the storage device. The processor may receive a query containing a non-equality join condition on a first column from a first data store table and a second column on a second data store table. The processor may generate a bitmap based on the join condition. The bitmap indicate respective matches between the first column and second column in accordance with the non-equality join condition. The bitmap may also be used each time the non-equality join condition is present in another received query. A method and computer-readable medium may also be implemented.
A method of operating a data store system may include identifying a non-responsive processing node from a plurality of processing nodes. The method may further include generating a new registration key in response to identifying the non-responsive processing node. The method may further include providing the new registration key to the other processing nodes of the plurality of processing nodes excluding the identified non-responsive node. Each processing node provided the new registration key may be authorized to access a plurality of storage devices of a storage array in communication with the plurality of processing nodes. A system and computer-readable medium may also be implemented.
In some examples, a database system includes a storage medium to store a materialized view (MV) that includes data satisfying an MV condition. At least one processor is to receive a query including a query condition, determine that the query condition partially matches the MV condition, and access a part of the data in the MV partially satisfy the query.
A computer running a database system receives one or more queries, each query comprised of parallel threads of execution working towards the common goal of completing a user request. These threads are grouped into a schedulable object called a task group. The task groups are placed within a specific multiple tier hierarchy, and database system resources and service level goals (SLGs) allocated to the task groups according to their placement within the hierarchy. The execution of requests/tasks is monitored, and resource allocations temporarily increased to critical requests that are unlikely to meet execution goals (SLGs).
A database query to be run against a database is received by a processor. The query includes a query predicate. The query predicate includes a condition. The condition applies to a single database table. The condition is parsed to create an input vector. The input vector is submitted to a neural network. The neural network is trained to calculate the selectivity, a number of unique values (NUV) of results of applying predicates to the single database table, and a high mode frequency (HMF) of results of applying predicates to the single database table. The neural network determines the selectivity of the query predicate, an NUV for each column in the result of applying the query predicate to the single database table, and an HMF for each column in the result of applying the query predicate to the single database table.
An apparatus, method and computer program product for query optimization in a Relational Database Management System (RDBMS), wherein an optimizer accesses a query expression repository (QER), so that the optimizer learns from previous versions of the queries to improve current and subsequent versions of the queries. The QER stores planning and execution information for QEs from the previous versions of the queries, wherein the QEs comprise table relations, intermediate results and/or final results of operations in the previous versions of the queries. The optimizer searches the QER for QEs from the query execution plans, and uses information from the QEs stored in the QER when optimizing the current and subsequent versions of the queries. The optimizer may also reuses results from the QEs stored in the QER.
In some examples, a system stores data in a logically disconnected data store. In response to a query for data in the data store, the system accesses metadata of objects stored in the data store, the metadata including information of a respective range of values of at least one clustering attribute in data contained in each respective object of the objects. The system partitions the objects across the plurality of processing engines based on the information of the respective ranges of values of the at least one clustering attribute in the data contained in the objects. The system assigns, based on the partitioning, the objects to respective processing engines of the plurality of processing engines.
In some examples, a system learns properties of an analytical function based on information of queries invoking the analytical function that have been previously executed, creates a function descriptor for the analytical function based on the learning, and provides the function descriptor for use by an optimizer in generating an execution plan for a received database query that includes the analytical function.
An apparatus, method and computer program product for physical database design and tuning in relational database management systems. A relational database management system executes in a computer system, wherein the relational database management system manages a relational database comprised of one or more tables storing data. A Deep Reinforcement Learning based feedback loop process also executes in the computer system for recommending one or more tuning actions for the physical database design and tuning of the relational database management system, wherein the Deep Reinforcement Learning based feedback loop process uses a neural network framework to select the tuning actions based on one or more query workloads performed by the relational database management system.
A data request that references an external data environment object (foreign object) is identified. A Data Manipulation Language (DML) statement for accessing the object is traversed in a defined order to identify foreign servers having the foreign object. Connections are attempted to foreign servers in the defined order and a selection to one of the foreign servers is made based on server and/or data conditions. The selected server is used for the request to process the portion of the request that includes the foreign object, In an embodiment and during execution of data request, the server and/or the data conditions can be dynamically overridden to change selection criteria for the selected server.
In some examples, a system sends a transaction to a database server to cause storing of data of the transaction in a cache of the database server, where the data in the cache is for inclusion in a backup of data from the database server to a remote data store (e.g., the backup may be in a cloud and may be a snapshot). The system detects a failure associated with the database server, and in response to detecting the failure, requests, from the database server or a replacement database server, transaction information of at least one transaction that was successfully applied to the remote data store, the transaction information based on the backup of data. The system causes replay one or more transactions to recover data at the database server or the replacement database server, to perform recovery of the database server or the replacement database server to a current state.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Improved techniques for performing Spatial Joins multi-processing computing systems and environments are disclosed. One or more intersection of bounds (or limits) of data sets is determined as a join bounding space. The join bounding space is in a space (Global space or Global universe) where a spatial join between (or for) the data can be performed. The determined join bounding space can be partitioned into sub-partitions of the join bounding space. The sub-partitions of the join bounding space can assigned respectively to multiple processing unit for processing in parallel in. In addition, distribution cost information associated with the cost of distribution of the datasets (and/or their components) to the processing units of a multi-processing system can be provided and/or used to effectively distribute and/or redistribute processing of the Spatial Join between the processing units of a multi-processing system.
Database values and their associated indicators can be arranged in multiple “buckets.” Adjacent buckets can be combined into a single bucket successively based one or more criteria associated with the indicators to effectively reduce the number of buckets until a desired number is reached.
A determination is made that a database system is resource bound resulting in a resource bound condition. Signals for the resources being bound in the database system are identified. Events associated with the signals are extracted. Events are correlated temporally to identify a time interval for which an arrival rate meter (ARM) is helpful. Database system segments are selected that effect key performance indicators associated with the identified time interval. Parameters for the selected database system segments to be deferred by the database system are estimated. The estimated parameters are incorporated into an arrival rate meter (ARM). The ARM is put into effect.
A system may include a storage device configured to store a plurality of database tables. The system may further include a processor in communication with the storage device. The processor may receive a request to transmit a database table from the plurality of database tables. The database table may have a plurality of rows. The processor may determine if contents of each column row of each row of the database table are eligible to be compressed. For each column row that contains eligible contents, the processor may generate compressed data representative of the contents of a respective column row. The processor may remove the contents of the respective column row from the associated row. The processor may transmit the compressed data and the database table without content of the column rows represented by the compressed data. A method and computer-readable medium may also be implemented.
As an abstract representation, a set of equivalent logical structures representative of multiple execution plans for execution of a database query can be used to optimize a database query. A logical structure can include one or more logical operators each representing multiple physical operators for executing the database query. Group and Operator Rules can be applied as rules to the set of equivalent logical structures to obtain additional equivalent logical structures and logical operator until no additional logical operators can be obtained. A set of possible implementation plans for the total number of the obtained logical operators can be obtained, for example, based on physical and/or implementation context. An optimization request can be effectively propagated through an implantation plan in a top-down manner, for example, recursively for each child of physical operators, where only new contexts are optimized, in order to generate an optimized structure, for example, in consideration of, implementation details, costs, physical properties, etc. One of the optimized structures can be selected as an optimal plan.
A first query execution plan generated for a query on a second time the query was processed by a database is compared against a dynamically generated second query plan generated based on statistics only dynamic feedback for the second time the query is processed by the database. A determination is made on the second time as to whether to cache the first query execution plan, the second query execution plan, or no plan for third or more times the query is processed by the database. The query can be non-parameterized or parameterized.
A machine-learning driven Database Management System (DBMS) is provided. One or more machine-learning algorithms are trained on the database constructs and execution plans produced by a database optimizer for queries. The trained machine-learning algorithms provide predictors when supplied the constructs and plans for a given query. The predictors are processed by the DBMS to make resource, scheduling, and Service Level Agreement (SLA) compliance decisions with respect to the given query.
A data engine request is received on a local data system. The data engine request includes a portion of the request that is to be processed on an external data engine system. The portion is forwarded to the external data engine system and statistics for accessing external objects of the external data engine system is acquired. The statistics are evaluated for compliance with a Service Level Goal (SLG) associated with the request. Rules-based processing permits optimization and planning of the request on the local data engine system to be modified in view of the statistics received from the external data engine system to comply with the SLG. In an embodiment, actual resource utilization metrics noted during execution of the portion on the external data engine system is provided as feedback to the local data engine system for re-planning and re-optimizing the request with a modified execution plan.
A multi-staged sample and seed machine-learning training technique is presented. A sample proportion of a training data set is fed to a machine-learning algorithm (MLA) for purposes of configuring functions of the MLA to predict an output with a desired degree of accuracy. When iterating the sample proportion, if a deviation in an incrementally produced current accuracy of the MLA does not exceed a threshold, the sampled proportion is increased. This continues until the current degree of accuracy meets or exceeds the desired degree of accuracy, which is an indication that the functions of the MLA are configured as a desired model for producing the predicted output when the MLA is presented with input that may or may not have been associated with the training data set.
Techniques for transitioning between code-based and data-based execution forms (or models) are disclosed. The techniques can be used to improve the performance of computing systems by allowing the execution to transition from one of the execution models to another one of the execution models that may be more suitable for carrying out the execution or effective processing of information in a computing system or environment. The techniques also allow switching back to the previous execution model when that previous model is more suitable than the execution model currently being used. In other words, the techniques allow transitioning (or switching) back and forth between a data-based and code-based execution (or information processing) models.
One or a soft correlation of a database can be adjusted (e.g., modified, replaced, overwritten) for use with respect to one or more record(s) of the database associated with the soft correlation, by considering at least one or more violations of the soft correlations in the one or more of records database records associated with the soft correlation. In addition, an adjusted soft correlation can be stored and used for optimizations of database queries pertaining to one or more records associated with the adjusted soft correlation. Typically, the adjusted soft correlation is adjusted by at least considering the violations of an original soft correlation in the one or more records relating to the database queries.
A query is preprocessed for features identified by a Data Manipulation Language (DML) in the text of the query. A machine-learning algorithm uses the features as input and provides as output a predicted query parsing execution time needed by a query parser to parse the query. The predicted query parsing time is provided as input to a query optimizer. The query optimizer uses the predicted query parsing time as a factor in optimizing a query execution plan for the query. Subsequently, the query execution plan is executed against a database as the query.
Execution of a query invoking an analytical function (AF) is optimized. The query includes a join operation between an AF table and an AuxiliaryTable. A determination is made that the AF includes a plurality of AF properties. Query-level properties about the query are inferred. A determination is made to change an order of the join operation from the plurality of AF properties and query-level properties.
A database system receives a request from a user. The request invokes a data set function (DSF) and uses a property to be provided by the DSF. The database system determines that a function descriptor is available for the DSF. The function descriptor is expressed as markup language instructions. The function descriptor defines the property of the DSF. The database system uses the function descriptor to define a property for the DSF.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 16/21 - Design, administration or maintenance of databases
99.
METHODS AND TECHNIQUES FOR DEEP LEARNING AT SCALE OVER VERY LARGE DISTRIBUTED DATASETS
An apparatus, method and computer program product for neural network training over very large distributed datasets, wherein a relational database management system (RDBMS) is executed in a computer system comprised of a plurality of compute units, and the RDBMS manages a relational database comprised of one or more tables storing data. One or more local neural network models are trained in the compute units using the data stored locally on the compute units. At least one global neural network model is generated in the compute units by aggregating the local neural network models after the local neural network models are trained.
Selecting a join plan for a query containing a join and a union block includes determining whether to propose a join plan with the join pushed across the union block. A selection is made between a join plan in which the join is not pushed across the union block and any proposed join plan in which the join is pushed across the union block.