A method of managing network-attachable computing entities comprising: training a machine-learning model to detect a bottleneck process segment in a process flow performed by a network-attachable computing entity; deploying a trained model to monitor a network-attachable computing entity in operation; responsive to detecting an instance of the bottleneck process segment, analyzing to determine a cause of the bottleneck; responsive to determining the cause of the bottleneck, generating an augmented functional unit to address the cause of the bottleneck; and deploying the augmented functional unit to at least one of the network-attached computing entities that has an instance of a process comprising the bottleneck process segment.
Apparatuses, methods, and non-transitory computer-readable media are disclosed. One example concerns a reduce interpolation channel-wise instruction to trigger a reduce interpolation channel-wise operation. The reduce interpolation channel-wise operation comprises: selecting a pair of vectors from a range of source vectors in dependence on a first portion of each element of an interpolation vector; a weighted addition of that element from the pair of vectors, wherein a weighting of the weighted addition is dependent on a second portion of that element of the interpolation vector; and storing a result of the weighted addition in that element of a destination vector. Another example concerns a 2D selection instruction to trigger a 2D selection operation comprising, for each vector element: selecting a selected vector from a range of source vectors, wherein the selected vector is selected in dependence on an element value of that element of an index vector; and copying that element from the selected vector of the range of source vectors to that element of a destination vector.
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
A tile-based graphics processor performs first and second processing passes to generate a render output. The first processing pass generates and writes out information representative of a set of bounding boxes, and the second processing pass uses the bounding box information to determine which primitives to process for which rendering tiles.
Apparatuses, methods, and non-transitory computer-readable media are disclosed. One example concerns a reduce interpolation channel-wise instruction to trigger a reduce interpolation channel-wise operation. The reduce interpolation channel-wise operation comprises: selecting a pair of vectors from a range of source vectors in dependence on a first portion of each element of an interpolation vector; a weighted addition of that element from the pair of vectors, wherein a weighting of the weighted addition is dependent on a second portion of that element of the interpolation vector; and storing a result of the weighted addition in that element of a destination vector. Another example concerns a 2D selection instruction to trigger a 2D selection operation comprising, for each vector element: selecting a selected vector from a range of source vectors, wherein the selected vector is selected in dependence on an element value of that element of an index vector; and copying that element from the selected vector of the range of source vectors to that element of a destination vector.
A graphics processor and method of operating a graphics processor to perform rendering followed by post-processing, in which post-processing tasks are permitted to be issued to processing cores of the graphics processor without waiting for all of the rendering tasks to have completed their processing, such that post-processing tasks are processed concurrently with rendering tasks.
The present disclosure relates to a method of operating a graphics processor to process a frame formed of a plurality of regions, the graphics processor comprising at least one execution unit with an associated storage element, the at least one execution unit is operable to process the plurality of regions according to a shading rate for each region to generate a processing output to the associated storage element, the method comprising: obtaining the shading rate for one or more of the plurality of regions; determining a respective processing output size for each of the one or more regions based on the shading rate for the one or more regions; forming the one or more regions into at least one variable processing unit based on the respective processing output size for each of the one or more regions and a capacity of the associated storage element; and assigning the at least one variable processing unit as a processing task to the at least one execution unit.
Resource selection circuitry selects one or more selected resources from among a set of resources, based on availability information indicating whether each resource is available or unavailable for selection. The resource selection circuitry comprises unavailable resource counting circuitry to generate count values indicative of a number of unavailable resources indicated by respective portions of the availability information; shift circuitry to perform, depending on the count values, a plurality of shift stages on a resource identifier vector comprising a plurality of resource identifier elements each for representing a resource identifier of a corresponding one of the resources, to compact the resource identifier elements corresponding to available resources into a contiguous portion of the resource identifier vector; and selection circuitry to select, as the one or more selected resources, one or more resources corresponding to resource identifier elements indicated in said contiguous portion of the resource identifier vector.
In response to instruction decoding circuitry decoding a conditional write instruction, processing circuitry determines whether a predetermined condition is satisfied for a target cache line corresponding to a target address specified by the conditional write instruction. If the predetermined condition is satisfied for the target cache line, a write request is issued to update the target cache line. If the predetermined condition is not satisfied for the target cache line, a failure indication is returned. The processing circuitry selects, depending on whether the sequence of instructions specifies cache-line-retention hint information applicable to the conditional write instruction, whether to prevent a unique coherency state of the target cache line being relinquished by a local cache associated with the processing circuitry for a retention period following processing of the conditional write instruction. The unique coherency state comprises a coherency state in which the processing circuitry has exclusive right to update the target cache line.
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
9.
APPARATUS, SYSTEM, CHIP-CONTAINING PRODUCT AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
Execution circuitry executes, in response to an instruction referencing a given source register of a register file, a data processing operation on pre-processed operand data obtained after a pre-processing action has been performed using stored operand data from the given source register of the register file. A buffer separate from the register file stores pre-processed operand data corresponding to a subset of registers. Register reuse detection circuitry detects a register reuse opportunity for a subsequent instruction referencing a reused source register also referenced by a previous instruction for which pre-processed operand data corresponding to the reused source register was written to the buffer, when it is guaranteed that no intervening instruction will cause a write to the reused source register. In response to detecting the register reuse opportunity, the data processing operation for the subsequent instruction can be performed using the pre-processed operand data stored in the buffer corresponding to the reused source register, and the pre-processing action is suppressed in relation to stored operand data from the reused source register.
In response to instruction decoding circuitry decoding a conditional write instruction, processing circuitry determines whether a predetermined condition is satisfied for a target cache line corresponding to a target address specified by the conditional write instruction. If the predetermined condition is satisfied for the target cache line, a write request is issued to update the target cache line. If the predetermined condition is not satisfied for the target cache line, a failure indication is returned. The processing circuitry selects, depending on whether the sequence of instructions specifies cache-line-retention hint information applicable to the conditional write instruction, whether to prevent a unique coherency state of the target cache line being relinquished by a local cache associated with the processing circuitry for a retention period following processing of the conditional write instruction. The unique coherency state comprises a coherency state in which the processing circuitry has exclusive right to update the target cache line.
There is provided an apparatus that includes storage circuitry for storing one or more addresses and one or more counters each associated with one of the one or more addresses. Receive circuitry receives a request comprising an address. In response to receiving the request, one of the one or more counters in the storage circuitry that is associated with the address that is in the request is incremented probabilistically.
There is provided an apparatus, a system, a chip containing product, a method, and a medium. The apparatus comprises: a plurality of registers comprising at least one array register having a plurality of array regions. The apparatus comprises processing circuitry to receive issued instructions and to process those instructions. The apparatus is also provided with control circuitry responsive to receipt of an instruction requiring access to two or more array regions: to decompose the instruction into two or more execution parts, each corresponding to one of the two or more array regions; and for each execution part, to delay issuing the execution part until it is predicted that the execution part can be processed hazard free. The control circuitry is capable of issuing the two or more execution parts in different cycles based on when it is predicted that each execution part can be processed hazard free.
Issue group allocation circuitry controls allocation of each micro-operation to one of a plurality of issue groups, depending on detection of register conflicts between micro-operations, the register conflicts concerning access to registers of a first register set. A given micro-operation is allocated to a selected issue group for which no micro-operation already allocated to the selected issue group has a register conflict with the given micro-operation and the selected issue group is a younger issue group than any issue group already allocated an older micro-operation than the given micro-operation for which a register conflict is detected between the given micro-operation and the older micro-operation. Issue circuitry controls issue of the micro-operations based on the issue groups, to prevent any instruction in a given issue group being issued until all micro-operations in any older issue group than the given issue group have been issued.
An apparatus comprises a predictor to make a prediction based on a plurality of prediction tables. The plurality of prediction tables are looked up using table lookup information generated based on different lengths of input history information representing a path through program execution. The apparatus comprises circuitry to prevent a given portion of the input history information from differing with respect to a corresponding portion of the input history information used to make a preceding prediction, where the given portion is a portion which is not used to generate table lookup information for an active subset of the prediction tables.
According to one implementation, a circuit includes a first digital gate (108A) and a timing offset circuit portion (238) coupled to the first digital gate (108A) that includes one or more tri-state inverters (202A . . . 202N) where a capacitance at an output of the first digital gate (108A) is based on a quantity of enabled tri-state inverters of the one or more tri-state inverters (202A-202N).
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
H03K 19/094 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits using specified components using semiconductor devices using field-effect transistors
H03K 19/0948 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits using specified components using semiconductor devices using field-effect transistors using MOSFET using CMOS
16.
Systems, Methods, and Devices of Droop Detector Circuitry
According to one implementation, a circuit includes a droop detection element (112) including voltage sensitive gates (114A-N) and at least two delay elements (118A, 118B) where each delay element of the at least two delay elements is a non-inverting gate or a non-inverting gate combination. The at least two delay elements (118A, 118B) are configured to provide a delay to the droop detection element (112), where at least a first delay element (118A) is a first voltage-threshold (VT)-type, at least a second delay element (118B) is a second voltage-threshold (VT)-type, and the first and the second VT-types are different.
H03K 5/133 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals using a chain of active-delay devices
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
An image processing method processes color image data. The applies an operator to a representative pixel value to generate a transformed pixel value. A controlled gain is determined so that: if applying a gain based on the transformed pixel value to a maximum color channel value of the pixel will generate a color channel value that is below a threshold value, the controlled gain is determined based on the transformed pixel value and the representative pixel value, and if applying a gain based on the transformed pixel value to the maximum color channel value will generate a color channel value that is above the threshold value, the controlled gain is determined such that the maximum color channel value is mapped to a value representable by a predetermined number of bits. The controlled gain is applied to each of the color channel values.
A method to debug instruction execution errors in a simulated computer system is provided. The method includes generating two separate simulations of the same system and causing a code including a set of instructions to execute on the two separate simulations. The computer implemented method further includes performing an efficient trace operation starting from a start instruction to an end instruction of the set of instructions on the two separate simulations. When trace operation is performed, an instruction execution deviation is identified between the code executed in the two separate simulations by comparing checksum values at a reporting frequency, determining that the comparison of the checksum values indicates a mismatch, and using instruction count and the reporting frequency to capture at least one instruction leading up to the instruction execution deviation.
A method and a system for dynamically deriving and verifying a measure of a computing environment is presented. The proposed method and system are used to reliably verify measurements of the computing environment. The method includes receiving a dataset recorded by an untrusted source describing elements used to create a computing system operating in a computing environment, receiving attestation evidence generated by a trusted source including an initial measurement value describing the elements of the computing system, deriving a measurement value based on the received dataset, and performing a verification process on a measurement of the computing environment. The verification process is performed by comparing the derived measurement value with the measurement value of the attestation evidence. In response to the comparison of the derived measurement value with the measure value of the attestation evidence being equal, trustworthiness of the computing environment is determined.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
According to one implementation, a pulse generator circuit includes a flip-flop receiving an input clock signal, and one or more delay elements where the circuit is configured to adjust a pulse width of an output clock signal independent of a clock period of the input clock signal. According to another implementation, a pulse generator circuit includes a flip-flop receiving an input clock signal, and one or more delay elements where the circuit is configured to adjust a pulse width of an output clock signal. The flip-flop is configured to transmit a flip-flop output signal to the one or more delay elements where the flip-flop output signal includes a state of the flip-flop. The one or more delay elements are configured to delay the flip-flop output signal by a delay period and transmit the delayed flip-flop output to a reset input.
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
H03K 3/017 - Adjustment of width or dutycycle of pulses
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
According to one implementation, a computer system includes processing unit circuitry of one or more computer devices that include a plurality of transistors configured to a first voltage threshold, and digital voltage sensor circuitry (160) of the one or more computer devices that include at least a delay line circuit (126) including one or more digital gates (120A-N). Each of the one or more digital gates (120A-N) includes driving transistors configured to a second voltage threshold, where the digital voltage sensor circuit (160) is configured to predict voltage droop of the processing unit circuitry.
H03K 5/134 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals using a chain of active-delay devices with field-effect transistors
G01R 19/165 - Indicating that current or voltage is either above or below a predetermined value or within or outside a predetermined range of values
G01R 19/25 - Arrangements for measuring currents or voltages or for indicating presence or sign thereof using digital measurement techniques
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
22.
TECHNIQUE FOR HANDLING ORDERING CONSTRAINED ACCESS OPERATIONS
Processing circuitry is provided to perform operations, along with instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions. A set of registers is used to hold data values for access by the processing circuitry. The instruction decoder circuitry is responsive to an ordering constrained access instruction used to access multiple data values, and providing register indication information and memory address information, to control the processing circuitry to perform a sequence of access operations, where each access operation causes a data value from amongst the multiple data values to be moved between an associated register determined from the register indication information and an associated memory address determined from the memory address information. Further, an ordering indication is derived from the ordering constrained access instruction and used to determine an order in which the multiple data values are to be accessed when performing the sequence of access operations, to thereby ensure that observability conditions required when implementing the ordering constrained access instruction are met.
A graphics processing system comprises a programmable processing unit operable to execute processing programs for execution threads corresponding to work items to be processed, and storage 74 in which a respective storage region can be allocated for temporary use by a respective group of execution threads corresponding to a group of work items being executed by the programmable processing unit while the group of execution threads are being executed. 10 Respective indicators (e.g. clear bits) are provided to indicate when respective regions of the storage are to be cleared.
Messages and data are dynamically selected for packing into an information packet for transmission across a communication link of the data processing network. The number of messages (zero or more) of each message kind to be packed is determined based, at least in part, on the number of slots available to be packed, on the number of pending messages of each kind and on a dynamically determined priority setting. The priority may be user controlled, dependent on input backpressure and/or dependent on target loading, for example. The number of messages of each message kind to be packed may be determined using a hardware lookup table.
A technique is provided for performing a computation equivalent to applying a shift to an input value to generate an output value. Mask generation circuitry is used to generate an N-bit mask in dependence on a provided shift amount indication. N is a number of possible bit positions that a given bit of the input value may be located within the output value after the shift is performed. The mask generation circuitry performs N independent logical operations on bits forming the shift amount indication, each logical operation producing a mask bit value for a corresponding bit position of the N-bit mask, and the N logical operations being arranged such that, for any given shift amount indication, only one bit position in the generated N-bit mask will have its mask bit value indicating a set state. Output value generation circuitry is used to apply the N-bit mask to the given bit of the input value in order to determine a corresponding location of the given bit within the output value, and to determine a location within the output value of each other bit of the input value in dependence on the corresponding location of the given bit.
G06F 7/76 - Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
A graphics processing system comprises a programmable processing unit operable to execute processing programs for execution threads corresponding to work items to be processed, and storage (74) in which a respective storage region can be allocated for temporary use by a respective group of execution threads corresponding to a group of work items being executed by the programmable processing unit while the group of execution threads are being executed. The graphics processing system further comprises one or more memory region "access permission" checking circuits operable to prevent access to the storage for requests that are made for any requesters other than an execution thread of the work group for which the respective region of the storage has been allocated for use by.
G06F 21/79 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data in semiconductor storage media, e.g. directly-addressable memories
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 12/14 - Protection against unauthorised use of memory
A graphics processing system comprises a programmable processing unit operable to execute processing programs for execution threads corresponding to work items to be processed, and storage (74) in which a respective storage region can be allocated for temporary use by a respective group of execution threads corresponding to a group of work items being executed by the programmable processing unit while the group of execution threads are being executed. The graphics processing system further comprises a clear operation circuit (33) controllable to write a clear value to all entries of a region of the storage.
An apparatus for address translation is provided in order to translate virtual addresses used by devices in a data processing system into physical addresses for accessing memory. In accordance with the techniques disclosed herein, state tracking circuitry is provided to maintain the state of a page table entry that specifies such address translations. The state can be used to assess whether or not the address translation entry is worth storing in an address translation cache provided for faster access of previously used address translations. Accordingly, the techniques disclosed herein allow for more efficient use of the limited capacity available in the address translation cache as well as additional uses of a page table entry's state.
G06F 12/1045 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
There is provided an apparatus in which processing circuitry performs processing in one of a fixed number of at least two domains, one of the domains being subdivided into a variable number of execution environments. Memory translation circuitry, in response to a memory access request to a given memory address, determines a given encryption environment identifier associated with the one of the execution environments and forwards the memory access request together with the given encryption environment identifier. Storage circuitry stores a plurality of entries, each associated with an associated encryption environment identifier and an associated memory address. The storage circuitry includes determination circuitry that determines, in at least one enabled mode of operation, whether the given encryption environment identifier differs from the associated encryption environment identifier associated with one of the entries associated with the given memory address.
An apparatus comprises request receiving circuitry to receive a given memory system request specifying a target address in a given physical address space and a target memory encryption context identifier (MECID) indicative of a selected memory encryption context associated with the memory system request. Snoop filtering circuitry determines whether a snoop request is to be transmitted to a given caching agent in response to the given memory system request. The snoop filtering circuitry determines, based on the target MECID of the given memory system request and on snoop filtering information associated with the given caching agent, whether the target MECID is a snoop-not-required MECID for the given caching agent. In response to determining that the target MECID is a snoop-not-required MECID for the given caching agent, the snoop filtering circuitry suppresses transmission of a snoop request to the given caching agent in response to the given memory system request.
Processing circuitry (4) performs data processing in response to instructions. Memory management circuitry (28) controls access to memory based on page table information capable of associating a given page of memory address space with a read-as-X property indicative that reads to an address in the given page of memory address space should be treated as returning a specified value X. In response to determining, for a read request issued to read a read target value for a read target block of memory address space, that at least part of the read target block corresponds to a page associated with the read-as-X property, the memory management circuitry (28) controls the specified value X to be returned to the processing circuitry (4) as at least part of the read target value. This enables large regions of memory address space to be treated as storing a specified value without needing to commit physical memory for those regions.
When sampling a 3D texture using anisotropic filtering, an anisotropy direction along which to take samples in the texture is determined by determining reduced precision representations of the texture coordinate derivative vectors and using the reduced precision texture coordinate derivative vectors to determine a pair of vectors representing the directions of x and y axes for a 2D coordinate system on the plane in the 3D texture defined by the texture coordinate derivative vectors. The x and y axis vectors are used together with the texture coordinate derivative vectors to determine both a X-axis component and a Y-axis component for projected representations of the texture coordinate derivative vectors in the 2D coordinate system on the plane in the 3D texture defined by the texture coordinate derivative vectors. The projected representations of the texture coordinate derivative vectors are then used to determine the anisotropy direction.
An apparatus has cache circuitry providing a cache storage to store data for access by processing circuitry, and request handling circuitry arranged to process requests, each request providing an address indication for associated data. The request handling circuitry determines with reference to the address indication whether the associated data is available in the cache circuitry. The cache circuitry forms a given level of a multi-level memory hierarchy, and the request handling circuitry is responsive to determining that the associated data is unavailable in the cache circuitry to issue an onward request to cause the associated data to be retrieved into the cache circuitry from a lower level of the multi-level memory hierarchy than the given level. Prefetch circuitry issues, as one type of request to be handled by the request handling circuitry, prefetch requests, and the request handling circuitry is arranged in response to a given prefetch request to retrieve into the cache circuitry the associated data in anticipation of that associated data being requested by the processing circuitry. In addition, trigger circuitry, responsive to a specified condition being detected in respect of the given prefetch request, issues a prefetch trigger signal for receipt by control circuitry associated with further cache circuitry at a higher level of the multi-level memory hierarchy, to cause a higher level prefetch procedure to be triggered by the control circuitry to retrieve the associated data from the cache circuitry into the further cache circuitry.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
Disclosed is a method of operating a graphics processor when performing a processing pass that includes an initial “pilot” processing job that executes a respective initial “pilot” shader program that is to be executed in advance of a corresponding “main” shader program that will be executed for a separate “main” processing job within the same processing pass. A “main” processing job is permitted to be issued for processing concurrently with an initial “pilot” processing job on which it depends. To enforce dependencies between “main” and “pilot” shader execution in this case it is tracked whether any initial “pilot” processing jobs are currently being processed by the set of one or more processing cores and processing of tasks for “main” processing jobs is controlled accordingly.
A data consumer device comprises circuitry to identify a plurality of data producer devices from which to receive remotely generated data over a network, and processing circuitry to provide a data processing environment in which to process the remotely generated data. The data consumer device also has measurement circuitry to take a measurement of the data processing environment, and attestation circuitry to provide to the plurality of data producer devices an attestation report based on the measurement of the data processing environment. The attestation report is to provide the data producer devices with a guarantee that the data consumer device will process the remotely generated data in a predetermined manner.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
Circuitry for profiling operations with a data processing apparatus includes sampling circuitry to select a subset of operations within a data processing apparatus as sampled operations to be profiled. The circuitry also includes profiling circuitry to collect a sample record for each operation selected as a sampled operation by the sampling circuitry. There is provided base interval storage circuitry to store an indication of a base interval with which operations are to be sampled. The profiling circuitry supports collection of sample records for multiple monitoring contexts where each monitoring context specifies a context sampling interval as a multiple of the base interval. The profiling circuitry filters, for each monitoring context, the collected sample records based on the respective context sampling interval to produce a series of filtered sample records for the respective monitoring context.
An apparatus comprises instruction decoding circuitry to decode a cryptographic hash instruction specifying at least one working operand and an input operand; and processing circuitry to perform, in response to decoding of the cryptographic hash instruction, two or more iterations of a cryptographic hash function. Each iteration of the cryptographic hash function comprises determining an updated value for the at least one working operand based on a previous value for the at least one working operand and a respective portion of the input operand selected to be processed in that iteration. The updated value for the at least one working operand in one iteration becoming the previous value for the at least one working operand in a next iteration. In response to decoding of the cryptographic hash instruction, the processing circuitry performs at least two iterations of the cryptographic hash function per processing cycle.
There is provided an apparatus, a method, and a storage medium. The apparatus comprises one or more requestor devices to issue transaction requests, and one or more target devices to service those requests. The requestor devices and the target devices are configured to fulfil the requests according to a request ordering protocol specifying an ordered write observation behaviour in which, for each write transaction in a group of ordered write transactions, at least a deferred portion of the write transaction is deferred until all data specified in the group of ordered write transactions preceding the write transaction are observable. When implementing the request ordering protocol, the target devices are responsive to control information taking a first value to dynamically enable the ordered write observation behaviour, and the one or more target devices are responsive to the control information taking a second value to dynamically disable the ordered write observation behaviour.
A data processing system, the data processing system comprising a command processing unit and a processor that is configured to perform processing, the processor comprising: multiple execution units configured to perform processing operations for a type of work; and a control circuit configured to distribute processing tasks to the multiple execution units to cause the multiple execution units to perform processing operations for the type of work in response to asynchronous commands provided to the control circuit by the command processing unit; wherein dependency tracking is compared against an array of counters to indicate dependencies within the array of counters; wherein the indicated dependencies are provided to the control circuit by the command processing unit in the asynchronous commands to indicate for the type of work that dependencies have been resolved or that dependencies exist.
A data processing system, the data processing system comprising a processor that is configured to perform neural network processing, the processor comprising: at least one execution unit configured to perform processing operations for neural network processing; and a control circuit configured to distribute processing tasks to the at least one execution unit to cause the at least one execution unit to perform processing operations for neural network processing in response to a set of indications of neural network processing to be performed provided to the control circuit; wherein the processing tasks are asynchronous and comprise a dependency on at least one other processing task, the set of indications of neural network processing to be performed comprising an indication flag to indicate whether the execution unit can be caused to operate with a dependency on at least one other asynchronous processing task being unresolved.
An apparatus is provided with asynchronous boundary transfer circuitry to transfer data across a clock domain boundary. The asynchronous boundary transfer circuitry has buffer circuitry with buffer storage elements, and source and sink synchronisation circuitry to control the transfer of the data. To initiate a transfer of data items, the source synchronisation circuitry sends a transfer request to the sink synchronisation circuitry indicating that the data items have been stored in one or more buffer storage elements and encoding an indication of one or more elements of destination circuitry targeted by the data items. The sink synchronisation circuitry is responsive to a transfer request to decode an indication of the elements of destination circuitry targeted by the data items, provide incoming data item notifications to elements of destination circuitry, and allow the data items to be read from buffer storage elements indicated by the given transfer request.
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
G06F 1/12 - Synchronisation of different clock signals
Data processing apparatus comprises vector processing circuitry to access an array register having at least n×n storage locations, where n is an integer greater than one, the vector processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry. The instruction decoder circuitry is responsive to an array access instruction, to control the instruction processing circuitry to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register. The array location accessed for a given vector element of the vector is defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
A method for detecting defective pixel values determines an isotropic dispersion difference value by determining a ratio or difference between a pixel error and an isotropic dispersion, where the isotropic dispersion is a measure of how much pixel values in a set of neighbouring pixel values uniformly distributed around the pixel under consideration vary. The method compares the isotropic dispersion difference value to an isotropic threshold. A directional dispersion difference value is found by determining a ratio or difference between a pixel error and a directional dispersion, wherein the directional dispersion is a weighted measure of how much pixel values in a set of neighbouring pixel values around the pixel under consideration in a given direction vary. The directional dispersion difference value is compared to a directional threshold and it is determined that the pixel under consideration is defective based on at least one of the comparison results.
There is provided an apparatus in which trace circuitry generates a trace indicating a series of memory addresses of memory accesses made to a memory. Buffer circuitry performs binning on the memory addresses to produce buffer-circuitry-maintained- frequency-bins that indicate access frequencies of the memory addresses. Processing circuitry, separate from the buffer circuitry, executes a stream of instructions to update processing-circuitry-maintained-frequency-bins to indicate access frequencies of the memory addresses based on the buffer-circuitry-maintained-frequency-bins.
An apparatus is provided that has decoder circuitry to decode instructions of a first instruction set, wherein the decoder circuitry is responsive to instructions of the first instruction set to generate control signals, and processing circuitry responsive to the control signals to cause operations defined by the instructions to be performed. The decoding circuitry is arranged to be responsive to a program-specifying instruction of the first instruction set that specifies a memory location operand and a program operand to issue control signals to the processing circuitry to cause a second instruction set processing unit to be triggered to execute a program identified by the program operand in order to perform a sequence of operations defined by the program on data accessed at a location in memory identified by the memory location operand. The program comprises one or more instructions of the second instruction set defining operations supported by the second instruction set processing unit.
A method for trace generation comprises: obtaining input trace data indicative of a sequence of events occurring during execution of a target program on a processor; providing a query input to a trained generative machine learning model, where the query input is based on the input trace data; and processing the query input using the trained generative machine learning model to generate predicted trace data providing a more detailed representation of the sequence of events than is indicated by the input trace data.
Data processing systems, methods, computer program products, devices, and graphics processors are provided that substantially remove, or reduce, latencies introduced or incurred by a (host) processor e.g. a central processing unit (CPU), during virtualisation, in which virtual machines that are operable to execute on the (host) processor are scheduled or assigned to the graphics processor in a time-slice manner.
A method for compressing data representative of a mapping for use in image processing. The method comprises determining, based on a plurality of mappings representing a look-up table, parameters of a function for transforming a given set of input pixel attribute values into a set of estimated output pixel attribute values. The method comprises, for a plurality of the sets of input pixel attribute values, determining, based on the function and the set of input pixel attribute values, a set of approximate values of the associated set of output pixel attribute values, and determining, based on the associated set of output pixel attribute values and the set of approximate values, a set of residual output pixel attribute values. The method comprises storing data representative of the parameters of the function and data representative of the sets of residual output pixel attribute values.
A method for processing an image comprising image data. The image data comprises pixel intensity values, said pixel intensity values being associated with respective pixel locations. For a plurality of zones of the image, based on a plurality of pixel intensity values in the respective zone of the image, a value of a characteristic pertaining to the plurality of pixel intensity values, is determined. At least a spatial filtering process is performed on data representative of the values of the characteristic for the plurality of zones, to obtain filtered values of an image characteristic at respective locations. The filtered values are interpolated from to determine a local value of the image characteristic at a said pixel location. The determining and/or the interpolating is performed using fixed function circuitry. The spatial filtering process is performed using a programmable processor.
According to one implementation of the present disclosure, an integrated circuit comprises: a memory macro unit including: one or more bitcells of one or more bitcell arrays, where a wordline or a bitline is at least partially disposed within a backside metal layer of the memory macro unit. In one implementation, a method comprises: transmitting, by a first wire of wiring, one or more control signals, where the first wire is disposed at least partially within a back-side metal layer. In one implementation, an integrated circuit comprises: a wire configured to transmit one or more control signals, where the wire is disposed at least partially on a back-side metal layer.
A data processing apparatus is provided. Decode circuitry decodes an instruction in a stream of instructions as a conditional branch instruction. Prediction circuitry performs a prediction of the conditional branch instruction in respect of a flow of the stream of instructions. Training circuitry receives and stores data associated with one or more executions of the conditional branch instruction. Generation circuitry generates the prediction based on the data and filter circuitry performs filtering to disregard a subset of the data, in dependence on whether the prediction is that the conditional branch instruction is of a specific type.
A persistent history buffer may be maintained in training a recurrent neural network such that information from at least one prior group of sequential training parameters within a training sequence is maintained for a subsequent group of training parameters. The persistent history buffer may be provided as an input to the recurrent neural network, and may store a history of a state of the recurrent neural network such as an input, an output and/or the state of a hidden layer. The persistent history buffer may be reset at the end of a sequence of input training parameters, which in a further example may span training input windows and/or batches.
A method of processing streamed image data is provided. A stream of image data is obtained along with information about the location of one or more region of interest in the image data. The method performs a first spatial processing to change a spatial resolution of at least a portion of the streamed image data in dependence upon the location of the region of interest to generate a first processed stream. Image signal processing is performed on the first processed stream to generate a stream of processed image data. A second spatial processing is then performed on the stream of processed image data to generate a second processed stream of image data.
G06V 10/56 - Extraction of image or video features relating to colour
H04N 25/46 - Extracting pixel data from image sensors by controlling scanning circuits, e.g. by modifying the number of pixels sampled or to be sampled by combining or binning pixels
An apparatus is provided. Delegable memory accesses are offloaded to be performed by an external processing apparatus, whereas non-delegable memory accesses are performed locally. Nonetheless, an ordering requirement may still be enforced between them. The apparatus comprises tracking circuitry to maintain tracking information related to delegable memory accesses separately from tracking information related to non-delegable memory accesses. Order enforcement circuitry may enforce an ordering requirement between a non-delegable memory access and a delegable memory access based on a lookup of the tracking information.
G06F 12/0804 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
G06F 12/0873 - Mapping of cache memory to specific storage devices or parts thereof
55.
TECHNIQUE FOR HANDLING DATA ELEMENTS STORED IN AN ARRAY STORAGE
An apparatus is provided comprising processing circuitry to perform operations, instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions, and array storage comprising storage elements to store data elements. The array storage is arranged to store at least one two dimensional array of data elements accessible to the processing circuitry when performing the operations, each two dimensional array of data elements comprising a plurality of vectors of data elements, where each vector is one dimensional. The instruction decoder circuitry is arranged, in response to a move and zero instruction that identifies one or more vectors of data elements of a given two dimensional array of data elements within the array storage, to control the processing circuitry to move the data elements of the one or more identified vectors from the array storage to a destination storage and to set to a logic zero value the storage elements of the array storage that were used to store
An apparatus is provided comprising processing circuitry to perform operations, instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions, and array storage comprising storage elements to store data elements. The array storage is arranged to store at least one two dimensional array of data elements accessible to the processing circuitry when performing the operations, each two dimensional array of data elements comprising a plurality of vectors of data elements, where each vector is one dimensional. The instruction decoder circuitry is arranged, in response to decoding a zero vectors instruction that identifies multiple vectors of data elements of a given two dimensional array of data elements within the array storage, to also decode a subsequent accumulate instruction arranged to operate on the identified multiple vectors of data elements, and to control the processing circuitry to perform a non-accumulating variant of an accumulate operation specified by the accumulate instruction to produce result data elements for storing in the identified multiple vectors within the array storage.
A data processor comprising an execution engine 51 for executing programs for execution threads and one or more caches 48, 49 operable to store data values for use when executing program instructions to perform processing operations for execution threads. The data processor further comprises a thread throttling control unit 54 configured to monitor the operation of the caches 48, 49 during execution of programs for execution threads, and to control the issuing of instructions for execution threads to the execution engine for executing a program based on the monitoring of the operation of the caches during execution of the program.
A method of operating a personal intelligent agent in an ambient computing environment, comprising receiving input; analyzing input to derive a user personal preference; associating the personal preference with a first context indicator; determining whether the personal preference is exposable; responsive to determining that the personal preference is exposable, storing the preference with the associated context indicator; detecting when the agent enters a detectable context and responsively creating a second context indicator; determining if there is a match between the second and the first context indicator; retrieving the exposable personal preference associated with the context indicator; creating an anonymous preference indicator comprising the exposable personal preference with the matched context; emitting the preference indicator over the ambient computing environment; and monitoring the ambient computing environment to detect any broadcast message indicating ability to satisfy the preference shown in the preference indicator.
The present disclosure relates to a data processor for processing data, comprising: a plurality of execution units to execute one or more operations; and a plurality of storage elements to store data for the one or more operations, the data processor being configured to process at least one task, each task to be executed in the form of a directed acyclic graph of operations, wherein each of the operations maps to a corresponding execution unit and each connection between operations in the acyclic graph maps to a corresponding storage element, the data processor further comprising: a plurality of counters; and a control module to control the plurality of counters to: in a first mode, count an operation cycle number associated with each operation of the at least one task, the operation cycle number of an operation being a number of cycles required to complete the operation; and in a second mode, count a unit cycle number associated with one or more execution units, the unit cycle number of an execution unit being an accumulative number of cycles when the execution unit is occupied in use during execution of the at least one task.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; semiconductors; system-on-chip devices; microprocessors; processors [central processing units]; microprocessors in the field of artificial intelligence; neural network processors; electronic chips; application-specific integrated circuits; graphics processing units; semiconductor intellectual property cores; computer interfaces, namely instruction set architectures; printed circuit boards; computer software for integrated circuits; cloud computing software for semiconductors; downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards. Design of semiconductors, microprocessors, system-on-chip devices, processors [central processing units], chips [integrated circuits], application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; semiconductors; system-on-chip devices; microprocessors; processors [central processing units]; microprocessors in the field of artificial intelligence; neural network processors; electronic chips; application-specific integrated circuits; graphics processing units; semiconductor intellectual property cores; computer interfaces, namely instruction set architectures; printed circuit boards; semiconductor devices featuring technology for automotive applications; computer software for integrated circuits; downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards. Design of semiconductors, microprocessors, system-on-chip devices, processors [central processing units], chips [integrated circuits], application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; semiconductor design for automotive technology; research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; Semiconductors; System-on-chip devices; Microprocessors; Processors central processing units; Microprocessors in the field of artificial intelligence; Neural network processors; Electronic chips; Application-specific integrated circuits; Graphics processing units; Semiconductor intellectual property cores; Computer interfaces, namely instruction set architectures; printed circuit boards; Semiconductors, microprocessors, and microprocessors for Internet of Things (IOT) devices; Computer software for integrated circuits; Downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; Electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards Design of semiconductors, microprocessors, system-on-chip devices, processors central processing units, chips, integrated circuits, application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; Research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; Research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; Semiconductors; System-on-chip devices; Microprocessors; Processors central processing units; Microprocessors in the field of artificial intelligence; Neural network processors; Electronic chips; Application-specific integrated circuits; Graphics processing units; Semiconductor intellectual property cores; Computer interfaces, namely instruction set architectures; Printed circuit boards; Computer software for integrated circuits; Semiconductors for use in handheld and mobile devices; Downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; Electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards Design of semiconductors, microprocessors, system-on-chip devices, processors central processing units, chips integrated circuits, application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; Research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; Research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; Semiconductors; System-on-chip devices; Microprocessors; Processors central processing units; Microprocessors in the field of artificial intelligence; Neural network processors; Electronic chips; Application-specific integrated circuits; Graphics processing units; Semiconductor intellectual property cores; Computer interfaces, namely instruction set architectures; Printed circuit boards; Semiconductors for use in automotive applications; Computer software for integrated circuits; Downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; Electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards Design of semiconductors, microprocessors, system-on-chip devices, processors central processing units, chips integrated circuits, application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; Design of semiconductor chips and components for automotive applications; Research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; Semiconductors; System-on-chip devices; Microprocessors; Processors central processing units; Microprocessors in the field of artificial intelligence; Neural network processors; Electronic chips; Application-specific integrated circuits; Graphics processing units; Semiconductor intellectual property cores; Computer interfaces, namely instruction set architectures; Printed circuit boards; Semiconductors for use in computers and laptop devices; Computer software for integrated circuits; Downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; Electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards. Design of semiconductors, microprocessors, system-on-chip devices, processors central processing units, chips, integrated circuits, application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; Research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; Research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors.
66.
DEVICE PERMISSIONS TABLE DEFINING PERMISSIONS INFORMATION FOR A TRANSLATED ACCESS REQUEST
Apparatus, method and code for fabrication of an apparatus. The apparatus comprises address translation circuitry (116) to translate virtual addresses to physical addresses in response to advance address translation requests issued by devices (105) on behalf of software contexts (125). The apparatus also comprises translated access control circuitry (117) to control access to memory (110) in response to translated access requests issued by the devices (105) on behalf of the software contexts (125), based on permissions information defined in a device permission table (220), wherein the corresponding access permissions provide information for checking whether translated access requests from a plurality of software contexts are prohibited.
A spiking neural network is described that comprises a plurality of neurons in a first layer connected to at least one neuron in a second layer, each neuron in the first layer being connected to the at least one neuron in the second layer via a respective variable delay path. The at least one neuron in the second layer comprises one or more logic components configured to generate an output signal in dependence upon signals received along the variable delay paths from the plurality of neurons in the first layer. A timing component is configured to determine a timing value in response to receiving the output signal from the one or more logic components, and an accumulate component is configured to accumulate a value based timing values from the timing component. A neuron fires in a case that a value accumulated at the accumulate component reaches a threshold value.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; semiconductors; system-on-chip devices; microprocessors; processors [central processing units]; microprocessors in the field of artificial intelligence; neural network processors; electronic chips; application-specific integrated circuits; graphics processing units; semiconductor intellectual property cores; computer interfaces, namely instruction set architectures; printed circuit boards; computer software for integrated circuits; semiconductors for handheld and mobile devices; downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards. Design of semiconductors, microprocessors, system-on-chip devices, processors [central processing units], chips [integrated circuits], application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; semiconductors; system-on-chip devices; microprocessors; processors [central processing units]; microprocessors in the field of artificial intelligence; neural network processors; electronic chips; application-specific integrated circuits; graphics processing units; semiconductor intellectual property cores; computer interfaces, namely instruction set architectures; printed circuit boards; semiconductors, microprocessors for Internet of Things (IOT) devices; computer software for integrated circuits; downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards. Design of semiconductors, microprocessors, system-on-chip devices, processors [central processing units], chips [integrated circuits], application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; semiconductors; system-on-chip devices; microprocessors; processors [central processing units]; microprocessors in the field of artificial intelligence; neural network processors; electronic chips; application-specific integrated circuits; graphics processing units; semiconductor intellectual property cores; computer interfaces, namely instruction set architectures; printed circuit boards; semiconductors for computers and laptop devices; computer software for integrated circuits; downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards. Design of semiconductors, microprocessors, system-on-chip devices, processors [central processing units], chips [integrated circuits], application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Integrated circuits; Semiconductors; System-on-chip devices; Microprocessors; Processors central processing units; Microprocessors in the field of artificial intelligence; Neural network processors; Electronic chips; Application-specific integrated circuits; Graphics processing units; Semiconductor intellectual property cores; Computer interfaces, namely instruction set architectures; Printed circuit boards; Downloadable computer software for integrated circuits; Downloadable cloud computing software for semiconductors; Downloadable computer operating software and libraries for machine learning, deep learning, and artificial intelligence hardware platforms in the field of semiconductors; Electronic downloadable materials, namely, electronic downloadable instruction and development manuals, datasheets and brochures, all in the area of design and development of integrated circuits, microprocessors, microprocessor cores, macro cells, microcontrollers, bus interfaces, and printed circuit boards Design of semiconductors, microprocessors, system-on-chip devices, processors central processing units, chips, integrated circuits, application-specific integrated circuits, graphics processing units, machine learning processors and semiconductor cores; Research, development, and design relating to computer hardware for semiconductor intellectual property, instruction set architectures, microprocessors; Research, development and design, all relating to computer software used in, and for use in the design, verification and construction of microprocessors, processors, microcontrollers, microprocessor design files, semiconductor intellectual property cores, computer hardware accelerators, neural network processors and machine learning processors
A graphics processing system that is operable to perform ray tracing using micromaps is disclosed. A tree representation of a micromap is generated, and when it is desired to determine whether and/or how a ray interacts with a sub-region of a primitive, the tree representation of the micromap is traversed to determine a property value for the sub-region of the primitive.
An apparatus is described having processing circuitry to perform vector processing operations, a set of vector registers, and an instruction decoder to decode vector instructions to control the processing circuitry to perform the required operations. The instruction decoder is responsive to a given vector memory access instruction specifying a plurality of memory access operations, where each memory access operation is to be performed to access an associated data element, to determine, from a data vector indication field of the given vector memory access instruction, at least one vector register in the set of vector registers associated with a plurality of data elements, and to determine, from at least one capability vector indication field of the given vector memory access instruction, a plurality of vector registers in the set of vector registers containing a plurality of capabilities. Each capability is associated with one of the data elements in the plurality of data elements and provides an address indication and constraining information constraining use of that address indication when accessing memory. The number of vector registers determined from the at least one capability vector indication field is greater than the number of vector registers determined from the data vector indication field. The instruction decoder controls the processing circuitry: to determine, for each given data element in the plurality of data elements, a memory address based on the address indication provided by the associated capability, and to determine whether the memory access operation to be used to access the given data element is allowed in respect of that determined memory address having regard to the constraining information of the associated capability; and to enable performance of the memory access operation for each data element for which the memory access operation is allowed.
An apparatus has processing circuitry (16) to perform data processing, and instruction decoding circuitry (10) to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths. The instruction decoding circuitry and the processing circuitry support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements. In response to the sub-vector-supporting instruction, the instruction decoding circuitry controls the processing circuitry to perform an operation for the given vector at sub-vector granularity. Each sub-vector has an equal sub-vector length.
The present techniques relate to voltage droop detection and there is disclosed circuitry for detecting a voltage droop event, the circuitry configured to: receive a clock signal from a clock distribution network; obtain, from a storage, a first predetermined value, a second predetermined value and a predetermined threshold count; obtain one or more measurement values associated with a system voltage; when a first measurement value of the one or more measurement values reaches the first predetermined value, initiate a count of clock cycles until a subsequent measurement value of the one or more measurement values reaches the second predetermined value, the second predetermined value being different from the first predetermined value; and when the count of clock cycles is lower than the predetermined threshold count, cause a control entity to take mitigation action.
G01R 19/165 - Indicating that current or voltage is either above or below a predetermined value or within or outside a predetermined range of values
G06F 1/30 - Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
The present techniques relate to mitigating droop conditions over state transitions in systems having dynamic voltage and frequency scaling and there is disclosed a method of controlling a dynamic voltage and frequency scaling circuit, comprising: initiating a transition from a first voltage and frequency state to a second voltage and frequency state; switching activity from a first nominal source to a first fallback source; retuning the first nominal source to become a second fallback source at the second voltage and frequency state; switching activity from the first fallback source to the second fallback source; retuning the first fallback source to become a second nominal source at the second voltage and frequency state; and switching activity from the second fallback source to the second nominal source.
The present techniques relate to mitigating droop conditions in systems having dynamic voltage and frequency scaling and there is disclosed a method of controlling a dynamic voltage and frequency scaling circuit, comprising: detecting a voltage droop relative to a current nominal voltage and frequency state; responsive to said current nominal voltage and frequency state having a corresponding fallback state in a safe operating zone of voltage and frequency, switching activity from a nominal source to a fallback source; and when a fallback to a safe operating zone is unavailable for said current nominal voltage and frequency state, pausing activity of the dynamic voltage and frequency scaling circuit.
The present techniques relate to monitoring of operating parameters at a circuit and disclose a method comprising: receiving, at a delay monitor from a power delivery network, a voltage signal representative of the voltage level of the voltage; receiving, at the delay monitor from a clock distribution network, a clock signal representative of an output clock of the clock distribution network; periodically generating, at the delay monitor, a measurement value responsive to the voltage signal and the clock signal; adjusting, at the delay monitor, a threshold level for the measurement value from a first threshold to a second threshold, where the second threshold level corresponds to a target voltage level; providing, from the delay monitor to the clock distribution network, a non-violation signal responsive to the measurement value reaching the second threshold.
The present techniques relate to a clock control scheme(s) and discloses circuitry for providing a clock signal to a sub-system of a processor, the circuitry comprising: a first clock selection stage to receive clock signals from a plurality of clock sources and, responsive to one or more first control signals, provide first and second clock signals to a second selection stage; a second selection component at the second selection stage to, responsive to one or more second control signals, select one of the first and second clock signals and output the selected clock signal as a mitigated clock signal.
The present techniques relate to a clock control scheme and related methods and circuitry in a system comprising one or more processor cores and there is disclosed a control state machine in a clock controller circuit comprises a sender operable to signal a request to a subordinate state machine and to store a request sent indicator in a store; a first receiver operable to receive an acknowledgement indicator signalled by the subordinate state machine and to clear the request sent indicator in the store; a delay component operable to hold the control state machine in a wait state; a second receiver operable to receive a request complete indicator signalled by the subordinate state machine; and the delay component responsive to receipt of the request complete indicator to release the control state machine from the wait state.
The present techniques relate to monitoring of a clock signal at a circuit and disclose a method comprising: receiving, at a delay monitor, a gateable clock signal; analysing, by the delay monitor, the clock signal to generate a measurement value, wherein the measurement value is responsive to the clock signal and/or a voltage; and comparing, by the delay monitor, the measurement value with a threshold; and storing the measurement value for further analysis when the comparison does not meet the threshold; or discarding the measurement value when the measurement value meets the threshold.
One or more lighting components are projected onto pixel locations of a rendered image with sampling locations set off from pixel location centers according to associated jitter vectors. The sampled image is denoised in way that preserves the associated jitter vectors, and may be performed separately for different lighting components. The denoised image is processed using upsampling and/or temporal antialiasing, using the associated jitter vectors, to an image format having a spatial resolution at least as high as the denoised image.
Various implementations described herein are directed to a device having a write circuit that provides data for storage. The device may include a memory circuit that stores the data in leaky bitcells with capacitive elements that gradually discharge over a pre-determined period of time. The device may include a read circuit that enables the leaky bitcells to operate as one or more memory storage elements. The device may include a query circuit that identifies matches between a query data and output data provided by the read circuit.
In a data processing system, a command stream provided to a processing resource to cause the processing resource to perform a processing task for an application executing on a host processor comprises a sequence of commands for execution by the processing resource to cause the processing resource to perform the processing operations for the processing task and one or more data save indicators that indicate data that is to be saved. In response to the processing resource receiving a request to suspend processing of the processing task, data indicated by one of the one or more data save indicators in the command stream is stored in memory.
Docket No. ARM-054-042PCT P08009EP.family/P07754EP.family 2 ABSTRACT Applications of improved detection of behavior from multi-variable data are provided. The applications are each described by a method that enables an evaluation of the degree of compatibility between more than one variable. The methods begin with inputs in a form of data. The inputs flow into an encoding step where encoders are configured to look for patterns in the inputs. A predicting step utilizes predictors to predict the next set of inputs. An energy function performs a comparing step that compares the predicted next set of inputs with data to determine if the predicted next set of inputs are compatible with one another or not. The comparison is used to detect that a certain behavior has occurred.
The present techniques relate to a method(s) and circuit(s) for implementing a voltage droop response and discloses a method of responding to a voltage droop in an electronic circuit; the method comprising: switching activity, in response to a voltage droop event, from a nominal clock source to a fallback clock source after a predetermined delay in time according to a programmable delay value.
Various implementations described herein are directed to a device having an array of bitcells with a first bitcell disposed adjacent to a second bitcell. The device may have a first wordline coupled to first transistors in the first bitcell, and the device may have a second wordline coupled to second transistors in the second bitcell. Also, the device may have a buried ground line coupled to the first transistors and the second transistors.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elementsStorage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
An apparatus has bridge circuitry to communicate transport packets between a transport network and data processing circuitry. Some or all of the data processing circuitry operates in a given reset domain other than the reset domain of the transport network. The apparatus also has packet tracking circuitry to monitor the transport packets received at the bridge circuitry and to track whether any required responses have been provided. The reset handling circuitry is responsive to a reset request for the given reset domain to: cause the bridge circuitry to reject new transport packets received for the domain; and when the packet tracking circuitry indicates that all required responses have been provided, accept the reset request for the given reset domain and allow the data processing circuitry to carry out a reset for the data processing circuitry operating in the given reset domain.
There is described a method of monitoring an electronic circuit voltage droop response; the method comprising: switching activity, in response to a voltage droop event, from a nominal clock source to a fallback clock source; and, optionally, switching activity, in response to a voltage recovery event, from a fallback clock source to a nominal clock source; wherein a voltage recovery event comprises a predetermined duration according to a configurable delay value without a voltage droop. The method further comprises at least one of: measuring a number of instances of switching from a nominal clock source to a fallback clock source; measuring an actual duration during which activity proceeds according to the fallback clock source without a voltage droop; measuring a fallback duration during which activity proceeds according to the fallback clock source; and measuring a number of instances of switching from a fallback clock source to a nominal clock source occurring during a voltage droop. Finally, the method may comprise modifying the switching to optimise activity efficiency based on the measuring. There is also described an electronic circuit configured to monitor voltage droop response of another electronic circuit according to the method.
The present techniques relate to a method and circuitry for determining system characteristics of an electronic circuit and there is disclosed a delay monitor circuit to characterise an electronic circuit comprising: a delay line that quantifies the delay within a clock cycle; the delay line comprising a plurality of sampling points therealong; wherein, in a first mode, the delay monitor is configured to capture delay statistics over a given measurement period; and wherein, in a second mode, the delay monitor is configured to capture a measurement value from the plurality of sampling points, wherein the measurement value is indicative of one or more characteristics of the electronic circuit.
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
91.
Temperature and Voltage Profiling Computer Systems and Methods
According to one implementation of the present disclosure, a method of profiling the temperature and voltage across different locations within a processor is disclosed. The method includes: in a first stage, determining respective first and second voltage sensitivity coefficients and respective first and second temperature sensitivity coefficients corresponding to a pair of ring oscillators; and in a second stage, determining a voltage deviation and a temperature deviation from a predetermined reference voltage and a predetermined reference temperature respectively, based on the determined respective first and second voltage sensitivity coefficients and the determined respective first and second temperature sensitivity coefficients.
Various implementations described herein are directed to a method that acquires operating frequencies for a first set of ring oscillators disposed in a first integrated circuit, determines one or more first coefficients and a first constant for each ring oscillator in the first set, and determines a correlation between each of the first coefficients and the first constant. Also, the method may acquire a single operating frequency for each of a second set of ring oscillators in a second integrated circuit at a single pre-determined temperature so as to determine a second constant, predict one or more second coefficients for each ring oscillator in the second set based on the second constant and the correlation, and derive a temperature dependence based on the single operating frequency using the one or more second coefficients and the second constant for each of the second set of ring oscillators.
G01K 7/20 - Measuring temperature based on the use of electric or magnetic elements directly sensitive to heat using resistive elements the element being a linear resistance, e.g. platinum resistance thermometer in a specially-adapted circuit, e.g. bridge circuit
G01K 7/24 - Measuring temperature based on the use of electric or magnetic elements directly sensitive to heat using resistive elements the element being a non-linear resistance, e.g. thermistor in a specially-adapted circuit, e.g. bridge circuit
93.
EVALUATING PERFORMANCE OF A DROOP MITIGATION SCHEME
The present techniques relate to droop mitigation scheme and there is disclosed a method of evaluating the performance of a droop mitigation scheme, wherein the method is carried out at a circuit, the method comprising: receiving a clock output signal, wherein the droop mitigation scheme has been used to generate the clock output signal; and analysing the clock output signal to generate an output, wherein the output provides an indication of the performance of the droop mitigation scheme.
A computer implemented method for processing instructions in a multiprocessing apparatus comprises obtaining a first instruction of a first process; decoding the first instruction to detect a continuation indicator associated with the first instruction; determining whether or not to enforce the continuation indicator; and when it is determined to enforce the continuation indicator: continuing to execute the first process until completion of the first instruction and at least a next sequential second instruction of the first process. The continuation may temporarily suppress a normal eviction process based on a fairness algorithm, for example.
The present techniques relate to droop detection and there is disclosed circuitry for detecting a voltage droop event, the circuitry configured to: receive a clock signal from a clock distribution network; monitor a measurement value associated with a voltage provided to the circuitry by a power delivery network; provide a first droop detection signal to the clock distribution network in response to the measurement value reaching a first threshold value; cause a control entity to take a first mitigation action in response to the first droop detection signal; provide a second droop detection signal to the clock distribution network in response to the measurement value reaching a second threshold value different to the first threshold value; and cause the control entity to take a second mitigation action in response to the second droop detection signal.
A tile-based graphics processor performs first and second processing passes to generate a render output. The first processing pass generates and writes out information representative of a set of bounding boxes, and the second processing pass uses the bounding box information to determine which primitives to process for which rendering tiles.
A tile-based graphics processor performs first and second processing passes to generate a render output. The first processing pass generates data that is used in the second processing pass to determine which primitives to process for which rendering tiles. The first processing pass is performed by a geometry processing control unit assembling primitives, and one or more programmable processing units transforming geometry data defining the primitives, and processing the transformed geometry data to generate the data.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to process pixel values sampled from a multi color channel imaging device. In particular, methods and/or techniques to process pixel samples for interpolating pixel values for one or more color channels.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06T 7/90 - Determination of colour characteristics
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/50 - Extraction of image or video features by performing operations within image blocksExtraction of image or video features by using histograms, e.g. histogram of oriented gradients [HoG]Extraction of image or video features by summing image-intensity valuesProjection analysis
An apparatus for improving the tracking of streams of memory accesses for training a stride prefetcher is provided, comprising a training data structure storing entries for training a stride prefetcher, a given entry specifying: a stride offset, a target address, a program counter address, and a bypass indicator indicating whether a program counter match condition is to be bypassed for the given entry; and training control circuitry to determine whether to update the stride offset for the given entry of the training data structure to specify a current stride between a target address of a current memory access and the target address for the last memory access of the tracked stream, in which the determination by the training control circuitry is controlled to be either dependent on a determination of whether the program counter match condition is satisfied or independent of whether the program counter match condition is satisfied, based on the bypass indicator.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
Apparatuses, methods, systems, chip containing products, and computer readable media are disclosed. An apparatus comprises dispatch circuitry to receive instructions, and to identify linear chains of instructions each comprising a first instruction and one or more further instructions, which are temporarily ineligible for execution due to a dependence on an immediately preceding instruction. The apparatus further comprises offline storage circuitry. The dispatch circuitry is configured, for each of the linear chains: to dispatch the sequentially first instruction to the issue circuitry and to retain the one or more further instructions in the offline storage circuitry until a chain trigger signal is received, the chain trigger signal indicating that a previously dispatched instruction, on which a sequentially next instruction depends, has satisfied a predefined issuing condition. In response to receipt of the chain trigger signal, the dispatch circuitry is configured to dispatch the sequentially next instruction to the issue circuitry.