According to one implementation, a circuit includes a first digital gate (108A) and a timing offset circuit portion (238) coupled to the first digital gate (108A) that includes one or more tri-state inverters (202A . . . 202N) where a capacitance at an output of the first digital gate (108A) is based on a quantity of enabled tri-state inverters of the one or more tri-state inverters (202A-202N).
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
H03K 19/094 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits using specified components using semiconductor devices using field-effect transistors
H03K 19/0948 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits using specified components using semiconductor devices using field-effect transistors using MOSFET using CMOS
2.
Systems, Methods, and Devices of Droop Detector Circuitry
According to one implementation, a circuit includes a droop detection element (112) including voltage sensitive gates (114A-N) and at least two delay elements (118A, 118B) where each delay element of the at least two delay elements is a non-inverting gate or a non-inverting gate combination. The at least two delay elements (118A, 118B) are configured to provide a delay to the droop detection element (112), where at least a first delay element (118A) is a first voltage-threshold (VT)-type, at least a second delay element (118B) is a second voltage-threshold (VT)-type, and the first and the second VT-types are different.
H03K 5/133 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals using a chain of active-delay devices
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
3.
REDUCED POWER CONSUMPTION PREDICTION USING PREDICTION TABLES
An apparatus comprises a predictor to make a prediction based on a plurality of prediction tables. The plurality of prediction tables are looked up using table lookup information generated based on different lengths of input history information representing a path through program execution. The apparatus comprises circuitry to prevent a given portion of the input history information from differing with respect to a corresponding portion of the input history information used to make a preceding prediction, where the given portion is a portion which is not used to generate table lookup information for an active subset of the prediction tables.
An image processing method processes color image data. The applies an operator to a representative pixel value to generate a transformed pixel value. A controlled gain is determined so that: if applying a gain based on the transformed pixel value to a maximum color channel value of the pixel will generate a color channel value that is below a threshold value, the controlled gain is determined based on the transformed pixel value and the representative pixel value, and if applying a gain based on the transformed pixel value to the maximum color channel value will generate a color channel value that is above the threshold value, the controlled gain is determined such that the maximum color channel value is mapped to a value representable by a predetermined number of bits. The controlled gain is applied to each of the color channel values.
According to one implementation, a computer system includes processing unit circuitry of one or more computer devices that include a plurality of transistors configured to a first voltage threshold, and digital voltage sensor circuitry (160) of the one or more computer devices that include at least a delay line circuit (126) including one or more digital gates (120A-N). Each of the one or more digital gates (120A-N) includes driving transistors configured to a second voltage threshold, where the digital voltage sensor circuit (160) is configured to predict voltage droop of the processing unit circuitry.
H03K 5/134 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals using a chain of active-delay devices with field-effect transistors
G01R 19/165 - Indicating that current or voltage is either above or below a predetermined value or within or outside a predetermined range of values
G01R 19/25 - Arrangements for measuring currents or voltages or for indicating presence or sign thereof using digital measurement techniques
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
According to one implementation, a pulse generator circuit includes a flip-flop receiving an input clock signal, and one or more delay elements where the circuit is configured to adjust a pulse width of an output clock signal independent of a clock period of the input clock signal. According to another implementation, a pulse generator circuit includes a flip-flop receiving an input clock signal, and one or more delay elements where the circuit is configured to adjust a pulse width of an output clock signal. The flip-flop is configured to transmit a flip-flop output signal to the one or more delay elements where the flip-flop output signal includes a state of the flip-flop. The one or more delay elements are configured to delay the flip-flop output signal by a delay period and transmit the delayed flip-flop output to a reset input.
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
H03K 3/017 - Adjustment of width or dutycycle of pulses
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
7.
DEBUGGING INSTRUCTION EXECUTION ERRORS IN A SIMULATED COMPUTER SYSTEM
A method to debug instruction execution errors in a simulated computer system is provided. The method includes generating two separate simulations of the same system and causing a code including a set of instructions to execute on the two separate simulations. The computer implemented method further includes performing an efficient trace operation starting from a start instruction to an end instruction of the set of instructions on the two separate simulations. When trace operation is performed, an instruction execution deviation is identified between the code executed in the two separate simulations by comparing checksum values at a reporting frequency, determining that the comparison of the checksum values indicates a mismatch, and using instruction count and the reporting frequency to capture at least one instruction leading up to the instruction execution deviation.
A method and a system for dynamically deriving and verifying a measure of a computing environment is presented. The proposed method and system are used to reliably verify measurements of the computing environment. The method includes receiving a dataset recorded by an untrusted source describing elements used to create a computing system operating in a computing environment, receiving attestation evidence generated by a trusted source including an initial measurement value describing the elements of the computing system, deriving a measurement value based on the received dataset, and performing a verification process on a measurement of the computing environment. The verification process is performed by comparing the derived measurement value with the measurement value of the attestation evidence. In response to the comparison of the derived measurement value with the measure value of the attestation evidence being equal, trustworthiness of the computing environment is determined.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
9.
Method and Apparatus for Efficient Packing of Flow Control Units
Messages and data are dynamically selected for packing into an information packet for transmission across a communication link of the data processing network. The number of messages (zero or more) of each message kind to be packed is determined based, at least in part, on the number of slots available to be packed, on the number of pending messages of each kind and on a dynamically determined priority setting. The priority may be user controlled, dependent on input backpressure and/or dependent on target loading, for example. The number of messages of each message kind to be packed may be determined using a hardware lookup table.
A technique is provided for performing a computation equivalent to applying a shift to an input value to generate an output value. Mask generation circuitry is used to generate an N-bit mask in dependence on a provided shift amount indication. N is a number of possible bit positions that a given bit of the input value may be located within the output value after the shift is performed. The mask generation circuitry performs N independent logical operations on bits forming the shift amount indication, each logical operation producing a mask bit value for a corresponding bit position of the N-bit mask, and the N logical operations being arranged such that, for any given shift amount indication, only one bit position in the generated N-bit mask will have its mask bit value indicating a set state. Output value generation circuitry is used to apply the N-bit mask to the given bit of the input value in order to determine a corresponding location of the given bit within the output value, and to determine a location within the output value of each other bit of the input value in dependence on the corresponding location of the given bit.
G06F 7/76 - Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
11.
TECHNIQUE FOR HANDLING ORDERING CONSTRAINED ACCESS OPERATIONS
Processing circuitry is provided to perform operations, along with instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions. A set of registers is used to hold data values for access by the processing circuitry. The instruction decoder circuitry is responsive to an ordering constrained access instruction used to access multiple data values, and providing register indication information and memory address information, to control the processing circuitry to perform a sequence of access operations, where each access operation causes a data value from amongst the multiple data values to be moved between an associated register determined from the register indication information and an associated memory address determined from the memory address information. Further, an ordering indication is derived from the ordering constrained access instruction and used to determine an order in which the multiple data values are to be accessed when performing the sequence of access operations, to thereby ensure that observability conditions required when implementing the ordering constrained access instruction are met.
A graphics processing system comprises a programmable processing unit operable to execute processing programs for execution threads corresponding to work items to be processed, and storage 74 in which a respective storage region can be allocated for temporary use by a respective group of execution threads corresponding to a group of work items being executed by the programmable processing unit while the group of execution threads are being executed. 10 Respective indicators (e.g. clear bits) are provided to indicate when respective regions of the storage are to be cleared.
An apparatus for address translation is provided in order to translate virtual addresses used by devices in a data processing system into physical addresses for accessing memory. In accordance with the techniques disclosed herein, state tracking circuitry is provided to maintain the state of a page table entry that specifies such address translations. The state can be used to assess whether or not the address translation entry is worth storing in an address translation cache provided for faster access of previously used address translations. Accordingly, the techniques disclosed herein allow for more efficient use of the limited capacity available in the address translation cache as well as additional uses of a page table entry's state.
G06F 12/1045 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
There is provided an apparatus in which processing circuitry performs processing in one of a fixed number of at least two domains, one of the domains being subdivided into a variable number of execution environments. Memory translation circuitry, in response to a memory access request to a given memory address, determines a given encryption environment identifier associated with the one of the execution environments and forwards the memory access request together with the given encryption environment identifier. Storage circuitry stores a plurality of entries, each associated with an associated encryption environment identifier and an associated memory address. The storage circuitry includes determination circuitry that determines, in at least one enabled mode of operation, whether the given encryption environment identifier differs from the associated encryption environment identifier associated with one of the entries associated with the given memory address.
An apparatus comprises request receiving circuitry to receive a given memory system request specifying a target address in a given physical address space and a target memory encryption context identifier (MECID) indicative of a selected memory encryption context associated with the memory system request. Snoop filtering circuitry determines whether a snoop request is to be transmitted to a given caching agent in response to the given memory system request. The snoop filtering circuitry determines, based on the target MECID of the given memory system request and on snoop filtering information associated with the given caching agent, whether the target MECID is a snoop-not-required MECID for the given caching agent. In response to determining that the target MECID is a snoop-not-required MECID for the given caching agent, the snoop filtering circuitry suppresses transmission of a snoop request to the given caching agent in response to the given memory system request.
An apparatus has cache circuitry providing a cache storage to store data for access by processing circuitry, and request handling circuitry arranged to process requests, each request providing an address indication for associated data. The request handling circuitry determines with reference to the address indication whether the associated data is available in the cache circuitry. The cache circuitry forms a given level of a multi-level memory hierarchy, and the request handling circuitry is responsive to determining that the associated data is unavailable in the cache circuitry to issue an onward request to cause the associated data to be retrieved into the cache circuitry from a lower level of the multi-level memory hierarchy than the given level. Prefetch circuitry issues, as one type of request to be handled by the request handling circuitry, prefetch requests, and the request handling circuitry is arranged in response to a given prefetch request to retrieve into the cache circuitry the associated data in anticipation of that associated data being requested by the processing circuitry. In addition, trigger circuitry, responsive to a specified condition being detected in respect of the given prefetch request, issues a prefetch trigger signal for receipt by control circuitry associated with further cache circuitry at a higher level of the multi-level memory hierarchy, to cause a higher level prefetch procedure to be triggered by the control circuitry to retrieve the associated data from the cache circuitry into the further cache circuitry.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
Disclosed is a method of operating a graphics processor when performing a processing pass that includes an initial “pilot” processing job that executes a respective initial “pilot” shader program that is to be executed in advance of a corresponding “main” shader program that will be executed for a separate “main” processing job within the same processing pass. A “main” processing job is permitted to be issued for processing concurrently with an initial “pilot” processing job on which it depends. To enforce dependencies between “main” and “pilot” shader execution in this case it is tracked whether any initial “pilot” processing jobs are currently being processed by the set of one or more processing cores and processing of tasks for “main” processing jobs is controlled accordingly.
Processing circuitry (4) performs data processing in response to instructions. Memory management circuitry (28) controls access to memory based on page table information capable of associating a given page of memory address space with a read-as-X property indicative that reads to an address in the given page of memory address space should be treated as returning a specified value X. In response to determining, for a read request issued to read a read target value for a read target block of memory address space, that at least part of the read target block corresponds to a page associated with the read-as-X property, the memory management circuitry (28) controls the specified value X to be returned to the processing circuitry (4) as at least part of the read target value. This enables large regions of memory address space to be treated as storing a specified value without needing to commit physical memory for those regions.
When sampling a 3D texture using anisotropic filtering, an anisotropy direction along which to take samples in the texture is determined by determining reduced precision representations of the texture coordinate derivative vectors and using the reduced precision texture coordinate derivative vectors to determine a pair of vectors representing the directions of x and y axes for a 2D coordinate system on the plane in the 3D texture defined by the texture coordinate derivative vectors. The x and y axis vectors are used together with the texture coordinate derivative vectors to determine both a X-axis component and a Y-axis component for projected representations of the texture coordinate derivative vectors in the 2D coordinate system on the plane in the 3D texture defined by the texture coordinate derivative vectors. The projected representations of the texture coordinate derivative vectors are then used to determine the anisotropy direction.
An apparatus comprises instruction decoding circuitry to decode a cryptographic hash instruction specifying at least one working operand and an input operand; and processing circuitry to perform, in response to decoding of the cryptographic hash instruction, two or more iterations of a cryptographic hash function. Each iteration of the cryptographic hash function comprises determining an updated value for the at least one working operand based on a previous value for the at least one working operand and a respective portion of the input operand selected to be processed in that iteration. The updated value for the at least one working operand in one iteration becoming the previous value for the at least one working operand in a next iteration. In response to decoding of the cryptographic hash instruction, the processing circuitry performs at least two iterations of the cryptographic hash function per processing cycle.
There is provided an apparatus, a method, and a storage medium. The apparatus comprises one or more requestor devices to issue transaction requests, and one or more target devices to service those requests. The requestor devices and the target devices are configured to fulfil the requests according to a request ordering protocol specifying an ordered write observation behaviour in which, for each write transaction in a group of ordered write transactions, at least a deferred portion of the write transaction is deferred until all data specified in the group of ordered write transactions preceding the write transaction are observable. When implementing the request ordering protocol, the target devices are responsive to control information taking a first value to dynamically enable the ordered write observation behaviour, and the one or more target devices are responsive to the control information taking a second value to dynamically disable the ordered write observation behaviour.
A data processing system, the data processing system comprising a command processing unit and a processor that is configured to perform processing, the processor comprising: multiple execution units configured to perform processing operations for a type of work; and a control circuit configured to distribute processing tasks to the multiple execution units to cause the multiple execution units to perform processing operations for the type of work in response to asynchronous commands provided to the control circuit by the command processing unit; wherein dependency tracking is compared against an array of counters to indicate dependencies within the array of counters; wherein the indicated dependencies are provided to the control circuit by the command processing unit in the asynchronous commands to indicate for the type of work that dependencies have been resolved or that dependencies exist.
A data processing system, the data processing system comprising a processor that is configured to perform neural network processing, the processor comprising: at least one execution unit configured to perform processing operations for neural network processing; and a control circuit configured to distribute processing tasks to the at least one execution unit to cause the at least one execution unit to perform processing operations for neural network processing in response to a set of indications of neural network processing to be performed provided to the control circuit; wherein the processing tasks are asynchronous and comprise a dependency on at least one other processing task, the set of indications of neural network processing to be performed comprising an indication flag to indicate whether the execution unit can be caused to operate with a dependency on at least one other asynchronous processing task being unresolved.
An apparatus is provided with asynchronous boundary transfer circuitry to transfer data across a clock domain boundary. The asynchronous boundary transfer circuitry has buffer circuitry with buffer storage elements, and source and sink synchronisation circuitry to control the transfer of the data. To initiate a transfer of data items, the source synchronisation circuitry sends a transfer request to the sink synchronisation circuitry indicating that the data items have been stored in one or more buffer storage elements and encoding an indication of one or more elements of destination circuitry targeted by the data items. The sink synchronisation circuitry is responsive to a transfer request to decode an indication of the elements of destination circuitry targeted by the data items, provide incoming data item notifications to elements of destination circuitry, and allow the data items to be read from buffer storage elements indicated by the given transfer request.
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
G06F 1/12 - Synchronisation of different clock signals
Data processing apparatus comprises vector processing circuitry to access an array register having at least n×n storage locations, where n is an integer greater than one, the vector processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry. The instruction decoder circuitry is responsive to an array access instruction, to control the instruction processing circuitry to access, for a vector of n vector elements, a set of n storage locations each having a respective array location in the array register. The array location accessed for a given vector element of the vector is defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction.
A method for detecting defective pixel values determines an isotropic dispersion difference value by determining a ratio or difference between a pixel error and an isotropic dispersion, where the isotropic dispersion is a measure of how much pixel values in a set of neighbouring pixel values uniformly distributed around the pixel under consideration vary. The method compares the isotropic dispersion difference value to an isotropic threshold. A directional dispersion difference value is found by determining a ratio or difference between a pixel error and a directional dispersion, wherein the directional dispersion is a weighted measure of how much pixel values in a set of neighbouring pixel values around the pixel under consideration in a given direction vary. The directional dispersion difference value is compared to a directional threshold and it is determined that the pixel under consideration is defective based on at least one of the comparison results.
According to one implementation of the present disclosure, an integrated circuit comprises: a memory macro unit including: one or more bitcells of one or more bitcell arrays, where a wordline or a bitline is at least partially disposed within a backside metal layer of the memory macro unit. In one implementation, a method comprises: transmitting, by a first wire of wiring, one or more control signals, where the first wire is disposed at least partially within a back-side metal layer. In one implementation, an integrated circuit comprises: a wire configured to transmit one or more control signals, where the wire is disposed at least partially on a back-side metal layer.
A data processing apparatus is provided. Decode circuitry decodes an instruction in a stream of instructions as a conditional branch instruction. Prediction circuitry performs a prediction of the conditional branch instruction in respect of a flow of the stream of instructions. Training circuitry receives and stores data associated with one or more executions of the conditional branch instruction. Generation circuitry generates the prediction based on the data and filter circuitry performs filtering to disregard a subset of the data, in dependence on whether the prediction is that the conditional branch instruction is of a specific type.
A persistent history buffer may be maintained in training a recurrent neural network such that information from at least one prior group of sequential training parameters within a training sequence is maintained for a subsequent group of training parameters. The persistent history buffer may be provided as an input to the recurrent neural network, and may store a history of a state of the recurrent neural network such as an input, an output and/or the state of a hidden layer. The persistent history buffer may be reset at the end of a sequence of input training parameters, which in a further example may span training input windows and/or batches.
A method of processing streamed image data is provided. A stream of image data is obtained along with information about the location of one or more region of interest in the image data. The method performs a first spatial processing to change a spatial resolution of at least a portion of the streamed image data in dependence upon the location of the region of interest to generate a first processed stream. Image signal processing is performed on the first processed stream to generate a stream of processed image data. A second spatial processing is then performed on the stream of processed image data to generate a second processed stream of image data.
G06V 10/56 - Extraction of image or video features relating to colour
H04N 25/46 - Extracting pixel data from image sensors by controlling scanning circuits, e.g. by modifying the number of pixels sampled or to be sampled by combining or binning pixels
A method for trace generation comprises: obtaining input trace data indicative of a sequence of events occurring during execution of a target program on a processor; providing a query input to a trained generative machine learning model, where the query input is based on the input trace data; and processing the query input using the trained generative machine learning model to generate predicted trace data providing a more detailed representation of the sequence of events than is indicated by the input trace data.
Data processing systems, methods, computer program products, devices, and graphics processors are provided that substantially remove, or reduce, latencies introduced or incurred by a (host) processor e.g. a central processing unit (CPU), during virtualisation, in which virtual machines that are operable to execute on the (host) processor are scheduled or assigned to the graphics processor in a time-slice manner.
A method for compressing data representative of a mapping for use in image processing. The method comprises determining, based on a plurality of mappings representing a look-up table, parameters of a function for transforming a given set of input pixel attribute values into a set of estimated output pixel attribute values. The method comprises, for a plurality of the sets of input pixel attribute values, determining, based on the function and the set of input pixel attribute values, a set of approximate values of the associated set of output pixel attribute values, and determining, based on the associated set of output pixel attribute values and the set of approximate values, a set of residual output pixel attribute values. The method comprises storing data representative of the parameters of the function and data representative of the sets of residual output pixel attribute values.
A method for processing an image comprising image data. The image data comprises pixel intensity values, said pixel intensity values being associated with respective pixel locations. For a plurality of zones of the image, based on a plurality of pixel intensity values in the respective zone of the image, a value of a characteristic pertaining to the plurality of pixel intensity values, is determined. At least a spatial filtering process is performed on data representative of the values of the characteristic for the plurality of zones, to obtain filtered values of an image characteristic at respective locations. The filtered values are interpolated from to determine a local value of the image characteristic at a said pixel location. The determining and/or the interpolating is performed using fixed function circuitry. The spatial filtering process is performed using a programmable processor.
An apparatus is provided. Delegable memory accesses are offloaded to be performed by an external processing apparatus, whereas non-delegable memory accesses are performed locally. Nonetheless, an ordering requirement may still be enforced between them. The apparatus comprises tracking circuitry to maintain tracking information related to delegable memory accesses separately from tracking information related to non-delegable memory accesses. Order enforcement circuitry may enforce an ordering requirement between a non-delegable memory access and a delegable memory access based on a lookup of the tracking information.
G06F 12/0804 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
G06F 12/0873 - Mapping of cache memory to specific storage devices or parts thereof
36.
TECHNIQUE FOR HANDLING DATA ELEMENTS STORED IN AN ARRAY STORAGE
An apparatus is provided comprising processing circuitry to perform operations, instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions, and array storage comprising storage elements to store data elements. The array storage is arranged to store at least one two dimensional array of data elements accessible to the processing circuitry when performing the operations, each two dimensional array of data elements comprising a plurality of vectors of data elements, where each vector is one dimensional. The instruction decoder circuitry is arranged, in response to a move and zero instruction that identifies one or more vectors of data elements of a given two dimensional array of data elements within the array storage, to control the processing circuitry to move the data elements of the one or more identified vectors from the array storage to a destination storage and to set to a logic zero value the storage elements of the array storage that were used to store
An apparatus is provided comprising processing circuitry to perform operations, instruction decoder circuitry to decode instructions to control the processing circuitry to perform the operations specified by the instructions, and array storage comprising storage elements to store data elements. The array storage is arranged to store at least one two dimensional array of data elements accessible to the processing circuitry when performing the operations, each two dimensional array of data elements comprising a plurality of vectors of data elements, where each vector is one dimensional. The instruction decoder circuitry is arranged, in response to decoding a zero vectors instruction that identifies multiple vectors of data elements of a given two dimensional array of data elements within the array storage, to also decode a subsequent accumulate instruction arranged to operate on the identified multiple vectors of data elements, and to control the processing circuitry to perform a non-accumulating variant of an accumulate operation specified by the accumulate instruction to produce result data elements for storing in the identified multiple vectors within the array storage.
A data processor comprising an execution engine 51 for executing programs for execution threads and one or more caches 48, 49 operable to store data values for use when executing program instructions to perform processing operations for execution threads. The data processor further comprises a thread throttling control unit 54 configured to monitor the operation of the caches 48, 49 during execution of programs for execution threads, and to control the issuing of instructions for execution threads to the execution engine for executing a program based on the monitoring of the operation of the caches during execution of the program.
A method of operating a personal intelligent agent in an ambient computing environment, comprising receiving input; analyzing input to derive a user personal preference; associating the personal preference with a first context indicator; determining whether the personal preference is exposable; responsive to determining that the personal preference is exposable, storing the preference with the associated context indicator; detecting when the agent enters a detectable context and responsively creating a second context indicator; determining if there is a match between the second and the first context indicator; retrieving the exposable personal preference associated with the context indicator; creating an anonymous preference indicator comprising the exposable personal preference with the matched context; emitting the preference indicator over the ambient computing environment; and monitoring the ambient computing environment to detect any broadcast message indicating ability to satisfy the preference shown in the preference indicator.
The present disclosure relates to a data processor for processing data, comprising: a plurality of execution units to execute one or more operations; and a plurality of storage elements to store data for the one or more operations, the data processor being configured to process at least one task, each task to be executed in the form of a directed acyclic graph of operations, wherein each of the operations maps to a corresponding execution unit and each connection between operations in the acyclic graph maps to a corresponding storage element, the data processor further comprising: a plurality of counters; and a control module to control the plurality of counters to: in a first mode, count an operation cycle number associated with each operation of the at least one task, the operation cycle number of an operation being a number of cycles required to complete the operation; and in a second mode, count a unit cycle number associated with one or more execution units, the unit cycle number of an execution unit being an accumulative number of cycles when the execution unit is occupied in use during execution of the at least one task.
Apparatus, method and code for fabrication of an apparatus. The apparatus comprises address translation circuitry (116) to translate virtual addresses to physical addresses in response to advance address translation requests issued by devices (105) on behalf of software contexts (125). The apparatus also comprises translated access control circuitry (117) to control access to memory (110) in response to translated access requests issued by the devices (105) on behalf of the software contexts (125), based on permissions information defined in a device permission table (220), wherein the corresponding access permissions provide information for checking whether translated access requests from a plurality of software contexts are prohibited.
A spiking neural network is described that comprises a plurality of neurons in a first layer connected to at least one neuron in a second layer, each neuron in the first layer being connected to the at least one neuron in the second layer via a respective variable delay path. The at least one neuron in the second layer comprises one or more logic components configured to generate an output signal in dependence upon signals received along the variable delay paths from the plurality of neurons in the first layer. A timing component is configured to determine a timing value in response to receiving the output signal from the one or more logic components, and an accumulate component is configured to accumulate a value based timing values from the timing component. A neuron fires in a case that a value accumulated at the accumulate component reaches a threshold value.
A graphics processing system that is operable to perform ray tracing using micromaps is disclosed. A tree representation of a micromap is generated, and when it is desired to determine whether and/or how a ray interacts with a sub-region of a primitive, the tree representation of the micromap is traversed to determine a property value for the sub-region of the primitive.
An apparatus is described having processing circuitry to perform vector processing operations, a set of vector registers, and an instruction decoder to decode vector instructions to control the processing circuitry to perform the required operations. The instruction decoder is responsive to a given vector memory access instruction specifying a plurality of memory access operations, where each memory access operation is to be performed to access an associated data element, to determine, from a data vector indication field of the given vector memory access instruction, at least one vector register in the set of vector registers associated with a plurality of data elements, and to determine, from at least one capability vector indication field of the given vector memory access instruction, a plurality of vector registers in the set of vector registers containing a plurality of capabilities. Each capability is associated with one of the data elements in the plurality of data elements and provides an address indication and constraining information constraining use of that address indication when accessing memory. The number of vector registers determined from the at least one capability vector indication field is greater than the number of vector registers determined from the data vector indication field. The instruction decoder controls the processing circuitry: to determine, for each given data element in the plurality of data elements, a memory address based on the address indication provided by the associated capability, and to determine whether the memory access operation to be used to access the given data element is allowed in respect of that determined memory address having regard to the constraining information of the associated capability; and to enable performance of the memory access operation for each data element for which the memory access operation is allowed.
An apparatus has processing circuitry (16) to perform data processing, and instruction decoding circuitry (10) to control the processing circuitry to perform the data processing in response to decoding of program instructions defined according to a scalable vector instruction set architecture supporting vector instructions operating on vectors of scalable vector length to enable the same instruction sequence to be executed on apparatuses with hardware supporting different maximum vector lengths. The instruction decoding circuitry and the processing circuitry support a sub-vector-supporting instruction which treats a given vector as comprising a plurality of sub-vectors with each sub-vector comprising a plurality of vector elements. In response to the sub-vector-supporting instruction, the instruction decoding circuitry controls the processing circuitry to perform an operation for the given vector at sub-vector granularity. Each sub-vector has an equal sub-vector length.
One or more lighting components are projected onto pixel locations of a rendered image with sampling locations set off from pixel location centers according to associated jitter vectors. The sampled image is denoised in way that preserves the associated jitter vectors, and may be performed separately for different lighting components. The denoised image is processed using upsampling and/or temporal antialiasing, using the associated jitter vectors, to an image format having a spatial resolution at least as high as the denoised image.
Various implementations described herein are directed to a device having a write circuit that provides data for storage. The device may include a memory circuit that stores the data in leaky bitcells with capacitive elements that gradually discharge over a pre-determined period of time. The device may include a read circuit that enables the leaky bitcells to operate as one or more memory storage elements. The device may include a query circuit that identifies matches between a query data and output data provided by the read circuit.
In a data processing system, a command stream provided to a processing resource to cause the processing resource to perform a processing task for an application executing on a host processor comprises a sequence of commands for execution by the processing resource to cause the processing resource to perform the processing operations for the processing task and one or more data save indicators that indicate data that is to be saved. In response to the processing resource receiving a request to suspend processing of the processing task, data indicated by one of the one or more data save indicators in the command stream is stored in memory.
The present techniques relate to voltage droop detection and there is disclosed circuitry for detecting a voltage droop event, the circuitry configured to: receive a clock signal from a clock distribution network; obtain, from a storage, a first predetermined value, a second predetermined value and a predetermined threshold count; obtain one or more measurement values associated with a system voltage; when a first measurement value of the one or more measurement values reaches the first predetermined value, initiate a count of clock cycles until a subsequent measurement value of the one or more measurement values reaches the second predetermined value, the second predetermined value being different from the first predetermined value; and when the count of clock cycles is lower than the predetermined threshold count, cause a control entity to take mitigation action.
G01R 19/165 - Indicating that current or voltage is either above or below a predetermined value or within or outside a predetermined range of values
G06F 1/30 - Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
The present techniques relate to mitigating droop conditions over state transitions in systems having dynamic voltage and frequency scaling and there is disclosed a method of controlling a dynamic voltage and frequency scaling circuit, comprising: initiating a transition from a first voltage and frequency state to a second voltage and frequency state; switching activity from a first nominal source to a first fallback source; retuning the first nominal source to become a second fallback source at the second voltage and frequency state; switching activity from the first fallback source to the second fallback source; retuning the first fallback source to become a second nominal source at the second voltage and frequency state; and switching activity from the second fallback source to the second nominal source.
The present techniques relate to mitigating droop conditions in systems having dynamic voltage and frequency scaling and there is disclosed a method of controlling a dynamic voltage and frequency scaling circuit, comprising: detecting a voltage droop relative to a current nominal voltage and frequency state; responsive to said current nominal voltage and frequency state having a corresponding fallback state in a safe operating zone of voltage and frequency, switching activity from a nominal source to a fallback source; and when a fallback to a safe operating zone is unavailable for said current nominal voltage and frequency state, pausing activity of the dynamic voltage and frequency scaling circuit.
The present techniques relate to monitoring of operating parameters at a circuit and disclose a method comprising: receiving, at a delay monitor from a power delivery network, a voltage signal representative of the voltage level of the voltage; receiving, at the delay monitor from a clock distribution network, a clock signal representative of an output clock of the clock distribution network; periodically generating, at the delay monitor, a measurement value responsive to the voltage signal and the clock signal; adjusting, at the delay monitor, a threshold level for the measurement value from a first threshold to a second threshold, where the second threshold level corresponds to a target voltage level; providing, from the delay monitor to the clock distribution network, a non-violation signal responsive to the measurement value reaching the second threshold.
The present techniques relate to a clock control scheme(s) and discloses circuitry for providing a clock signal to a sub-system of a processor, the circuitry comprising: a first clock selection stage to receive clock signals from a plurality of clock sources and, responsive to one or more first control signals, provide first and second clock signals to a second selection stage; a second selection component at the second selection stage to, responsive to one or more second control signals, select one of the first and second clock signals and output the selected clock signal as a mitigated clock signal.
The present techniques relate to a clock control scheme and related methods and circuitry in a system comprising one or more processor cores and there is disclosed a control state machine in a clock controller circuit comprises a sender operable to signal a request to a subordinate state machine and to store a request sent indicator in a store; a first receiver operable to receive an acknowledgement indicator signalled by the subordinate state machine and to clear the request sent indicator in the store; a delay component operable to hold the control state machine in a wait state; a second receiver operable to receive a request complete indicator signalled by the subordinate state machine; and the delay component responsive to receipt of the request complete indicator to release the control state machine from the wait state.
The present techniques relate to monitoring of a clock signal at a circuit and disclose a method comprising: receiving, at a delay monitor, a gateable clock signal; analysing, by the delay monitor, the clock signal to generate a measurement value, wherein the measurement value is responsive to the clock signal and/or a voltage; and comparing, by the delay monitor, the measurement value with a threshold; and storing the measurement value for further analysis when the comparison does not meet the threshold; or discarding the measurement value when the measurement value meets the threshold.
Various implementations described herein are directed to a device having an array of bitcells with a first bitcell disposed adjacent to a second bitcell. The device may have a first wordline coupled to first transistors in the first bitcell, and the device may have a second wordline coupled to second transistors in the second bitcell. Also, the device may have a buried ground line coupled to the first transistors and the second transistors.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elementsStorage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
An apparatus has bridge circuitry to communicate transport packets between a transport network and data processing circuitry. Some or all of the data processing circuitry operates in a given reset domain other than the reset domain of the transport network. The apparatus also has packet tracking circuitry to monitor the transport packets received at the bridge circuitry and to track whether any required responses have been provided. The reset handling circuitry is responsive to a reset request for the given reset domain to: cause the bridge circuitry to reject new transport packets received for the domain; and when the packet tracking circuitry indicates that all required responses have been provided, accept the reset request for the given reset domain and allow the data processing circuitry to carry out a reset for the data processing circuitry operating in the given reset domain.
There is described a method of monitoring an electronic circuit voltage droop response; the method comprising: switching activity, in response to a voltage droop event, from a nominal clock source to a fallback clock source; and, optionally, switching activity, in response to a voltage recovery event, from a fallback clock source to a nominal clock source; wherein a voltage recovery event comprises a predetermined duration according to a configurable delay value without a voltage droop. The method further comprises at least one of: measuring a number of instances of switching from a nominal clock source to a fallback clock source; measuring an actual duration during which activity proceeds according to the fallback clock source without a voltage droop; measuring a fallback duration during which activity proceeds according to the fallback clock source; and measuring a number of instances of switching from a fallback clock source to a nominal clock source occurring during a voltage droop. Finally, the method may comprise modifying the switching to optimise activity efficiency based on the measuring. There is also described an electronic circuit configured to monitor voltage droop response of another electronic circuit according to the method.
The present techniques relate to a method and circuitry for determining system characteristics of an electronic circuit and there is disclosed a delay monitor circuit to characterise an electronic circuit comprising: a delay line that quantifies the delay within a clock cycle; the delay line comprising a plurality of sampling points therealong; wherein, in a first mode, the delay monitor is configured to capture delay statistics over a given measurement period; and wherein, in a second mode, the delay monitor is configured to capture a measurement value from the plurality of sampling points, wherein the measurement value is indicative of one or more characteristics of the electronic circuit.
H03K 5/14 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of delay lines
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
60.
Temperature and Voltage Profiling Computer Systems and Methods
According to one implementation of the present disclosure, a method of profiling the temperature and voltage across different locations within a processor is disclosed. The method includes: in a first stage, determining respective first and second voltage sensitivity coefficients and respective first and second temperature sensitivity coefficients corresponding to a pair of ring oscillators; and in a second stage, determining a voltage deviation and a temperature deviation from a predetermined reference voltage and a predetermined reference temperature respectively, based on the determined respective first and second voltage sensitivity coefficients and the determined respective first and second temperature sensitivity coefficients.
Various implementations described herein are directed to a method that acquires operating frequencies for a first set of ring oscillators disposed in a first integrated circuit, determines one or more first coefficients and a first constant for each ring oscillator in the first set, and determines a correlation between each of the first coefficients and the first constant. Also, the method may acquire a single operating frequency for each of a second set of ring oscillators in a second integrated circuit at a single pre-determined temperature so as to determine a second constant, predict one or more second coefficients for each ring oscillator in the second set based on the second constant and the correlation, and derive a temperature dependence based on the single operating frequency using the one or more second coefficients and the second constant for each of the second set of ring oscillators.
G01K 7/20 - Measuring temperature based on the use of electric or magnetic elements directly sensitive to heat using resistive elements the element being a linear resistance, e.g. platinum resistance thermometer in a specially-adapted circuit, e.g. bridge circuit
G01K 7/24 - Measuring temperature based on the use of electric or magnetic elements directly sensitive to heat using resistive elements the element being a non-linear resistance, e.g. thermistor in a specially-adapted circuit, e.g. bridge circuit
62.
EVALUATING PERFORMANCE OF A DROOP MITIGATION SCHEME
The present techniques relate to droop mitigation scheme and there is disclosed a method of evaluating the performance of a droop mitigation scheme, wherein the method is carried out at a circuit, the method comprising: receiving a clock output signal, wherein the droop mitigation scheme has been used to generate the clock output signal; and analysing the clock output signal to generate an output, wherein the output provides an indication of the performance of the droop mitigation scheme.
A computer implemented method for processing instructions in a multiprocessing apparatus comprises obtaining a first instruction of a first process; decoding the first instruction to detect a continuation indicator associated with the first instruction; determining whether or not to enforce the continuation indicator; and when it is determined to enforce the continuation indicator: continuing to execute the first process until completion of the first instruction and at least a next sequential second instruction of the first process. The continuation may temporarily suppress a normal eviction process based on a fairness algorithm, for example.
A tile-based graphics processor performs first and second processing passes to generate a render output. The first processing pass generates and writes out information representative of a set of bounding boxes, and the second processing pass uses the bounding box information to determine which primitives to process for which rendering tiles.
A tile-based graphics processor performs first and second processing passes to generate a render output. The first processing pass generates data that is used in the second processing pass to determine which primitives to process for which rendering tiles. The first processing pass is performed by a geometry processing control unit assembling primitives, and one or more programmable processing units transforming geometry data defining the primitives, and processing the transformed geometry data to generate the data.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, techniques to process pixel values sampled from a multi color channel imaging device. In particular, methods and/or techniques to process pixel samples for interpolating pixel values for one or more color channels.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06T 7/90 - Determination of colour characteristics
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/50 - Extraction of image or video features by performing operations within image blocksExtraction of image or video features by using histograms, e.g. histogram of oriented gradients [HoG]Extraction of image or video features by summing image-intensity valuesProjection analysis
An apparatus for improving the tracking of streams of memory accesses for training a stride prefetcher is provided, comprising a training data structure storing entries for training a stride prefetcher, a given entry specifying: a stride offset, a target address, a program counter address, and a bypass indicator indicating whether a program counter match condition is to be bypassed for the given entry; and training control circuitry to determine whether to update the stride offset for the given entry of the training data structure to specify a current stride between a target address of a current memory access and the target address for the last memory access of the tracked stream, in which the determination by the training control circuitry is controlled to be either dependent on a determination of whether the program counter match condition is satisfied or independent of whether the program counter match condition is satisfied, based on the bypass indicator.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
Apparatuses, methods, systems, chip containing products, and computer readable media are disclosed. An apparatus comprises dispatch circuitry to receive instructions, and to identify linear chains of instructions each comprising a first instruction and one or more further instructions, which are temporarily ineligible for execution due to a dependence on an immediately preceding instruction. The apparatus further comprises offline storage circuitry. The dispatch circuitry is configured, for each of the linear chains: to dispatch the sequentially first instruction to the issue circuitry and to retain the one or more further instructions in the offline storage circuitry until a chain trigger signal is received, the chain trigger signal indicating that a previously dispatched instruction, on which a sequentially next instruction depends, has satisfied a predefined issuing condition. In response to receipt of the chain trigger signal, the dispatch circuitry is configured to dispatch the sequentially next instruction to the issue circuitry.
Address translation circuitry 16 translates a virtual address specified by a memory access request issued by requester circuitry into a target physical address (PA). Requester-side filtering circuitry 20 performs a granule protection lookup based on the target PA and a selected physical address space (PAS) associated with the memory access request, to determine whether to allow the memory access request to be passed to a cache or interconnect. In the granule protection lookup, the requester-side filtering circuitry obtains granule protection information corresponding to a target granule of physical addresses including the target PA, which indicates at least one allowed PAS associated with the target granule, and blocks the memory access request when the granule protection information indicates that the selected PAS is not an allowed PAS.
G06F 12/14 - Protection against unauthorised use of memory
G06F 12/0808 - Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
G06F 12/1045 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
Cache invalidation circuitry responds to a cache invalidation command specifying invalidation scope information indicative of at least one invalidation condition, to control a cache to perform an invalidation process to invalidate cache entries satisfying the invalidation condition(s). Cache lookup circuitry issues to the cache a cache lookup request specifying address information, to request that the cache returns a cache lookup response. Cache lookup response filtering circuitry is responsive to a given hit-indicating cache lookup response which provides cached information and invalidation qualifying information returned from a corresponding valid cache entry, to determine whether the given hit-indicating cache lookup response conflicts with an in-progress cache invalidation command, based on the invalidation scope information specified by the in-progress cache invalidation command and the invalidation qualifying information, and when conflict is detected, causes the given hit-indicating cache lookup response to be treated as a miss-indicating cache lookup response.
A method and system for processing image data having a first bit depth using at least one trained neural network configured to operate on data having a second bit depth, where the second bit depth is smaller than the first bit depth by generating a plurality of image data portions by splitting the image data. Each of the plurality of image data portions is encoded to produce a plurality of encoded image data portions having the second bit depth. The plurality of image data portions are then processed by at least one trained neural network, before being decoded and combined to produce composite image data. The composite image data is then output.
When performing texture processing operations in a graphics processing system, for a texture processing operation that requires M input texture data elements from an array of texture data elements, each of the M texture data elements is selected from a different set of texture data elements having a different set of positions within the texture data array. The texture processing operation is then performed using the M texture data elements.
When generating a sequence of render outputs using a graphics processor, the completion status of rendering tasks from different render outputs is tracked so that processing tasks for later render outputs in the sequence of outputs can be processed concurrently with processing tasks for earlier render outputs in the sequence of outputs whilst ensuring that any dependencies between the rendering tasks for the different render outputs are enforced. In particular, there is disclosed a mechanism for suspending the sequence of rendering jobs (so that it may subsequently be resumed).
When performing a sequence of rendering jobs, rendering tasks for separate rendering jobs are permitted to overlap within the graphics processor's processing (shader) cores. A record is maintained of which rendering tasks are currently being processed by the graphics processor's processing (shader) cores which record can then be used to enforce any data (processing) dependencies between different rendering jobs.
A method of managing write-after-read (WAR) hazards in a graphics processor. A host processor when preparing a graphics processor command stream can identify possible WAR hazards between rendering jobs for example by detecting layout transitions and insert a suitable barrier into the graphics processor command stream. The graphics processor when encountering such a barrier can then determine whether it is possible to ignore the barrier and allow rendering jobs to be processed concurrently.
A method of operating a graphics processor when performing a certain sequence of rendering jobs that produces a series of progressively lower resolution versions of the same render output comprising issuing rendering tasks for different rendering jobs concurrently and controlling processing for a later rendering job using a respective ‘task completion status’ data structure associated with the earlier rendering job on which it depends, wherein the looking up of respective entries in the ‘task completion status’ data structure takes into account the change in resolution between the first, earlier rendering job and the second, later rendering job.
When performing a texture sampling operation that uses the results of plural texture filtering operations to provide an overall output sampled texture value in a graphics processing system, it is determined whether a texture filtering operation in the set of plural texture filtering operations that are to be performed to provide the overall output sampled texture value can be at least partially merged with another texture filtering operation in the set of texture filtering operations. If so a merged texture filtering operation is performed for the two texture filtering operations, with the result of the merged texture filtering operation being used when providing the overall output sampled texture value.
When generating a sequence of render outputs using a graphics processor, the completion status of rendering tasks for different render outputs is tracked so that processing tasks for later render outputs in the sequence of outputs can be processed concurrently with processing tasks for earlier render outputs in the sequence of outputs whilst ensuring that any dependencies between the rendering tasks are enforced.
There is provided an apparatus, system, chip-containing product, method, and storage medium. The apparatus comprises memory access circuitry responsive to one or more types of memory access request, to retrieve specified data items from memory. The apparatus is also provided with local storage circuitry configured to store at least some of the retrieved data items. The local storage circuitry is N-way associative, and N is greater than 1. The apparatus is also provided with control circuitry responsive to an indication that an access request signalled to the local storage circuitry relating to an accessed data item corresponds to a predefined type of memory access request, to implement a restrictive access policy in relation to the accessed data item in the local storage circuitry. The restrictive access policy excludes at least one step of accessing an excluded subset of ways of the local storage circuitry.
When generating a sequence of render outputs using a graphics processor, the completion status of rendering tasks from different render outputs is tracked so that processing tasks for later render outputs in the sequence of outputs can be processed concurrently with processing tasks for earlier render outputs in the sequence of outputs whilst ensuring that any dependencies between the rendering tasks for the different render outputs are enforced.
A method of preparing a command stream for a parallel processor, comprising: analysing the command stream to detect at least a first dependency; generating at least one timeline dependency point responsive to detecting the first dependency; determining a latest action for the first dependency to derive a completion stream timeline point for the first dependency; comparing the completion stream timeline point for the first dependency with a completion stream timeline point for a second dependency to determine a latest stream timeline point; generating at least one command stream synchronization control instruction according to the latest stream timeline point; and providing the command stream and the at least one command stream synchronization control instruction to an execution unit of the parallel processor.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to enhance a rendered image. In an implementation, a process to enhance a portion of a rendered image may be affected based, at least in part, on a shading rate applied in rendering the portion of the rendered image.
When preparing and storing primitive lists in a tile-based graphics processing system, one or more primitive list pointer arrays store pointers, each pointer indicating a location in storage of one or more of the primitive lists. A further pointer array stores further pointers, each further pointer indicating a location in storage of one or more of the primitive list pointer arrays.
An apparatus is provided for varying paths from power sources to components in order to inhibit side channel attacks. The power source provides power. The circuit component consumes the power to perform a function and a power grid provides a plurality of redundant paths by which the power can flow from between the circuit component and one of a power source and ground, to perform the function. The power grid is dynamically selects at least one active path of the redundant paths through which the power flows to perform the function.
H02J 3/00 - Circuit arrangements for ac mains or ac distribution networks
G06F 21/72 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information in cryptographic circuits
85.
PROCESSING ELEMENT CONFIGURED TO APPROXIMATE A TRANSCENDENTAL FUNCTION
A processing element is configured to approximate a transcendental function. The processing element comprises an input storage and a look-up storage. The processing element obtains floating-point input data from the input storage representing having an input exponent value and an input mantissa value. The processing element looks up approximation parameters and an output exponent value from the look-up storage, wherein each group of approximation parameters and output exponent value are stored in the look-up storage in association with a respective range of a plurality of ranges that are defined by the input exponent value and the input mantissa value. The ranges cover values of the input exponent value and input mantissa value such that the output exponent value associated with each range does not change by more than a predetermined number. An approximation function is evaluated that approximates the transcendental function based on the looked-up approximation parameters and output exponent.
A first image frame in a first resolution format comprising image signal intensity values mapped to first pixel locations has sampling points offset from the centers of first pixel locations according to associated jitter vectors. The image signal intensity values are mapped to second pixel locations in a second image frame in a second resolution format based at least in part on the jitter vectors, the second resolution format being higher resolution than the first resolution format. The mapped image signal intensity values are combined with image signal intensity values of an accumulated history image buffer according to coefficients predicted by a neural network such as based on magnitudes of the jitter vectors. Interpolated pixel image intensities are added to the accumulated history buffer for empty or null pixel locations in the second image frame in the second image format. Upsampling artifacts such as checkerboarding and aliasing are reduced.
A memory instance comprises a plurality of banks of storage cells to store data values, and input/output circuitry shared between the plurality of banks for receiving write data or outputting read data. Each bank of storage cells supports a power saving mode and an operational mode. A control interface receives power control signals for controlling use of the power saving mode. Bank power control circuitry individually controls, for each of a plurality of subsets of banks of storage cells within the same memory instance, whether that subset of banks is in the power saving mode based on the power control signals. For at least one setting for the power control signals, one subset of banks is in the power saving mode while another subset of banks in the same memory instance is in the operational mode. Also disclosed is power control circuitry which selects the power mode to use for each subset of banks and generates the power control signals.
According to the present techniques there is provided a method of operating a data processor unit to generate processing tasks. The data processor unit comprises a control circuit configured to receive, from a host processor unit, a request for the data processor unit to perform processing jobs and to generate a workload for each job. Each workload comprises one or more tasks. The data processor unit further comprises first and second execution units to process the workloads. The method comprises: receiving, at the control circuit, a request to perform first and second processing jobs; generating, at the control circuit in response to the request, a primary workload for the first processing job, and a secondary workload for the second processing job; generating, at the control circuit, one or more operation instructions to control processing of the primary and/or secondary workloads at the first and/or second execution units; processing, at the first execution unit, the primary workload in accordance with the operation instructions; and processing, at the second execution unit, the secondary workload in parallel with the primary workload in accordance with the operation instructions.
Various implementations described herein are directed to a device having a bank of bitcells split into a plurality of portions including a first row slice of the bitcells and a second row slice of the bitcells. Also, the device may have control circuitry configured to access and repair a first bitcell in the first row slice with a first row address and a second bitcell in the second row slice with a second row address that is different than the first row address.
System emulation of a floating-point dot product operation can be performed without directly performing the arithmetic by decomposing the Addend into a constituent sign, an exponent, and a fractional part; performing inverse scaling of the Addend by subtracting a scaling exponent (LSCALE) of a scaling of a negative power of two from the exponent to calculate an inverse-scaled addend; comparing a corresponding fractional part of the inverse-scaled addend with notional exponents of the most significant bit (MSB) and the least significant bit (LSB) of a fixed point accumulator to determine which of three cases have been encountered; and adding particular values representing the Addend to the calculation result according to which of the three cases have been encountered. The three cases include the inverse-scaled addend being able to be exactly accumulated into the fixed-point accumulator and the scenarios where the inverse-scaled addend is either too large or too small to be exactly accumulated into the fixed-point accumulator.
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state deviceMethods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
Various implementations described herein are directed to a device having a power-gate structure with multiple transistors including a first transistor and a second transistor. The first transistor may be coupled between a first voltage node and a second voltage node, and the second transistor may be coupled between the second voltage node and a third voltage node that is coupled to the second voltage node.
A data processing apparatus includes pointer storage configured to store pointer values for pointers. Increment circuitry, responsive to one or more increment events, increments each of the pointer values in dependence on a corresponding live pointer value update condition from corresponding live pointer value update conditions. The corresponding live pointer value update condition is different for each of the pointers. History storage circuitry stores resolved behaviours of instances of a control flow instruction, each of the resolved behaviours being associated with one of the pointers. At least one of the live pointer value update conditions is changeable at runtime. Consequently, storage can be reduced as compared to a situation where all pointer value update conditions are active.
An apparatus comprises a plurality of interfaces, each couplable to a respective one of a plurality of processing circuitries either in a higher criticality compliance state or a lower criticality compliance state. Each interface can receive from its respective processing circuitry interrupt signals destined to a target processing circuitry of the plurality of processing circuitries and transmit to its respective processing circuitry interrupt signals issued by a source processing circuitry of the plurality of processing circuitries. Control circuitry monitors the flow of the interrupt signals and determines whether the flow of interrupt signals exhibits a discrepancy with respect to an expected flow of interrupt signals, and performs a mitigation action in respect of said discrepancy to avoid violation of the higher criticality compliance state.
An apparatus is provided for improving the use of multiple-issue operations in a data processor. A variable-issue operation can be recognised is being either a single-issue operation or a multiple-issue operation in dependence on the state of the program at runtime. If a variable-issue operation can be scheduled as a multiple-issue operation, then other operations can be scheduled for performance in the same cycle, when they would have otherwise had to be scheduled for a later cycle. As such, more operations can be performed in fewer cycles thus improving code density and improving data processing performance.
Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to adapt a neural network structure to a target platform. One or more performance metrics of an execution of the neural network structure may be implemented by one or more target hardware elements. A module from a library of modules may be selected to replace one or more elements of the neural network structure based, at least in part, on the observed one or more performance metrics.
Disclosed are devices and/or processes to process image frames expressed in part by different lighting components, such as lighting components generated using ray tracing. In an embodiment, different lighting components of a previous image frame may be separately warped and combined with like lighting components in a current image frame.
A data processing apparatus is provided. It includes history storage circuitry that stores historic data of instructions and prediction circuitry that predicts a historic datum of a specific instruction based on subsets of the historic data of the instructions. The history storage circuitry overwrites the historic data of one of the instructions to form a corrupted instruction datum and at least one of the subsets of the historic data of the instructions includes the corrupted historic datum.
Barcelona Supercomputing Center - Centro Nacional de Supercomputación (Spain)
Inventor
Siracusa, Marco
Randall, Joshua
Joseph, Douglas James
Moretó Planas, Miquel
Armejach Sanosa, Adrià
Abstract
A data structure marshalling unit for a processor comprises data structure traversal circuitry to perform data structure traversal processing according to a dataflow architecture. The data structure traversal circuitry comprises two or more layers of traversal circuit units, each layer comprising two or more parallel lanes of traversal circuit units. Each traversal circuit unit triggers loading, according to a programmable iteration range, of at least one stream of elements of at least one data structure from data storage circuitry. For at least one programmable setting for the data structure traversal circuitry, the programmable iteration range for a given traversal circuit unit in a downstream layer is dependent on one or more elements of the at least one stream of elements loaded by at least one traversal circuit unit in an upstream layer. Output interface circuitry outputs to the data storage circuitry at least one vector of elements loaded by respective traversal circuit units in a given active layer of the data structure traversal circuitry.
Various implementations described herein are directed to a device having first transistors arranged as cross-coupled inverters coupled between a disconnect node and ground. The device may have second transistors arranged as passgates coupled between the cross-coupled inverters and bitlines. The device may have third transistors coupled between a voltage supply and the disconnect node.
G11C 11/412 - Digital stores characterised by the use of particular electric or magnetic storage elementsStorage elements therefor using electric elements using semiconductor devices using transistors forming cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
A data processing apparatus is provided. It includes first history storage circuitry that stores control flow information of control flow instructions. Second history storage circuitry stores a subset of the control flow information by considering a subset of the control flow instructions. Prediction circuitry produces a prediction for a specific one of the control flow instructions based on the subset of the control flow information and power control circuitry performs a determination of an extent to which the subset of the control flow information matches the control flow information and disables the prediction circuitry in dependence on a result of the determination.