Blocks of a video frame to be excluded from a motion-compensated operation are identified, by processing pixel values of a first frame to characterize blocks of the pixels as representing a portion of an object. Difference values between blocks of the first frame and blocks of a second frame are determined, processed difference values characterizing blocks of the first frame as representing an image component that is static between the first and second frames. A score is generated for each block of the first frame indicating a confidence level that the block represents a static image component. Blocks of the first frame are identified as protected blocks which (i) represent a portion of an object; and (ii) represent an image component that is static between the first and second frames. A dilating kernel is applied to blocks with a score indicating a low confidence level and characterizing each block within the kernel as not representing a static image component.
G06V 10/98 - Detection or correction of errors, e.g. by rescanning the pattern or by human interventionEvaluation of the quality of the acquired patterns
G06V 20/40 - ScenesScene-specific elements in video content
2.
Processing Fragments Which Have A Shader-Dependent Property In A Graphics Processing System
A graphics processing system includes hidden surface removal logic and processing logic for processing fragments. An early depth test is performed on a first fragment with the hidden surface removal logic using a depth buffer, the first fragment having a shader-dependent property. In response to the first fragment passing the early depth test, the processing logic determines the property of the first fragment. After the determination of the property of the first fragment, a late depth test is performed on the first fragment with the hidden surface removal logic using the depth buffer. After performing the early depth test on the first fragment but before the late depth test is performed on the first fragment, an early depth test is performed on a second fragment with the hidden surface removal logic, wherein the second fragment does not have a shader-dependent property.
A decoder for decoding a texel according to the Adaptive Scalable Texture Compression (ASTC) format, is configured to select a colour endpoint mode (CEM) of a plurality of different CEMs and generate a plurality of input values for inputting to multiple inputs of a logic circuit. The input values are generated such that the logic circuit will generate an intermediate output value for calculating colour endpoints in accordance with the selected CEM. For different CEMs, the decoder generates a different plurality of input values for inputting to the same inputs of the multiple inputs of the logic circuit. The logic circuit is configured to operate on the plurality of input values so as to generate the at least one intermediate output value; determine a colour endpoint pair in accordance with the selected CEM in dependence on the at least one intermediate output value; and decode the texel in dependence on the colour endpoint pair.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
A method of preventing unauthorized access to uninitialized memory. Registers are grouped into blocks, each of which has a corresponding validity bit. When data is written to a block of memory the validity bit is set to valid. A read function reads both the register data and the validity bit but if the validity bit is set to invalid dummy values are output. Once a program is complete, or before a fresh program the validity bits are reset to invalid.
A graphics processing unit (GPU) comprises a plurality of geometry pipelines and a tiling back-end module. The geometry pipelines receive batches of primitives of a sequence of primitives. Each pipeline has geometry processing modules configured to perform geometry processing functions on the primitives of a batch. A tiling front-end module determines, for each tile of a set of tiles, tile-primitive indications indicating which of the primitives of the batch of primitives received at the geometry pipeline are present within that tile. The tiling back-end module is configured to: receive the tile-primitive indications determined by the plurality of geometry pipelines; and for each of the tiles for which a tile-primitive indication is received, include indications of the primitives that are present within that tile in a control stream for that tile in an order in accordance with an order of the primitives within the sequence of primitives.
A method for compressing work item coordinate data for work items in a work group and sending the data across an interface between a computation requesting unit and a computation sequencing unit. A work item valid mask is created in dependence on the number and positions of work items in the work group, the work item valid mask indicating valid work items in the work group. A first swizzle mask indicates which bits of a swizzle index for each work item in the work group correspond to the value of a first coordinate for that work item. A second swizzle mask indicates which bits of the swizzle index for each work item in the work group correspond to the value of a second coordinate for that work item.
Compressed work item coordinate data for work items in a work group across an interface between a computation requesting unit and a computation sequencing unit is decompressed. A work item valid mask indicates valid work items in the work group. A swizzle index is computed for each valid work item in the work group. A first swizzle mask indicates which bits of the swizzle index for each work item correspond to the value of a first coordinate for that work item. A second swizzle mask indicates which bits of the swizzle index for each work item correspond to the value of a second coordinate for that work item. First coordinates for each valid work item in dependence on the first swizzle mask and the swizzle index for that item, and second coordinates for each valid work item in dependence on the second swizzle mask and the swizzle index computed for that item are computed.
A filtering unit of a processing unit applies filtering to sequences of input values to determine output values. A control block allocates each of the sequences to a sequencer defining a sequence of operations of a filtering process to be performed on the sequence of input values allocated to that sequencer. A datapath block processes values for the operations to generate results of the operations as part of the filtering process. An arbiter controls access to the datapath block according to prioritization rules, where each operation has a priority in accordance with those rules. Operations of a first set of operations have a high priority, operations of a second set of intermediate operations which do not involve input values and which determine intermediate result values rather than determining output values have a medium priority, and operations of a third set of operations have a low priority, wherein the third set of operations comprises output operations which determine output values.
A filtering unit applies filtering methods to input values to determine output values. A plurality of inputs receive input values, signal values, and filter coefficients. The signal values define a filtering mode and the filter coefficients correspond to a filtering method. A computation pipeline receives two input values and a corresponding filter coefficient, and performs an interpolation using the input values and the filter coefficient. Registers store intermediate output values generated by the computation pipeline. Signal values, a volumetric filter coefficient, and an input value are received, and it is determined i) that a filtering mode defined by the received signal values comprises volumetric filtering and at least one other filtering method and ii) that the volumetric filter coefficient is equal to zero. In response to determining i) and ii) the filtering unit stores the received input value in a register of the plurality of registers.
A hardware module for performing dot product operations includes receiver circuitry receiving a first vector and a second vector, each comprising at least two elements of a binary encoded integer. Logic generates an array of partial products of N rows of bits for a dot product operation between the first vector and the second vector. Grouping circuitry groups bits of the elements of the second vector into a binary number, wherein each binary number is associated with a respective row of the N rows of bits, and selector circuitry selects a partial product value for each of the N rows of bits based on the binary number that is associated with the respective row, such that one partial product is generated per binary number. The hardware module also comprises adder circuitry configured to perform adding the N rows of bits together to compute an output associated with the dot product operation between the first and second vectors.
A contained region facilitating determining whether a ray intersects an object of a scene in a ray tracing system is generated, wherein the object is contained within finite bounding regions forming an object partitioning hierarchy. The volume inside the finite bounding regions is partitioned into voxels categorised by identifying a subset of boundary voxels that lie within extents of a geometry defined by the object and which intersect with the object's contiguous surface. An occlusion utility metric comprises a component quantifying a maximum number of boundary voxels lying in a contiguous chain that intersect the contiguous surface of the object. A boundary voxel is selected to be a candidate voxel for transformation into a contained region. An expanded volume of the candidate voxel is generated through at least one dimension to obtain an expanded voxel contained within, and smaller than, the extents of the geometry defined by the object. The expanded voxel is allocated as a contained region.
A contained region for use in a ray tracing system is generated, the contained region facilitating determining whether a ray intersects an object of a scene, the object being contained within finite bounding regions which form part of an object partitioning hierarchy. The volume inside the finite bounding regions is partitioned into voxels, which are categorised by identifying a subset of internal voxels that are contained within extents of a geometry defined by the object, and determining an occlusion utility metric for each of the internal voxels which quantifies an estimate of a potential surface area of an expanded version of each internal voxel. In dependence on the occlusion utility metric, an internal voxel is selected from the subset of internal voxels to be a candidate voxel for transformation into a contained region, and a volume of the candidate voxel is expanded through at least one dimension to obtain an expanded voxel and is allocated as a contained region.
A computer-implemented method of compiling a program includes analysing the program to identify at least one group of instructions within the program that can be executed atomically. In response to identifying a group of instructions that can be executed atomically, the group of instructions is extracted from the program to form a burst; a modified program is created by inserting an instruction into the program in place of the extracted group of instructions. The instruction is configured to trigger execution of the burst, and the burst and the modified program are saved separately.
An image of a 3-D scene is rendered by rendering a noisy image at a first resolution; obtaining initial guide channels at the first resolution, and obtaining corresponding initial guide channels at a second resolution. When the two resolutions are the same, the initial guide channels at the first resolution and the corresponding initial guide channels at the second resolution may be provided by a single set of initial guide channels. Enhanced guide channels are derived from the initial guide channels using machine learning models. For each of a plurality of local neighbourhoods, the parameters of a denoising model that approximates the noisy image (in the local neighbourhood) are calculated as a function of the enhanced guide channels (at the first resolution), and the calculated parameters are applied to the one or more enhanced guide channels (at the second resolution), to produce a denoised image at the second resolution.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
G06T 5/20 - Image enhancement or restoration using local operators
G06T 5/60 - Image enhancement or restoration using machine learning, e.g. neural networks
Shuffle accelerators for shuffling data on a shader core of a graphics processing unit include routing logic, slave logic and master logic. The routing logic selectively connects data input ports to a plurality of data output ports. The slave logic selectively provides data from a first set of instances to the plurality of data input ports and receives data from the plurality of data output ports for a second set of instances. The master logic is configured to, in response to receiving a shuffle instruction that identifies a shuffle of data between the plurality of instances, cause the routing logic and the slave logic to perform the identified shuffle of data in a plurality of phases, wherein in each phase of the plurality of phases a subset of the instances of the plurality of instances receive data from a subset of the instances of the plurality of instances.
Contained regions are selected for a ray tracing system, the contained regions facilitating determining whether a ray intersects an object of a scene contained within finite bounding regions of an object partitioning hierarchy. A target contained region is selected from candidate contained regions within extents of a geometry defined by the object. Occluded contained regions of the contained regions are identified, which the target contained region at least partially occludes. It is determined whether a surface area metric of the target contained region meets surface area utility criteria defined based on i) a surface area defined by the occluded contained regions; and ii) a surface area defined by the object. When the surface area metric does not meet the surface area utility criteria, the geometry data defining the target contained region is discarded to obtain a refined set of contained regions facilitating determining whether a ray intersects the object in dependence on determining that the ray intersects at least one contained region of the refined set of contained regions.
Processing logic of a processing system processes protected tasks first and second times to generate first and second processed outputs. A first fault detection unit compares the first and second processed outputs for a respective protected task and generates a first signal indicative of whether they match. A second fault detection unit compares the first and second processed outputs for the respective protected task and generates a second signal indicative of whether they match. The processing system is operable in a first protected mode in which the first fault detection unit and the second fault detection unit operate, concurrently, in respective mission modes, and the first signal and the second signal for the respective protected task are provided to a fault assessment unit for comparison in order to assess whether a fault existed at the first fault detection unit and/or the second fault detection unit when generating the first and second signals.
G06F 11/18 - Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits, e.g. by quadding or by majority decision circuits
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
A hierarchy is a multi-level linked structure of nodes, wherein the hierarchy represents data relating to a set of one or more items to be processed. Where there are multiple input hierarchies, it may improve the efficiency of the processing of the items to merge the input hierarchies to form a merged hierarchy. The hierarchies are merged by identifying two or more sub-hierarchies within the input hierarchies which are to be merged, and determining one or more nodes of the merged hierarchy which reference nodes of the identified sub-hierarchies. The determined nodes of the merged hierarchy are stored and indications of the references between the determined nodes of the merged hierarchy and the referenced nodes of the identified sub-hierarchies are also stored. In this way, the merged hierarchy is formed for use in processing the items.
G06F 7/14 - Merging, i.e. combining at least two sets of record carriers each arranged in the same ordered sequence to produce a single set having the same ordered sequence
A method of performing safety-critical rendering at a graphics processing unit within a graphics processing system, the method comprising: receiving, at the graphics processing system, graphical data for safety-critical rendering at the graphics processing unit; scheduling at a safety controller, in accordance with a reset frequency, a plurality of resets of the graphics processing unit; rendering the graphical data at the graphics processing unit; and the safety controller causing the plurality of resets of the graphics processing unit to be performed commensurate with the reset frequency.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
G06T 1/20 - Processor architecturesProcessor configuration, e.g. pipelining
G06T 15/00 - 3D [Three Dimensional] image rendering
20.
Texture Address Generation Using Fragment Pair Differences
Methods and hardware for texture address generation receive fragment coordinates for an input block of fragments and texture instructions for the fragments and calculating gradients for at least one pair of fragments. Based on the gradients, the method determines whether a first mode or a second mode of texture address generation is to be used and then uses the determined mode and the gradients to perform texture address generation. The first mode of texture address generation performs calculations at a first precision for a subset of the fragments and calculations for remaining fragments at a second, lower, precision. The second mode of texture address generation performs calculations for all fragments at the first precision and if the second mode is used and more than half of the fragments in the input block are valid, the texture address generation is performed over two clock cycles.
Object intersection testing in a ray tracing system determines whether a ray intersects an object of a scene, wherein the object is contained within a bounding region that is finite and forms part of an object partitioning hierarchy. Upon determining that the ray intersects the bounding region, at least one contained region is obtained, wherein the contained region is contained within, and smaller than, extents of a geometry defined by the object. Upon determining that the ray intersects a contained region of the at least one contained region, it is determined whether the ray intersects the object in dependence on at least determining that the ray intersects the contained region.
A computer-implemented method for compressing, an input group of m data values compresses the two least significant bits of each of the data values by mapping the two least significant bits of each of the data values in the input group of m data values collectively onto an m-bit encoding and storing the m-bit encoding, the m-bit encoding being selected from 2m m-bit encodings, the 2m m-bit encodings comprising a first group of encodings comprising (2m−4) m-bit encodings and a second group of encodings comprising four m-bit encodings, wherein if the selected encoding is an encoding from the first group of encodings then the selected encoding represents the two least significant bits for a representative group of m data values in which the second least significant bit of each of the data values is the same as a respective bit of the m-bit encoding, and wherein if the selected encoding is an encoding from the second group of encodings then the selected encoding represents the two least significant bits for a representative group of m data values in which the two least significant bits for each of the data values in the representative group are equal to the two least significant bits of the other data values in the representative group.
A method of generating identifiers (IDs) for primitives and optionally vertices during tessellation. The IDs include a binary sequence of bits that represents the sub-division steps taken during the tessellation process and so encodes the way in which tessellation has been performed. Such an ID may subsequently be used to generate a random primitive or vertex and hence recalculate vertex data for that primitive or vertex.
A method of performing anisotropic texture filtering involves performing isotropic filtering at each sampling point of a set of sampling points in an ellipse to produce isotropic filter results. Weights of an anisotropic filter are selected that minimize a cost function that penalises high frequencies in the filter response of the anisotropic filter under a constraint that the variance of the anisotropic filter is related to an anisotropic ratio squared, the anisotropic ratio being the ratio of a major radius of the ellipse to be sampled and a minor radius of the ellipse to be sampled. The plurality of isotropic filter results are combined using the selected weights of the anisotropic filter to generate at least a portion of a filter result.
A block of sub-primitive presence indications for use in a rendering system are compressed into a block of compressed data. The block of sub-primitive presence indications is subdivided into a plurality of parent regions, each of the parent regions being subdivided into a plurality of child regions. A hierarchical representation of the block of sub-primitive presence indications is determined, wherein for each of one or more parent regions whose child regions all have the same presence state according to the sub-primitive presence indications in the block of sub-primitive presence indications, parent-level data is included in the hierarchical representation to represent the presence state of the parent region without child-level data for the child regions within the parent region being included in the hierarchical representation. The determined hierarchical representation of the block of sub-primitive presence indications is then stored in the block of compressed data.
A tessellation method uses both vertex tessellation factors and displacement factors defined for each vertex of a patch, which may be a quad, a triangle or an isoline. The method is implemented in a computer graphics system and involves calculating a vertex tessellation factor for each corner vertex in one or more input patches. Tessellation is then performed on the plurality of input patches using the vertex tessellation factors. The tessellation operation involves adding one or more new vertices and calculating a displacement factor for each newly added vertex. A world space parameter for each vertex is subsequently determined by calculating a target world space parameter for each vertex and then modifying the target world space parameter for a vertex using the displacement factor for that vertex.
A wireless media distribution system is provided comprising an access point (6) for broadcasting media and a plurality of stations (2) for reception and playback of media. Each station is configured for receiving and decoding a timestamp in a beacon frame transmitted repeatedly from the access point. This is used to control the output signal of a station physical layer clock (12) which is then used as a clock source for an application layer time synchronisation protocol. This application layer time synchronisation protocol can then be used in the station to control an operating system clock (8) for regulating playback of media.
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video streamElementary client operations, e.g. monitoring of home network or synchronizing decoder's clockClient middleware
A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 7/575 - Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 9/52 - Program synchronisationMutual exclusion, e.g. by means of semaphores
Methods and compression units for compressing a block of image data, the block of image data comprising a plurality of image element values, the image element values being divisible into at least a first value and a second value such that the block of image data comprises a two-dimensional block of first values, the method comprising: compressing a first data set comprising all or a portion of the two-dimensional block of first values in accordance with a first fixed-length compression algorithm to generate a first compressed block by: identifying common base information for the first data set; and identifying a fixed-length parameter for each first value in the first data set, the fixed-length parameter being zero, one or more than one bits in length; and forming a compressed block for the block of image data based on the first compressed block.
G06F 7/72 - Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radixComputing devices using combinations of denominational and non-denominational quantity representations using residue arithmetic
G06T 3/40 - Scaling of whole images or parts thereof, e.g. expanding or contracting
In an aspect, an update unit can evaluate condition(s) in an update request and update one or more memory locations based on the condition evaluation. The update unit can operate atomically to determine whether to effect the update and to make the update. Updates can include one or more of incrementing and swapping values. An update request may specify one of a pre-determined set of update types. Some update types may be conditional and others unconditional. The update unit can be coupled to receive update requests from a plurality of computation units. The computation units may not have privileges to directly generate write requests to be effected on at least some of the locations in memory. The computation units can be fixed function circuitry operating on inputs received from programmable computation elements. The update unit may include a buffer to hold received update requests.
G06T 15/00 - 3D [Three Dimensional] image rendering
G06F 12/0804 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
A method and system for performing a render using a graphics processing unit that implements a tile-based graphics pipeline where a rendering space is sub-divided into tiles. For a selected tile of a plurality of tiles, a representation of per-tile vertex shader data identifying vertex shader programs used to generate processed primitives located within the selected tile is stored, and it is determined whether the output of a previous render for the selected tile can be used as an output for the render, by comparing the per-tile vertex shader data of the selected tile of the render with that of the previous render.
A method and an intersection testing module for performing intersection testing in a ray tracing system determines if a difference between an intersection distance at which a ray intersects a first primitive and an intersection distance at which the ray intersects a second primitive satisfies a comparison condition with respect to a threshold, and if the orientations of the first and second primitives are different. If so the intersection of the ray with the one of the first and second primitives which has a particular orientation is selected.
Methods and tiling engines for tiling primitives in a tile based graphics processing system. A multi-level hierarchy of tile groups is generated, each level comprising one or more tile groups comprising one or more of the plurality of tiles. A plurality of primitive blocks is received, each comprising geometry data for one or more primitives. Each of the plurality of primitive blocks is associated with one or more of the tile groups up to a maximum number of tile groups such that if at least one primitive of a primitive block falls, at least partially, within the bounds of a tile, the primitive block is associated with at least one tile group that includes that tile. A control stream is generated for each tile group based on the associations, wherein each control stream comprises a primitive block entry for each primitive block associated with the corresponding tile group.
Implementations of post-tessellation blender hardware perform both domain shading and blending and while some vertices may not require blending, all vertices require domain shading. The blender hardware includes a cache and/or a content addressable memory and these data structures are used to reduce duplicate domain shading operations. In the event of a cache miss for a UV coordinate of a domain space vertex, a cache outputs the UV coordinate to a domain shader, where the domain space vertex comprises UV coordinates of neighbor vertices that are not inherent from the UV coordinates of the vertex itself.
A binary logic circuit for performing an interpolation calculation between two endpoint values E0 and E1 using a weighting index i for generating an interpolated result P, the values E0 and E1 being formed from Adaptive Scalable Texture Compression (ASTC) low-dynamic range (LDR) colour endpoint values C0 and C1 respectively, the circuit comprising: an interpolation unit configured to perform an interpolation between the colour endpoint values C0 and C1 using the weighting index i to generate a first intermediate interpolated result C2; and combinational logic circuitry configured to receive the interpolated result C2 and to perform one or more logical processing operations to calculate the interpolated result P according to the equation P=└((C2<<8)+C2+32)/64┘ when the interpolated result is not to be compatible with an sRGB colour space, and according to the equation P=└((C2<<8)+128·64+32)/64┘ when the interpolated result is to be compatible with an sRGB colour space.
A method of rendering, in a rendering space, a scene formed by primitives in a graphics processing system. A geometry processing phase includes the step of storing fragment shading rate data representing a first fragment shading rate value and associating data identifying a primitive with the fragment shading rate data. A rendering phase includes the steps of retrieving the stored fragment shading rate data and associated data identifying the primitive, obtaining an attachment specifying one or more attachment fragment shading rate values for the rendering space; processing the primitive to derive primitive fragments to be shaded; and for each primitive fragment, combining the first fragment shading rate value for the primitive from which the primitive fragment is derived with an attachment fragment shading rate value from the attachment to produce a resolved combined fragment shading rate value for the respective fragment.
A computing system and method for processing data in which a forward transformation indication is received defining a transformation from a first space to a second space. A transformation is performed on input data from the second space to the first space to determine transformed data by performing a reverse translation operation on the input data, wherein the reverse translation operation is the reverse of a translation defined by the forward transformation indication. An inverse linear mapping operation is performed on the result of the reverse translation operation, wherein the inverse linear mapping operation is the inverse of a linear mapping defined by the forward transformation indication. The transformed data is processed in the computing system to render an image of a scene.
An on-chip cache is described which receives memory requests and in the event of a cache miss, the cache generates memory requests to a lower level in the memory hierarchy (e.g. to a lower level cache or an external memory). Data returned to the on-chip cache in response to the generated memory requests may be received out-of-order. An instruction scheduler in the on-chip cache stores pending received memory requests and effects the re-ordering by selecting a sequence of pending memory requests for execution such that pending requests relating to an identical cache line are executed in age order and pending requests relating to different cache lines are executed in an order dependent upon when data relating to the different cache lines is returned. The memory requests which are received may be received from another, lower level on-chip cache or from registers.
An image of a 3-D scene is rendered by rendering a noisy image at a first resolution; obtaining initial guide channels at the first resolution, and obtaining corresponding initial guide channels at a second resolution. When the two resolutions are the same, the initial guide channels at the first resolution and the corresponding initial guide channels at the second resolution may be provided by a single set of initial guide channels. Enhanced guide channels are derived from the initial guide channels and the noisy image, using machine learning models. For each of a plurality of local neighbourhoods, the parameters of a denoising model that approximates the noisy image as a function of the one or more enhanced guide channels (at the first resolution) are calculated, and the calculated parameters are applied to the one or more enhanced guide channels (at the second resolution), to produce a denoised image at the second resolution.
A graphics processing unit having multiple groups of processor cores for rendering graphics data for allocated tiles and outputting the processed data to regions of a memory resource. Scheduling logic allocates sets of tiles to the groups to perform a first render, and at a time when at least one of the groups has not completed processing its allocated sets as part of the first render, allocates at least one set of tiles for a second render to one of the other groups for processing. Progress indication logic indicates progress of the first render, indicating regions of the memory resource for which processing for the first render has been completed. Progress check logic checks the progress indication in response to a request for access to a region of the memory resource as part of the second render and enables access to that region of the resource in response to an indication that processing for the first render has been completed for that region.
G06T 15/00 - 3D [Three Dimensional] image rendering
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06T 1/20 - Processor architecturesProcessor configuration, e.g. pipelining
Lossy methods and hardware for compressing data and the corresponding decompression methods and hardware are described. The lossy compression method comprises dividing a block of pixels into a number of sub-blocks and then analysing, for each sub-block, and selecting one of a candidate set of lossy compression modes. The analysis may, for example, be based on the alpha values for the pixels in the sub-block. In various examples, the candidate set of lossy compression modes comprises at least one mode that uses a fixed alpha channel value for all pixels in the sub-block and one or more modes that encode a variable alpha channel value.
H03M 7/30 - CompressionExpansionSuppression of unnecessary data, e.g. redundancy reduction
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/42 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
Within a graphical processing system a plurality of different shading programs may be executed by a single processor over multiple threads. For each shading program a plurality of registers are used to store data for the respective shading program. Thus, for multiple shading programs executed over multiple threads a plurality of registers are allocated to each program, or thread, being executed. However, there are a limited number of registers available and therefore efficient allocation of the registers optimises performance. Often an unnecessary number of registers is allocated to each shading program but the present invention provides a method of allocating the correct number of registers based on the size of the fragments being shaded.
An attention layer of an attention-based neural network is arranged to implement an attention function in dependence on a Key matrix, a Query matrix and a Value matrix. The attention layer uses a Key weight matrix to determine the Key matrix, a Query weight matrix to determine the Query matrix, and Value weight matrix to determine the Value matrix. A compressed attention-based neural network is outputted which comprises a compressed attention layer arranged to implement the attention function by performing a compressed operation in dependence on: (i) a set of one or more Key weight sub-matrices, (ii) a set of one or more Query weight sub-matrices, and (iii) a set of one or more Value weight sub-matrices.
A graphics processing unit has a shader core including a main processing portion and a sub-processor. The main processing portion comprises a scheduler, an instruction cache, a plurality of registers and a plurality of arithmetic logic units (ALUs). The sub-processor operates independently of the main processing portion and comprises a burst scheduler, a plurality of registers and a plurality of ALUs. The sub-processor is arranged to execute bursts, wherein a burst comprises at least one group of instructions that can be executed atomically and which are extracted from a program. The main processing portion executes a modified version of the program, wherein the modified program is created from the program by replacing the instructions in a burst with an instruction that triggers the execution of the burst. The registers in the sub-processor are used to store one or more sources and/or results for bursts that are being executed by the sub-processor.
A method of performing anisotropic texture filtering includes generating one or more parameters describing an elliptical footprint in texture space; performing isotropic filtering at each of a plurality of sampling points along a major axis of the elliptical footprint, wherein a spacing between adjacent sampling points of the plurality of sampling points is proportional to √{square root over (1−η−2)} units, wherein η is a ratio of a major radius of an ellipse to be sampled and a minor radius of the ellipse to be sampled, wherein the ellipse to be sampled is based on the elliptical footprint; and combining results of the isotropic filtering at the plurality of sampling points with a Gaussian filter to generate at least a portion of a filter result.
A method of converting 10-bit pixel data (e.g. 10:10:10:2 data) into 8-bit pixel data involves converting the 10-bit values to 7-bits or 8-bits and generating error values for each of the converted values. Two of the 8-bit output channels comprise a combination of a converted 7-bit value and one of the bits from the fourth input channel. A third 8-bit output channel comprises the converted 8-bit value and the fourth 8-bit output channel comprises the error values. In various examples, the bits of the error values may be interleaved when they are packed into the fourth output channel.
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
G06T 1/20 - Processor architecturesProcessor configuration, e.g. pipelining
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/42 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
H04N 19/89 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder
A graph attention network including a graph attention network layer arranged to perform an operation in dependence on an adjacency matrix mask having a plurality of elements representative of connected graph nodes is compressed by rearranging the rows and/or columns of the adjacency matrix mask so as to gather the plurality of elements representative of connected graph nodes into one or more adjacency sub-matrix masks, the one or more adjacency sub-matrix masks having a greater number of elements representative of connected graph nodes per total number of elements of the one or more adjacency sub-matrix masks than the number of elements representative of connected graph nodes per total number of elements of the adjacency matrix mask. A compressed graph attention network comprising a compressed graph attention network layer arranged to perform a compressed operation in dependence on the one or more adjacency sub-matrix masks is outputted.
An indication of one or more weighting parameters is determined for use in applying upsampling to input pixel values representing an image region to determine a block of one or more upsampled pixel values. A horizontal edge filter determines a first filtered value. A vertical edge filter determines a second filtered value. A horizontal line filter determines a third filtered value. A vertical line filter determines a fourth filtered value. The first, second, third and fourth filtered values are used to determine the indication of one or more weighting parameters, wherein the one or more weighting parameters are indicative of relative horizontal and vertical variation of the input pixel values within the image region. The determined indication of the one or more weighting parameters is output for use in applying upsampling to the input pixel values representing the image region to determine a block of one or more upsampled pixel values.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
Pixel values are determined at respective upsampled pixel locations for a current frame of a sequence of frames. Depth values are obtained for locations of pixels of a reference frame of the sequence of frames. For each of the upsampled pixel locations: (a) a depth value of the current frame is obtained; (b) a motion vector is obtained to indicate motion between the reference frame and the current frame; (c) the motion vector is used to identify one or more of the pixels of the reference frame; (d) a weight is determined for each of the identified pixels of the reference frame in dependence on: (i) the depth value of the current frame for the upsampled pixel location, and (ii) the depth value for the location of the identified pixel of the reference frame; and (e) the pixel value for the upsampled pixel location is determined using the determined weight for each of the identified pixels.
G06V 10/75 - Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video featuresCoarse-fine approaches, e.g. multi-scale approachesImage or video pattern matchingProximity measures in feature spaces using context analysisSelection of dictionaries
A neural network block includes a plurality of layers arranged sequentially. Each layer includes an expansion layer having a first number of input channels and a second number of output channels, where the second number is larger than the first number, a compression layer, having a third number of input channels and a fourth number of output channels, wherein the fourth number is smaller than the third number, and a grouped convolution layer.
A computer-implemented method for performing a vector bitwise rotation, wherein a processing system comprises a byte-wise anything-to-anything mux and one or more bitwise right shifters, wherein the byte-wise anything-to-anything mux includes a plurality of byte-sized inputs and a plurality of byte-sized outputs, each input being associated with a respective input position and each output being associated with a respective output position. A combination of a byte-wise anything-to-anything mux and one or more bitwise shifts is used to perform vector bitwise rotations, with even and odd elements of the vector operated on separately.
A hardware design for a main component is verified, the main component being representable as a hierarchical set of components comprising parent components which each comprise leaf components in the hierarchical set. For each of the parent components it is verified that an instantiation of an abstracted hardware design for the parent component generates an expected output transaction in response to each of a plurality of test input transactions. The abstracted hardware design comprises, for each leaf component of the parent component, a corresponding abstracted component that is configured to, for a specific input transaction to the leaf component, produce a specific output transaction with a causal deterministic relationship to the specific input transaction, wherein a formal verification tool is configured to select the specific input transaction and the specific output transaction pair to be each possible valid input transaction and valid output transaction pair for the leaf component.
Methods of arbitrating between requestors and a shared resource wherein for each processing cycle a plurality of select signals are generated and then used by decision nodes in a binary decision tree to select a requestor. The select signals are generated using valid bits and priority bits. Each valid bit corresponds to one of the requestors and indicates whether, in the processing cycle, the requestor is requesting access to the shared resource. Each priority bit corresponds one of the requestors and indicates whether, in the processing cycle, the requestor has priority. Corresponding valid bit and priority bits are combined in an AND logic element to generate a valid_and_priority bit for each requestor. Pair-wise OR-reduction is then performed on both the valid bits and the valid_and_priority bits to generate additional valid bits and valid_and_priority bits for sets of requestors and these are then used to generate the select signal.
A method of operating a GPU uses input attributes in executing a first part of a geometry task fetched by a shader core. The first part of the task executes a first part of a shader to calculate position data for each instance of the task. The first part of the task is executed to output the position data for each instance of the task. The task is then descheduled until cull results are received for each instance. In response to receiving cull results indicating at least one remaining instance in the task, input attributes used in executing a second part of a task are fetched. The second part of the task executes a second part of a shader to calculate varyings for each remaining instance. The second part of the task is executed and the varyings for each remaining instance are output.
A method of managing resources in a GPU comprises allocating a region of off-chip storage to a geometry task on creation of the geometry task and receiving, at an on-chip store in the GPU, a memory allocation request for the geometry task from a shader core in the GPU, wherein the memory allocation request is received after generation of geometry data for the geometry task. In response to receiving the memory allocation request, the method comprises determining, by the on-chip store, whether to allocate a region of the on-chip store to the geometry task. In response to allocating the region of the on-chip store, geometry data for the geometry task is written to the on-chip store and in response to determining not to allocate the region of the on-chip store, the geometry data is written to the allocated region of off-chip storage.
An elementwise operations hardware accelerator for use in a neural network accelerator. The elementwise operations hardware accelerator comprises one or more processing pipelines and a control module. Each processing pipeline includes: an arithmetic logic unit module comprising a plurality of different arithmetic logic unit blocks, each arithmetic logic unit block of the plurality of arithmetic logic unit blocks configured to receive one or more inputs, selectively perform one or more elementwise operations on the one or more inputs, and output a result of the one or more elementwise operations; and an interconnection module configured to receive elements of one or more input tensors and selectively provide the elements of at least one of the one or more input tensors to an arithmetic logic unit block of the plurality of arithmetic logic unit blocks as an input; The control module is configured to receive a set of commands identifying an arithmetic logic unit block of the plurality of arithmetic logic unit blocks and one or more elementwise operations to be performed by the identified arithmetic logic unit block and control the operation of the one or more processing pipelines to cause the identified arithmetic logic unit block to perform the identified one or more elementwise operations.
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
Shader programs may include conditional portions, executed only in response to a specific condition being met. The use of conditional portions can require different numbers of registers. Thus, the use of conditional portions potentially results in the over-allocation of registers. Accordingly, there is provided a method of rendering in a graphics processing system using a shader program having a conditional section applied only in response to fulfilment of a condition, the method comprising compiling the program, by a compiler, the compiling comprising identifying a conditional section reading, by a resource allocator, a constant which determines the result of the condition, determining, by the resource allocator, whether the condition is met or not met and allocating, by the resource allocator, a number of registers.
A compressed attention-based neural network comprises a compressed attention layer implementing an attention function. The compressed attention layer rearranges and partitions an embedded tensor to form embedded sub-matrices. The compressed attention layer applies Key weight sub-matrices to the embedded sub-matrices, and concatenates the results to the respective embedded sub-matrices to determine a Key matrix. The compressed attention layer applies Query weight sub-matrices to the embedded sub-matrices and concatenates the results to determine a Query matrix. The compressed attention layer applies a set of one or more Value weight sub-matrices to the respective one or more embedded sub-matrices, and concatenates the results of applying the one or more Value weight sub-matrices to the respective one or more embedded sub-matrices, to determine a Value matrix. The compressed attention layer implements the attention function using the determined Key matrix, the determined Query matrix and the determined Value matrix.
A method and a processing module are provided for applying upsampling to input pixels representing an image region to determine a block of upsampled pixels. The input pixels have locations corresponding to a repeating quincunx arrangement of upsampled pixel locations. The input pixels are analysed to determine one or more weighting parameters, the one or more weighting parameters being indicative of a directionality of filtering to be applied when upsampling is applied to the input pixels within the image region. One or more of the upsampled pixels of the block of upsampled pixels are determined in accordance with the directionality of filtering indicated by the determined one or more weighting parameters.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
G06T 3/4046 - Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
G06T 5/10 - Image enhancement or restoration using non-spatial domain filtering
G06T 5/60 - Image enhancement or restoration using machine learning, e.g. neural networks
Intersection testing in a ray tracing system is performed for a ray with respect to a primitive. An intersection attribute value is determined for a primary sample of the ray relating to an intersection between the ray and the primitive in a ray coordinate system. The ray coordinate system has two non-parallel axes that are both transverse to the direction of the ray, and an origin of the ray coordinate system is on the ray. For one or both of the two non-parallel axes of the ray coordinate system, data is determined indicating a change to the intersection attributes in a direction parallel to that axis. The intersection between the ray and the primitive is processed using the determined value of the intersection attributes for the primary sample of the ray and the determined data indicating a change to the intersection attributes in the directions parallel to the two non-parallel axes.
Segment load operations are performed by processing data through an anything-to-anything mux, and sections writing elements to respective storage locations based on corresponding indices of the elements and the storage locations. Once all of the elements are loaded into the correct storage location, each location is read again with the elements of that storage location being sent through the mux, arranged) into the correct order, and written back to the same register.
A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.
Methods and image processing systems are provided for determining a dominant gradient orientation for a target region within an image. A plurality of gradient samples are determined for the target region, wherein each of the gradient samples represents a variation in pixel values within the target region. The gradient samples are converted into double-angle gradient vectors, and the double-angle gradient vectors are combined so as to determine a dominant gradient orientation for the target region.
G06T 7/44 - Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
G06F 18/22 - Matching criteria, e.g. proximity measures
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]Salient regional features
A graphics processing system includes a plurality of processing units, wherein the graphics processing system is configured to process a task first and second times at the plurality of processing units. Data identifying which processing unit of the plurality of processing units the task has been allocated to is consulted on allocating the task to a processing unit for processing for a second time, and, in response, the task is allocated for processing for the second time to any processing unit of the plurality of processing units other than the processing unit to which the task was allocated for processing for a first time.
A method of analyzing one or more objects in a set of frames. A first frame is segmented to produce a plurality of first masks each identifying pixels belonging to a potential object-instance detected in the first frame. A first feature vector is extracted from the first frame for each potential object-instance detected therein, characterizing the potential object-instance. A second frame is segmented to produce a plurality of second masks each identifying pixels belonging to a potential object-instance detected in the second frame. A second feature vector is extracted for each potential object-instance detected in the second frame, characterizing the potential object-instance. A potential object-instance in the first frame is matched with one of the potential object-instances in the second frame.
A graph convolutional network (GCN) having a GCN layer is configured. The GCN layer performs an operation in dependence on an adjacency matrix, a feature embedding matrix and a weight matrix. In response to determining that the weight matrix comprises more rows than columns, the GCN layer is configured to determine a first intermediate result of multiplying the feature embedding matrix and the weight matrix, and subsequently use the determined first intermediate result to determine a full result representing a result of multiplying the adjacency matrix, the feature embedding matrix and the weight matrix. In response to determining that the weight matrix comprises more columns than rows, the GCN layer is configured to determine a second intermediate result of multiplying the adjacency matrix and the feature embedding matrix, and subsequently use the determined second intermediate result to determine the full result representing the result of multiplying the adjacency, feature embedding and weight matrices.
A graphics processing system includes a tiling unit configured to tile a scene into a plurality of tiles. A processing unit identifies tiles of the plurality of tiles that are each associated with at least a predetermined number of primitives. A memory management unit allocates a portion of memory to each of the identified tiles and does not allocate a portion of memory for each of the plurality of tiles that are not identified by the processing unit. A rendering unit renders each of the identified tiles and does not render tiles that are not identified by the processing unit.
Data in a processing system is compressed, the data comprising a plurality of values having a same multiple-byte format. Bytes with corresponding byte significance are grouped together to form a plurality of byte blocks, and a byte block of the plurality of byte blocks is compressed using a compression algorithm comprising storing at least one byte from the byte block as a byte origin, and storing at least one remaining byte in the byte block as a difference value from the byte origin.
A method of processing an input task in a processing system involves duplicating the input task so as to form a first task and a second task; allocating memory including a first block of memory configured to store read-write data to be accessed during the processing of the first task; a second block of memory configured to store a copy of the read-write data to be accessed during the processing of the second task; and a third block of memory configured to store read-only data to be accessed during the processing of both the first task and the second task; and processing the first task and the second task at processing logic of the processing system so as to, respectively, generate first and second outputs.
Rendering systems that can use combinations of rasterization rendering processes and ray tracing rendering processes are disclosed. In some implementations, these systems perform a rasterization pass to identify visible surfaces of pixels in an image. Some implementations may begin shading processes for visible surfaces, before the geometry is entirely processed, in which rays are emitted. Rays can be culled at various points during processing, based on determining whether the surface from which the ray was emitted is still visible. Rendering systems may implement rendering effects as disclosed.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
Hardware is configured for implementing a Deep Neural Network (DNN) for performing an activation function. A programmable lookup table for storing lookup data approximating the activation function is provided at an activation module for performing the activation function. Training data is provided to an input layer of a representation of the hardware, wherein the representation of the hardware is configured to implement the DNN, to configure the DNN by using the training data, wherein configuring the DNN comprises determining lookup data for the lookup table representing the activation function. The lookup data is loaded into the lookup table of the hardware, thereby configuring the activation module of the hardware for performing the activation function during post-training operation.
Systems and method to implement a geometry processing phase of tile-based rendering. The systems include a plurality of parallel geometry pipelines, a plurality of tiling pipelines and a geometry to tiling arbiter situated between the plurality of geometry pipelines and the plurality of tiling pipelines. Each geometry pipeline is configured to generate one or more geometry blocks for each geometry group of a subset of ordered geometry groups; generate a corresponding primitive position block for each geometry block, and compress each geometry blocks to generate a corresponding compressed geometry block. The tiling pipelines are configured to generate, from the primitive position blocks, a list for each tile indicating primitives that fall within the bounds of that tile. The geometry to tiling arbiter is configured to forward the primitive position blocks generated by the plurality of geometry pipelines to the plurality of tiling pipelines in the correct order based on the order of the geometry groups.
A processor has first and second cores and a distributed cache that caches a copy of data stored at a plurality of memory addresses of a memory. A first cache slice is connected to the first core, and a second cache slice is connected to the second core. The first cache caches a copy of data stored at a first set of memory addresses, and the second cache slice caches a copy of data stored at a second, different, set of memory addresses.
A transaction processing circuit in a graphics rendering system receives information identifying a particular vertex of a plurality of vertices in a strip, each of which is associated with a viewport, and selects a plurality of viewports for viewport transformation of the particular vertex by selecting relevant vertices from the vertices in the strip based on a provoking vertex, and selecting the plurality of viewports to comprise the viewport associated with that relevant vertex. Viewport transformation instructions are sent to a viewport transformation module to perform a viewport transformation on untransformed coordinate data for the particular vertex for each of the viewports, wherein the one or more viewport transformation instructions comprises a viewport transformation instruction for each of the plurality of viewports, each viewport transformation instruction comprises information identifying the particular vertex and information identifying one of the plurality of viewports.
A method of matching features in first and second images captured from respective camera viewpoints related by an epipolar geometry. The coordinate system of the second image is transformed so as to map an epipolar line in the second image corresponding to a first feature in the first image, to be parallel to one of the coordinate axes of the coordinate system. The epipolar line defines a geometrically-constrained region in the second image in the transformed coordinate system corresponding to the first feature in the first image; measures of similarity between the first feature in the first image and features in the second image are determined; and a best match feature is identified from the measures of similarity between the first feature in the first image and the respective features in the second image.
G06T 7/593 - Depth or shape recovery from multiple images from stereo images
G06F 18/22 - Matching criteria, e.g. proximity measures
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]Salient regional features
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
Hardware logic for implementing a convolutional neural network (CNN) is configured to receive input data values to be processed in a layer of the CNN. Addresses in banked memory of a buffer in which the received input data values are to be stored are determined based upon format data indicating a format parameter of the input data in the layer and indicating a format parameter of a filter which is to be used to process the input data in the layer, wherein the format parameter of the filter comprises a stride. The received input data values are then stored at the determined addresses in the buffer for retrieval for processing in the layer.
A method and system for generating and shading a computer graphics image in a tile based computer graphics system is provided. Geometry data is supplied and a plurality of primitives are derived from the geometry data. One or more modified primitives are then derived from at least one of the plurality of primitives. For each of a plurality of tiles, an object list is derived including data identifying the primitive from which each modified primitive located at least partially within that tile is derived. Alternatively, the object list may include data identifying each modified primitive located at least partially within that tile. Each tile is then shaded for display using its respective object list.
A cache system in a graphics processing system stores graphics data items for use in rendering primitives. It is determined whether graphics data items relating to primitives to be rendered are present in the cache, and if not then computation instances for generating the graphics data items are created. Computation instances are allocated to tasks using a task assembly unit which stores task entries for respective tasks. The task entries indicate which computation instances have been allocated to the respective tasks. The task entries are associated with characteristics of computation instances which can be allocated to the respective tasks. A computation instance to be executed is allocated to a task based on the characteristics of the computation instance. SIMD processing logic executes computation instances of a task outputted from the task assembly unit to thereby determine graphics data items, which can be used to render the primitives.
Graphics processing systems render items of geometry using a rendering space subdivided into a plurality of first regions. The items of geometry are stored in data blocks having a respective block ID. The items of geometry are rendered within a second region of a plurality of second regions using a first control list for the first region of which the second region is a part, and a second control list for the second region, each control list comprising entries associated with respective items of geometry, each of the entries comprising a block ID associated with a data block. The items of geometry are rendered within the second region by choosing from the first control list and the second list, the entry comprising the lowest block ID which has not previously been chosen, and fetching items of geometry from the data block associated with the block ID of the chosen entry.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
45 - Legal and security services; personal services for individuals.
Goods & Services
Computer software for microprocessors and electronic circuits; computer software for graphics processing; computer software for multimedia processing; computer software in relation to instruction set architectures; computer software in relation to neural networks processors; artificial intelligence and machine learning software in connection with microprocessors and electronic circuits; firmware and device drivers for microprocessors; interfaces between computer hardware and computer software; electronic databases featuring data and information relating to microprocessors, electronic circuits, graphics processing, instruction set architectures and neural networks processors; electronic publications in the field of microprocessors; microprocessors; electronic circuits; central processing units; graphics processing units; neural network processors. Providing online non-downloadable computer software, Software-as-a-Service, and Platform-as-a-Service in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessor architecture. Licensing of Intellectual property and technology; licensing of know-how, namely practical knowledge, skill and expertise in relation to the development of microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits.
45 - Legal and security services; personal services for individuals.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Licensing of Intellectual property and technology; licensing of know-how, namely practical knowledge, skill and expertise in relation to the development of microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits Computer software for microprocessors and electronic circuits; computer software for graphics processing; computer software for multimedia processing; computer software in relation to instruction set architectures; computer software in relation to neural networks processors; artificial intelligence and machine learning software in connection with microprocessors and electronic circuits; firmware and device drivers for microprocessors; interfaces between computer hardware and computer software; electronic databases featuring data and information relating to microprocessors, electronic circuits, graphics processing, instruction set architectures and neural networks processors; electronic publications in the field of microprocessors; microprocessors; electronic circuits; central processing units; graphics processing units; neural network processors Providing online non-downloadable computer software, Software-as-a-Service, and Platform-as-a-Service in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessor architecture
An adder and a method for calculating 2n+x are provided, where x is a variable input expressed in a floating point format and n is an integer. The adder comprises a first path configured to calculate 2n+x for x<0 and 2n−1≤|x|<2n+1; a second path configured to calculate 2n+x for |x|<2n; a third path configured to calculate 2n+x for |x|≥2n; and selection logic configured to cause the adder to output a result from one of the first, second, and third paths in dependence on the values of x and n.
A colour processor for mapping an image from source to destination colour gamuts has an input for receiving a source image including a plurality of source colour points expressed according to the source gamut; a colour characterizer configured to, for each source colour point in the source image, determine a position of intersection of a curve with the boundary of the destination gamut; and a gamut mapper configured to, for each source colour point in the source image: if the source colour point lies inside the destination gamut, apply a first translation factor to translate the source colour point to a destination colour point within a first range of values; or if the source colour point lies outside the destination gamut, apply a second translation factor, different to the first translation factor, to translate the source colour point to a destination colour point within a second range of values.
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
G09G 5/06 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed using colour palettes, e.g. look-up tables
A multicore graphics processing unit (GPU) and a method of operating a GPU are provided. The GPU comprises at least a first core and a second core. At least one of the cores in the multicore GPU comprises a master unit configured to distribute geometry processing tasks between at least the first core and the second core.
Input/output filter units for use in a graphics processing unit include a first buffer configured to store data received from, and output to, a first component of the graphics processing unit; a second buffer configured to store data received from, and output to, a second component of the graphics processing unit; a weight buffer configured to store filter weights; a filter bank configurable to perform a plurality of types of filtering on a set of input data, the plurality of types of filtering comprising texture filtering types and pixel filtering types; and control logic configured to cause the filter bank to: (i) perform one of the plurality of types of filtering on a set of data stored in one of the first and second buffers using a set of weights stored, and (ii) store the results of the filtering in one of the first and second buffers.
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
G06F 13/12 - Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
Methods and apparatus for generating a data structure for storing primitive data for a number of primitives and vertex data for a plurality of vertices, wherein each primitive is defined with reference to one or more of the plurality of vertices. The vertex data comprises data for more than one view, such as a left view and a right view, with vertex parameter values for a first group of vertex parameters being stored separately for each view and vertex parameter values for a second, non-overlapping group of vertex parameters being stored only once and used when rendering either or both views.
Methods and hardware for cube mapping comprise receiving fragment coordinates for an input block of fragments and texture instructions for the fragments and then determining, based on gradients of the input block of fragments, whether a first mode of cube mapping or a second mode of cube mapping is to be used, wherein the first mode of cube mapping performs calculations at a first precision for a subset of the fragments and calculations for remaining fragments at a second, lower, precision and the second mode of cube mapping performs calculations for all fragments at the first precision. Cube mapping is then performed using the determined mode and the gradients, wherein if the second mode is used and more than half of the fragments in the input block are valid, the cube mapping is performed over two clock cycles.
A method of decompression to determine data values from compressed data comprising representations of one or more difference values for the data values being decompressed, each difference value representing a difference between the respective data value and an origin value, wherein the representations of the one or more difference values are included in the compressed data using a second number of bits. Based on the representations of the one or more difference values in the compressed data and a first number of bits for representing the one or more difference values for the one or more data values, for each of the one or more data values being decompressed, a difference value is determined in accordance with the first number of bits. Each of the one or more data values being decompressed is determined using: (i) the origin value, and (ii) the determined difference value for the data value.
G06F 7/72 - Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radixComputing devices using combinations of denominational and non-denominational quantity representations using residue arithmetic
G06T 3/40 - Scaling of whole images or parts thereof, e.g. expanding or contracting
A binary logic circuit performs an interpolation calculation between two endpoint values E0 and E1 using a weighting index i for generating an interpolated result P, the values E0 and E1 being formed from Adaptive Scalable Texture Compression (ASTC) colour endpoint values C0 and C1 respectively, the colour endpoint values C0 and C1 being low-dynamic range (LDR) or high dynamic range (HDR) values. An interpolation unit performs an interpolation between the values C0 and C1 using the index i to generate a first intermediate interpolated result C2; combinational logic circuitry receives the result C2 and performs logical processing operations to calculate the interpolated result P according to the equation: (1) P=└((C2<<8)+C2+32)/64┘ when the interpolated result is not to be compatible with an sRGB colour space and the colour endpoint values are LDR values; (2) P=└((C2<<8)+128.64+32)/64┘ when the interpolated result is to be compatible with an sRGB colour space and the colour endpoint values are LDR values; and (3) P=(C2+2)>>2 when the colour endpoint values are HDR values.
Texture filtering is applied to a texture represented with a mipmap comprising a plurality of levels, wherein each level of the mipmap comprises an image representing the texture at a respective level of detail. A texture filtering unit has minimum and maximum limits on an amount by which it can alter the level of detail when it filters texels from an image of a single level of the mipmap. The range of level of detail between the minimum and maximum limits defines an intrinsic region of the texture filtering unit. If it is determined that a received input level of detail is in an intrinsic region of the texture filtering unit, texels are read from a single mipmap level of the mipmap, and the read texels from the single mipmap level are filtered to determine a filtered texture value representing part of the texture at the input level of detail. If it is determined that the received input level of detail is in an extrinsic region of the texture filtering unit: texels are read from two mipmap levels of the mipmap, and the read texels from the two mipmap levels are processed to determine a filtered texture value representing part of the texture at the input level of detail.
Methods and tiling engines for tiling primitives in a tile based graphics processing system in which a rendering space is divided into a plurality of tiles. The method includes generating a multi-level hierarchy of tile groups, each level of the multi-level hierarchy comprising one or more tile groups comprising one or more of the plurality of tiles; receiving a plurality of primitive blocks, each primitive block comprising geometry data for one or more primitives; associating each of the plurality of primitive blocks with one or more of the tile groups up to a maximum number of tile groups such that if at least one primitive of a primitive block falls, at least partially, within the bounds of a tile, the primitive block is associated with at least one tile group that includes that tile; and generating a control stream for each tile group based on the associations, wherein each control stream comprises a primitive block entry for each primitive block associated with the corresponding tile group.
Rendering system combines point sampling and volume sampling operations to produce rendering outputs. For example, to determine color information for a surface location in a 3-D scene, one or more point sampling operations are conducted in a volume around the surface location, and one or more sampling operations of volumetric light transport data are performed farther from the surface location. A transition zone between point sampling and volume sampling can be provided, in which both point and volume sampling operations are conducted. Data obtained from point and volume sampling operations can be blended in determining color information for the surface location. For example, point samples are obtained by tracing a ray for each point sample, to identify an intersection between another surface and the ray, to be shaded, and volume samples are obtained from a nested 3-D grids of volume elements expressing light transport data at different levels of granularity.
Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are re-ordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.
Adder circuits and associated methods for processing a set of at least three floating-point numbers to be added together include identifying, from among the at least three numbers, at least two numbers that have the same sign—that is, at least two numbers that are both positive or both negative. The identified at least two numbers are added together using one or more same-sign floating-point adders. A same-sign floating-point adder comprises circuitry configured to add together floating-point numbers having the same sign and does not include circuitry configured to add together numbers having different signs.
G06F 7/24 - Sorting, i.e. extracting data from one or more carriers, re-arranging the data in numerical or other ordered sequence, and re-recording the sorted data on the original carrier or on a different carrier or set of carriers
G06F 7/501 - Half or full adders, i.e. basic adder cells for one denomination
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
95.
Modifying Processing of Commands in a Command Queue Based on Accessed Data Related to a Command
Processing of commands at a graphics processor are controlled by receiving input data and generating a command for processing at the graphics processor from the input data, wherein the command will cause the graphics processor to write out at least one buffer of data to an external memory, and submitting the command to a queue for later processing at the graphics processor. Subsequent to submitting the command, but before the write to external memory has been completed, further input data is received and it is determined that the buffer of data does not need to be written to external memory. The graphics processor is then signalled to prevent at least a portion of the write to external memory from being performed for the command.
A method of filtering a target pixel in an image forms, for a kernel of pixels comprising the target pixel and its neighbouring pixels, a data model to model pixel values within the kernel; calculates a weight for each pixel of the kernel comprising: (i) a geometric term dependent on a difference in position between that pixel and the target pixel; and (ii) a data term dependent on a difference between a pixel value of that pixel and its predicted pixel value according to the data model; and uses the calculated weights to form a filtered pixel value for the target pixel, e.g. by updating the data model with a weighted regression analysis technique using the calculated weights for the pixels of the kernel; and evaluating the updated data model at the target pixel position so as to form the filtered pixel value for the target pixel.
A method of GPU virtualization comprises allocating each virtual machine (or operating system running on a VM) an identifier by the hypervisor and then this identifier is used to tag every transaction deriving from a GPU workload operating within a given VM context (i.e. every GPU transaction on the system bus which interconnects the CPU, GPU and other peripherals). Additionally, dedicated portions of a memory resource (which may be GPU registers or RAM) are provided for each VM and whilst each VM can only see their allocated portion of the memory, a microprocessor within the GPU can see all of the memory. Access control is achieved using root memory management units which are configured by the hypervisor and which map guest physical addresses to actual memory addresses based on the identifier associated with the transaction.
Shader processing units for a graphics processing unit execute ray tracing shaders that generate ray data associated with rays. The ray data includes a plurality of ray data elements. Store logic receives, as part of a ray tracing shader, a ray store instruction that includes: (i) information identifying a store group of a plurality of store groups, each store group comprising one or more ray data elements of the plurality of ray data elements, and (ii) information identifying one or more ray data elements of the identified store group to be stored in an external unit. In response to receiving the ray store instruction, the store logic retrieves the identified ray data elements for one or more rays from the storage. The store logic then sends one or more store requests to an external unit which cause the external unit to store the identified ray data elements for the one or more rays.
Neural network accelerators with one or more neural network accelerator cores. Each neural network accelerator core has hardware accelerators configured to accelerate neural network operations, an embedded processor, a command decoder, and a hardware feedback path between the embedded processor and the command decoder. The command decoder is configured to control the hardware accelerators and the embedded processor of that core in accordance with commands of a command stream, and when the command stream comprises a set of one or more branch commands that indicate a conditional branch is to be performed, cause the embedded processor to determine a next command stream, and in response to receiving information from the embedded processor identifying the next command stream via the hardware feedback path, control the one or more hardware accelerators and the embedded processor in accordance with commands of the next command stream.
Methods and intersection testing modules are provided for determining, in a ray tracing system, whether a ray intersects a 3D axis-aligned box representing a volume defined by a front-facing plane and a back-facing plane for each dimension. The front-facing plane of the box which intersects the ray furthest along the ray is identified. It is determined whether the ray intersects the identified front-facing plane at a position that is no further along the ray than positions at which the ray intersects the back-facing planes in a subset of the dimensions, and this determination is used to determine whether the ray intersects the axis-aligned box. The subset of dimensions comprises the two dimensions for which the front-facing plane was not identified, but does not comprise the dimension for which the front-facing plane was identified. It is determined whether the ray intersects the box without performing a test to determine whether the ray intersects the identified front-facing plane at a position that is no further along the ray than a position at which the ray intersects the back-facing plane in the dimension for which the front-facing plane was identified.