A flow of packets is communicated through a data center. The data center includes multiple racks, where each rack includes multiple network devices. A group of packets of the flow is received onto a first network device. The first device includes a neural network. The first network device generates a neural network feature vector (NNFV) based on the received packets. The first network device then sends the NNFV to a second network device. The second device uses the NNFV to determine a set of weight values. The weight values are then sent back to the first network device. The first device loads the weight values into the neural network. The neural network, as configured by the weight values, then analyzes each of a plurality of flows received onto the first device to determine whether the flow likely has a particular characteristic.
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated computer hardware and recorded computer software, namely, network interface cards, physical and virtual appliances, libraries, and recorded software and hardware components for forwarding, analyzing, and decrypting computer network traffic
3.
Network interface device that sets an ECN-CE bit in response to detecting congestion at an internal bus interface
A network device includes a Network Interface Device (NID) and multiple servers. Each server is coupled to the NID via a corresponding PCIe bus. The NID has a network port through which it receives packets. The packets are destined for one of the servers. The NID detects a PCIe congestion condition regarding the PCIe bus to the server. Rather than transferring the packet across the bus, the NID buffers the packet and places a pointer to the packet in an overflow queue. If the level of bus congestion is high, the NID sets the packet's ECN-CE bit. When PCIe bus congestion subsides, the packet passes to the server. The server responds by returning an ACK whose ECE bit is set. The originating TCP endpoint in turn reduces the rate at which it sends data to the destination server, thereby reducing congestion at the PCIe bus interface within the network device.
H04L 12/879 - Single buffer operations, e.g. buffer pointers or buffer descriptors
H04L 1/16 - Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
4.
Configuration mesh data bus and transactional memories in a multi-processor integrated circuit
A network flow processor integrated circuit includes a plurality of processors, a plurality of multi-threaded transactional memories (MTMs), and a configurable mesh posted transaction data bus. The configurable mesh posted transaction data bus includes a configurable command mesh and a configurable data mesh. Each of these configurable meshes includes crossbar switches and interconnecting links. A command bus transaction value issued by a processor can pass across the command mesh to an MTM. The command bus transaction bus value includes a reference value. The MTM uses the reference value to pull data across the configurable data mesh into the MTM. The MTM then uses the data to carry out the commanded transactional memory operation. Multiple such commands can pass across the posted transaction bus across different parts of the integrated circuit at the same time, and a single MTM can be carrying out multiple such operations at the same time.
A transactional memory (TM) of an island-based network flow processor (IB-NFP) integrated circuit receives a Stats Add-and-Update (AU) command across a command mesh of a Command/Push/Pull (CPP) data bus from a processor. A memory unit of the TM stores a plurality of first values in a corresponding set of memory locations. A hardware engine of the TM receives the AU, performs a pull across other meshes of the CPP bus thereby obtaining a set of addresses, uses the pulled addresses to read the first values out of the memory unit, adds the same second value to each of the first values thereby generating a corresponding set of updated first values, and causes the set of updated first values to be written back into the plurality of memory locations. Even though multiple count values are updated, there is only one bus transaction value sent across the CPP bus command mesh.
09 - Scientific and electric apparatus and instruments
Goods & Services
Integrated computer hardware and / or software, namely, network interface cards, physical or virtual appliances, libraries, and/or software or hardware components for forwarding, analyzing, decryption and other processing of computer network traffic.
7.
Ordering system that employs chained ticket release bitmap having a protected portion
An ordering system includes a plurality of ticket order release bitmap blocks that together store a ticket order release bitmap, a bus and a Global Reordering Block (GRO). Each Ticket Order Release Bitmap Block (TORBB) stores a different part of the ticket order release bitmap. A first TORBB of the plurality of TORBBs is protected. The GRO 1) receives a queue entry onto the ordering system from a thread, 2) receives a ticket release command from the thread, and in response 3) outputs a return data of ticket release command. The queue entry includes a first sequence number. The return data of ticket release command indicates if a bit in the protected TORBB was set. An error code is included in the return data of ticket release command if a bit is set within the protected TORBB. When a bit in the TORBB is set the thread stops processing packets.
A networking device includes: 1) a first processor that includes a match table, and 2) a second processor that includes both a Flow Tracking Autolearning Match Table (FTAMT) as well as a synchronized match table. A set of multiple entries stored in the synchronized match table is synchronized with a corresponding set of multiple entries in the match table on the first processor. The FTAMT, for a first packet of the flow, generates a Flow Identifier (ID) and stores the flow ID as part of a new entry for the flow. The match of a packet to one of the synchronized entries in the synchronized match table causes an action identifier to be recorded in the new entry in the FTAMT. A subsequent packet of the flow results in a hit in the FTAMT and results in the previously recorded action being applied to the subsequent packet.
A method involves compiling a first amount of high-level programming language code (for example, P4) and a second amount of a low-level programming language code (for example, C) thereby obtaining a first amount of native code and a second amount of native code. The high-level programming language code at least in part defines how an SDN switch performs matching in a first condition. The low-level programming language code at least in part defines how the SDN switch performs matching in a second condition. The low-level code can be a type of plugin or patch for handling special packets. The amounts of native code are loaded into the SDN switch such that a first processor (for example, x86 of the host) executes the first amount of native code and such that a second processor (for example, ME of an NFP on the NIC) executes the second amount of native code.
The flow cache of a network flow processor (NFP) stores flow lookup information in cache lines. Some cache lines are stored in external bulk memory and others are cached in cache memory on the NFP. A cache line includes several lock/hash entry slots. Each slot can store a CAM entry hash value, associated exclusive lock status, and associated shared lock status. The head of a linked list of keys associated with the first slot is implicitly pointed to. For the other lock/entry slots, the cache line stores a head pointer that explicitly points to the head. Due to this architecture, multiple threads can simultaneously process packets of the same flow, obtain lookup information, and update statistics in a fast and memory-efficient manner. Flow entries can be added and deleted while the flow cache is handling packets without the recording of erroneous statistics and timestamp information.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0802 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
G06F 12/14 - Protection against unauthorised use of memory
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
11.
Multiprocessor system having posted transaction bus interface that generates posted transaction bus commands
A multiprocessor system includes several processors, a Shared Local Memory (SLMEM), and an interface circuit for interfacing the system to an external posted transaction bus. Each processor has the same address map. Each fetches instructions from SLMEM, and accesses data from/to SLMEM. A processor can initiate a read transaction on the posted transaction bus by doing an AHB-S bus write to a particular address. The AHB-S write determines the type of transaction initiated and also specifies an address in a shared memory in the interface circuit. The interface circuit uses information from the AHB-S write to generate a command of the correct format. The interface circuit outputs the command onto the posted transaction bus, and then receives read data back from the posted transaction bus, and then puts the read data into the shared memory at the address specified by the processor in the original AHB-S bus write.
A VIRTIO Relay Program allows packets to be transferred from a Network Interface Device (NID), across a PCIe bus to a host, and to a virtual machine executing on the host. Rather than an OvS switch subsystem of the host making packet switching decisions, switching rules are transferred to the NID and the NID makes packet switching decisions. Transfer of a packet from the NID to the host occurs across an SR-IOV compliant PCIe virtual function and into host memory. Transfer from that memory and into memory space of the virtual machine is a VIRTIO transfer. This relaying of the packet occurs in no more than two read/write transfers without the host making any packet steering decision based on any packet header. Packet counts/statistics for the switched flow are maintained by the OvS switch subsystem just as if it were the subsystem that had performed the packet switching.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
An integrated circuit includes a processor and an exact-match flow table structure. A first packet is received onto the integrated circuit. The packet is determined to be of a first type. As a result of this determination, execution by the processor of a first sequence of instructions is initiated. This execution causes bits of the first packet to be concatenated and modified in a first way, thereby generating a first Flow Id. The first Flow Id is an exact-match for the Flow Id of a first stored flow entry. A second packet is received. It is of a first type. As a result, a second sequence of instructions is executed. This causes bits of the second packet to be concatenated and modified in a second way, thereby generating a second Flow Id. The second Flow Id is an exact-match for the Flow Id of a second stored flow entry.
A flow of packets is communicated through a data center. The data center includes multiple racks, where each rack includes multiple network devices. A group of packets of the flow is received onto an integrated circuit located in a first network device. The integrated circuit includes a neural network. The neural network analyzes the group of packets and in response outputs a neural network output value. The neural network output value is used to determine how the packets of the flow are to be output from a second network device. In one example, each packet of the flow output by the first network device is output along with a tag. The tag is indicative of the neural network output value. The second device uses the tag to determine which output port located on the second device is to be used to output each of the packets.
H04B 10/00 - Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
A method of dynamically allocating buffers involves receiving a packet onto an ingress circuit. The ingress circuit includes a memory that stores a free buffer list, and an allocated buffer list. Packet data of the packet is stored into a buffer. The buffer is associated with a buffer identification (ID). The buffer ID is moved from the free buffer list to the allocated buffer list once the packet data is stored in the buffer. The buffer ID is used to read the packet data from the buffer and into an egress circuit and is stored in a de-allocation buffer list in the egress circuit. A send buffer IDs command is received from a processor onto the egress circuit and instructs the egress circuit to send the buffer ID to the ingress circuit such that the buffer ID is pushed onto the free buffer list.
A system and method for handling a digital electronic flow between a first and second entity in which a flow policy is determined that is to be applied to the flow and the flow is then directed along a path in accordance with the policy. An ID is supplied for each flow and a tag associated with each flow which indicates the policy to be applied to its associated flow. Flows are also associated with one another, with associated flows having associated policies. In particular the flow may be processed or forwarded. The path may include a graph structure and virtual applications.
A system and method for handling a digital electronic flow between a first and second entity in which a flow policy is determined that is to be applied to the flow and the flow is then directed along a path in accordance with the policy. An ID is supplied for each flow and a tag associated with each flow which indicates the policy to be applied to its associated flow. Flows are also associated with one another, with associated flows having associated policies. In particular the flow may be processed or forwarded. The path may include a graph structure and virtual applications.
A TCP connection is established between a client and a server, such that packets communicated across the TCP connection pass through a proxy. Based at least in part on a result of monitoring packets flowing across the TCP connection, the proxy determines whether to split the TCP control loop into two TCP control loops so that packets can be inspected more thoroughly. If the TCP control loop is split, then a first TCP control loop manages flow between the client the proxy and a second TCP control loop manages flow between the proxy and the server. Due to the two control loops, packets can be held on the proxy long enough to be analyzed. In some circumstances, a decision is then made to stop inspecting. The two TCP control loops are merged into a single TCP control loop, and thereafter the proxy passes packets of the TCP connection through unmodified.
An ordering system receives release requests to release packets, where each packet has an associated sequence number, but the system only releases packets sequentially in accordance with the sequence numbers. The system includes a Ticket Order Release Command Dispatcher And Sequence Number Translator (TORCDSNT) and a plurality of Ticket Order Release Bitmap Blocks (TORBBs). The TORBBs are stored in one or more transactional memories. In response to receiving release requests, the TORCDSNT issues atomic ticket release commands to the transactional memory or memories, and uses the multiple TORBBs in a chained manner to implement a larger overall ticket release bitmap than could otherwise be supported by any one of the TORBBs individually. Special use of one flag bit position in each TORBB facilitates this chaining. In one example, the system is implemented in a network flow processor so that the TORBBs are maintained in transactional memories spread across the chip.
An integrated circuit includes a processor and an exact-match flow table structure. A first packet is received onto the integrated circuit. The packet is determined to be of a first type. As a result of this determination, execution by the processor of a first sequence of instructions is initiated. This execution causes bits of the first packet to be concatenated and modified in a first way, thereby generating a first Flow Id. The first Flow Id is an exact-match for the Flow Id of a first stored flow entry. A second packet is received. It is of a first type. As a result, a second sequence of instructions is executed. This causes bits of the second packet to be concatenated and modified in a second way, thereby generating a second Flow Id. The second Flow Id is an exact-match for the Flow Id of a second stored flow entry.
An array of columns and rows of host server devices is mounted in a row of racks. Each device has a host processor and an exact-match packet switching integrated circuit. Packets are switched within the system using exact-match flow tables that are provisioned by a central controller. Each device is coupled by a first cable to a device to its left, by a second cable to a device to its right, by a third cable to a device above, and by a fourth cable to a device below. In one example, substantially all cables that are one meter or less in length are non-optical cables, whereas substantially all cables that are seven meters or more in length are optical cables. Advantageously, each device of a majority of the devices has four and only four cable ports, and connects only to non-optical cables, and the connections involve no optical transceiver.
A Software-Defined Networking (SDN) switch includes external network ports for receiving external network traffic onto the SDN switch, external network ports for transmitting external network traffic out of the SDN switch, a first Network Flow Switch (NFX) integrated circuit that has multiple network ports and that maintains a first flow table, another Network Flow Switch (NFX) integrated circuit that has multiple network ports and that maintains a second flow table, a Network Flow Processor (NFP) circuit that maintains a third flow table, and a controller processor circuit that maintains a fourth flow table. The controller processor circuit is coupled by a serial bus to the NFP circuit but is not directly coupled by any network port to either the NFP circuit nor the first NFX integrated circuit nor the second NFX integrated circuit.
A network device receives TCP segments of a flow via a first SSL session and transmits TCP segments via a second SSL session. Once a TCP segment has been transmitted, the TCP payload need no longer be stored on the network device. Substantial memory resources are conserved, because the device may have to handle many retransmit TCP segments at a given time. If the device receives a retransmit segment, then the device regenerates the retransmit segment to be transmitted. A data structure of entries is stored, with each entry including a decrypt state and an encrypt state for an associated SSL byte position. The device uses the decrypt state to initialize a decrypt engine, decrypts an SSL payload of the retransmit TCP segment received, uses the encrypt state to initialize an encrypt engine, re-encrypts the SSL payload, and then incorporates the re-encrypted SSL payload into the regenerated retransmit TCP segment.
A method involves a Software-Defined Networking (SDN) switch that includes multiple Network Flow Switch (NFX) integrated circuits, a Network Flow Processor (NFP) circuit, and a controller processor. The controller processor is coupled to the NFP circuit by a serial bus. A flow table is maintained on each of the NFX integrated circuits. A SDN flow table is maintained on the NFP circuit. A copy of each of the flow tables is maintained on the NFP circuit. Another SDN flow table is maintained on the controller processor. A SDN protocol stack is executed on the controller processor. A SDN protocol message is received onto the SDN switch via one of the NFX integrated circuits. The SDN protocol message is communicated across a network link to the NFP circuit, and across the serial bus from the NFP circuit to the controller processor such that the SDN protocol message is received and processed by the SDN protocol stack executing on the controller processor.
Packet information is stored in split fashion such that a first part is stored in a first device and a second part is stored in a second device. A split packet transmission DMA engine receives an egress packet descriptor. The descriptor does not indicate where the second part is stored but contains information about the first part. Using this information, the DMA engine causes a part of the first part to be transferred from the first device to the DMA engine. Address information in the first part indicates where the second part is stored. The DMA engine uses the address information to cause the second part to be transferred from the second device to the DMA engine. When both the part of the first part and the second part are stored in the DMA engine, then the entire packet is transferred in ordered fashion to an egress device.
G06F 12/1081 - Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
G06F 13/30 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal with priority control
An island-based integrated circuit includes a configurable mesh data bus. The data bus includes four meshes. Each mesh includes, for each island, a crossbar switch and radiating half links. The half links of adjacent islands align to form links between crossbar switches. A link is implemented as two distributed credit FIFOs. In one direction, a link portion involves a FIFO associated with an output port of a first island, a first chain of registers, and a second FIFO associated with an input port of a second island. When a transaction value passes through the FIFO and through the crossbar switch of the second island, an arbiter in the crossbar switch returns a taken signal. The taken signal passes back through a second chain of registers to a credit count circuit in the first island. The credit count circuit maintains a credit count value for the distributed credit FIFO.
A registered synchronous FIFO has a tail register, internal registers, and a head register. The FIFO cannot be pushed if it is full and cannot be popped if it is empty, but otherwise can be pushed and/or popped. Within the FIFO, the internal signal fanout of incoming data circuitry and push control circuitry and is minimized and remains essentially constant regardless of the number of registers of the FIFO. The output delay of the output data also is essentially constant regardless of the number of registers of the FIFO. An incoming data value can only be written into the head or tail. If a data value is in the tail and one of the internal registers is empty, and if no push or pop is to be performed in a clock cycle, then nevertheless the data value in the tail is moved into the empty internal register in the cycle.
G06F 3/00 - Input arrangements for transferring data to be processed into a form capable of being handled by the computerOutput arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
G06F 5/00 - Methods or arrangements for data conversion without changing the order or content of the data handled
G06F 5/14 - Means for monitoring the fill levelMeans for resolving contention, i.e. conflicts between simultaneous enqueue and dequeue operations for overflow or underflow handling, e.g. full or empty flags
G06F 9/34 - Addressing or accessing the instruction operand or the result
H04B 1/38 - Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
28.
Loading a flow table with neural network determined information
A flow of packets is communicated through a data center. The data center includes multiple racks, where each rack includes multiple network devices. A group of packets of the flow is received onto an integrated circuit located in one of the network devices. The integrated circuit includes a neural network and a flow table. The neural network analyzes the group of packets and in response determines if it is likely that the flow has a particular characteristic. The neural network outputs a neural network output value that indicates if it is likely that the flow has a particular characteristic. The neural network output value, or a value derived from it, is included in a flow entry in the flow table on the integrated circuit. Packets of the flow subsequently received onto the integrated circuit are routed or otherwise processed according to the flow entry associated with the flow.
H04B 10/00 - Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
An individual score is generated for a first combination of a transmitter configuration value and a receiver configuration value. The transmitter configuration is used to configure a first physical layer circuit and the receiver configuration is used to configure a second physical layer circuit. The individual score is based on measured characteristics observed by the second physical layer circuit in response to the first configuration combination. A neighbor weighted score is then generated for the first configuration combination. The neighbor weighted score is based on measured characteristics observed by the second physical layer circuit in response to a second configuration combination that is within a first distance from the first configuration combination within a multidimensional array of configuration combinations. The individual score is summed with the neighbor weighted score to generate a final neighbor weighted score for the first configuration combination.
An exact-match flow table structure of an integrated circuit stores flow entries. Each flow entry includes a Flow Id and an action value. Each Flow Id is a multi-bit digital value that uniquely identifies a flow. A Flow Id does not include any wildcard indictor. The flow table structure cannot and does not store an indicator that any particular part of a packet should be matched against any part of a Flow Id. In one example, a packet is received onto the integrated circuit. A Flow Id is generated from the packet. If the flow table structure determines that the Flow Id is a bit-by-bit exact-match of any Flow Id of any stored flow entry, then the packet is handled according to the action value of the flow entry. If, on the other hand, there is not exact-match, then a miss indication is output from the integrated circuit.
A hardware trie structure includes a tree of internal node circuits and leaf node circuits. Each internal node is configured by a corresponding multi-bit node control value (NCV). Each leaf node can output a corresponding result value (RV). An input value (IV) supplied onto input leads of the trie causes signals to propagate through the trie such that one of the leaf nodes outputs one of the RVs onto output leads of the trie. In a transactional memory, a memory stores a set of NCVs and RVs. In response to a lookup command, the NCVs and RVs are read out of memory and are used to configure the trie. The IV of the lookup is supplied to the input leads, and the trie looks up an RV. A non-final RV initiates another lookup in a recursive fashion, whereas a final RV is returned as the result of the lookup command.
An appliance receives packets that are part of a flow pair, each packet sharing an application protocol. The appliance determines the application protocol of the packets by performing deep packet inspection (DPI) on the packets. Packet sizes are measured and converted into packet size states. Packet size states, packet sequence numbers, and packet flow directions are used to create an application protocol estimation table (APET). The APET is used during normal operation to estimate the application protocol of a flow pair without performing time consuming DPI. The appliance then determines inter-packet intervals between received packets. The inter-packet intervals are converted into inter-packet interval states. The inter-packet interval states and packet sequence numbers are used to create an inter-packet interval prediction table. The appliance then stores an inter-packet interval prediction table for each application protocol. The inter-packet interval prediction table is used during operation to predict the inter-packet interval between packets.
H04L 12/24 - Arrangements for maintenance or administration
H04L 12/851 - Traffic type related actions, e.g. QoS or priority
H04B 10/079 - Arrangements for monitoring or testing transmission systemsArrangements for fault measurement of transmission systems using an in-service signal using measurements of the data signal
33.
256-bit parallel parser and checksum circuit with 1-hot state information bus
A parser and checksum circuit includes a 256-bit data bus, IPV4, IPV6, TCP, and UDP state signal buses, a checksum summer and compare circuit, four 64-bit parsing circuits, a V6 extension processor, and a parse state context circuit. Each of the 64-bit parsing circuits includes two 32-bit parsing circuits. The data bus receives a data signal that is part of a packet. IPV4, IPV6, TCP, and UDP state signals are each configurable into 1-hot states where at most 1-bit is digital logic high. Each of the 1-hot states corresponds to a segment of a packet header of one of the IPV4, IPV6, TCP, and UDP protocols. Each 32-bit parsing circuit receives a 1-bit shifted version of the state signals received by the adjacent 32-bit parsing circuit and receives a portion of the data signal. State signals and the data signal portion are received in parallel during a single clock cycle.
A method involves compiling a first amount of high-level programming language code (for example, P4) and a second amount of a low-level programming language code (for example, C) thereby obtaining a first section of native code and a second section of native code. The high-level programming language code at least in part defines how an SDN switch performs matching in a first condition. The low-level programming language code at least in part defines how the SDN switch performs matching in a second condition. The low-level code can be a type of plugin or patch for handling special packets. The sections of native code are loaded into the SDN switch such that a first processor (for example, x86 of the host) executes the first section of native code and such that a second processor (for example, ME of an NFP on the NIC) executes the second section of native code.
A multi-processor includes a pool of processors and a common packet buffer memory. Bytes of packet data of a packet are stored in the packet buffer memory. Each of the processors has an intelligent packet data register file. One processor is tasked with processing the packet data, and its packet data register file caches a subset of the bytes of packet data. Some instructions when executed require that the packet data register file supply the execute stage of the processor with certain bytes of the packet data. If during instruction execution the intelligent packet data register file determines that it does not store some of the necessary bytes, then the register file asserts a stall signal thereby stalling the processor, and retrieves the bytes from the packet buffer memory, and then supplies the retrieved bytes to the execute stage, and de-asserts the stall signal to unstall the processor.
A novel hash range lookup command is disclosed. In an exemplary embodiment, a method includes (a) providing access to a hash table that includes hash buckets having hash entry fields; (b) receiving a novel hash lookup command; (c) using the hash lookup command to determine hash command parameters, a hashed index value, and a flow key value; (d) using the hash command parameters and the hashed index value to generate hash values (addresses) to access entry fields in a selectable number of hash buckets; (e) comparing bits of the entry value in the entry field to bits of the flow key value; (f) repeating (d) through (e) until a match is determined or until the selectable number of hash buckets and entries have been accessed; and (g) returning either an address of the entry field containing the match or a result associated with the entry field containing the match.
An egress packet modifier includes a script parser and a pipeline of processing stages. Rather than performing egress modifications using a processor that fetches and decodes and executes instructions in a classic processor fashion, and rather than storing a packet in memory and reading it out and modifying it and writing it back, the packet modifier pipeline processes the packet by passing parts of the packet through the pipeline. A processor identifies particular egress modifications to be performed by placing a script code at the beginning of the packet. The script parser then uses the code to identify a specific script of opcodes, where each opcode defines a modification. As a part passes through a stage, the stage can carry out the modification of such an opcode. As realized using current semiconductor fabrication process, the packet modifier can modify 200M packets/second at a sustained rate of up to 100 gigabits/second.
A Network Interface Device (NID) of a web hosting server implements multiple virtual NIDs. For each virtual NID there is a block in a memory of a transactional memory on the NID. This block stores configuration information that configures the corresponding virtual NID. The NID also has a single managing processor that monitors configuration of the plurality of virtual NIDs. If there is a write into the memory space where the configuration information for the virtual NIDs is stored, then the transactional memory detects this write and in response sends an alert to the managing processor. The size and location of the memory space in the memory for which write alerts are to be generated is programmable. The content and destination of the alert is also programmable.
An exact-match flow table structure stores flow entries. Each flow entry includes a Flow Id. A flow entry is generated from an incoming packet. The flow table structure determines whether there is a stored flow entry, the Flow Id of which is an exact-match for the generated Flow Id. In one novel aspect, a programmable reduce table circuit is used to generate a Flow Id. A selected subset of bits of an incoming packet is supplied as an address to an SRAM, so that the SRAM outputs a data value. The data value is supplied to a programmable lookup circuit such that the lookup circuit performs a selected type of lookup operation, and outputs a result value of a reduced number of bits. A multiplexer circuit is used to form a Flow Id such that the result value is a part of the Flow Id.
A transactional memory (TM) receives an Atomic Look-up, Add and Lock (ALAL) command across a bus from a client. The command includes a first value. The TM pulls a second value. The TM uses the first value to read a set of memory locations, and determines if any of the locations contains the second value. If no location contains the second value, then the TM locks a vacant location, adds the second value to the vacant location, and sends a result to the client. If a location contains the second value and it is not locked, then the TM locks the location and returns a result to the client. If a location contains the second value and it is locked, then the TM returns a result to the client. Each location has an associated data structure. Setting the lock field of a location locks access to its associated data structure.
An integrated circuit includes an exact-match flow table structure, a crossbar switch, and an egress packet modifier. Each flow entry includes an egress action value, an egress flow number, and an egress port number. A Flow Id is generated from an incoming packet. The Flow Id is used to obtain a matching flow entry. A portion of the packet is communicated across the crossbar switch to the egress packet modifier, along with the egress action value and flow number. The egress action value is used to obtain non-flow specific header information stored in a first egress memory. The egress flow number is used to obtain flow specific header information stored in a second egress memory. The egress packet modifier adds the header information onto the portion of the packet, thereby generating a complete packet. The complete packet is then output from an egress port indicated by the egress port number.
A networking device includes a match table maintained on a first processor. The match table includes an entry that in turn includes an entry packet count. Packets of multiple flows result in matches to the entry. A set of bypass packet counts is maintained on a second processor of the networking device. There is one bypass packet count for each of the multiple paths through the first processor. A request for a “system entry packet count” of an entry located in a match table on the first processor is received onto the networking device. All paths of all flows that could have resulted in matches of that entry are determined. The “system entry packet count” is then determined by summing the entry packet count and the bypass packet counts for all those paths. A response is output from the networking device, where the response includes the “system entry packet count”.
An apparatus and method for providing minipacket flow control. A device includes a traffic manager interface (TMI) circuit that receives a minipacket, communicates the minipacket to a physical layer circuit, and communicates a minipacket flow control signal to a scheduler circuit that controls if another minipacket is to be sent to the TMI circuit. In one example, upon receiving a minipacket the TMI circuit updates a current credit value, compares the current credit value with a credit limit value, and outputs a minipacket flow control signal that is a function of the comparison. In another example, upon receiving a minipacket the TMI circuit updates a current credit value, compares the current credit value with a credit limit value, reads and compares a credit pool value with a credit pool limit value, and outputs a minipacket flow control signal that is a function of both of the comparisons.
A networking device includes a Network Interface Device (NID) and a host. Packets are received onto the networking device via the NID. Some of the packets pass along paths from the NID to the host, whereas others do not pass to the host and are processed by the NID. A bypass packet count for each path that passes from the NID to the host is maintained on the NID. It is determined, using a match table, that one of the packets received on the NID is to be sent to the host. The packet, however, is instead sent along a bypass path without going through the host (as it should have according to the host's match tables). The path that the packet would have traversed had the packet not been sent along the bypass path is determined and the bypass packet count associated with the determined path is incremented.
A flow of packets is communicated through a data center including an electrical switch, an optical switch, and multiple racks each including multiple network devices. The optical switch can be controlled to receive packet traffic from a network device via a first optical link and to output that packet traffic to another network device via a second optical link. One network device includes a neural network that analyzes received packets of the flow. The optical switch is controlled to switch based on a result of the analysis performed. In one instance, the optical switch is controlled such that immediately prior to the switching no packet traffic passes from the first optical link and through the optical switch and to the second optical link but such that after the switching packet traffic does pass from the first optical link and through the optical switch and to the second optical link.
An exact-match flow table structure stores flow entries. Each flow entry includes a Flow Id and an action value. A flow entry is generated from an incoming packet. The flow table structure determines whether there is a stored flow entry, the Flow Id of which is an exact-match for generated Flow Id. In one novel aspect, a multiplexer circuit is used to generate Flow Ids. The multiplexer circuit includes a plurality of byte-wide multiplexer. Each respective one of the byte-wide multiplexers outputs a byte that is a corresponding respective byte of the Flow Id. The various inputs of the byte-wide multiplexers are coupled to receive various bytes of the incoming packet, various bytes of modified or compressed packet data, as well as bytes of metadata. By controlling select values supplied onto the select inputs of the multiplexer circuit, Flow Ids of different forms can be generated.
A method of performing an identical packet multicast packet ready command (common packet multicast mode operation) is described herein. A packet ready command is received from a first memory system via a bus and onto a network interface circuit. The packet ready command includes a multicast value. A communication mode is determined as a function of the multicast value. The multicast value indicates a single packet is to be communicated by the network interface circuit to a first number of destinations. A free packet command is output from the network interface circuit onto the bus. The free packet command includes a Free On Last Transfer (FOLT) value that indicates that the packet will not be freed from the first memory system by the network interface circuit once the packet is transmitted. The network interface circuit and the memory system are included on an Island-Based Network Flow Processor.
A method of Software-Defined Networking (SDN) switching. A packet of a flow is received onto a SDN switch via a NFX circuit. The NFX circuit determines that the packet matches a flow entry stored in any flow table in the NFX circuit, counts the number of packets of the flow received, and determines that the number of packets of the flow received is above a threshold value. The NFX circuit then forwards the packet to a NFP circuit in the SDN switch. The NFP circuit determines that the packet matches a flow entry stored in the flow table in the NFX and generates a new flow entry that applies to a relatively narrow subflow of packets that is forwarded to and stored the flow table in the NFX circuit. A subsequent packet of the flow is switched by the SDN switch without forwarding the packet to the NFP.
A method of performing an unicast packet ready command (unicast mode operation) is described herein. A packet ready command is received from a memory system via a bus and onto a network interface circuit. The packet ready command includes a multicast value. A communication mode is determined as a function of the multicast value. The multicast value indicates a single packet is to be communicated to a single destination by the network interface circuit. A free packet command is outputted from the network interface circuit onto the bus. The free packet command includes a Free On Last Transfer (FOLT) value that indicates that the packet is to be freed from the memory system by the network interface circuit after the packet is communicated to the network interface circuit. The network interface circuit and the memory system are included on an Island-Based Network Flow Processor.
An integrated circuit includes a first circuit, a second circuit, and a bus that couples the circuits together. The first circuit is simulated on a first simulator at the same time that the second circuit is simulated on a second simulator. A simulator plug-in is incorporated into the simulation model of the first circuit. A simulator plug-in is incorporated into the simulation model of the second circuit. If valid data is to pass from the first to second circuit across the bus during simulation, then the plug-in of the first model causes a network stack to generate a packet. The packet carries the data. After communication to the second simulator, the data is recovered from the packet, and is injected by the plug-in of the second model into the simulation of the second circuit. By exchanging data back and forth this way, multiple circuits are simulated simultaneously on different simulators.
An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. A configurable mesh data bus includes a command mesh, a pull-id mesh, and two data meshes. The configurable mesh data bus extends through all the islands. For each mesh, each island includes a centrally located crossbar switch and eight half links. Two half links extend to ports on the top edge of the island, a half link extends to a port on a right edge of the island, two half links extend to ports on the bottom edge of the island, and a half link extents to a port on the left edge of the island. Two additional links extend to functional circuitry of the island. The configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit.
A Software-Defined Networking (SDN) switch that includes external network ports for receiving external network traffic onto the SDN switch, external network ports for transmitting external network traffic out of the SDN switch, a Network Flow Switch (NFX) integrated circuit that has multiple network ports and that maintains a flow table, another NFX integrated circuit that has multiple network ports and that maintains a flow table, and a Network Flow Processor (NFP) circuit that maintains a flow table. The NFP circuit couples directly to a network port of the first NFX integrated circuit but does not couple directly to any network port of the second NFX integrated circuit. The NFP circuit sends a flow entry to one NFX integrated circuit along with an addressing label and the NFX integrated circuit uses the addressing label to determine that the flow entry is to be forwarded to the second NFX integrated circuit.
A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).
Software update information is communicated to a network appliance either across a network or from a local memory device. The software update information includes kernel data, application data, or indicator data. The network appliance includes a first storage device, a second storage device, an operating memory, a central processing unit (CPU), and a network adapter. First and second storage devices are persistent storage devices. In a first example, both kernel data and application data are updated in the network appliance in response to receiving the software update information. In a second example, only the kernel data is updated in the network appliance in response to receiving the software update information. In a third example, only the application data is updated in the network appliance in response to receiving the software update information. Indicator data included in the software update information determines the data to be updated in the network appliance.
An integrated circuit includes an input port, a first Characterize/Classify/Table Lookup and Multiplexer Circuit (CCTC), a second CCTC, and an exact-match flow table structure. The first and second CCTCs are structurally identical. The first and second CCTs are coupled together serially. In one example, an incoming packet is received onto the integrated circuit via the input port and packet information is supplied to a first characterizer of the first CCTC. Information flow passes through the classifier of the first CCT, through the Table Lookup and Multiplexer Circuit (TLMC) of the first CCT, through the characterizer of the second CCT, through the classifier of the second CCT, and out of the TLMC of the second CCT in the form of a Flow Id. The Flow Id is supplied to the exact-match flow table structure to determine whether an exact-match for the Flow Id is found in the flow table structure.
An Island-Based Network Flow Processor (IB-NFP) receives packets of many flows, and classifies them as belonging to an ordering context. These packets are distributed to a set of Worker Processors (WPs), so that each packet of the context is processed by one WP, but multiple WPs operate on packets of the context at a given time. The WPs use an atomic ticket release functionality of a transactional memory to assist in determining when to release packets to another set of Output Processors (OP). The packets are indicated to the set of OPs in the correct order, even though the WPs may complete their processing of the packets in an out-of-order fashion. For a packet that is indicated as to be released, an OP generates a “transmit command” such that the packet (or a descriptor of the packet) is then put into a properly ordered stream for output from the IB-NFP.
A multi-processor includes a pool of processors and a common packet buffer memory. Bytes of packet data of a packet are stored in the packet buffer memory. Each of the processors has an intelligent packet data register file. One processor is tasked with processing the packet data, and its packet data register file caches a subset of the bytes. Some instructions when executed require that the packet data register file supply the processor execute stage with certain bytes of the packet data. The register file includes a set of slice portions, where each slice portion is responsible for different bytes of the overall packet data. Each slice portion independently handles stalling the processor and prefetching any bytes it is responsible for. The slice portions output their bytes in a shifted and masked fashion to that the overall register file output is properly presented to the execute stage.
A method of performing an unique packet multicast packet ready command (unique packet multicast mode operation) is described herein. A packet ready command is received from a memory system via a bus and onto a network interface circuit. The packet ready command includes a multicast value. A communication mode is determined as a function of the multicast value. The multicast value indicates a plurality of packets are to be communicated to a plurality of destinations by the network interface circuit, and each of the plurality of packets are unique. A free packet command is output from the network interface circuit onto the bus. The free packet command includes a Free On Last Transfer (FOLT) value that indicates that the packets are to be freed from the memory system by the network interface circuit after the packets are communicated to the network interface circuit.
An Island-Based Network Flow Processor (IB-NFP) receives packets of many flows, and classifies each packet as belonging to one of a plurality of ordering contexts. As packets of an ordering context flow through the IB-NFP they are distributed to a set of Worker Processors (WPs). Each packet is processed by one WP, but multiple WPs are typically operating on packets of the ordering context at the same time. The ordering system handles releasing packets from the WPs to set of Output Processors (OP) in the correct order, even though WPs may complete their processing in an out-of-order fashion. One OP is responsible for generating “transmit commands” for packets of the ordering context. This OP generates a transmit command in the correct format as required by the particular egress destination circuit through which the packet will exit the IB-NFP. This architecture reduces code space, and facilitates good usage of processing resources.
A multi-processor includes a pool of processors and a common packet buffer memory. Bytes of packet data of a packet are stored in the packet buffer memory. Each processor has an intelligent packet data register file. One processor is tasked with processing the packet, and its packet data register file caches a subset of the bytes. If the register file detects a packet data prefetch trigger condition, and it does not store some of the bytes in a prefetch window, then it prefetches the bytes before such bytes are required in the execution of a subsequent instruction. The processor has instructions that configure the prefetching, that enable such prefetching, and that disable such prefetching in certain ways.
An integrated circuit includes ingress ethernet ports and egress ethernet ports. A second ingress ethernet port is configurable to operate in a selected one of a command mode and a data mode. The ingress ethernet port does not power up in the command mode and can only be put into the command mode as a result of a port modeset command being received onto an ingress ethernet port operating in the command mode. A first ingress ethernet port powers up in the command mode. In the command mode the first ingress ethernet port can receive and carry out a port modeset command. Receiving and carrying out of the port modeset command causes one of the ingress ethernet ports identified by the port modeset command to operate in the command mode. A flow table structure adapted to store flow entries is used to determine which egress ethernet port outputs a packet.
A method involving a Software-Defined Networking (SDN) switch. A packet is received onto a SDN switch via a NFX circuit. The NFX circuit determines that the packet matches no flow entry stored in any flow table in the NFX circuit and forwards the packet to a NFP circuit. The NFP circuit determines that the packet matches a first flow entry that applies to a relatively broad flow of packets stored in a flow table in the NFP circuit, generates a new flow entry that applies to a relatively narrow subflow of packets, and forwards the new flow entry to the NFX circuit that stores the new flow entry in a flow table in the NFX circuit. A subsequent packet is received onto the SDN switch via the NFX circuit and is switched using the new flow entry stored in the NFX circuit without forwarding the packet to the NFP circuit.
A multi-processor includes a pool of processors and a common packet buffer memory. Bytes of packet data of a packet are stored in the packet buffer memory. Each processor has an intelligent packet data register file. One processor is tasked with processing the packet, and its packet data register file caches a subset of the bytes. Some instructions when executed require that the packet data register file supply the execute stage of the processor with certain bytes of the packet data. The register file detects a packet data prefetch trigger condition, and in response determines if it does not store some of the bytes in a prefetch window. If it does not, then it retrieves those bytes from the packet buffer memory, so that it then has all the bytes in the prefetch window. In one example, a subsequently executed instruction uses the prefetched packet data.
An island-based network flow processor (IB-NFP) integrated circuit includes islands organized in rows. A configurable mesh event bus extends through the islands and is configured to form a local event ring. The configurable mesh event bus is configured with configuration information received via a configurable mesh control bus. The local event ring involves event ring circuits and event ring segments. In one example, a packet is received onto a first island. If an amount of a processing resource (for example, memory buffer space) available to the first island is below a threshold, then an event packet is communicated from the first island to a second island via the local event ring. In response, the second island causes a third island to communicate via a command/push/pull data bus with the first island, thereby increasing the amount of the processing resource available to the first island for handing incoming packets.
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
H04L 12/861 - Packet buffering or queuing arrangements; Queue scheduling
H04L 12/933 - Switch core, e.g. crossbar, shared memory or shared medium
A transactional memory (TM) receives a lookup command across a bus from a processor. The command includes a memory address. In response to the command, the TM pulls an input value (IV). The memory address is used to read a word containing multiple result values (RVs), multiple reference values, and multiple mask values from memory. A selecting circuit within the TM uses a starting bit position and a mask size to select a portion of the IV. The portion of the IV is a lookup key value (LKV). The LKV is masked by each mask value thereby generating multiple masked values. Each masked value is compared to a reference value thereby generating multiple comparison values. A lookup table generates a selector value based upon the comparison values. A result value is selected based on the selector value. The selected result value is then communicated to the processor via the bus.
A flow key is determined from an incoming packet. Two hash values A and B are then generated from the flow key. Hash value A is an index into a hash table to identify a hash bucket. Multiple simultaneous CAM lookup operations are performed on fields of the bucket to determine which ones of the fields store hash value B. For each populated field there is a corresponding entry in a key table and in other tables. The key table entry corresponding to each field that stores hash value B is checked to determine if that key table entry stores the original flow key. When the key table entry that stores the original flow key is identified, then the corresponding entries in the other tables are determined to be a “lookup output information value”. This value indicates how the packet is to be handled/forwarded by the network appliance.
A method for supporting in-flight packet processing is provided. Packet processing devices (microengines) can send a request for packet processing to a packet engine before a packet comes in. The request offers a twofold benefit. First, the microengines add themselves to a work queue to request for processing. Once the packet becomes available, the header portion is automatically provided to the corresponding microengine for packet processing. Only one bus transaction is involved in order for the microengines to start packet processing. Second, the microengines can process packets before the entire packet is written into the memory. This is especially useful for large sized packets because the packets do not have to be written into the memory completely when processed by the microengines.
An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. In one example, the configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit. The rectangular islands of one row are oriented in staggered relation with respect to the rectangular islands of the next row. The left and right edges of islands in a row align with left and right edges of islands two rows down in the row structure. The data bus involves multiple meshes. In each mesh, the island has a centrally located crossbar switch and six radiating half links, and half links down to functional circuitry of the island. The staggered orientation of the islands, and the structure of the half links, allows half links of adjacent islands to align with one another.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
A transactional memory (TM) receives a lookup command across a bus from a processor. The command includes a memory address. In response to the command, the TM pulls an input value (IV). The memory address is used to read a word containing multiple result values (RVs), multiple reference values, and multiple prefix values from memory. A selecting circuit within the TM uses a starting bit position and a mask size to select a portion of the IV. The portion of the IV is a lookup key value (LKV). Mask values are generated based on the prefix values. The LKV is masked by each mask value thereby generating multiple masked values that are compared to the reference values. Based on the comparison a lookup table generates a selector value that is used to select a result value. The selected result value is then communicated to the processor via the bus.
A method for receiving a packet descriptor including a priority indicator and a queue number indicating a queue stored within a first memory unit, storing a packet associated with the packet descriptor in a second memory, determining a first amount of free memory in the first memory unit, determining if the first amount of free memory is above a threshold value, writing the packet from the second memory to a third memory when the first amount of memory is above the threshold value and the priority indicator is equal to a first value, not writing the packet from the second memory unit to the third memory unit if the first amount of memory is below the threshold value or when the priority indicator is equal to a second value. The priority indicator is equal to a first value for high priority packets and a second value for low priority packets.
A circuit that receives queue number that indicates a queue stored within a memory unit and a packet descriptor that includes a drop precedence value, and in response determines an instantaneous queue depth of the queue. The instantaneous queue depth and drop precedence value are used to determine a drop probability. The drop probability is used to randomly determine if the packet descriptor should be stored in the queue. When a packet descriptor is not stored in a queue the packet associated with the packet descriptor is dropped. The queue has a first queue depth range. A first drop probability is used when the queue depth is within the first queue depth range and the drop precedence is equal to the first value. A second drop probability is used when the queue depth is within the first queue depth range and the drop precedence equal to a second value.
An apparatus and method for receiving a packet descriptor and a queue number that indicates a queue stored within a memory unit, determining a first amount of free memory in a group of packet descriptor queues, determining if the first amount of free memory is within a first range, applying a first drop probability to determine if the packet associated with the packet descriptor should be dropped when the first amount of free memory is within the first range, and applying a second drop probability to determine if the packet should be dropped when the first amount of free memory is within a second range. When it is determined that the packet is to be dropped, the packet descriptor is not stored in the queue. When it is determined that the packet is not to be dropped, the packet descriptor is stored in the queue.
A method for receiving a packet descriptor associated with a packet and a queue number indicating a queue stored within a memory unit, determining a priority level of the packet and an amount of free memory available in the memory unit. Applying a global drop probability to generate a global drop indicator and applying a queue drop probability to generate a queue drop indicator. The global drop probability is a function of the amount of free memory. The queue drop probability is a function of instantaneous queue depth or drop precedence value. The packet is transmitted whenever the priority level is high. When the priority level is low, the packet is transmitted when both the global drop indicator and the queue drop indicator are a logic low value. When the priority level is low, the packet is not transmitted when either drop indicator is a logic low value.
A transactional memory (TM) receives an Atomic Metering Command (AMC) across a bus from a processor. The command includes a memory address and a meter pair indicator value. In response to the AMC, the TM pulls an input value (IV). The TM uses the memory address to read a word including multiple credit values from a memory unit. Circuitry within the TM selects a pair of credit values, subtracts the IV from each of the pair of credit values thereby generating a pair of decremented credit values, compares the pair of decremented credit values with a threshold value, respectively, thereby generating a pair of indicator values, performs a lookup based upon the pair of indicator values and the meter pair indicator value, and outputs a selector value and a result value that represents a meter color. The selector value determines the credit values written back to the memory unit.
G06F 3/00 - Input arrangements for transferring data to be processed into a form capable of being handled by the computerOutput arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
G06F 13/00 - Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
An addressless merge command includes an identifier of an item of data, and a reference value, but no address. A first part of the item is stored in a first place. A second part is stored in a second place. To move the first part so that the first and second parts are merged, the command is sent across a bus to a device. The device translates the identifier into a first address ADR1, and uses ADR1 to read the first part. Stored in or with the first part is a second address ADR2 indicating where the second part is stored. The device extracts ADR2, and uses ADR1 and ADR2 to issue bus commands. Each bus command causes a piece of the first part to be moved. When the entire first part has been moved, the device returns the reference value to indicate that the merge command has been completed.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
A chained Command/Push/Pull (CPP) bus command is output by a first device and is sent from a CPP bus master interface across a set of command conductors of a CPP bus to a second device. The chained CPP command includes a reference value. The second device decodes the command, in response determines a plurality of CPP commands, and outputs the plurality of CPP commands onto the CPP bus. The second device detects when the plurality of CPP commands have been completed, and in response returns the reference value back to the CPP bus master interface of the first device via a set of data conductors of the CPP bus. The reference value indicates to the first device that an overall operation of the chained CPP command has been completed.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing and handling the storing of packet portions into the memory, a packet engine is provided. The PDRSDs use a PPI (Packet Portion Identifier) Addressing Mode (PAM) in communicating with the packet engine and in instructing the packet engine to store packet portions. The packet engine uses linear memory addressing to write the packet portions into the memory, and to read the packet portions from the memory.
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing and handling the storing of packet portions into the memory, a packet engine is provided. The PDRSDs use a PPI (Packet Portion Identifier) Addressing Mode (PAM) in communicating with the packet engine and in instructing the packet engine to store packet portions. A PDRSD requests a PPI from the packet engine in a PPI allocation request, and is allocated a PPI by the packet engine in a PPI allocation response, and then tags the packet portion to be written with the PPI and sends the packet portion and the PPI to the packet engine.
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing and handling the storing of packet portions into the memory, a packet engine is provided. A device interacting with the packet engine can use a PPI (Packet Portion Identifier) Addressing Mode (PAM) in communicating with the packet engine and in instructing the packet engine to store packet portions. Alternatively, the device can use a Linear Addressing Mode (LAM) to communicate with the packet engine. A PAM/LAM selection code field in a bus transaction value sent to the packet engine indicates whether PAM or LAM will be used.
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing the storing of packet portions into the memory, a packet engine is provided. The PDRSDs use a PPI addressing mode in communicating with the packet engine and in instructing the packet engine to store packet portions. A PDRSD requests a PPI from the packet engine, and is allocated a PPI by the packet engine, and then tags the packet portion to be written with the PPI and sends the packet portion and the PPI to the packet engine. Once the packet portion has been processed, a PPI de-allocation command causes the packet engine to de-allocate the PPI so that the PPI is available for allocating in association with another packet portion.
In response to receiving a novel “Return Available PPI Credits” command from a credit-aware device, a packet engine sends a “Credit To Be Returned” (CTBR) value it maintains for that device back to the credit-aware device, and zeroes out its stored CTBR value. The credit-aware device adds the credits returned to a “Credits Available” value it maintains. The credit-aware device uses the “Credits Available” value to determine whether it can issue a PPI allocation request. The “Return Available PPI Credits” command does not result in any PPI allocation or de-allocation. In another novel aspect, the credit-aware device is permitted to issue one PPI allocation request to the packet engine when its recorded “Credits Available” value is zero or negative. If the PPI allocation request cannot be granted, then it is buffered in the packet engine, and is resubmitted within the packet engine, until the packet engine makes the PPI allocation.
In response to receiving a “Return Available PPI Credits” command from a credit-aware (CA) device, a packet engine sends a “Credit To Be Returned” (CTBR) value it maintains for that device back to the CA device, and zeroes out its stored CTBR value. The CA device adds the credits returned to a “Credits Available” value it maintains. The CA device uses the “Credits Available” value to determine whether it can issue a PPI allocation request. The “Return Available PPI Credits” command does not result in any PPI allocation or de-allocation. In another aspect, the CA device issues one PPI allocation request to the packet engine when its recorded “Credits Available” value is zero or negative. If the PPI allocation request cannot be granted, then it is buffered in the packet engine, and is resubmitted within the packet engine, until the packet engine makes the PPI allocation.
A processor includes a hash register and a hash generating circuit. The hash generating circuit includes a novel programmable nonlinearizing function circuit as well as a modulo-2 multiplier, a first modulo-2 summer, a modulor-2 divider, and a second modulo-2 summer. The nonlinearizing function circuit receives a hash value from the hash register and performs a programmable nonlinearizing function, thereby generating a modified version of the hash value. In one example, the nonlinearizing function circuit includes a plurality of separately enableable S-box circuits. The multiplier multiplies the input data by a programmable multiplier value, thereby generating a product value. The first summer sums a first portion of the product value with the modified hash value. The divider divides the resulting sum by a fixed divisor value, thereby generating a remainder value. The second summer sums the remainder value and the second portion of the input data, thereby generating a hash result.
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
85.
Generating a hash using S-box nonlinearizing of a remainder input
A processor includes a hash register and a hash generating circuit. The hash generating circuit includes a novel programmable nonlinearizing function circuit as well as a modulo-2 multiplier, a first modulo-2 summer, a modulor-2 divider, and a second modulo-2 summer. The nonlinearizing function circuit receives a hash value from the hash register and performs a programmable nonlinearizing function, thereby generating a modified version of the hash value. In one example, the nonlinearizing function circuit includes a plurality of separately enableable S-box circuits. The multiplier multiplies the input data by a programmable multiplier value, thereby generating a product value. The first summer sums a first portion of the product value with the modified hash value. The divider divides the resulting sum by a fixed divisor value, thereby generating a remainder value. The second summer sums the remainder value and the second portion of the input data, thereby generating a hash result.
H04L 9/32 - Arrangements for secret or secure communicationsNetwork security protocols including means for verifying the identity or authority of a user of the system
G09C 1/00 - Apparatus or methods whereby a given sequence of signs, e.g. an intelligible text, is transformed into an unintelligible sequence of signs by transposing the signs or groups of signs or by replacing them by others according to a predetermined system
86.
Efficient search key controller with standard bus interface, external memory interface, and interlaken lookaside interface
A device includes a Standard Bus Interface Circuit (SBIC), a memory interface circuit, a Direct Memory Access (DMA) controller, and an Interlaken Look-Aside (ILA) interface circuit. A search key data set including multiple search keys is received via the SBIC and is written to an external memory via the memory interface circuit. The DMA controller receives a descriptor via the SBIC, generates a search key data request, receives the search key data set, and selects a single search key from the set. The ILA interface circuit receives the search key from the DMA controller, generates and ILA packet including the search key, and sends the ILA packet to an external transactional memory device that generates a result data value. The DMA controller receives the result data value via the ILA interface circuit, writes the result data value to the external memory, and sends a DMA completion notification.
G06F 13/00 - Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
G06F 12/1081 - Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
G06F 13/42 - Bus transfer protocol, e.g. handshakeSynchronisation
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
87.
Island-based network flow processor with efficient search key processing
A Island-Based Network Flow Processor (IBNFP) includes a memory and a processor located on a first island, a Direct Memory Access (DMA) controller located on a second island, and an Interlaken Look-Aside (ILA) interface circuit and an interface circuit located on a third island. A search key data set including multiple search keys is stored in the memory. A descriptor is generated by the processor and is sent to the DMA controller, which generates a search key data request, receives the search key data set, and selects a single search key. The ILA interface circuit receives the search key, generates and ILA packet including the search key that is sent to an external transactional memory device that generates a result data value. The DMA controller receives the result data value via the ILA interface circuit, writes the result data value to the memory, and sends a DMA completion notification.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
G06F 13/42 - Bus transfer protocol, e.g. handshakeSynchronisation
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
An efficient search key processing method includes writing a first and a second search key data set to a memory, where the search key data sets are written to memory on a word by word basis. Each of the first and second search key data sets includes a header indicating a common lookup operation to be performed and a string of search keys. The header is immediately followed in memory by a search key. The search keys are located contiguously in the memory. At least one word contains search keys from the first and second search key data sets. The memory is read word by word. A first plurality of lookup command messages are sent based on the search keys included in the first search key data set. A second plurality of lookup command messages are sent based on the search keys included in the second search key data set.
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
89.
Multi-processor with efficient search key processing
A multi-processor includes a shared memory that stores a search key data set including multiple search keys, a processor, a Direct Memory Access (DMA) controller, and an Interlaken Look-Aside (ILA) interface circuit. The processor generates a descriptor that is sent to the DMA controller causing the DMA controller to read the search key data set. The DMA controller selects a single search key from the set and generates a lookup command message that is communicated to the ILA interface circuit. The ILA interface circuit generates an ILA packet that includes the single search key and sends the ILA packet to an external transactional memory device that generates a result data value. The result data value is communicated back to the DMA controller via the ILA interface circuit. The DMA controller stores the result data value in the shared memory and notifies the processor that the DMA process has completed.
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
90.
Inverse PCP flow remapping for PFC pause frame generation
An overflow threshold value is stored for each of a plurality of virtual channels. A link manager maintains, for each virtual channel, a buffer count. If the buffer count for a virtual channel is detected to exceed the overflow threshold value for a virtual channel whose originating PCP flows were merged, then a PFC (Priority Flow Control) pause frame is generated where multiple ones of the priority class enable bits are set to indicate that multiple PCP flows should be paused. For the particular virtual channel that is overloaded, an Inverse PCP Remap LUT (IPRLUT) circuit performs inverse PCP mapping, including merging and/or reordering mapping, and outputs an indication of each of those PCP flows that is associated with the overloaded virtual channel. Associated physical MAC port circuitry uses this information to generate the PFC pause frame so that the appropriate multiple enable bits are set in the pause frame.
A Network Flow Processor (NFP) integrated circuit receives, via each of a first plurality of physical MAC ports, one or more PCP (Priority Code Point) flows. The NFP also maintains, for each of a second plurality of virtual channels, a linked list of buffers. There is one port enqueue engine for each physical MAC port. For each PCP flow received via the physical MAC port associated with a port enqueue engine, the port enqueue engine causes frame data of the flow to be loaded into one particular linked list of buffers. Each port enqueue engine has a lookup table circuit that is configurable to cause multiple PCP flows to be merged so that the frame data for the multiple flows is all assigned to the same one virtual channel. Due to the PCP flow merging, the second number can be smaller than the first number multiplied by eight.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
A Network Flow Processor (NFP) integrated circuit receives, via each of a plurality of physical MAC ports, PCP (Priority Code Point) flows. The NFP also maintains, for each of a plurality of virtual channels, a linked list of buffers. There is one port enqueue engine for each physical MAC port. For each PCP flow received via the physical MAC port associated with a port enqueue engine, the engine causes frame data of the flow to be loaded into one particular linked list of buffers. Each engine has a lookup table circuit that is configurable so that the relative priorities of the PCP flows are reordered as the PCP flows are assigned to virtual channels. A PCP flow with a higher PCP value can be assigned to a lower priority virtual channel, whereas a PCP flow with a lower PCP value can be assigned to a higher priority virtual channel.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
Incoming frame data is stored in a plurality of dual linked lists of buffers in a pipelined memory. The dual linked lists of buffers are maintained by a link manager. The link manager maintains, for each dual linked list of buffers, a first head pointer, a second head pointer, a first tail pointer, a second tail pointer, a head pointer active bit, and a tail pointer active bit. The first head and tail pointers are used to maintain the first linked list of the dual linked list. The second head and tail pointers are used to maintain the second linked list of the dual linked list. Due to the pipelined nature of the memory, the dual linked list system can be popped to supply dequeued values at a sustained rate of more than one value per the read access latency time of the pipelined memory.
A pipelined run-to-completion processor has a special tripwire bus port and executes a novel tripwire instruction. Execution of the tripwire instruction causes the processor to output a tripwire value onto the port during a clock cycle when the tripwire instruction is being executed. A first multi-bit value of the tripwire value is data that is output from registers, and/or flags, and/or pointers, and/or data values stored in the pipeline. A field of the tripwire instruction specifies what particular stored values will be output as the first multi-bit value. A second multi-bit value of the tripwire value is a number that identifies the particular processor that output the tripwire value. The processor has a TE enable/disable control bit. This bit is programmable by a special instruction to disable all tripwire instructions. If disabled, a tripwire instruction is fetched and decoded but does not cause the output of a tripwire value.
An integrated circuit includes a pool of processors and a Tripwire Data Merging and Collision Detection Circuit (TDMCDC). Each processor has a special tripwire bus port. Execution of a novel tripwire instruction causes the processor to output a tripwire value onto its tripwire bus port. Each respective tripwire bus port is coupled to a corresponding respective one of a plurality of tripwire bus inputs of the TDMCDC. The TDMCDC receives tripwire values from the processors and communicates them onto a consolidated tripwire bus. From the consolidated bus the values are communicated out of the integrated circuit and to a debug station. If more than one processor outputs a valid tripwire value at a given time, then the TDMCDC asserts a collision bit signal that is communicated along with the tripwire value. Receiving tripwire values onto the debug station facilitates use of the debug station in monitoring and debugging processor code.
A pipelined run-to-completion processor executes a conditional skip instruction. If a predicate condition as specified by a predicate code field of the skip instruction is true, then the skip instruction causes execution of a number of instructions following the skip instruction to be “skipped”. The number of instructions to be skipped is specified by a skip count field of the skip instruction. In some examples, the skip instruction includes a “flag don't touch” bit. If this bit is set, then neither the skip instruction nor any of the skipped instructions can change the values of the flags. Both the skip instruction and following instructions to be skipped are decoded one by one in sequence and pass through the processor pipeline, but the execution stage is prevented from carrying out the instruction operation of a following instruction if the predicate condition of the skip instruction was true.
A pipelined run-to-completion processor can decode three instructions in three consecutive clock cycles, and can also execute the instructions in three consecutive clock cycles. The first instruction causes the ALU to generate a value which is then loaded due to execution of the first instruction into a register of a register file. The second instruction accesses the register and loads the value into predicate bits in a register file read stage. The predicate bits are loaded in the very next clock cycle following the clock cycle in which the second instruction was decoded. The third instruction is a conditional instruction that uses the values of the predicate bits as a predicate code to determine a predicate function. If a predicate condition (as determined by the predicate function as applied to flags) is true then an instruction operation of the third instruction is carried out, otherwise it is not carried out.
An integrated circuit receives a DDR (Double Data Rate) data signal and an associated DDR clock signal, and communicates those signals from integrated circuit input terminals a substantial distance across the integrated circuit to a subcircuit that then receives and uses the DDR data. Within the integrated circuit, a DDR retiming circuit receives the DDR data signal and the associated DDR clock signal from the terminals. The DDR retiming circuit splits the DDR data signal into two components, and then transmits those two components over the substantial distance toward the subcircuit. The subcircuit then recombines the two components back into a single DDR data signal and supplies the DDR data signal and the DDR clock signal to the subcircuit. The DDR data signal and the DDR clock signal are supplied to the subcircuit in such a way that setup and hold time requirements of the subcircuit are met.
H03K 19/00 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits
G11C 8/18 - Address timing or clocking circuitsAddress control signal generation or management, e.g. for row address strobe [RAS] or column address strobe [CAS] signals
G11C 8/06 - Address interface arrangements, e.g. address buffers
99.
Kick-started run-to-completion processor having no instruction counter
A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor.
A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor.