1) Fix some typos. 2) Change mode of the punctuation from full to half, eg.’,“ . So that the punctuation can be read at console. Signed-off-by: Shan Wei<shanwei88@xxxxxxxxx> --- v2: fix the broken in patchwork. I feel uncertain when reading following contents that no variable in rps_dev_flow_table or softnet_data records the length of the current backlog. Just last_qtail variable pointers the tail of the backlog. "The counter in rps_dev_flow_table values records the length of the current CPU's backlog when a packet in this flow was last enqueued. " If missing something, please correct me. --- Documentation/networking/scaling.txt | 26 +++++++++++++------------- 1 files changed, 13 insertions(+), 13 deletions(-) diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index a177de2..1215fcc 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -26,7 +26,7 @@ queues to distribute processing among CPUs. The NIC distributes packets by applying a filter to each packet that assigns it to one of a small number of logical flows. Packets for each flow are steered to a separate receive queue, which in turn can be processed by separate CPUs. This mechanism is -generally known as “Receive-side Scaling” (RSS). The goal of RSS and +generally known as "Receive-side Scaling" (RSS). The goal of RSS and the other scaling techniques is to increase performance uniformly. Multi-queue distribution can also be used for traffic prioritization, but that is not the focus of these techniques. @@ -42,7 +42,7 @@ indirection table and reading the corresponding value. Some advanced NICs allow steering packets to queues based on programmable filters. For example, webserver bound TCP port 80 packets -can be directed to their own receive queue. Such “n-tuple” filters can +can be directed to their own receive queue. Such "n-tuple" filters can be configured from ethtool (--config-ntuple). ==== RSS Configuration @@ -104,7 +104,7 @@ RSS. Being in software, it is necessarily called later in the datapath. Whereas RSS selects the queue and hence CPU that will run the hardware interrupt handler, RPS selects the CPU to perform protocol processing above the interrupt handler. This is accomplished by placing the packet -on the desired CPU’s backlog queue and waking up the CPU for processing. +on the desired CPU's backlog queue and waking up the CPU for processing. RPS has some advantages over RSS: 1) it can be used with any NIC, 2) software filters can easily be added to hash over new protocols, 3) it does not increase hardware device interrupt rate (although it does @@ -116,20 +116,20 @@ netif_receive_skb(). These call the get_rps_cpu() function, which selects the queue that should process a packet. The first step in determining the target CPU for RPS is to calculate a -flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash +flow hash over the packet's addresses or ports (2-tuple or 4-tuple hash depending on the protocol). This serves as a consistent hash of the associated flow of the packet. The hash is either provided by hardware or will be computed in the stack. Capable hardware can pass the hash in the receive descriptor for the packet; this would usually be the same hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in skb->rx_hash and can be used elsewhere in the stack as a hash of the -packet’s flow. +packet's flow. Each receive hardware queue has an associated list of CPUs to which RPS may enqueue packets for processing. For each received packet, an index into the list is computed from the flow hash modulo the size of the list. The indexed CPU is the target for processing the packet, -and the packet is queued to the tail of that CPU’s backlog queue. At +and the packet is queued to the tail of that CPU's backlog queue. At the end of the bottom half routine, IPIs are sent to any CPUs for which packets have been queued to their backlog queue. The IPI wakes backlog processing on the remote CPU, and any queued packets are then processed @@ -208,7 +208,7 @@ The counter in rps_dev_flow_table values records the length of the current CPU's backlog when a packet in this flow was last enqueued. Each backlog queue has a head counter that is incremented on dequeue. A tail counter is computed as head counter + queue length. In other words, the counter -in rps_dev_flow_table[i] records the last element in flow i that has +in rps_dev_flow[i] records the last element in flow i that has been enqueued onto the currently designated CPU for flow i (of course, entry i is actually selected by hash and multiple flows may hash to the same entry i). @@ -218,13 +218,13 @@ CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table and the rps_dev_flow table of the queue that the packet was received on are compared. If the desired CPU for the flow (found in the rps_sock_flow table) matches the current CPU (found in the rps_dev_flow -table), the packet is enqueued onto that CPU’s backlog. If they differ, +table), the packet is enqueued onto that CPU's backlog. If they differ, the current CPU is updated to match the desired CPU if one of the following is true: - The current CPU's queue head counter >= the recorded tail counter value in rps_dev_flow[i] -- The current CPU is unset (equal to NR_CPUS) +- The current CPU is unset (equal to RPS_NO_CPU) - The current CPU is offline After this check, the packet is sent to the (possibly updated) current @@ -235,7 +235,7 @@ CPU. ==== RFS Configuration -RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on +RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on by default for SMP). The functionality remains disabled until explicitly configured. The number of entries in the global flow table is set through: @@ -258,7 +258,7 @@ For a single queue device, the rps_flow_cnt value for the single queue would normally be configured to the same value as rps_sock_flow_entries. For a multi-queue device, the rps_flow_cnt for each queue might be configured as rps_sock_flow_entries / N, where N is the number of -queues. So for instance, if rps_flow_entries is set to 32768 and there +queues. So for instance, if rps_sock_flow_entries is set to 32768 and there are 16 configured receive queues, rps_flow_cnt for each queue might be configured as 2048. @@ -272,7 +272,7 @@ the application thread consuming the packets of each flow is running. Accelerated RFS should perform better than RFS since packets are sent directly to a CPU local to the thread consuming the data. The target CPU will either be the same CPU where the application runs, or at least a CPU -which is local to the application thread’s CPU in the cache hierarchy. +which is local to the application thread's CPU in the cache hierarchy. To enable accelerated RFS, the networking stack calls the ndo_rx_flow_steer driver function to communicate the desired hardware @@ -285,7 +285,7 @@ The hardware queue for a flow is derived from the CPU recorded in rps_dev_flow_table. The stack consults a CPU to hardware queue map which is maintained by the NIC driver. This is an auto-generated reverse map of the IRQ affinity table shown by /proc/interrupts. Drivers can use -functions in the cpu_rmap (“CPU affinity reverse map”) kernel library +functions in the cpu_rmap ("CPU affinity reverse map") kernel library to populate the map. For each CPU, the corresponding queue in the map is set to be one whose processing CPU is closest in cache locality. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html