This approach implemented in previous patches is not trivial and deserves small description. --- src/util/virnetdevbandwidth.c | 68 +++++++++++++++++++++++++++++++++++++--- 1 files changed, 62 insertions(+), 6 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index b4ffc29..d32c7db 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -92,13 +92,69 @@ virNetDevBandwidthSet(const char *ifname, if (virCommandRun(cmd, NULL) < 0) goto cleanup; + /* If we are creating hierarchical class, all non guaranteed traffic + * goes to 1:2 class which will adjust 'rate' dynamically as NICs with + * guaranteed throughput are plugged and unplugged. Class 1:1 is there + * so we don't exceed the maximum limit for network. For each NIC with + * guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0'. + * + * To get a picture how this works: + * + * +-----+ +---------+ +-----------+ +-----------+ +-----+ + * | | | qdisc | | class 1:1 | | class 1:2 | | | + * | NIC | | def 1:2 | | rate | | rate | | sfq | + * | | --> | | --> | peak | -+-> | peak | --> | | + * +-----+ +---------+ +-----------+ | +-----------+ +-----+ + * | + * | +-----------+ +-----+ + * | | class 1:3 | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * | +-----------+ +-----+ + * ... + * | +-----------+ +-----+ + * | | class 1:n | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * +-----------+ +-----+ + * + * After the routing decision, when is it clear a packet is to be send + * via NIC, it is sent to root qdisc (queueing discipline). In this case + * HTB (Hierarchical Token Bucket). It has only one direct child class + * (with id 1:1) which shapes the overall rate that is sent through NIC. + * This class have at least one child (1:2). This is meant for whole + * non-privileged (non guaranteed) traffic from all domains. Then, for + * each interface with guaranteed throughput a separate class (1:n) is + * created. Imagine a class is a box. Whenever a packet ends up in a + * class it is stored in this box until a kernel sends it in which case + * it is removed from box. Packets are placed into boxes based on rules + * (filters) - e.g. depending on destination IP/MAC address. If there is + * no rule to be applied, root qdisc have a default where such packets + * go (1:2 in this case). Packets come in over and over again and boxes + * get filled more and more. Imagine that kernel sends packets just once + * a second. So it starts to traverse through this tree. It starts with + * root qdisc and over 1:1 it gets to 1:2. It sends packets up to its + * 'rate'. Then it takes 1:3 and again sends packets up to its 'rate'. + * And the whole process is repeated until 1:n is processed. So now we + * have ensured each class its guaranteed bandwidth. If the sum of sent + * data doesn't exceed 'rate' in 1:1 class, we can go further and send + * more packets. The rest of available bandwidth is distributed to + * 1:2,1:3...1:n classes by ratio of their 'rate'. As soon as root + * 'rate' limit is reached or there are no more packets to send, we stop + * sending and wait another second. Each class has SFQ qdisc which + * shuffles packets in boxes stochastically, so one sender could not + * starve others. + * + * Therefore, whenever we want to plug a new guaranteed interface, we + * need to create a new class and adjust 'rate' of 1:2 class. When + * unplugging we do the exact opposite - remove associated class, and + * adjust the 'rate'. + * + * This description is rather longer and you'd better read it before you + * start digging into this :) + */ if (hierarchical_class) { - /* If we are creating hierarchical class, all non guaranteed traffic - * goes to 1:2 class which will adjust 'rate' dynamically as NICs with - * guaranteed throughput are plugged and unplugged. Class 1:1 is there - * so we don't exceed the maximum limit for network. For each NIC with - * guaranteed throughput a separate classid will be created. - * NB '1:' is just a shorter notation of '1:0' */ virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", -- 1.7.8.6 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list