TBF and SFQ can't be used together. The changes to make TBF classful kept the limit inside TBF for compatibility. This means TBF calculates how much bytes are in the leaf qdisc and drops if the limit is exceeded. This is stupid for two reasons:
1. The limit is checked in the leaf qdisc (default: bfifo) anyways, this is enough for backward compatibility. Turning the inner qdisc into a pfifo gives you two limits, N bytes from the TBF and M packets from the pfifo.
2. With SFQ it is not possible for TBF to know how much bytes are in the queue because SFQ may return NET_XMIT_CN, meaning that this or another packet has been dropped. The result is that either the limit is not kept (underestimate) or worse, the qdisc stops accepting packets at some point (overestimate).
So to use TBF+SFQ you need the attached (untested) patch which removes limit checking from TBF and relies on the inner qdisc to do the right thing.
For you main question: Yes it will work as you expect. SFQ provides fairness for single flows and TBF limits the total output rate.
Best regards, Patrick
Jon Zeeff wrote:
How do sfq and tbf interact when packets need to be dropped?
Example:
two flows are entering the linux router, flow A is 1 mbit and flow B is 3 mbits (total = 4). We use tbf to limit outgoing bandwidth to 2 mbits (further downstream there is a bottleneck).
If we just used tbf, it would be dropping 50% of all packets - ie, flow A would see about 500K
of it's traffic getting through and flow B would see about 1.5mbits getting through.
This isn't "fair" - flow A should not have any of its packets dropped and flow B should be
limited to 1 mbit. So we add sfq.
What's not clear to me is if this will accomplish what I want. If the tbf algorithm occurs before
sfq, tbf will drop 50% of all packets. After that, sfq will have no effect. If the sfq algorithm
is performed first, then all 4 mbits of the resulting traffic is sent to tbf, then tbf will drop
packets from both flows.
It would seem that there needs to be some interaction between the algorithms - tbf needs to
signal to sqf that more packets can be sent. But the packet dropping should occur in sqf (where fairness is understood) not in tbf.
So question: will it work as I want or is a new qdisk needed?
note: I actually have hundreds of flows, so I don't want to classify them manually. I also don't
want hard allocations of bandwidth. If there is only flow B, 2 mbits of it should be allowed to pass.
Thanks.
===== net/sched/sch_tbf.c 1.11 vs edited ===== --- 1.11/net/sched/sch_tbf.c Tue Nov 18 01:14:58 2003 +++ edited/net/sched/sch_tbf.c Tue Nov 18 01:45:15 2003 @@ -136,7 +136,7 @@ struct tbf_sched_data *q = (struct tbf_sched_data *)sch->data; int ret; - if (skb->len > q->max_size || sch->stats.backlog + skb->len > q->limit) { + if (skb->len > q->max_size) { sch->stats.drops++; #ifdef CONFIG_NET_CLS_POLICE if (sch->reshape_fail == NULL || sch->reshape_fail(skb, sch)) @@ -152,7 +152,6 @@ } sch->q.qlen++; - sch->stats.backlog += skb->len; sch->stats.bytes += skb->len; sch->stats.packets++; return 0; @@ -163,10 +162,8 @@ struct tbf_sched_data *q = (struct tbf_sched_data *)sch->data; int ret; - if ((ret = q->qdisc->ops->requeue(skb, q->qdisc)) == 0) { + if ((ret = q->qdisc->ops->requeue(skb, q->qdisc)) == 0) sch->q.qlen++; - sch->stats.backlog += skb->len; - } return ret; } @@ -178,7 +175,6 @@ if ((len = q->qdisc->ops->drop(q->qdisc)) != 0) { sch->q.qlen--; - sch->stats.backlog -= len; sch->stats.drops++; } return len; @@ -224,7 +220,6 @@ q->t_c = now; q->tokens = toks; q->ptokens = ptoks; - sch->stats.backlog -= len; sch->q.qlen--; sch->flags &= ~TCQ_F_THROTTLED; return skb; @@ -253,7 +248,6 @@ if (q->qdisc->ops->requeue(skb, q->qdisc) != NET_XMIT_SUCCESS) { /* When requeue fails skb is dropped */ sch->q.qlen--; - sch->stats.backlog -= len; sch->stats.drops++; } @@ -269,7 +263,6 @@ qdisc_reset(q->qdisc); skb_queue_purge(&sch->q); - sch->stats.backlog = 0; PSCHED_GET_TIME(q->t_c); q->tokens = q->buffer; q->ptokens = q->mtu; @@ -456,7 +449,6 @@ *old = xchg(&q->qdisc, new); qdisc_reset(*old); sch->q.qlen = 0; - sch->stats.backlog = 0; sch_tree_unlock(sch); return 0;