On 12/6/22 21:43, Jakub Kicinski wrote:
On Mon, 5 Dec 2022 21:44:09 +0100 Justin Iurman wrote:
Please revert this patch.
Many people use FQ qdisc, where packets are waiting for their Earliest
Departure Time to be released.
The IOAM queue depth is a very important value and is already used.
Can you say more about the use? What signal do you derive from it?
I do track qlen on Meta's servers but haven't found a strong use
for it yet (I did for backlog drops but not the qlen itself).
The specification goal of the queue depth was initially to be able to
track the entire path with a detailed view for packets or flows (kind of
a zoom on the interface to have details about its queues). With the
current definition/implementation of the queue depth, if only one queue
is congested, you're able to know it. Which doesn't necessarily mean
that all queues are full, but this one is and there might be something
going on. And this is something operators might want to be able to
detect precisely, for a lot of use cases depending on the situation. On
the contrary, if all queues are full, then you could deduce that as well
for each queue separately, as soon as a packet is assigned to it. So I
think that with "queue depth = sum(queues)", you don't have details and
you're not able to detect a single queue congestion, while with "queue
depth = queue" you could detect both. One might argue that it's fine to
only have the aggregation in some situation. I'd say that we might need
both, actually. Which is technically possible (even though expensive, as
Eric mentioned) thanks to the way it is specified by the RFC, where some
freedom was intentionally given. I could come up with a solution for that.
Also, the draft says:
5.4.2.7. queue depth
The "queue depth" field is a 4-octet unsigned integer field. This
field indicates the current length of the egress interface queue of
the interface from where the packet is forwarded out. The queue
depth is expressed as the current amount of memory buffers used by
the queue (a packet could consume one or more memory buffers,
depending on its size).
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| queue depth |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It is relatively clear that the egress interface is the aggregate
egress interface,
not a subset of the interface.
Correct, even though the definition of an interface in RFC 9197 is quite
abstract (see the end of section 4.4.2.2: "[...] could represent a
physical interface, a virtual or logical interface, or even a queue").
If you have 32 TX queues on a NIC, all of them being backlogged (line rate),
sensing the queue length of one of the queues would give a 97% error
on the measure.
Why would it? Not sure I get your idea based on that example.
Because it measures the length of a single queue not the device.
Yep, I figured that out after the off-list discussion we've had with Eric.
So my plan would be, if you all agree with, to correct and repost this
patch to fix the NULL qdisc issue. Then, I'd come with a solution to
allow both (with and without aggregation of queues) and post it on
net-next. But again, if the consensus is to revert this patch (which I
think would bring no benefit IMHO), then so be it. Thoughts?