Re: [RFC net-next 2/2] ipv6: ioam: Support for Buffer occupancy data field

Justin Iurman <justin.iurman@xxxxxxxxx> · Tue, 7 Dec 2021 17:35:49 +0100 (CET)

On Dec 7, 2021, at 4:50 PM, Jakub Kicinski kuba@xxxxxxxxxx wrote:
> On Tue, 7 Dec 2021 12:54:04 +0100 (CET) Justin Iurman wrote:
>> >> The function kmem_cache_size is used to retrieve the size of a slab
>> >> object. Note that it returns the "object_size" field, not the "size"
>> >> field. If needed, a new function (e.g., kmem_cache_full_size) could be
>> >> added to return the "size" field. To match the definition from the
>> >> draft, the number of bytes is computed as follows:
>> >> 
>> >> slabinfo.active_objs * size
>> > 
>> > Implementing the standard is one thing but how useful is this
>> > in practice?
>> 
>> IMHO, very useful. To be honest, if I were to implement only a few data
>> fields, these two would be both included. Take the example of CLT [1]
>> where the queue length data field is used to detect low-level issues
>> from inside a L5-7 distributed tracing tool. And this is just one
>> example among many others. The queue length data field is very specific
>> to TX queues, but we could also use the buffer occupancy data field to
>> detect more global loads on a node. Actually, the goal for operators
>> running their IOAM domain is to quickly detect a problem along a path
>> and react accordingly (human or automatic action). For example, if you
>> monitor TX queues along a path and detect an increasing queue on a
>> router, you could choose to, e.g.,  rebalance its queues. With the
>> buffer occupancy, you could detect high-loaded nodes in general and,
>> e.g., rebalance traffic to another branch. Again, this is just one
>> example among others. Apart from more accurate ECMPs, you could for
>> instance deploy a smart (micro)service selection based on different
>> metrics, etc.
>> 
>>   [1] https://github.com/Advanced-Observability/cross-layer-telemetry
> 
> Ack, my question was more about whether the metric as implemented

Oh, sorry about that.

> provides the best signal. Since the slab cache scales dynamically
> (AFAIU) it's not really a big deal if it's full as long as there's
> memory available on the system.

Well, I got the same understanding as you. However, we do not provide a
value meaning "X percent used" just because it wouldn't make much sense,
as you pointed out. So I think it is sound to have the current value,
even if it's a quite dynamic one. Indeed, what's important here is to
know how many bytes are used and this is exactly what it does. If a node
is under heavy load, the value would be hell high. The operator could
define a threshold for each node resp. and detect abnormal values.

We probably want the metadata included for accuracy as well (e.g.,
kmem_cache_size vs new function kmem_cache_full_size).