Re: [PATCH net-next v4 4/4] net: tun: track dropped skb via kfree_skb_reason()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jakub,

On 3/2/22 11:17 AM, Jakub Kicinski wrote:
> On Wed, 2 Mar 2022 10:19:30 -0800 Dongli Zhang wrote:
>> On 3/1/22 6:50 PM, Jakub Kicinski wrote:
>>> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote:  
>>>> +	SKB_DROP_REASON_SKB_PULL,	/* failed to pull sk_buff data */
>>>> +	SKB_DROP_REASON_SKB_TRIM,	/* failed to trim sk_buff data */  
>>>
>>> IDK if these are not too low level and therefore lacking meaning.
>>>
>>> What are your thoughts David?
>>>
>>> Would it be better to up level the names a little bit and call SKB_PULL
>>> something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe
>>> "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN?  
>>
>> This is for device driver and I think for most of cases the people understanding
>> source code will be involved. I think SKB_PULL is more meaningful to people
>> understanding source code.
>>
>> The functions to pull data to skb is commonly used with the same pattern, and
>> not only for ETH_HLEN. E.g., I randomly found below in kernel source code.
>>
>> 1071 static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb)
>> 1072 {
>> ... ...
>> 1102         pulled_sci = pskb_may_pull(skb, macsec_extra_len(true));
>> 1103         if (!pulled_sci) {
>> 1104                 if (!pskb_may_pull(skb, macsec_extra_len(false)))
>> 1105                         goto drop_direct;
>> 1106         }
>> ... ...
>> 1254 drop_direct:
>> 1255         kfree_skb(skb);
>> 1256         *pskb = NULL;
>> 1257         return RX_HANDLER_CONSUMED;
>>
>>
>> About 'L2_HDR_ERR', I am curious what the user/administrator may do as next
>> step, while the 'SKB_PULL' will be very clear to the developers which kernel
>> operation (e.g., to pull some protocol/hdr data to sk_buff data) is with the issue.
>>
>> I may use 'L2_HDR_ERR' if you prefer.
> 
> We don't have to break it out per layer if you prefer. Let's call it
> HDR_TRUNC.
> 
> I don't like SKB_PULL, people using these trace points are as likely 
> to be BPF developers as kernel developers and skb_pull will not be
> meaningful to them. Besides the code can check if header is not
> truncated in other ways than pskb_may_pull(). And calling things 
> by the name of the helper that failed is bad precedent.

I will switch to SKB_DROP_REASON_HDR_TRUNC.

> 
>>> For SKB_TRIM the error comes from allocation failures, there may be
>>> a whole bunch of skb helpers which will fail only under mem pressure,
>>> would it be better to identify them and return some ENOMEM related
>>> reason, since, most likely, those will be noise to whoever is tracking
>>> real errors?  
>>
>> The reasons I want to use SKB_TRIM:
>>
>> 1. To have SKB_PULL and SKB_TRIM (perhaps more SKB_XXX in the future in the same
>> set).
>>
>> 2. Although so that SKB_TRIM is always caused by ENOMEM, suppose if there is new
>> return values by pskb_trim(), the reason is not going to be valid any longer.
>>
>>
>> I may use SKB_DROP_REASON_NOMEM if you prefer.
>>
>> Another concern is that many functions may return -ENOMEM. It is more likely
>> that if there are two "goto drop" to return -ENOMEM, we will not be able to tell
>> from which function the sk_buff is dropped, e.g.,
>>
>> if (function_A()) {
>>     reason = -ENOMEM;
>>     goto drop;
>> }
>>
>> if (function_B()) {
>>     reason = -ENOMEM;
>>     goto drop;
>> }
> 
> Are you saying that you're intending to break out skb drop reasons 
> by what entity failed to allocate memory? I'd think "skb was dropped

Yes.

> because of OOM" is what should be reported. What we were trying to
> allocate is not very relevant (and can be gotten from the stack trace 
> if needed).

I think OOM is not enough. Although it may not be the case in this patchset,
sometimes the allocation is failed because we are allocating a large chunk of
physically continuous pages (kmalloc vs. vmalloc) while there is still plenty of
memory pages available.

As a kernel developer, it is very significant for me to identify the specific
line/function and specific data structure that cause the error. E.g, the bug
filer may be chasing which line is making trouble.

It is less likely to SKB_TRIM more than once in a driver function, compared to
ENOMEM.

I am the user of this patchset and I prefer to make my work easier in the future :)

> 
>>>>  	SKB_DROP_REASON_DEV_HDR,	/* there is something wrong with
>>>>  					 * device driver specific header
>>>>  					 */
>>>> +	SKB_DROP_REASON_DEV_READY,	/* device is not ready */  
>>>
>>> What is ready? link is not up? peer not connected? can we expand?
>>
>> In this patchset, it is for either:
>>
>> - tun->tfiles[txq] is not set, or
>>
>> - !(tun->dev->flags & IFF_UP)
>>
>> I want to make it very generic so that the sk_buff dropped due to any device
>> level data structure that is not up/ready/initialized/allocated will use this
>> reason in the future.
> 
> Let's expand the documentation so someone reading thru the enum can
> feel confident if they are using this reason correctly.
> 
> Side note - you may want to switch to inline comments to make writing
> more verbose documentation, I mean:
> 
> 	/* This is the explanation of reason one which explains what
> 	 * reason ones means, and how it should be used. We can make
> 	 * use of full line width this way.
>          */
> 	SKB_DROP_REASON_ONE,
> 	/* And this is an explanation for reason two. */
> 	SKB_DROP_REASON_TWO,
> 

I will expand the comments.

Thank you very much!

Dongli Zhang



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux