Hi Jakub, On 3/2/22 11:17 AM, Jakub Kicinski wrote: > On Wed, 2 Mar 2022 10:19:30 -0800 Dongli Zhang wrote: >> On 3/1/22 6:50 PM, Jakub Kicinski wrote: >>> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote: >>>> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */ >>>> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */ >>> >>> IDK if these are not too low level and therefore lacking meaning. >>> >>> What are your thoughts David? >>> >>> Would it be better to up level the names a little bit and call SKB_PULL >>> something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe >>> "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN? >> >> This is for device driver and I think for most of cases the people understanding >> source code will be involved. I think SKB_PULL is more meaningful to people >> understanding source code. >> >> The functions to pull data to skb is commonly used with the same pattern, and >> not only for ETH_HLEN. E.g., I randomly found below in kernel source code. >> >> 1071 static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb) >> 1072 { >> ... ... >> 1102 pulled_sci = pskb_may_pull(skb, macsec_extra_len(true)); >> 1103 if (!pulled_sci) { >> 1104 if (!pskb_may_pull(skb, macsec_extra_len(false))) >> 1105 goto drop_direct; >> 1106 } >> ... ... >> 1254 drop_direct: >> 1255 kfree_skb(skb); >> 1256 *pskb = NULL; >> 1257 return RX_HANDLER_CONSUMED; >> >> >> About 'L2_HDR_ERR', I am curious what the user/administrator may do as next >> step, while the 'SKB_PULL' will be very clear to the developers which kernel >> operation (e.g., to pull some protocol/hdr data to sk_buff data) is with the issue. >> >> I may use 'L2_HDR_ERR' if you prefer. > > We don't have to break it out per layer if you prefer. Let's call it > HDR_TRUNC. > > I don't like SKB_PULL, people using these trace points are as likely > to be BPF developers as kernel developers and skb_pull will not be > meaningful to them. Besides the code can check if header is not > truncated in other ways than pskb_may_pull(). And calling things > by the name of the helper that failed is bad precedent. I will switch to SKB_DROP_REASON_HDR_TRUNC. > >>> For SKB_TRIM the error comes from allocation failures, there may be >>> a whole bunch of skb helpers which will fail only under mem pressure, >>> would it be better to identify them and return some ENOMEM related >>> reason, since, most likely, those will be noise to whoever is tracking >>> real errors? >> >> The reasons I want to use SKB_TRIM: >> >> 1. To have SKB_PULL and SKB_TRIM (perhaps more SKB_XXX in the future in the same >> set). >> >> 2. Although so that SKB_TRIM is always caused by ENOMEM, suppose if there is new >> return values by pskb_trim(), the reason is not going to be valid any longer. >> >> >> I may use SKB_DROP_REASON_NOMEM if you prefer. >> >> Another concern is that many functions may return -ENOMEM. It is more likely >> that if there are two "goto drop" to return -ENOMEM, we will not be able to tell >> from which function the sk_buff is dropped, e.g., >> >> if (function_A()) { >> reason = -ENOMEM; >> goto drop; >> } >> >> if (function_B()) { >> reason = -ENOMEM; >> goto drop; >> } > > Are you saying that you're intending to break out skb drop reasons > by what entity failed to allocate memory? I'd think "skb was dropped Yes. > because of OOM" is what should be reported. What we were trying to > allocate is not very relevant (and can be gotten from the stack trace > if needed). I think OOM is not enough. Although it may not be the case in this patchset, sometimes the allocation is failed because we are allocating a large chunk of physically continuous pages (kmalloc vs. vmalloc) while there is still plenty of memory pages available. As a kernel developer, it is very significant for me to identify the specific line/function and specific data structure that cause the error. E.g, the bug filer may be chasing which line is making trouble. It is less likely to SKB_TRIM more than once in a driver function, compared to ENOMEM. I am the user of this patchset and I prefer to make my work easier in the future :) > >>>> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with >>>> * device driver specific header >>>> */ >>>> + SKB_DROP_REASON_DEV_READY, /* device is not ready */ >>> >>> What is ready? link is not up? peer not connected? can we expand? >> >> In this patchset, it is for either: >> >> - tun->tfiles[txq] is not set, or >> >> - !(tun->dev->flags & IFF_UP) >> >> I want to make it very generic so that the sk_buff dropped due to any device >> level data structure that is not up/ready/initialized/allocated will use this >> reason in the future. > > Let's expand the documentation so someone reading thru the enum can > feel confident if they are using this reason correctly. > > Side note - you may want to switch to inline comments to make writing > more verbose documentation, I mean: > > /* This is the explanation of reason one which explains what > * reason ones means, and how it should be used. We can make > * use of full line width this way. > */ > SKB_DROP_REASON_ONE, > /* And this is an explanation for reason two. */ > SKB_DROP_REASON_TWO, > I will expand the comments. Thank you very much! Dongli Zhang