On 10/6/23 9:39 PM, Jamal Hadi Salim wrote:
On Fri, Oct 6, 2023 at 11:45 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 10/6/23 5:25 PM, Jamal Hadi Salim wrote:
On Fri, Oct 6, 2023 at 10:12 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
On Fri, 6 Oct 2023 15:49:18 +0200 Daniel Borkmann wrote:
Which will no longer work with the "pack multiple values into
the reason" scheme of subsys-specific values :(
Too bad, do you happen to know why it won't work?
I'm just guessing but the reason is enum skb_drop_reason
and the values of subsystem specific reasons won't be part
of that enum.
IIUC, this would gives us the readability and never require any
changes to bpftrace, whereas the major:minor encoding would require
further logic in bpftrace.
Makes sense, agree.
Given they went into the
length of extending this for subsystems, they presumably would also like to
benefit from above. :/
What I'm saying is that there is a trade-off here between providing
as much info as possible vs basic user getting intelligible data..
Makes sense. I think we can drop that aspect for the subsys specific error
codes. Fwiw, TCP has 22 drop codes in the core section alone, so this should
be fine if you think it's better. The rest of the patch shown should still
apply the same way. I can tweak it to use the core section for codes, and
then it can be successively extended if that looks good to you - unless you
are saying from above, that just one error code is better and then going via
detailed stats for specific errors is preferred.
No, no, multiple reasons are perfectly fine. The non-technical
advantage of mac80211 error codes being separate is that there
are no git conflicts when we add new ones. TC codes can just
be added to the main enum like TCP 🤷️
We still need to differentiate policy vs error - I suppose we could go
with Daniel's idea of introducing TC_ACT_ABORT/ERROR and ensure all
the callees set the drop_reason.
I've simplified the set (attached). The disambiguation could eventually be on
SKB_DROP_REASON_TC_{INGRESS,EGRESS} == intentional drop vs SKB_DROP_REASON_TC_ERROR_*
which indicates an internal error code once these are covered on all locations.
There could probably also be just a SKB_DROP_REASON_TC_ERROR which could act as
a catch-all for the time being to initially mark all error locations with something
generic. I think this should be flexible where you wouldn't need extra TC_ACT_ABORT.
I think this should work - either Victor or myself will work on a followup.
Sounds great, thanks!
Cheers,
Daniel