Re: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/22/24 6:47 AM, Alexei Starovoitov wrote:
On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:

On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:

From: Martin KaFai Lau <martin.lau@xxxxxxxxxx>

The existing prologue has been able to call bpf helper but not a kfunc.
This patch allows the prologue/epilogue to call the kfunc.

The subsystem that implements the .gen_prologue and .gen_epilogue
can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
set to the btf func_id of the kfunc call. This part is the same
as the bpf prog loaded from the sys_bpf.

I don't understand the value of this feature, since it seems
pretty hard to use.
The module (qdisc-bpf or else) would need to do something
like patch 8/8:
+BTF_ID_LIST(st_ops_epilogue_kfunc_list)
+BTF_ID(func, bpf_kfunc_st_ops_inc10)
+BTF_ID(func, bpf_kfunc_st_ops_inc100)

just to be able to:
    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
                 st_ops_epilogue_kfunc_list[0]);

So a bunch of extra work on the module side and
a bunch of work in this patch to enable such a pattern,
but what is the value?

gen_epilogue() can call arbitrary kernel function.
It doesn't have to be a helper.
kfunc-s provide calling convention conversion from bpf to native,
but the same thing is achieved by BPF_CALL_N macro.
The module can use that macro without adding an actual bpf helper
to uapi bpf.h.
Then in gen_epilogue() the extra bpf insn can use:
BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
which will use
BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)

BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
Using kfunc call will make supporting it the same.

I believe far calls are typically slower,
so it may be a foot gun.
If something like qdisc-bpf adding a function call to bpf_exit
it will be called every time the program is called, so
it needs to be really fast.
Allowing such callable funcs in modules may be a performance issue
that we'd need to fix.
So imo making a design requirement that such funcs for gen_epilogoue()
need to be in kernel text is a good thing.

Agreed. Make sense.


I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
built-in only also. I think the hid_bpf is built-in only also.

I don't think hid_bpf has any need for such gen_epilogue() adjustment.
tcp-bpf-cc probably doesn't need it either.
it's cleaner to fix up on the kernel side, no?

tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was set to 0 (or negative, can't remember which one). >1 ops of the tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it needs to do an extra check/fix in the kernel. It is usually not the fast path, so may be ok.

It is not catastrophic as skb->dev. kfunc was not introduced at that time also. Otherwise, having a kfunc to set the snd_cwnd instead could have been an option.

qdisc-bpf and ->dev stuff is probably the only upcoming user.

For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the way to go? The example could be operations that need to touch the skb->rbnode/dev sharing pointer.

For fixing ->dev in the kernel, there are multiple places doing ->dequeue and not sure if we need to include the child->dequeue also. This fixing could be refactored to a kernel function and probably need to a static key in this fast path case.

And that's a separate discussion. I'm not sure such gen_epilogoue()
concept is really that great.
Especially considering all the complexity involved.

I am curious on the problem you pointed out at patch 1 regardless, I am going to give it a try and remove the kfunc call. I made kfunc call separated at patch 7 and 8 :)

If it still looks too complex or there is no value on gen_epilogue, I am fine to table this set.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux