On 11/5/24 1:52 PM, Alexei Starovoitov wrote:
On Tue, Nov 5, 2024 at 1:26 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
I see. I think it works, but feels complicated.
It feels it should be possible to do without extra flags. Like
check_max_stack_depth_subprog() will know whether it was called
to verify async_cb or not.
So it's just a matter of adding single 'if' to it:
if (subprog[idx].use_priv_stack && checking_async_cb)
/* reset to false due to potential recursion */
subprog[idx].use_priv_stack = false;
check_max_stack_depth() starts with i==0,
so reachable and eligible subprogs will be marked with use_priv_stack.
Then check_max_stack_depth_subprog() will be called again
to verify async. If it sees the mark it's a bad case.
what am I missing?
First I think we still want to mark some subprogs in async tree
to use private stack, right? If this is the case, then let us see
the following examle:
main_prog:
sub1: use_priv_stack = true
sub2" use_priv_stack = true
async: /* calling sub1 twice */
sub1
<=== we do
if (subprog[idx].use_priv_stack && checking_async_cb)
subprog[idx].use_priv_stack = false;
sub1
<=== here we have subprog[idx].use_priv_stack = false;
we could mark use_priv_stack = true again here
since logic didn't keep track of sub1 has been
visited before.
This case needs a sticky state to solve.
Instead of bool use_priv_stack it can be tri-state:
no_priv_stack
priv_stack_unknown <- start state
priv_stack_maybe
main_prog pass will set it to priv_stack_maybe
while async pass will clear it to no_priv_stack
and it cannot be bumped up.
The tri-state may not work. For example,
main_prog:
call sub1
call sub2
call sub1
call sub3
async:
call sub4 ==> UNKNOWN -> MAYBE
call sub3
call sub4 ==> MAYBE -> NO_PRIV_STACK?
For sub4 in async which is called twice, for the second sub4 call,
it is not clear whether UNKNOWN->MAYBE transition happens in
main_prog or async. So based on transition prototol,
second sub4 call will transition to NO_PRIV_STACK which is not
what we want.
So I think we still need a separate bit in bpf_subprog_info to
accumulate information for main_prog tree or any async tree.
To solve the above issue, we need one visited bit in bpf_subprog_info.
After finishing async tree, if for any subprog,
visited_bit && subprog[idx].use_priv_stack
is true, we can mark subprog[idx].use_priv_stack = false
So one visited bit is enough.
More complicated case is two asyncs. For example:
main_prog:
sub1
sub2
async1:
sub3
async2:
sub3
If async1/sub3 and async2/sub3 can be nested, then we will
need two visited bits as I have above.
If async1/sub3 and async2/sub3 cannot be nested, then one
visited bit should be enough, since we can traverse
async1/async2 with 'visited' marking and then compare against
main prog.
So the question would be:
1. Is it possible that two async call backs may nest with
each other? I actually do not know the answer.
I think we have to assume that they can. Doing otherwise
would subject us to implementation details.
I think above tri-state approach works for two callbacks case too:
async1 will bump sub3 to priv_stack_maybe
while async2 will clear it to sticky no_priv_stack.
Ideally we reuse the same enum for this algorithm and for earlier
patches.
2. Do we want to allow subprogs in async tree to use private
stacks?
yes. when sched-ext requests priv stack it would want it everywhere.
I think the request_priv_stack should be treated as
PRIV_STACK_ADAPTIVE. Meaning that subprogs with stack_depth < 64
don't need to use it.
In other words struct_ops prog with request_priv_stack == true
tells the verifier: add run-time recursion check at main prog entry,
otherwise treat it like fentry and pick priv stack vs normal
as the best for performance.
Then for both fentry and struct_ops w/request_priv_stack
the async callbacks will be considered for priv stack too and
will be cleared to normals stack when potential recursion via async
is detected.
I don't think it's an error for either prog type.
Overall we shouldn't treat struct_ops too special.
fentry progs with large stack are automatically candidates for priv stack.
struct_ops w/request_priv_stack are in the same category.
Okay, indeed we can treat struct_ops the same as other tracing programs.
They are adaptive too subject to various conditions where priv state may
not be used, e.g. small stack size, tailcall, and potentially nested subprogs.