Re: [RFC PATCH v1 00/14] Exceptions - Resource Cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 14 Mar 2024 at 12:08, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> Hi Kumar,
>

Hello Eduard,

> Sorry for delayed response. Theoretical possibility that frame merge
> might fail because of some e.g. clang version changes bugs me,
> so I thought a bit on what alternatives are there.

I spent some time this weekend addressing feedback, and prepping the
next version.
As you point out (and as I explained in one of the replies to you),
the possibility of the
same stack slot / register containing a conflicting resource type can
cause merge problems.
I have had a similar thought to what you describe below (emitting
resource hints in such cases) to address this,
but I slept over your idea and have a few thoughts. Please let me know
what you think.

> For the sake of discussion, what do you think about runtime-based
> approach:
> - for each throwing sub-program reserve a stack region where
>   references to currently acquired resources (and their types)
>   would be stored;
> - upon a call to something that acquires resource push pointer/type
>   pair to this region;
> - upon a call to something that releases resource delete pointer/type
>   pair from this region;
> - when bpf_throw() is executed, walk this region for each active frame
>   and execute corresponding destructors.
> - (alternatively, reserve one region for all frames).
>

I went over this option early on but rejected it due to the typical
downsides you observed.
The maximum stack usage due to this will be peak resource acquisition
during the runtime of a subprogram (N) x 16 bytes.
The maximum will be the sum of all subprogs in a call chain.
This seems a bit problematic to me, given this can basically greatly
exceed typical program stack usage. Compared to
extra stack slot usage by bpf_loop inlining, and may_goto instruction,
this appears to be too much.
We can mitigate it by using a more space efficient encoding to
identify resources, but the core problem still remains.
Apart from that, there's the overhead to know zero of this stack
portion on entry every time.

I just think this is all unnecessary especially when most of the time
the exception is not going to be thrown anyway,
so it's all cost incurred for a case that will practically never
happen in a correct program.

> Technically, this could be implemented as follows:
> - during main verification pass:
>   - when verifier processes a call to something that acquires resource,
>     mark call instruction and resource type in insn_aux_data data;
>   - same for processing of a call to something that releases resource;
>   - keep track of a maximal number of simultaneously acquired resources;
> - after main verification pass:
>   - bump stack size for each subprogram by amount, necessary to hold
>     the acquired resource table and assume that this table is at the
>     end of the subprogram stack;
>   - after each acquire call, insert a kfunc call that would add
>     resource reference to the table;
>   - after each release call, insert a kfunc call that would remove
>     resource reference from the table.
>
> On a surface it appears that this approach has a few advantages:
> - seems simpler than frame descriptors tracking and merging
>   (but only implementation would tell if this is so);
> - should be resilient to program compilation changes;
> - abort is possible at any program point, which might be interesting for:
>   - cancelable BPF programs (where abort might be needed in the middle
>     of the e.g. loop);

Let us discuss cancellation in another set that I plan to send in some
time after this one,
but I think aborting arbitrarily at any point of the program is not
the way to go. I have also
reviewed literature on how this happens in other language runtimes,
and the broad consensus
is to have explicit cancellation points in programs, and only allow
cancellation for them.
Synchronous and asynchronous is irrelevant to the mechanism, but
usually it is at explicit program
points.

Not only can you encounter a program executing in the middle of a
loop, you can interrupt it within a kfunc,
in such a case, you cannot really abort it immediately, as that may
not be safe. It is thus better to designate
cancellation points in the program, which are visited even in cases
where the program may possibly be stuck
for a long time (like iterator next kfunc) and only do cancellation
when executing them.

Anyway, let's continue this discussion once I send the RFC out for
cancellation. Will try to also include a PoC
for arena fault aborting.

>   - arenas, where it might be desirable to stop the program after e.g.
>     faulting arena access.
>
> The obvious disadvantage is incurred runtime cost.
> On the other hand, it might be not that big.
>
> What do you think?

Let's go back to the problem we can encounter with frame descriptors.
The case of multiple paths having distinct resource types in a common
stack slot.

What I would do is only force a spill of the specific resource type
(right after the pc acquires it)
only if their stack slots are unmergeable. Since we go over the stack
first, any conflicts in the register
types will be resolved since all states for the ref_obj_id will be
erased. For conflicts in registers, we'd
do something similar. A new 8-byte entry would be reserved for the pc
after max_stack_depth and
all these entries will be zeroed upon entry. A similar thing could be
done by the compiler itself and the
verifier wouldn't have to do all this, but for now let's just do it in
the verifier.

In this way, when we encounter an unlikely case where the same slot
has conflicting entries, we'll erase
state by storing its entry in the stack slot reserved for the pc, and
clear its state from the verifier state,
allowing all respective entries for the same resource to now merge
easily. The same can be repeated
for each resource if all of them conflict.

So far playing around with this series applying it to program
cancellation and sched-ext cases, I haven't
encountered merging issues, so it seems like an edge case that is
unlikely to often occur in practice. But when it does,
it can be resolved by the logic above. This should address all issues
and should be resilient to changes triggered by different clang
versions.

Let me know what you think.

>
> Thanks,
> Eduard




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux