On Sat, Apr 8, 2023 at 5:22 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Tue, Mar 28, 2023 at 2:43 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Mon, Mar 27, 2023 at 12:47 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > > On Thu, Mar 16, 2023 at 11:55 AM Andrii Nakryiko > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > On Mon, Mar 13, 2023 at 7:41 AM Kumar Kartikeya Dwivedi > > > > <memxor@xxxxxxxxx> wrote: > > > > > > > > [...] > > > > > > > For bpf_dynptr_slice_rdrw we can mark buffer[] in stack as > > > > > > > poisoned with dynptr_id == R0's PTR_TO_MEM dynptr_id. > > > > > > > Then as soon as first spillable reg touches that poisoned stack area > > > > > > > we can invalidate all PTR_TO_MEM's with that dynptr_id. > > > > > > > > > > > > Okay, this makes sense to me. are you already currently working or > > > > > > planning to work on a fix for this Kumar, or should i take a stab at > > > > > > it? > > > > > > > > > > I'm not planning to do so, so go ahead. One more thing I noticed just now is > > > > > that we probably need to update regsafe to perform a check_ids comparison for > > > > > dynptr_id for dynptr PTR_TO_MEMs? It was not a problem back when f8064ab90d66 > > > > > ("bpf: Invalidate slices on destruction of dynptrs on stack") was added but > > > > > 567da5d253cd ("bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}") > > > > > added PTR_TO_MEM in the switch statement. > > > > > > > > I can take care of this. But I really would like to avoid these > > > > special cases of extra dynptr_id, exactly for reasons like this > > > > omitted check. > > > > > > > > What do people think about generalizing current ref_obj_id to be more > > > > like "lifetime id" (to borrow Rust terminology a bit), which would be > > > > an object (which might or might not be a tracked reference) defining > > > > the scope/lifetime of the current register (whatever it represents). > > > > > > > > I haven't looked through code much, but I've been treating ref_obj_id > > > > as that already in my thinking before, and it seems to be a better > > > > approach than having a special-case of dynptr_id. > > > > > > > > Thoughts? > > > > > > Thanks for taking care of this (and apologies for the late reply). i > > > think the dynptr_id field would still be needed in this case to > > > associate a slice with a dynptr, so that when a dynptr is invalidated > > > its slices get invalidated as well. I'm not sure we could get away > > > with just having ref_obj_id symbolize that in the case where the > > > underlying object is a tracked reference, because for example, it > > > seems like a dynptr would need both a unique reference id to the > > > object (so that if for example there are two dynptrs pointing to the > > > same object, they will both be assignedthe same reference id so the > > > object can't for example be freed twice) and also its own dynptr id so > > > that its slices get invalidated if the dynptr is invalidated > > > > Can you elaborate on specific example? Because let's say dynptr is > > created from some refcounted object. Then that dynptr's id field will > > be a unique "dynptr id", dynptr's ref_obj_id will point to that > > refcounted object from which we derived dynptr itself. And then when > > we create slices from dynptrs, then each slice gets its own unique id, > > but records dynptr's id as slice's ref_obj_id. So we end up with this > > hierarchy of id + ref_obj_id forming a tree. > > > > Or am I missing something? > > > > I want to take a look at simplifying this at some point, so I'll know > > more details once I start digging into code. Right now I still fail to > > see why we need a third ID for dynptr. > > My mental model is that > * dynptr's ref_obj_id is set whenver there's a refcounted object > (right now, only ringbuf dynptrs are refcounted), to enforce that the > reference gets released by the time the program exits (dynptr > ref_obj_id is set in mark_stack_slots_dynptr()) > * dynptr's dynptr id is set for all dynptrs, so that if a dynptr gets > overwritten/invalidated, all slices for that dynptr get invalidated > (dynptr id is set in mark_dynptr_stack_regs(), called in > mark_stack_slots_dynptr()) yeah, I understand that's how it works today and what the semantics of ref_obj_id is. But I'm saying that we should look at whether we can revise ref_obj_id semantics and generalize it to be "ID of the <object> whose lifetime we are bound to". This refcount part could be optional (again, will know for sure when I get to writing the code). I'll get to this in time and will validate my own preconceptions. I don't think we should spend too much time discussing this in abstract right now. > * when there's a data slice, both the slice's dynptr id and ref_obj_id > get set to the dynptr's dynptr id and ref_obj_id, so that the slice > gets invalidated when either the dynptr is released or when the dynptr > is overwritten (two separate cases) (the slice's dynptr id and ref obj > id get set in check_helper_call()). The data slice also has its own > unique id, but this is to handle the case where the data slice may be > null. > > "And then when we create slices from dynptrs, then each slice gets its > own unique id, but records dynptr's id as slice's ref_obj_id. So we > end up with this hierarchy of id + ref_obj_id forming a tree." I don't > think I'm following the tree part. I think it records the dynptr's id > as slice's id (and dynptr's ref obj id as slice's ref obj id) in > check_helper_call(). > > "Right now I still fail to see why we need a third ID for dynptr". I > think for dynptrs, there are two IDs: > state->stack[spi].spilled_ptr.ref_obj_id and > state->stack[spi].spilled_ptr.id (where ref_obj_id is used to > invalidate slices when dynptr is released and id is used to > invalidates slices when dynptr is overwritten), and then for dynptr > slices there are 3 IDs: reg->id, reg->dynptr_id, reg->ref_obj_id > (where id is used for the data slice returning NULL case, and > ref_obj_id / dynptr_id are used when dynptrs are invalidated).