On Mon, 24 Jun 2024 17:21:36 -0700 Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > Anyways, under exclusive writer lock, we double-check that refcount > didn't change and is still zero. If it is, we proceed with destruction, > because at that point we have a guarantee that find_active_uprobe() > can't successfully look up this uprobe instance, as it's going to be > removed in destructor under writer lock. If, on the other hand, > find_active_uprobe() managed to bump refcount from zero to one in > between put_uprobe()'s atomic_dec_and_test(&uprobe->ref) and > write_lock(&uprobes_treelock), we'll deterministically detect this with > extra atomic_read(&uprobe->ref) check, and if it doesn't hold, we > pretend like atomic_dec_and_test() never returned true. There is no > resource freeing or any other irreversible action taken up till this > point, so we just exit early. > > One tricky part in the above is actually two CPUs racing and dropping > refcnt to zero, and then attempting to free resources. This can happen > as follows: > - CPU #0 drops refcnt from 1 to 0, and proceeds to grab uprobes_treelock; > - before CPU #0 grabs a lock, CPU #1 updates refcnt as 0 -> 1 -> 0, at > which point it decides that it needs to free uprobe as well. > > At this point both CPU #0 and CPU #1 will believe they need to destroy > uprobe, which is obviously wrong. To prevent this situations, we augment > refcount with epoch counter, which is always incremented by 1 on either > get or put operation. This allows those two CPUs above to disambiguate > who should actually free uprobe (it's the CPU #1, because it has > up-to-date epoch). See comments in the code and note the specific values > of UPROBE_REFCNT_GET and UPROBE_REFCNT_PUT constants. Keep in mind that > a single atomi64_t is actually a two sort-of-independent 32-bit counters > that are incremented/decremented with a single atomic_add_and_return() > operation. Note also a small and extremely rare (and thus having no > effect on performance) need to clear the highest bit every 2 billion > get/put operations to prevent high 32-bit counter from "bleeding over" > into lower 32-bit counter. I have a question here. Is there any chance to the CPU#1 to put the uprobe before CPU#0 gets the uprobes_treelock, and free uprobe before CPU#0 validate uprobe->ref again? e.g. CPU#0 CPU#1 put_uprobe() { atomic64_add_return() __get_uprobe(); put_uprobe() { kfree(uprobe) } write_lock(&uprobes_treelock); atomic64_read(&uprobe->ref); } I think it is very rare case, but I could not find any code to prevent this scenario. Thank you, -- Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>