> > > Have you considered/measured per-CPU RW semaphores? > > > > No I hadn't but thanks hugely for suggesting it! In initial measurements > > it seems to be between 20-100% faster than the RW spinlocks! Apologies for > > all the exclamation marks but I'm very excited. I'll do some more testing > > tomorrow but so far it's looking very good. > > > > Documentation ([0]) says that locking for writing calls > synchronize_rcu(), is that right? If that's true, attaching multiple > uprobes (including just attaching a single BPF multi-uprobe) will take > a really long time. We need to confirm we are not significantly > regressing this. And if we do, we need to take measures in the BPF > multi-uprobe attachment code path to make sure that a single > multi-uprobe attachment is still fast. > > If my worries above turn out to be true, it still feels like a first > good step should be landing this patch as is (and get it backported to > older kernels), and then have percpu rw-semaphore as a final (and a > bit more invasive) solution (it's RCU-based, so feels like a good > primitive to settle on), making sure to not regress multi-uprobes > (we'll probably will need some batched API for multiple uprobes). > > Thoughts? Agreed. In the percpu_down_write() path we call rcu_sync_enter() which is what calls into synchronize_rcu(). I haven't done the measurements yet but I would imagine this has to regress probe attachment, at least in the uncontended case. Of course, reads are by far the dominant mode here but we probably shouldn't punish writes excessively. I will do some measurements to quantify the write penalty here. I agree that a batched interface for probe attachment is needed here. The usual mode of operation for us is that we have a number of USDTs (uprobes) in hand and we want to enable and disable them in one shot. Removing the need to do multiple locking operations is definitely an efficiency improvement that needs to be done. Tie that together with per-CPU RW semaphores and this should scale extremely well in both a read and write case. Jon.