On Fri, Feb 21, 2020 at 08:21:52PM +0100, Uladzislau Rezki wrote: > > > > > > Overall this implementation is nice. You are basically avoiding allocating > > > rcu_head like Ted did by using the array-of-pointers technique we used for > > > the previous kfree_rcu() work. > > > > > > One thing stands out, the path where we could not allocate a page for the new > > > block node: > > > > > > > @@ -3061,6 +3148,11 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > > > > if (krcp->initialized) > > > > spin_unlock(&krcp->lock); > > > > local_irq_restore(flags); > > > > + > > > > + if (!skip_call_rcu) { > > > > + synchronize_rcu(); > > > > + kvfree(ptr_to_free); > > > > > > We can't block, it has to be async otherwise everything else blocks, and I > > > think this can also be used from interrupt handlers which would at least be > > > an SWA violation. So perhaps it needs to allocate an rcu_head wrapper object > > > itself for the 'emergeny case' and use the regular techniques. > > > > > > Another thing that stands out is the code duplication, if we can make this > > > reuse as much as of the previous code as possible, that'd be great. I'd like > > > to avoid bvcached and bvhead if possible. Maybe we can store information > > > about the fact that this is a 'special object' in some of the lower-order > > > bits of the pointer. Then we can detect that it is 'special' and free it > > > using kvfree() during the reclaim > > > > Basically what I did different is: > > 1. Use the existing kfree_rcu_bulk_data::records array to store the > > to-be-freed array. > > 2. In case of emergency, allocate a new wrapper and tag the pointer. > > Read the tag later to figure its an array wrapper and do additional kvfree. > > > I see your point and agree that duplication is odd and we should avoid > it as much as possible. Also, i like the idea of using the wrapper as > one more chance to build a "head" for headless object. > > I did not mix pointers because then you will need to understand what is what. Well that's why I brought up the whole tagging idea. Then you don't need separate pointers to manage either (edit: but maybe you do as you mentioned vfree below..). > It is OK for "emergency" path, because we simply can just serialize it by kvfree() > call, it checks inside what the ptr address belong to: > > <snip> > void kvfree(const void *addr) > { > if (is_vmalloc_addr(addr)) > vfree(addr); > else > kfree(addr); > } > <snip> > > whereas normal path, i mean "bulk one" where we store pointers into array > would be broken. We can not call kfree_bulk(array, nr_entries) if the passed > array contains "vmalloc" pointers, because it is different allocator. Therefore, > i deliberately have made it as a special case. Ok, it would be nice if you can verify that ptr_to_free passed to kfree_call_rcu() is infact a vmalloc pointer. > > Perhaps the synchronize_rcu() should be done from a workqueue handler > > to prevent IRQ crapping out? > > > I think so. For example one approach would be: > > <snip> > struct free_deferred { > struct llist_head list; > struct work_struct wq; > }; > static DEFINE_PER_CPU(struct free_deferred, free_deferred); > > static void free_work(struct work_struct *w) > { > struct free_deferred *p = container_of(w, struct free_deferred, wq); > struct llist_node *t, *llnode; > > synchronize_rcu(); > > llist_for_each_safe(llnode, t, llist_del_all(&p->list)) > vfree((void *)llnode, 1); > } > > static inline void free_deferred_common(void *ptr_to_free) > { > struct free_deferred *p = raw_cpu_ptr(&free_deferred); > > if (llist_add((struct llist_node *)ptr_to_free, &p->list)) Would this not corrupt the ptr_to_free pointer which readers might still be accessing since grace period has not yet ended? We cannot touch the ptr_to_free pointer until after the grace period has ended. > schedule_work(&p->wq); > } > <snip> > > and it seems it should work. Because we know that KMALLOC_MIN_SIZE > can not be less then machine word: > > /* > * Kmalloc subsystem. > */ > #ifndef KMALLOC_MIN_SIZE > #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) > #endif > > when it comes to vmalloc pointer it can not be less then one PAGE_SIZE :) > > Another thing: > > we are talking about "headless" variant that is special, therefore it > implies to have some restrictions, since we need a dynamic memory to > drive it. For example "headless" object can be freed from preemptible > context only, because freeing can be inlined: > > <snip> > + // NOT SURE if permitted due to IRQ. Maybe we > + // should try doing this from WQ? > + synchronize_rcu(); > + kvfree(ptr); > <snip> > > Calling synchronize_rcu() from the IRQ context will screw the system up :) > Because the current CPU will never pass the QS state if i do not miss something. Yes are you right, calling synchronize_rcu() from IRQ context is a strict no-no. I believe we could tap into the GFP_ATOMIC emergency memory pool for this emergency situation. This pool is used for emergency cases. I think in emergency we can grow an rcu_head on this pool. > Also kvfree() itself can be called from the preemptible context only, excluding IRQ, > there is a special path for it, otherwise vfree() can sleep. Ok that's good to know. > > debug_objects bits wouldn't work obviously for the !emergency kvfree case, > > not sure what we can do there. > > > Agree. > > Thank you, Joel, for your comments! No problem, I think we have a couple of ideas here. What I also wanted to do was (may be after all this), see if we can create an API for head-less kfree based on the same ideas. Not just for arrays for for any object. Calling it, say, kfree_rcu_headless() and then use the bulk array as we have been doing. That would save any users from having an rcu_head -- of course with all needed warnings about memory allocation failure. Vlad, What do you think? Paul, any thoughts on this? thanks, - Joel