On Fri, Sep 25, 2020 at 05:17:12PM +0100, Mel Gorman wrote: > On Fri, Sep 25, 2020 at 05:31:29PM +0200, Uladzislau Rezki wrote: > > > > > > > > > > > > All good points! > > > > > > > > > > > > On the other hand, duplicating a portion of the allocator functionality > > > > > > within RCU increases the amount of reserved memory, and needlessly most > > > > > > of the time. > > > > > > > > > > > > > > > > But it's very similar to what mempools are for. > > > > > > > > > As for dynamic caching or mempools. It requires extra logic on top of RCU > > > > to move things forward and it might be not efficient way. As a side > > > > effect, maintaining of the bulk arrays in the separate worker thread > > > > will introduce other drawbacks: > > > > > > This is true but it is also true that it is RCU to require this special > > > logic and we can expect that we might need to fine tune this logic > > > depending on the RCU usage. We definitely do not want to tune the > > > generic page allocator for a very specific usecase, do we? > > > > > I look at it in scope of GFP_ATOMIC/GFP_NOWAIT issues, i.e. inability > > to provide a memory service for contexts which are not allowed to > > sleep, RCU is part of them. Both flags used to provide such ability > > before but not anymore. > > > > Do you agree with it? > > > > I was led to believe that hte problem was taking the zone lock while > holding a raw spinlock that was specific to RCU. In RCU code we hold a raw spinlock, because the kfree_rcu() should follow the call_rcu() rule and work in atomic contexts. So we can not enter a page allocator because it uses spinlock_t z->lock(is sleepable for RT). Because of CONFIG_PROVE_RAW_LOCK_NESTING option and CONFIG_PREEMPT_RT. > > Are you saying that GFP_ATOMIC/GFP_NOWAIT users are also holding raw > spinlocks at the same time on RT? > I do not say it. And it is not possible because zone->lock has a spinlock_t type. So, in case of CONFIG_PREEMPT_RT you will hit a "BUG: scheduling while atomic". If allocator is called when: raw lock is held or irqs are disabled or preempt_disable() on a higher level. -- Vlad Rezki