On 23.08.22 10:33, Mel Gorman wrote: > On Tue, Aug 23, 2022 at 08:36:34AM +0200, David Hildenbrand wrote: >>> @@ -6517,6 +6538,7 @@ static void __build_all_zonelists(void *data) >>> static DEFINE_SPINLOCK(lock); >>> >>> spin_lock(&lock); >>> + write_seqcount_begin(&zonelist_update_seq); >>> >>> #ifdef CONFIG_NUMA >>> memset(node_load, 0, sizeof(node_load)); >>> @@ -6553,6 +6575,7 @@ static void __build_all_zonelists(void *data) >>> #endif >>> } >>> >>> + write_seqcount_end(&zonelist_update_seq); >>> spin_unlock(&lock); >> >> Do we want to get rid of the static lock by using a seqlock_t instead of >> a seqcount_t? >> > > I do not think so because it's a relatively heavy lock compared to the > counter and the read side. I was primarily asking because seqlock.h states: "Sequential locks (seqlock_t): Sequence counters with an embedded spinlock for writer serialization and non-preemptibility." seems to be precisely what we are doing here. > > As the read-side can be called from hardirq or softirq context, the > write-side needs to disable irqs or bottom halves as well according to the > documentation. That is relatively minor as the write side is rare but it's > tricker because the calling context can be both IRQ or softirq so the IRQ > protection would have to be used. Naive me would just have used write_seqlock()/write_sequnlock() and read_seqbegin()/read_seqretry() to result in almost the same code as with your change -- but having both mechanisms encapsulated. Yeah, there are special write_seqlock_bh()/write_sequnlock_bh(), write_sequnlock_irq() ... but IIRC we don't have to care about that at all when just using the primitives as above. But most probably I am missing something important. > > The read-side also gets more complicated. The read side becomes > either read_seqlock_excl() (or bh or IRQ as appropriate) or > read_seqbegin_or_lock. The read_seqlock_excl acts like a spinlock blocking > out other readers and writers, we definitely do not want to block out other > readers in the allocator path because .... that is crazy, it's basically a > global memory allocator lock. There is not an obvious option of limiting Yes. > the scope of that lock such as a single zone because it's the zonelists > we care about, not an individual zone. I guess it could be done on a > pgdat basis selected by the preferred zone but it's also unnecessarily > complicated and a relatively heavy lock. > > The other obvious choice is read_seqbegin_or_lock to locklessly try and > then retry if necessary. This has better semantics as a lockless version > exists with the caller tracking more state but again, the retry step is > heavy and acts as a global lock. Documentation/locking/seqlock.rst points out read path #1, that's just for "Normal Sequence readers which never block a writer but they must retry if a writer is in progress by detecting change in the sequence number." It's a simple use of read_seqbegin/read_seqretry. read_seqbegin()/read_seqretry() translate essentially to read_seqcount_begin()/read_seqcount_begin() -- except some kcsan() checks. > > In this case, the seqcounter or seqlock is protecting relatively simple > data -- the zonelist pointing to struct zones that never disappear (the > zone might be unpopulated but the struct zone still exists). The critical > data being protected in this context is either the PCP lists or the buddy > lists, each which has separate locking. The zonelist needs less protection > although RCU protection would be a potential, if somewhat complicated > option, as even if the zonelist itself is valid after an RCU update, > the zones listed are not necessarily useful any more. > > There is no real advantage to using seqcount_spinlock_t either as the > associated lock would be a global lock and there isn't any lockdep advantage > to doing that either. > > As the alternatives have side effects, I would prefer to see any proposed > conversion on top of the fix with review determining if there is any > unnecessary additional serialisation. > Agreed with more complicated conversions. Using a seqlock_t to replace the spinlock and avoid introducing the sequcount here would have been easy and straight forward, though. -- Thanks, David / dhildenb