sorry, I somehow forgot about this… On 2019-02-13 09:56:56 [-0500], Johannes Weiner wrote: > On Wed, Feb 13, 2019 at 10:27:54AM +0100, Sebastian Andrzej Siewior wrote: > > On 2019-02-11 16:02:08 [-0500], Johannes Weiner wrote: > > > > how do you define safe? I've been looking for dependencies of > > > > __mod_lruvec_state() but found only that the lock is held during the RMW > > > > operation with WORKINGSET_NODES idx. > > > > > > These stat functions are not allowed to nest, and the executing thread > > > cannot migrate to another CPU during the operation, otherwise they > > > corrupt the state they're modifying. > > > > If everyone is taking the same lock (like i_pages.xa_lock) then there > > will not be two instances updating the same stat. The owner of the > > (sleeping)-spinlock will not be migrated to another CPU. > > This might be true for this particular stat item, but they are general > VM statistics. They're assuredly not all taking the xa_lock. This one in particular does and my guess is that the interrupts are disabled here because of xa_lock. So the question is why should the interrupts be disabled? Is this due to the lock that should have been acquired (and as such disable interrupts) _or_ because of the *_lruvec_slab_state() operation. > > > They are called from interrupt handlers, such as when NR_WRITEBACK is > > > decreased. Thus workingset_node_update() must exclude preemption from > > > irq handlers on the local CPU. > > > > Do you have an example for a code path to check NR_WRITEBACK? > > end_page_writeback() > test_clear_page_writeback() > dec_lruvec_state(lruvec, NR_WRITEBACK) So with a warning in dec_lruvec_state() I found only a call path from softirq (like scsi_io_completion() / bio_endio()). Having lockdep annotation instead "just" preempt_disable() would have helped :) > > > They rely on IRQ-disabling to also disable CPU migration. > > The spinlock disables CPU migration. > > > > > > > I'm guessing it's because > > > > > preemption is disabled and irq handlers are punted to process context. > > > > preemption is enabled and IRQ are processed in forced-threaded mode. > > > > > > That doesn't sound safe. > > > > Do you have test-case or something I could throw at it and verify that > > this still works? So far nothing complains… > > It's not easy to get the timing right on purpose, but we've seen in > production what happens when you don't protect these counter updates > from interrupts. See c3cc39118c36 ("mm: memcontrol: fix NR_WRITEBACK > leak in memcg and system stats"). Based on the looking code I'm looking at, it looks fine. Should I just resubmit the patch? Sebastian