On Sun, 4 Aug 2024, Yu Zhao wrote: > On Sat, Aug 03, 2024 at 11:55:51PM -0700, Hugh Dickins wrote: > > On Wed, 10 Jul 2024, Yu Zhao wrote: > > > > > Remove boilerplate by using a macro to choose the corresponding lock > > > and handler for each folio_batch in cpu_fbatches. > > > > > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > > > Andrew, please revert this "remove boilerplate" patch (and of course its > > followup fix) from mm-unstable. From the title I presume it was intended > > to make no functional change, but that's far from so. > > > > Under tmpfs swapping load, on different runs I see various badnesses: > > "Bad page" in __free_one_page(), Oops in __run_timer_base(), > > WARNING at kernel/workqueue.c:790 in set_work_data(), PageBuddy BUG > > at page-flags.h:1009 from __del_page_from_freelist(), something (I'd > > given up taking better notes by this time) in __queue_work(), others. > > > > All those were including the fix to Barry's report: without that fix, > > the boot is drowned in warnings scrolling past too fast to be read. > > > > (All the above were on the HP workstation, swapping to SSD; whereas on > > this ThinkPad, swapping to NVMe, no problem seen at all - I mention the > > swapping medium, but have no idea whether that's a relevant difference. > > In each case, MGLRU compiled in but not enabled. THPs and 64kTHPs active.) > > > > Sorry, but I've put no effort whatsoever into debugging this: "remove > > boilerplate" didn't seem worth the effort, and my personal preference > > is for readable boilerplate over hiding in a macro. If you prefer the > > macro, I expect Yu can soon come up with a fix (which I could test here): > > but for now please revert "remove boilerplate", since its issues get in > > the way of further mm testing. > > Sorry for getting in your way, Hugh. > > Apparently I didn't expect local_lock_t to be zero length, i.e., when > CONFIG_DEBUG_LOCK_ALLOC is not set. So that might explain why you only > had problems with one of the two machines, where it failed to disable > IRQ when rotating clean pages after writeback. > > The following should fix it, in case you want to verify the above: > > diff --git a/mm/swap.c b/mm/swap.c > index 4bc08352ad87..67a246772811 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -254,7 +254,7 @@ static void __folio_batch_add_and_move(struct folio_batch __percpu *fbatch, > folio, \ > op, \ > on_lru, \ > - offsetof(struct cpu_fbatches, op) > offsetof(struct cpu_fbatches, lock_irq) \ > + offsetof(struct cpu_fbatches, op) >= offsetof(struct cpu_fbatches, lock_irq) \ > ) > > static void lru_move_tail(struct lruvec *lruvec, struct folio *folio) Well caught! Yes, I confirm that fixes all the bad behaviour I was seeing (and fits with my having DEBUG_LOCK_ALLOC and lockdep enabled on the untroubled machine, but not on the one showing problems) - thanks. But it does reinforce my opinion that mm/swap.c is more understandable without that macro than with it. Hugh