Hello, On Fri, Feb 02, 2024 at 08:35:51AM -0800, Paul E. McKenney wrote: > Good point, and if this sort of thing happens frequently, perhaps there > should be an easy way of doing this. One crude hack that might come > pretty close would be to redefine the barrier() macro to be smp_mb(). > > But as noted earlier, -ENOREPRODUCE on today's -next. I will try the > next several -next releases. But if they all get -ENOREPRODUCE, I owe > everyone on CC an apology for having sent this report out before trying > next-20240202. :-/ I think I saw that problem too but could reproduce it with or without the workqueue changes, so I did the lazy thing "oh well, somebody is gonna fix that" and just tested as-is. It's a bit worrying that ppl don't seem to already know what the culprit is. Hmm... I can't reproduce it anymore either. So, there is some chance that this may really be a subtle breakage. If you ever see it happening again, triggering sysrq-t and capturing the dmesg output (network should still work fine, so these shouldn't be too difficult) may help. sysrq-t has workqueue state dump at the end which should clearly indicate if anything is stalled in workqueue. That said, another data point. In my test setup, I use the earlyprintk boot option which enables console output way before than workqueue becomes operational, so having on console output at all is highly unlikely to be indicative of workqueue problem. My memory is hazy but it seems like I can no longer reproduce the problem on the same git commit. Maybe it was a problem on the qemu side? Thanks. -- tejun