On Tue, Oct 31, 2023 at 11:42 PM Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Oct 31, 2023 at 03:01:36PM +0100, Jan Kara wrote: > > On Tue 31-10-23 04:48:44, Marek Marczykowski-Górecki wrote: > > > Then tried: > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce, > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce, > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly > > > > > > I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times > > > and I can't reproduce the issue there. I'm confused... > > > > And this kind of confirms that allocations > PAGE_ALLOC_COSTLY_ORDER > > causing hangs is most likely just a coincidence. Rather something either in > > the block layer or in the storage driver has problems with handling bios > > with sufficiently high order pages attached. This is going to be a bit > > painful to debug I'm afraid. How long does it take for you trigger the > > hang? I'm asking to get rough estimate how heavy tracing we can afford so > > that we don't overwhelm the system... > > Sometimes it freezes just after logging in, but in worst case it takes > me about 10min of more or less `tar xz` + `dd`. blk-mq debugfs is usually helpful for hang issue in block layer or underlying drivers: (cd /sys/kernel/debug/block && find . -type f -exec grep -aH . {} \;) BTW, you can just collect logs of the exact disks if you know what are behind dm-crypt, which can be figured out by `lsblk`, and it has to be collected after the hang is triggered. Thanks, Ming Lei