Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 31, 2023 at 11:42 PM Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Oct 31, 2023 at 03:01:36PM +0100, Jan Kara wrote:
> > On Tue 31-10-23 04:48:44, Marek Marczykowski-Górecki wrote:
> > > Then tried:
> > >  - PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce,
> > >  - PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce,
> > >  - PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly
> > >
> > > I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times
> > > and I can't reproduce the issue there. I'm confused...
> >
> > And this kind of confirms that allocations > PAGE_ALLOC_COSTLY_ORDER
> > causing hangs is most likely just a coincidence. Rather something either in
> > the block layer or in the storage driver has problems with handling bios
> > with sufficiently high order pages attached. This is going to be a bit
> > painful to debug I'm afraid. How long does it take for you trigger the
> > hang? I'm asking to get rough estimate how heavy tracing we can afford so
> > that we don't overwhelm the system...
>
> Sometimes it freezes just after logging in, but in worst case it takes
> me about 10min of more or less `tar xz` + `dd`.

blk-mq debugfs is usually helpful for hang issue in block layer or
underlying drivers:

(cd /sys/kernel/debug/block && find . -type f -exec grep -aH . {} \;)

BTW,  you can just collect logs of the exact disks if you know what
are behind dm-crypt,
which can be figured out by `lsblk`, and it has to be collected after
the hang is triggered.

Thanks,
Ming Lei





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux