Re: How to debug stuck read?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@xxxxxxxxx> wrote:
>
> trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> (<willy@xxxxxxxxxxxxx>) rakstīja:
> >
> > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > checksum. Trying to read this file causes the process to get stuck
> > > forever. It doesn't return EIO.
> > >
> > > How can I find out why it gets stuck?
> >
> > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > folio_wait_bit_common (mm/filemap.c:1314)
> > > filemap_get_pages (mm/filemap.c:2622)
> > > filemap_read (mm/filemap.c:2676)
> > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> >
> > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > Probably the problem is that btrfs isn't unlocking the page on
> > seeing the error, so you don't get the -EIO returned?
>
>
> Yeah, but how to find where that happens.
> Anyway by pure luck I found memcpy that wrote outside of allocated
> memory and fixing that solved this issue but I still don't know how to
> debug this properly.
>
There is no special recipe for debugging "this properly" :)

You wrote that "by pure luck" you found a memcpy() that wrote beyond the
limit of allocated memory. I suppose that you found that faulty memcpy()
somewhere in one of the function listed in the stack trace.

That's the right approach! You read the calls chain and find out where something
looks wrong and then fix it. This is why stack traces are so helpful.

It was not "pure luck". I think that you did what developers usually do after
decoding a stack trace. If not, how did you find that faulty memcpy() buried
somewhere in 40 millions lines of code?

it seems that you've found the right way to figure out the problems in code
that (probably) you had not ever worked on or read before you hit that bug.

Have you sent a patch to the LKML? If not, please do it.

Regards,

Fabio M. De Francesco




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux