Re: Nilfs2 crash debugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Vyacheslav Dubeyko skrev 2013-08-19 21:55:
On Aug 15, 2013, at 2:40 PM, Anton Eliasson wrote:

[snip]
Hi again. I was able to reproduce the crash on a fully updated system by starting the two virtual machines simultaneously as described in my e-mail from May 25. I made a new attempt to rebuild the kernel with your patches. I selected these options in make menuconfig [1], which resulted in this generated config.x86_64 [2] which has the following diff compared to the stock config.x86_64:

As I remember, you reported about remount file system in RO mode
and many "broken bnode" error messages issue, initially. Unfortunately,
as I can see, you can't reproduce this issue. I really had hope that you
can reproduce this important issue.

As I see, shared by you logs with crush contain details about the issue
that it was reported also by Jérôme Poulin <jeromepoulin@xxxxxxxxx>.
I mean this error message:

[  304.494448] BUG: unable to handle kernel paging request at 00000000000013f6
[  304.494456] IP: [<ffffffffa1327232>] nilfs_end_page_io+0x12/0xc0 [nilfs2]

I can reproduce this issue on my side and this issue is under investigation yet.

But anyway... Could you try to reproduce the issue with remounting
file system in RO mode? It is really important and annoying issue.
Yes, that one is much easier to reproduce. I simply try to read one of the corrupted files in /home. See below. I have no idea how the actual corruption happened, however.

[...]
As I remember, I asked you about enabling more configuration options.
I mean such options:
CONFIG_NILFS2_DEBUG_BASE_OPERATIONS,
CONFIG_NILFS2_DEBUG_MDT_FILES,
CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM,
CONFIG_NILFS2_DEBUG_BLOCK_MAPPING.

I suppose that you don't enable these options because it has dependence
from "Enable output from subsystem" option. But, anyway, I am afraid
that you don't reproduce the issue in the case of these options enabling.
But maybe you will be more lucky in such trying. :)
I think I got it right this time. The missing options appeared after I enabled CONFIG_NILFS2_DEBUG_SUBSYSTEMS. The config I used is here [1], which has the following diff compared to the upstream config:

    --- config.x86_64    2013-08-25 06:53:05.000000000 +0200
    +++ config.x86_64.last    2013-08-25 15:24:51.118711529 +0200
    @@ -1,6 +1,6 @@
     #
     # Automatically generated file; DO NOT EDIT.
    -# Linux/x86 3.10.5-1 Kernel Configuration
    +# Linux/x86 3.10.9-1 Kernel Configuration
     #
     CONFIG_64BIT=y
     CONFIG_X86_64=y
    @@ -5452,6 +5452,20 @@
     # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
     # CONFIG_BTRFS_DEBUG is not set
     CONFIG_NILFS2_FS=m
    +CONFIG_NILFS2_DEBUG=y
    +# CONFIG_NILFS2_USE_PR_DEBUG is not set
    +CONFIG_NILFS2_DEBUG_SHOW_ERRORS=y
    +CONFIG_NILFS2_DEBUG_DUMP_STACK=y
    +CONFIG_NILFS2_DEBUG_SUBSYSTEMS=y
    +CONFIG_NILFS2_DEBUG_BASE_OPERATIONS=y
    +CONFIG_NILFS2_DEBUG_MDT_FILES=y
    +CONFIG_NILFS2_DEBUG_SEGMENTS_SUBSYSTEM=y
    +# CONFIG_NILFS2_DEBUG_GC_SUBSYSTEM is not set
    +# CONFIG_NILFS2_DEBUG_RECOVERY_SUBSYSTEM is not set
    +CONFIG_NILFS2_DEBUG_BLOCK_MAPPING=y
    +# CONFIG_NILFS2_DEBUG_BUFFER_MANAGEMENT is not set
    +# CONFIG_NILFS2_DEBUG_SHOW_SPAM is not set
    +# CONFIG_NILFS2_DEBUG_HEXDUMP is not set
     CONFIG_FS_POSIX_ACL=y
     CONFIG_EXPORTFS=y
     CONFIG_FILE_LOCKING=y

Anyway, thank you for your efforts. It will be really great if you will be lucky
and will reproduce the issue with remount file system in RO mode
and many "broken bnode" error messages. Could you try again?

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yes. Here's another huge kernel.log for you [2]. It's 19 MB compressed and 282 MB uncompressed. I blanked the log while running the stock kernel and then rebooted to the custom debugging kernel. X wouldn't start so I just logged in to a virtual terminal, changed directory to "~/Bilder/20130321-28 Jakobs bilder från Nederländerna" and then executed `cat 179.JPG >/dev/null`.

This caused a read-only remount and a bunch of "broken bmap" messages to show, followed by an "Input/Output error". I saved a copy of /var/log/kernel.log as soon as I could after that, before reinstalling the stock kernel and rebooting.

[1]: http://antoneliasson.se/publicdump/config.x86_64.last.20130825
[2]: http://antoneliasson.se/publicdump/kernel.log.20130825.gz

--
Best Regards,
Anton Eliasson

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux