Re: kernel panic (2.6.36) after file system corruption (?)

Jan Misiak <fijam7@xxxxxxxxx> · Tue, 21 Dec 2010 00:40:38 +0100

On 19 December 2010 18:04, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
> Two more questions here.
>
> Â1) Did the panic arise during mount?

Yes, the panic occurs just after issuing the 'mount' command.

> Â2) Did you see the following message just before this oops?
> Â "segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds"

Yes, netconsole managed to capture it:

sd 2:0:0:0: [sdb] Attached SCSI removable disk
segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
BUG: unable to handle kernel paging request at 00001000
IP: [<c10d572b>] page_address+0xb/0xd0
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP

> After that, try just a read-only mount without the norecovery option.
>
> Â# mount -t nilfs2 -o ro /dev/sdb1 /your-mount-dir

It does not trigger the oops. Sorry for not mentioning it earlier. The
kernel only panics if the file system is mounted 'rw'.
Thank you for looking into this.

Regards,
Jan

>
> On Sun, 19 Dec 2010 13:04:23 +0100, Jan Misiak wrote:
>> On 19 December 2010 06:13, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
>> > Hi,
>> > On Sat, 18 Dec 2010 15:08:45 +0100, Jan Misiak wrote:
>> >> Hello,
>> >>
>> >> I am just a simple end-user but as nobody in my distribution has had
>> >> the same problem I was forced to turn to the upstream. Please bear
>> >> with me.
>> >>
>> >> I have been using nilfs2 on a 16GB usb-stick on a x86 thin client
>> >> running Arch Linux. The box had been running 24/7 and had an uptime of
>> >> about two weeks with kernel 2.6.36/nilfs-utils 2.0.20 when it
>> >> panicked. Unfortunately nothing was to be seen in the logs (system
>> >> partition was ext3). Now it panics every time I attempt to mount the
>> >> volume.
>> >>
>> >> I tried to use netconsole to capture the panic message but it gets
>> >> truncated so I had to resort to taking pictures.
>> >>
>> >> box #1 kernel 2.6.36.2/nilfs-utlis 2.0.20
>> >> Â Â http://fijam.eu.org/other/netconsole.log
>> >> Â Â http://fijam.eu.org/other/0000.jpg
>> >>
>> >> I tried to mount the usb-stick on a laptop with the same kernel
>> >> (2.6.36.2) to capture more of the panic messages:
>> >>
>> >> box #2 kernel 2.6.36.2/nilfs-utlis 2.0.20
>> >> Â Â http://fijam.eu.org/other/0001.jpeg
>> >> Â Â http://fijam.eu.org/other/0002.jpeg
>> >>
>> >> It crashes when I try to mount with kernel 2.6.32.27 as well:
>> >>
>> >> box #2 kernel 2.6.32.27/nilfs-utlis 2.0.20
>> >> Â Â http://fijam.eu.org/other/0003.jpeg
>> >> Â Â http://fijam.eu.org/other/0004.jpeg
>> >>
>> >> I would be grateful for advice on how can I help with getting to the
>> >> bottom of this.
>> >>
>> >> Regards,
>> >> Jan
>> >
>> > It looks like these oopses were hit in the common block layer code
>> > called from the usb mass storage driver.
>> >
>> > Could you do some tests to narrow down the issue ?
>> >
>> > Â1) Use "nogc" mount option to see whether the oops depends on the
>> > Â Âcontext of garbage collection or not:
>> >
>> > Â # mount -t nilfs2 -o nogc /dev/sdb1 /your-mount-dir
>> >
>> > Â2) Mount the partition read-only with "norecovery" option and make
>> > Â Âread accesses to the filesystem as below:
>> >
>> > Â # mount -t nilfs2 -o ro,norecovery /dev/sdb1 /your-mount-dir
>> > Â # find /your-mount-dir -type f -exec cat {} > /dev/null \;
>> >
>> > Â3) Try to read the block device directly with "dd":
>> >
>> > Â # dd if=/dev/sdb1 bs=4k > /dev/null
>> >
>> > Â4) Try lssu and lscp commands in the read-only mount to do quick
>> > Â Âsanity checks of meta data files.
>> >
>> > Â # mount -t nilfs2 -o ro,norecovery /dev/sdb1 /your-mount-dir
>> > Â # lssu -a
>> > Â # lscp
>> >
>> >
>> > Regards,
>> > Ryusuke Konishi
>> >
>>
>> Thank you for your reply and suggestions. I have tried the following:
>>
>> # mount -t nilfs2 -o nogc /dev/sdb1 /your-mount-dir
>> Â Â Results in exactly the same kernel panic.
>>
>> # mount -t nilfs2 -o ro,norecovery /dev/sdb1 /your-mount-dir
>> # find /your-mount-dir -type f -exec cat {} > /dev/null \;
>> Â Â Doesn't trigger the oops. I was able to retrieve my data but
>> haven't checked them for correctness yet.
>>
>> # lssu -a
>> Â Â http://fijam.eu.org/other/lssu
>> # lscp
>> Â Â http://fijam.eu.org/other/lscp
>>
>> # dd if=/dev/sdb1 bs=4k > /dev/null
>> Â Â Likewise, it doesn't trigger the oops.
>>
>> Is there anything else I could do to help?
>>
>> Regards,
>> Jan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html