Re: kernel panic (2.6.36) after file system corruption (?)

Ryusuke Konishi <konishi.ryusuke@xxxxxxxxxxxxx> · Tue, 21 Dec 2010 11:55:42 +0900 (JST)

Hi,
On Tue, 21 Dec 2010 00:40:38 +0100, Jan Misiak wrote:
> On 19 December 2010 18:04, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
> > Two more questions here.
> >
> >  1) Did the panic arise during mount?
> 
> Yes, the panic occurs just after issuing the 'mount' command.
> 
> >  2) Did you see the following message just before this oops?
> >   "segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds"
> 
> Yes, netconsole managed to capture it:
> 
> sd 2:0:0:0: [sdb] Attached SCSI removable disk
> segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
> BUG: unable to handle kernel paging request at 00001000
> IP: [<c10d572b>] page_address+0xb/0xd0
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT SMP
> 
> > After that, try just a read-only mount without the norecovery option.
> >
> >  # mount -t nilfs2 -o ro /dev/sdb1 /your-mount-dir
> 
> It does not trigger the oops. Sorry for not mentioning it earlier. The
> kernel only panics if the file system is mounted 'rw'.
> Thank you for looking into this.
> 
> Regards,
> Jan

Thank you for your cooperation.

According to the situation, I'm guessing the oops is triggered by the
writeback of two super blocks.

Could you get device size information in a few ways?

(In the following examples, I assumed the target device is /dev/sdb1.)

1) sysfs reported sizes
 # cat /sys/block/sdb/size
 # cat /sys/block/sdb/sdb1/size

2) Sizes on the partition table
 # fdisk -lu /dev/sdb

3) Dump of the first super block (it has the layout information)

 # dd if=/dev/sdb1 bs=1k count=1 skip=1 2>/dev/null | hd

Thanks,
Ryusuke Konishi

> >
> > On Sun, 19 Dec 2010 13:04:23 +0100, Jan Misiak wrote:
> >> On 19 December 2010 06:13, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
> >> > Hi,
> >> > On Sat, 18 Dec 2010 15:08:45 +0100, Jan Misiak wrote:
> >> >> Hello,
> >> >>
> >> >> I am just a simple end-user but as nobody in my distribution has had
> >> >> the same problem I was forced to turn to the upstream. Please bear
> >> >> with me.
> >> >>
> >> >> I have been using nilfs2 on a 16GB usb-stick on a x86 thin client
> >> >> running Arch Linux. The box had been running 24/7 and had an uptime of
> >> >> about two weeks with kernel 2.6.36/nilfs-utils 2.0.20 when it
> >> >> panicked. Unfortunately nothing was to be seen in the logs (system
> >> >> partition was ext3). Now it panics every time I attempt to mount the
> >> >> volume.
> >> >>
> >> >> I tried to use netconsole to capture the panic message but it gets
> >> >> truncated so I had to resort to taking pictures.
> >> >>
> >> >> box #1 kernel 2.6.36.2/nilfs-utlis 2.0.20
> >> >>     http://fijam.eu.org/other/netconsole.log
> >> >>     http://fijam.eu.org/other/0000.jpg
> >> >>
> >> >> I tried to mount the usb-stick on a laptop with the same kernel
> >> >> (2.6.36.2) to capture more of the panic messages:
> >> >>
> >> >> box #2 kernel 2.6.36.2/nilfs-utlis 2.0.20
> >> >>     http://fijam.eu.org/other/0001.jpeg
> >> >>     http://fijam.eu.org/other/0002.jpeg
> >> >>
> >> >> It crashes when I try to mount with kernel 2.6.32.27 as well:
> >> >>
> >> >> box #2 kernel 2.6.32.27/nilfs-utlis 2.0.20
> >> >>     http://fijam.eu.org/other/0003.jpeg
> >> >>     http://fijam.eu.org/other/0004.jpeg
> >> >>
> >> >> I would be grateful for advice on how can I help with getting to the
> >> >> bottom of this.
> >> >>
> >> >> Regards,
> >> >> Jan
> >> >
> >> > It looks like these oopses were hit in the common block layer code
> >> > called from the usb mass storage driver.
> >> >
> >> > Could you do some tests to narrow down the issue ?
> >> >
> >> >  1) Use "nogc" mount option to see whether the oops depends on the
> >> >    context of garbage collection or not:
> >> >
> >> >   # mount -t nilfs2 -o nogc /dev/sdb1 /your-mount-dir
> >> >
> >> >  2) Mount the partition read-only with "norecovery" option and make
> >> >    read accesses to the filesystem as below:
> >> >
> >> >   # mount -t nilfs2 -o ro,norecovery /dev/sdb1 /your-mount-dir
> >> >   # find /your-mount-dir -type f -exec cat {} > /dev/null \;
> >> >
> >> >  3) Try to read the block device directly with "dd":
> >> >
> >> >   # dd if=/dev/sdb1 bs=4k > /dev/null
> >> >
> >> >  4) Try lssu and lscp commands in the read-only mount to do quick
> >> >    sanity checks of meta data files.
> >> >
> >> >   # mount -t nilfs2 -o ro,norecovery /dev/sdb1 /your-mount-dir
> >> >   # lssu -a
> >> >   # lscp
> >> >
> >> >
> >> > Regards,
> >> > Ryusuke Konishi
> >> >
> >>
> >> Thank you for your reply and suggestions. I have tried the following:
> >>
> >> # mount -t nilfs2 -o nogc /dev/sdb1 /your-mount-dir
> >>     Results in exactly the same kernel panic.
> >>
> >> # mount -t nilfs2 -o ro,norecovery /dev/sdb1 /your-mount-dir
> >> # find /your-mount-dir -type f -exec cat {} > /dev/null \;
> >>     Doesn't trigger the oops. I was able to retrieve my data but
> >> haven't checked them for correctness yet.
> >>
> >> # lssu -a
> >>     http://fijam.eu.org/other/lssu
> >> # lscp
> >>     http://fijam.eu.org/other/lscp
> >>
> >> # dd if=/dev/sdb1 bs=4k > /dev/null
> >>     Likewise, it doesn't trigger the oops.
> >>
> >> Is there anything else I could do to help?
> >>
> >> Regards,
> >> Jan
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html