RE: help about ext3 read-only issue on ext3(2.6.16.30)

"Peng, Tao" <tao.peng@xxxxxxx> · Fri, 14 Dec 2012 03:32:32 +0000



> -----Original Message-----
> From: linux-fsdevel-owner@xxxxxxxxxxxxxxx [mailto:linux-fsdevel-owner@xxxxxxxxxxxxxxx] On Behalf Of Li
> Zefan
> Sent: Wednesday, December 12, 2012 7:31 PM
> To: Jan Kara
> Cc: qixuan wu; Tao Ma; Theodore Ts'o; Eric Sandeen; Yafang Shao; linux-fsdevel@xxxxxxxxxxxxxxx; linux-
> ext4@xxxxxxxxxxxxxxx; wuqixuan@xxxxxxxxxx; xieshuangyi@xxxxxxxxxx
> Subject: Re: help about ext3 read-only issue on ext3(2.6.16.30)
> 
> On 2012/12/12 18:04, Jan Kara wrote:
> > On Tue 11-12-12 16:01:51, Li Zefan wrote:
> >>>>> We have already dump of the data by debugfs. The data is very good
> >>>>> without error. But we just did it before fsck, even the fsck is not
> >>>>> giving any error. I want to know whether fsck will modify disk data
> >>>>> without reporting any error or not ?
> >>>>   Ah, OK. So it seems that directory block is OK, just  f_pos gets corrupted
> >>>> somehow. There are guards in ext3_readdir() to rescan dir block when
> >>>> directory is modified but maybe that's not working correctly. I don't want
> >>>> to burn too much time on this since this is so ancient kernel but I'd be
> >>>> looking in that direction...
> >>>>
> >>>
> >>> I've added some debug code into ext3, which does these things:
> >>> - dump the dir block
> >>> - print the current and last f_pos and offset
> >>> - dump_stack() to see which process triggers the bug
> >>>
> >>> Hope we can trigger the bug in our labs (We did see this happened twice this week
> >>> in a lab), though we can't patch the kernel in the products.
> >>>
> >>> I compared ext3_readdir() with latest ext3, and saw no difference except some
> >>> API changes. I'll dig deeper. Thansks for the suggestion!
> >>>
> >>
> >> We've managed to trigger the bug once, and collected some debug information. We
> >> found the buffer head wasn't corrupted, but f_pos was set to 4024 and then ext3
> >> reported error.
> >>
> >> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #12747345: rec_len is smaller
> than minimal - offset=4024, inode=0, rec_len=0, name_len=0
> >> Aborting journal on device sda7.
> >> ext3_abort called.
> >> EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
> >> Remounting filesystem read-only
> >>
> >> 00000000: 51 82 c2 00 0c 00 01 02 2e 00 00 00 04 80 c2 00  Q...............
> >> 00000010: 0c 00 02 02 2e 2e 00 00 d6 80 c2 00 10 00 06 02  ................
> >> 00000020: 62 61 63 6b 75 70 00 00 bb 82 c2 00 1c 00 11 01  backup..........
> >> 00000030: 4d 6f 6e 69 74 6f 72 53 65 72 76 69 63 65 2e 6f  MonitorService.o
> >> 00000040: 70 00 00 00 be 82 c2 00 1c 00 13 01 43 6f 6d 70  p...........Comp
> >> 00000050: 6c 61 69 6e 74 50 72 6f 63 65 73 73 2e 6f 70 00  laintProcess.op.
> >> 00000060: c2 82 c2 00 20 00 15 01 4c 6f 63 61 74 69 6f 6e  .... ...Location
> >> 00000070: 50 72 65 50 72 6f 63 65 73 73 2e 6f 70 00 00 00  PreProcess.op...
> >> 00000080: c9 82 c2 00 18 00 0f 01 4e 6f 72 74 68 50 72 6f  ........NorthPro
> >> 00000090: 63 65 73 73 2e 6f 70 00 d4 82 c2 00 18 00 0d 01  cess.op.........
> >> 000000a0: 53 79 73 4d 6f 6e 69 74 6f 72 2e 6f 70 00 00 00  SysMonitor.op...
> >> 000000b0: db 82 c2 00 1c 00 13 01 56 56 49 50 4e 6f 72 74  ........VVIPNort
> >> 000000c0: 68 50 72 6f 63 65 73 73 2e 6f 70 00 e1 82 c2 00  hProcess.op.....
> >> 000000d0: 34 0f 09 01 72 61 6e 73 61 75 2e 6f 70 00 00 00  4...ransau.op...
> >> 000000e0: 4f 83 c2 00 20 0f 1e 01 72 61 6e 73 61 75 2e 6f  O... ...ransau.o
> >> 000000f0: 70 2e 32 30 31 32 31 32 31 30 30 32 30 39 32 34  p.20121210020924
> >> 00000100: 34 35 31 33 39 34 00 00 79 83 c2 00 f8 0e 18 01  451394..y.......
> >> 00000110: 72 61 6e 73 61 75 2e 6f 70 2e 32 30 31 32 31 32  ransau.op.201212
> >> 00000120: 31 30 30 32 30 39 32 34 00 00 00 00 00 00 00 00  10020924........
> >> ...
> >> 00000ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>
> >> last_offset=-1, last_fpos=-1, f_pos=4024
> >>
> >> -1 means we hit the bug in the first iteration in the insde while in
> >> ext3_readdir().
> >>
> >> I've checked how ext3_readdir() works and how f_pos, f_version and i_version
> >> get initialized and modified. Now I'm lost. I really can't see how f_pos got
> >> corrupted. :(
> >   Hum, it looks really curious. So f_pos has been 4024 when we entered
> > ext3_readdir()?
> 
> dunno. but what else can be
> 
> > Do you know what it was when we last left ext3_readdir()
> > for that filp? You can store that value in some debug entry added to struct
> > file... Also any chance we ever hit:
> >                                 if (version != filp->f_version)
> >                                         goto revalidate;
> > I don't think it can ever happen since we hold i_mutex and
> > generic_file_llseek() takes i_mutex as well. But better be sure.
> >
> 
> Yesterday I've added more debug aids, which convers all the above information
> mentioned. Actually the code tracks all the places that change f_pos, and
> I think only lseek() and readdir() can change it.
> 
> Now I'm waiting for the bug to happen again, can be several days...
Not sure if related. I've seen crappy apps doing seekdir() randomly causing Lustre client oops. We fixed it by checking if f_version is 0 as well, which may pass the (filp->f_version != inode->i_version) check in Lustre client when a dir is lseek()ed before i_version changes. Not sure if ext3 has the same problem though. Maybe it will be helpful to print f_version and i_version when you reproduce it.

Thanks,
Tao

��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f