Re: very large mount time after unxepected power down

Vyacheslav Dubeyko <slava@xxxxxxxxxxx> · Fri, 16 Nov 2012 11:37:16 +0400

On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote:
> dmesg:
> [53994.254432] NILFS warning: mounting unchecked fs
> [56686.968229] NILFS: recovery complete.
> [56686.969316] segctord starting. Construction interval = 5 seconds,
> CP frequency < 30 seconds
> 
> messages:
> Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting
> unchecked fs
> Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete.
> Nov 15 11:42:02 router kernel: [56686.969316] segctord starting.
> Construction interval = 5 seconds, CP frequency < 30 seconds
> 
> May be there is some kernel config option to get more debug output?
> 

I am afraid that it is all that we can get from NILFS2 driver currently.
So, as I understand, we haven't any messages about detected corruptions.
It needs to analyze situation further.

But maybe, it makes sense to enable some kernel options from the kernel
hacking part (maybe, synchronization related). 

> As for fsck, I have not found it in git public repo, so where can I
> get the latest version?

Unfortunately, you can get it in the form of path set only. I sent the
last version (v4) in the e-mail list at November 12. 

With the best regards,
Vyacheslav Dubeyko.

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/11/16 Vyacheslav Dubeyko <slava@xxxxxxxxxxx>:
> > On Fri, 2012-11-16 at 09:40 +0300, Сергей Александров wrote:
> >> Sorry, but I didn't save top output this time..
> >> But for sure, it was "mount /dev/md0 /nfs/raid -o ...." process. The
> >> CPU load was fully in kernel space.
> >> So while the mount call, the kernel was doing something very both IO
> >> and CPU intensive for almost 50 minutes.
> >> As I have already written the load was about 80MB/s read IO according
> >> to iotop, and about 60% of the first CPU core according to top.
> >>
> >
> > Ok. I see.
> >
> > I suspect currently that you can have some special corruption of the
> > volume state that is resulted in so long recovery code working time. But
> > if so, then you can have some warning messages in system log from
> > recovery subsystem (maybe not, of course). As I know, Gentoo has special
> > log that keeps error and warning messages from the kernel. Could you
> > check that shared by you the dmesg output contains error messages from
> > kernel?
> >
> > Moreover, current functionality state of fsck.nilfs2 is not very useful
> > yet. But it can check superblocks and segment summary headers validity.
> > Maybe it makes sense to check your volume by fsck.nilfs2. Could you try
> > to check your volume?
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> >
> >> If this info is not sufficient I'll try to reproduce the case as soon
> >> as possible.
> >> --------------------------------------------------
> >> Александров Сергей Васильевич
> >>
> >>
> >> 2012/11/16 Vyacheslav Dubeyko <slava@xxxxxxxxxxx>:
> >> > On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote:
> >> >> lssu, lscp after mount. Actually I missed the moment and
> >> >> nilfs_cleanerd has cleaned some data.
> >> >> Mount took about 50 minutes.
> >> >>
> >> >
> >> > Thank you for info.
> >> >
> >> > I have some additional questions after thinking about issue. As I
> >> > remember, you wrote that you tried to understand what process eats CPU
> >> > time during issue. But you don't share details about it. Could you share
> >> > details of "top" and "ps ax" outputs for the case of issue reproducing?
> >> >
> >> > With the best regards,
> >> > Vyacheslav Dubeyko.
> >> >
> >> >> --------------------------------------------------
> >> >> Александров Сергей Васильевич
> >> >>
> >> >>
> >
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html