On Fri, 2012-11-16 at 10:11 +0300, Сергей Александров wrote: > dmesg: > [53994.254432] NILFS warning: mounting unchecked fs > [56686.968229] NILFS: recovery complete. > [56686.969316] segctord starting. Construction interval = 5 seconds, > CP frequency < 30 seconds > > messages: > Nov 15 10:57:06 router kernel: [53994.254432] NILFS warning: mounting > unchecked fs > Nov 15 11:42:02 router kernel: [56686.968229] NILFS: recovery complete. > Nov 15 11:42:02 router kernel: [56686.969316] segctord starting. > Construction interval = 5 seconds, CP frequency < 30 seconds > > May be there is some kernel config option to get more debug output? > I am afraid that it is all that we can get from NILFS2 driver currently. So, as I understand, we haven't any messages about detected corruptions. It needs to analyze situation further. But maybe, it makes sense to enable some kernel options from the kernel hacking part (maybe, synchronization related). > As for fsck, I have not found it in git public repo, so where can I > get the latest version? Unfortunately, you can get it in the form of path set only. I sent the last version (v4) in the e-mail list at November 12. With the best regards, Vyacheslav Dubeyko. > -------------------------------------------------- > Александров Сергей Васильевич > > > 2012/11/16 Vyacheslav Dubeyko <slava@xxxxxxxxxxx>: > > On Fri, 2012-11-16 at 09:40 +0300, Сергей Александров wrote: > >> Sorry, but I didn't save top output this time.. > >> But for sure, it was "mount /dev/md0 /nfs/raid -o ...." process. The > >> CPU load was fully in kernel space. > >> So while the mount call, the kernel was doing something very both IO > >> and CPU intensive for almost 50 minutes. > >> As I have already written the load was about 80MB/s read IO according > >> to iotop, and about 60% of the first CPU core according to top. > >> > > > > Ok. I see. > > > > I suspect currently that you can have some special corruption of the > > volume state that is resulted in so long recovery code working time. But > > if so, then you can have some warning messages in system log from > > recovery subsystem (maybe not, of course). As I know, Gentoo has special > > log that keeps error and warning messages from the kernel. Could you > > check that shared by you the dmesg output contains error messages from > > kernel? > > > > Moreover, current functionality state of fsck.nilfs2 is not very useful > > yet. But it can check superblocks and segment summary headers validity. > > Maybe it makes sense to check your volume by fsck.nilfs2. Could you try > > to check your volume? > > > > With the best regards, > > Vyacheslav Dubeyko. > > > > > >> If this info is not sufficient I'll try to reproduce the case as soon > >> as possible. > >> -------------------------------------------------- > >> Александров Сергей Васильевич > >> > >> > >> 2012/11/16 Vyacheslav Dubeyko <slava@xxxxxxxxxxx>: > >> > On Thu, 2012-11-15 at 16:08 +0300, Сергей Александров wrote: > >> >> lssu, lscp after mount. Actually I missed the moment and > >> >> nilfs_cleanerd has cleaned some data. > >> >> Mount took about 50 minutes. > >> >> > >> > > >> > Thank you for info. > >> > > >> > I have some additional questions after thinking about issue. As I > >> > remember, you wrote that you tried to understand what process eats CPU > >> > time during issue. But you don't share details about it. Could you share > >> > details of "top" and "ps ax" outputs for the case of issue reproducing? > >> > > >> > With the best regards, > >> > Vyacheslav Dubeyko. > >> > > >> >> -------------------------------------------------- > >> >> Александров Сергей Васильевич > >> >> > >> >> > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html