Re: very large mount time after unxepected power down

Vyacheslav Dubeyko <slava@xxxxxxxxxxx> · Tue, 30 Oct 2012 20:35:41 +0300

On Oct 30, 2012, at 6:02 PM, Сергей Александров wrote:

> --------------------------------------------------
> Александров Сергей Васильевич
> 
> 
> 2012/10/30 Vyacheslav Dubeyko <slava@xxxxxxxxxxx>:
>> On Tue, 2012-10-30 at 17:30 +0300, Сергей Александров wrote:
>>> --------------------------------------------------
>>> Александров Сергей Васильевич
>>> 
>>> 
>>> 2012/10/30 Vyacheslav Dubeyko <slava@xxxxxxxxxxx>:
>>>> Hi,
>>>> 
>>>> On Tue, 2012-10-30 at 16:20 +0300, Сергей Александров wrote:
>>>>> Good time of the day!
>>>>> 
>>>>> I'v got a nilfs2 partition on a 1TB md RAID1 partition composed of two
>>>>> HDD's. Kernel 3.5.3, userspace utils v2.1.1. Gentoo linux
>>>>> distribution.
>>>>> Just updated utils to 2.1.4 but no failure since.
>>>>> 
>>>>> After power shutdown, mount takes about several hours.
>>>>> 
>>>> 
>>>> What about RAID1 consistency? Could you describe more about your RAID
>>>> configuration?
>>> 
>>> # cat /proc/mdstat
>>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>>> md0 : active raid1 sdb1[0] sdc1[2]
>>>      976760400 blocks super 1.2 [2/2] [UU]
>>> 
>>> So, raid is consistent. Reading speed from md device is about 60MB/s
>>> according to iostat.
>>> 
>>>>> For the first time I thought that it won't mount at all and tried to
>>>>> use fsck tool, found somewhere in the internet(don't really remember).
>>>>> It reported that superblock is ok.
>>>> 
>>>> So, I am implementing the fsck tool for NILFS2. I guess that you take
>>>> sources from NILFS2 e-mail list.
>>>> 
>>>>> Than I commented the check in the source file and the default number
>>>>> of blocks to check appeared to be too small. It failed to find the
>>>>> next superblock. I've increased the number, but increasing it to *100
>>>>> didn't help.
>>>> 
>>>> Sorry, I can't understand about what sources you are talking. Could you
>>>> describe more details about what and where you commented?
>>>> 
>>> I've forced test_latest_log to return negative result. And changed
>>> MAX_SCAN_SEGMENTS to 100000
>>> That was not enough. It finished without finding the SB.
>>> 
>>> 
>>> The load from fsck was the same as from mount.
>>> About 60MB/s read from md device and about 30% load on one core.
>>> 
>>>>> So, probably the reserved SB is too far from away and it takes too
>>>>> long to find it.
>>>>> 
>>>> 
>>>> If you try to find the second superblock then it is placed in the begin
>>>> of last 4 KB of the volume. Your device size is 1000202649600 bytes.
>>>> 
>>>>> Does anybody knows, how can it be speed up? I know, UPS is a solution,
>>>>> but I consider it be a bug.
>>>>> 
>>>> 
>>>> Could you share more details about situation during mount operations? I
>>>> mean: (1) NILFS2-related messages in the system log; (2) "ps ax" output;
>>>> (3) maybe "top" output can be useful also; (4) "mount" output before
>>>> trying to mount NILFS2 volume.
>>> last situation:
>>> 
>>> messages log:
>>> Oct 30 12:18:52 router kernel: [  159.674579] NILFS warning: mounting
>>> unchecked fs
>>> .....
>>> .....
>>> Oct 30 13:03:06 router kernel: [ 2810.304245] NILFS: recovery complete.
>>> Oct 30 13:03:06 router kernel: [ 2810.325240] segctord starting.
>>> Construction interval = 5 seconds, CP frequency < 30 seconds
>>> Oct 30 13:03:07 router nilfs_cleanerd[15453]: start
>>> Oct 30 13:03:07 router nilfs_cleanerd[15453]: pause (clean check)
>>> 
>> 
>> Could you share content of your /etc/nilfs_cleanerd.conf file?
> 
> 
> protection_period       3600
> min_clean_segments      10%
> max_clean_segments      20%
> clean_check_interval    10
> selection_policy        timestamp
> nsegments_per_clean     4
> mc_nsegments_per_clean  8
> cleaning_interval       5
> mc_cleaning_interval    2
> retry_interval          30
> use_mmap
> log_priority            info
> 
>> Could you try to reproduce the issue with log_priority enhanced to debug
>> level (I mean option in nilfs_cleanerd.conf) and share messages log
>> again?
> 
> I can, but a bit later, if you don''t mind(in 24 hours).
> 

Please, try to mount with using "strace". The output of "strace" can be useful. Please, share it after trying.

Thanks,
Vyacheslav Dubeyko.

>>> It took about 45 minutes.
>>> Previous time it took more than 4 hours.
>> 
>> You mean that your console returns input after 45 minutes when you try
>> to execute mount. Am I correct?
> 
> Yes, you are correct.
> 
> --------------------------------------------------
> Aleksandrov Sergey Vasil'evich
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html