Re: md0: bitmap file is out of date, resync

Hou Tao <houtao1@xxxxxxxxxx> · Sat, 6 Jul 2019 12:49:21 +0800

Hi,

On 2019/6/28 18:57, Mathias G wrote:
> Hi Song
> 
> On 21.06.19 21:51, Mathias G wrote:
>>> Question: are you running the two drives with write cache on?
>>> If yes, and if your application is not heavy on writes, could you try
>>> turn off HDD write cache and see if the issue repros?
>> Thanks for this. I just disabled the write cache with hdparm in rc.local
>> for both RAID members and will let you know if the problem occurs again.
> 
> Today the problem occurred again:
> 
> kern.log
>> Jun 28 12:39:11 $hostname kernel: [    2.098096] md/raid1:md0: not clean -- starting background reconstruction
>> Jun 28 12:39:11 $hostname kernel: [    2.098099] md/raid1:md0: active with 2 out of 2 mirrors

"not clean" means the resync has not been completed yet. Is the array still "not clean" in the previous boot/reboot or not ?
If it is only "not clean" in the reproduced boot, that may mean the final update of MD super block is lost and the lagging
behind of the events in bitmap super-block will be possible.

If the array is also "not clean" in the previous boot/reboot, could you please check when does the status of array
change from "clean" to "not clean" ?

Is the RAID array (md0) used as rootfs or other fs ? And how do you reproduce the problem ? Just rebooting continuously
until the problem reoccurs ? And not suddenly power-cut ?

Regards,
Tao

>> Jun 28 12:39:11 $hostname kernel: [    2.098201] md0: bitmap file is out of date (236662 < 236663) -- forcing full recovery
>> Jun 28 12:39:11 $hostname kernel: [    2.098252] md0: bitmap file is out of date, doing full recovery
> 
> And the write cache is disabled for both RAID members:
>> # hdparm -i /dev/sdb |grep WriteCache
>>  AdvancedPM=yes: disabled (255) WriteCache=disabled
> 
>> # hdparm -i /dev/sdc |grep WriteCache
>>  AdvancedPM=no WriteCache=disabled
> 
> I'm a little at a loss..
>