Re: RAID creation resync behaviors

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Thu, 4 May 2017 16:50:38 +0100

On 04/05/17 02:54, Shaohua Li wrote:
> On Wed, May 03, 2017 at 11:06:01PM +0200, David Brown wrote:
>> On 03/05/17 22:27, Shaohua Li wrote:
>>> Hi,
>>>
>>> Currently we have different resync behaviors in array creation.
>>>
>>> - raid1: copy data from disk 0 to disk 1 (overwrite)
>>> - raid10: read both disks, compare and write if there is difference (compare-write)
>>> - raid4/5: read first n-1 disks, calculate parity and then write parity to the last disk (overwrite)
>>> - raid6: read all disks, calculate parity and compare, and write if there is difference (compare-write)
>>>
>>> Write whole disk is very unfriendly for SSD, because it reduces lifetime. And
>>> if user already does a trim before creation, the unncessary write could make
>>> SSD slower in the future. Could we prefer compare-write to overwrite if mdadm
>>> detects the disks are SSD? Surely sometimes compare-write is slower than
>>> overwrite, so maybe add new option in mdadm. An option to let mdadm trim SSD
>>> before creation sounds reasonable too.
>>>
>>
>> When doing the first sync, md tracks how far its sync has got, keeping a
>> record in the metadata in case it has to be restarted (such as due to a
>> reboot while syncing).  Why not simply /not/ sync stripes until you first
>> write to them?  It may be that a counter of synced stripes is not enough,
>> and you need a bitmap (like the write intent bitmap), but it would reduce
>> the creation sync time to 0 and avoid any writes at all.
> 
> For raid 4/5/6, this means we always must do a full stripe write for any normal
> write if it hits a range not synced. This would harm the performance of the
> norma write. For raid1/10, this sounds more appealing. But since each bit in
> the bitmap will stand for a range. If only part of the range is written by
> normal IO, we have two choices. sync the range immediately and clear the bit,
> this sync will impact normal IO. Don't do the sync immediately, but since the
> bit is set (which means the range isn't synced), read IO can only access the
> first disk, which is harmful too.
> 
We're creating the array, right? So the user is sitting in front of
mdadm looking at its output, right?

So we just print a message saying "the disks aren't sync'd. If you don't
want a performance hit in normal use, fire up a sync now and take the
hit up front".

The question isn't "how do we avoid a performance hit?", it's "we're
going to take a hit, do we take it up-front on creation or defer it
until we're using the array?".

Cheers,
Wol

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html