Re: raid0 vs. mkfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/27/2016 07:09 PM, Coly Li wrote:
On 2016/11/27 下午11:24, Avi Kivity wrote:
mkfs /dev/md0 can take a very long time, if /dev/md0 is a very large
disk that supports TRIM/DISCARD (erase whichever is inappropriate).
That is because mkfs issues a TRIM/DISCARD (erase whichever is
inappropriate) for the entire partition. As far as I can tell, md
converts the large TRIM/DISCARD (erase whichever is inappropriate) into
a large number of TRIM/DISCARD (erase whichever is inappropriate)
requests, one per chunk-size worth of disk, and issues them to the RAID
components individually.


It seems to me that md can convert the large TRIM/DISCARD (erase
whichever is inappropriate) request it gets into one TRIM/DISCARD (erase
whichever is inappropriate) per RAID component, converting an O(disk
size / chunk size) operation into an O(number of RAID components)
operation, which is much faster.


I observed this with mkfs.xfs on a RAID0 of four 3TB NVMe devices, with
the operation taking about a quarter of an hour, continuously pushing
half-megabyte TRIM/DISCARD (erase whichever is inappropriate) requests
to the disk. Linux 4.1.12.
It might be possible to improve a bit for DISCARD performance, by your
suggestion. The implementation might be tricky, but it is worthy to try.

Indeed, it is not only for DISCARD, for read or write, it might be
helpful for better performance as well. We can check the bio size, if,
	bio_sectors(bio)/conf->nr_strip_zones >= SOMETHRESHOLD
it means on each underlying device, we have more then SOMETHRESHOLD
continuous chunks to issue, and they can be merged into a larger bio.

It's true that this does not strictly apply to TRIM/DISCARD (erase whichever is inappropriate), but to see any gain for READ/WRITE, you need a request that is larger than (chunk size) * (raid elements), which is unlikely for reasonable values of those parameters. But a common implementation can of course work for multiple request types.

IMHO it's interesting, good suggestion!

Looking forward to seeing an implementation!


Coly


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux