On Mon, May 11 2020 at 2:31am -0400, Hannes Reinecke <hare@xxxxxxx> wrote: > On 5/11/20 4:46 AM, Damien Le Moal wrote: > >On 2020/05/08 18:03, Hannes Reinecke wrote: > >>Hi all, > >> > >>this patchset adds a new metadata version 2 for dm-zoned, which brings the > >>following improvements: > >> > >>- UUIDs and labels: Adding three more fields to the metadata containing > >> the dm-zoned device UUID and label, and the device UUID. This allows > >> for an unique identification of the devices, so that several dm-zoned > >> sets can coexist and have a persistent identification. > >>- Extend random zones by an additional regular disk device: A regular > >> block device can be added together with the zoned block device, providing > >> additional (emulated) random write zones. With this it's possible to > >> handle sequential zones only devices; also there will be a speed-up if > >> the regular block device resides on a fast medium. The regular block device > >> is placed logically in front of the zoned block device, so that metadata > >> and mapping tables reside on the regular block device, not the zoned device. > >>- Tertiary superblock support: In addition to the two existing sets of metadata > >> another, tertiary, superblock is written to the first block of the zoned > >> block device. This superblock is for identification only; the generation > >> number is set to '0' and the block itself it never updated. The addition > >> metadate like bitmap tables etc are not copied. > >> > >>To handle this, some changes to the original handling are introduced: > >>- Zones are now equidistant. Originally, runt zones were ignored, and > >> not counted when sizing the mapping tables. With the dual device setup > >> runt zones might occur at the end of the regular block device, making > >> direct translation between zone number and sector/block number complex. > >> For metadata version 2 all zones are considered to be of the same size, > >> and runt zones are simply marked as 'offline' to have them ignored when > >> allocating a new zone. > >>- The block number in the superblock is now the global number, and refers to > >> the location of the superblock relative to the resulting device-mapper > >> device. Which means that the tertiary superblock contains absolute block > >> addresses, which needs to be translated to the relative device addresses > >> to find the referenced block. > >> > >>There is an accompanying patchset for dm-zoned-tools for writing and checking > >>this new metadata. > >> > >>As usual, comments and reviews are welcome. > > > >I gave this series a good round of testing. See the attached picture for the > >results. The test is this: > >1) Setup dm-zoned > >2) Format and mount with mkfs.ext4 -E packed_meta_blocks=1 /dev/mapper/xxx > >3) Create file random in size between 1 and 4MB and measure user seen throughput > >over 100 files. > >3) Run that for 2 hours > > > >I ran this over a 15TB SMR drive single drive setup, and on the same drive + a > >500GB m.2 ssd added. > > > >For the single drive case, the usual 3 phases can be seen: start writing at > >about 110MB/s, everything going to conventional zones (note conv zones are in > >the middle of the disk, hence the low-ish throughput). Then after about 400s, > >reclaim kicks in and the throughput drops to 60-70 MB/s. As reclaim cannot keep > >up under this heavy write workload, performance drops to 20-30MB/s after 800s. > >All good, without any idle time for reclaim to do its job, this is all expected. > > > >For the dual drive case, things are more interesting: > >1) The first phase is longer as overall, there is more conventional space (500G > >ssd + 400G on SMR drive). So we see the SSD speed first (~425MB/s), then the > >drive speed (100 MB/s), slightly lower than the single drive case toward the end > >as reclaim triggers. > >2) Some recovery back to ssd speed, then a long phase at half the speed of the > >ssd as writes go to ssd and reclaim is running moving data out of the ssd onto > >the disk. > >3) Then a long phase at 25MB/s due to SMR disk reclaim. > >4) back up to half the ssd speed. > > > >No crashes, no data corruption, all good. But is does look like we can improve > >on performance further by preventing using the drive conventional zones as > >"buffer" zones. If we let those be the final resting place of data, the SMR disk > >only reclaim would not kick in and hurt performance as seen here. That I think > >can all be done on top of this series though. Let's get this in first. > > > Thanks for the data! That indeed is very interesting; guess I'll do > some tests here on my setup, too. > (And hope it doesn't burn my NVDIMM ...) > > But, guess what, I had the some thoughts; we should be treating the > random zones more like sequential zones in a two-disk setup. > So guess I'll be resurrecting the idea from my very first patch and > implement 'cache' zones in addition to the existing 'random' and > 'sequential' zones. > But, as you said, that'll be a next series of patches. FYI, I staged the series in linux-next (for 5.8) yesterday, see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.8 So please base any follow-on fixes or advances on this baseline. Thanks! Mike -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel