On 2/27/25 03:56, Christoph Hellwig wrote: > Hi all, > > this series adds support for zoned devices: > > https://zonedstorage.io/docs/introduction/zoned-storage > > to XFS. It has been developed for and tested on both SMR hard drives, > which are the oldest and most common class of zoned devices: > > https://zonedstorage.io/docs/introduction/smr > > and ZNS SSDs: > > https://zonedstorage.io/docs/introduction/zns > > It has not been tested with zoned UFS devices, as their current capacity > points and performance characteristics aren't too interesting for XFS > use cases (but never say never). > > Sequential write only zones are only supported for data using a new > allocator for the RT device, which maps each zone to a rtgroup which > is written sequentially. All metadata and (for now) the log require > using randomly writable space. This means a realtime device is required > to support zoned storage, but for the common case of SMR hard drives > that contain random writable zones and sequential write required zones > on the same block device, the concept of an internal RT device is added > which means using XFS on a SMR HDD is as simple as: > > $ mkfs.xfs /dev/sda > $ mount /dev/sda /mnt > > When using NVMe ZNS SSDs that do not support conventional zones, the > traditional multi-device RT configuration is required. E.g. for an > SSD with a conventional namespace 1 and a zoned namespace 2: > > $ mkfs.xfs /dev/nvme0n1 -o rtdev=/dev/nvme0n2 > $ mount -o rtdev=/dev/nvme0n2 /dev/nvme0n1 /mnt > > The zoned allocator can also be used on conventional block devices, or > on conventional zones (e.g. when using an SMR HDD as the external RT > device). For example using zoned XFS on normal SSDs shows very nice > performance advantages and write amplification reduction for intelligent > workloads like RocksDB. > > Some work is still in progress or planned, but should not affect the > integration with the rest of XFS or the on-disk format: > > - support for quotas > - support for reflinks - the I/O path already supports them, but > garbage collection currently isn't refcount aware and would unshare > them, rendering the feature useless > - more scalable garbage collection victim selection > - various improvements to hint based data placement > > To make testing easier a git tree is provided that has the required > iomap changes that we merged through the VFS tree, this code and a > few misc patches that make VM testing easier: > > git://git.infradead.org/users/hch/xfs.git xfs-zoned > > The matching xfsprogs is available here: > > git://git.infradead.org/users/hch/xfsprogs.git xfs-zoned I ran this several times doing short runs and a long run over the weekend on a 30TB SMR HDD, doing random sized buffered writes, read and readwrite IOs to randomly sized files. Working great for me. So: Tested-by: Damien Le Moal <dlemoal@xxxxxxxxxx> -- Damien Le Moal Western Digital Research