On Mon, Jul 22, 2019 at 04:01:27PM +0300, Lennert Buytenhek wrote: > I tried something like this, and indeed, as you say, xfs seems to trim > more than you ask it to. From looking at the code, it seems benign -- > it just seems to trim allocation groups (AGs) that overlap with your > range, and so it can end up trimming a larger range than intended. > Unfortunately, my allocation groups are sized such that trimming the > minimum range at a time still produces long stalls. :-/ Crazy Idea #1: fallocate a file using most of free space, filefrag / fiemap its physical address block ranges, translate those to physical address ranges on the member devices, then blkdiscard -o offset -l length the member devices directly, bypassing the RAID layer entirely. That would be blazing fast (depending on free space fragmentation). However. To pull this off you need to be super confident in your understanding of mdadm's RAID layout. And perhaps even then actually write some boundary markers to the file and verify it ends up where you expect. Otherwise it's a fast lane ticket to the data loss department... Crazy Idea #2: While on the topic of fallocate, I notice with XFS it blocks TRIM, even though that may be a bug as well. Not much reason to not trim unwritten extents. Guess they're used too rarely to matter much... So you can perhaps use this indirectly, do a fallocate-fstrim dance for more finegrained control: # df -B 1G . Filesystem 1G-blocks Used Available Use% Mounted on /dev/loop0 1024 2 1023 1% /mnt/tmp # fallocate --length 1010G trim.todo # df -B 1G . Filesystem 1G-blocks Used Available Use% Mounted on /dev/loop0 1024 1012 13 99% /mnt/tmp # fstrim -v . .: 13.5 GiB (14495330304 bytes) trimmed # fallocate --length 10G trim.done # truncate -s 1000G trim.todo # fstrim -v . .: 13.5 GiB (14495330304 bytes) trimmed # fallocate --length 20G trim.done # truncate -s 990G trim.todo # fstrim -v . .: 13.5 GiB (14495330304 bytes) trimmed ... well, something like that, maybe ... You grow trim.done in every step and shrink trim.todo in every step, so the window that will be trimmed moves along until you covered all the free space. At the end you have a very large trim.todo file of which you're sure the space inside has already been trimmed. And if you keep this file around (monitor free space, shrink or grow as needed), you can effectively implement your very own "don't re-trim previously trimmed space" logic in XFS. And unlike ext4's it would survive reboots. So much for crazy ideas, still hope it can be improved in kernel somehow. Regards Andreas Klauer