Am 27.05.20 um 12:32 schrieb Lukas Czerner: > On Wed, May 27, 2020 at 12:11:52PM +0200, Reindl Harald wrote: >> >> Am 27.05.20 um 11:57 schrieb Lukas Czerner: >>> On Wed, May 27, 2020 at 11:32:02AM +0200, Reindl Harald wrote: >>>> >>>> >>>> Am 27.05.20 um 11:19 schrieb Lukas Czerner: >>>>> On Wed, May 27, 2020 at 04:38:50PM +0900, Wang Shilong wrote: >>>>>> From: Wang Shilong <wshilong@xxxxxxx> >>>>>> >>>>>> Currently WAS_TRIMMED flag is not persistent, whenever filesystem was >>>>>> remounted, fstrim need walk all block groups again, the problem with >>>>>> this is FSTRIM could be slow on very large LUN SSD based filesystem. >>>>>> >>>>>> To avoid this kind of problem, we introduce a block group flag >>>>>> EXT4_BG_WAS_TRIMMED, the side effect of this is we need introduce >>>>>> extra one block group dirty write after trimming block group. >>>> >>>> would that also fix the issue that *way too much* is trimmed all the >>>> time, no matter if it's a thin provisioned vmware disk or a phyiscal >>>> RAID10 with SSD >>> >>> no, the mechanism remains the same, but the proposal is to make it >>> pesisten across re-mounts. >>> >>>> >>>> no way of 315 MB deletes within 2 hours or so on a system with just 485M >>>> used >>> >>> The reason is that we're working on block group granularity. So if you >>> have almost free block group, and you free some blocks from it, the flag >>> gets freed and next time you run fstrim it'll trim all the free space in >>> the group. Then again if you free some blocks from the group, the flags >>> gets cleared again ... >>> >>> But I don't think this is a problem at all. Certainly not worth tracking >>> free/trimmed extents to solve it. >> >> it is a problem >> >> on a daily "fstrim -av" you trim gigabytes of alredy trimmed blocks >> which for example on a vmware thin provisioned vdisk makes it down to >> CBT (changed-block-tracking) >> >> so instead completly ignore that untouched space thanks to CBT it's >> considered as changed and verified in the follow up backup run which >> takes magnitutdes longer than needed > > Looks like you identified the problem then ;) well, in a perfect world..... > But seriously, trim/discard was always considered advisory and the > storage is completely free to do whatever it wants to do with the > information. I might even be the case that the discard requests are > ignored and we might not even need optimization like this. But > regardless it does take time to go through the block gropus and as a > result this optimization is useful in the fs itself. luckily at least fstrim is non-blocking in a vmware environment, on my physical box it takes ages this machine *does nothing* than wait to be cloned, 235 MB pretended deleted data within 50 minutes is absurd on a completly idle guest so even when i am all in for optimizations thatÄs way over top [root@master:~]$ fstrim -av /boot: 0 B (0 bytes) trimmed on /dev/sda1 /: 235.8 MiB (247201792 bytes) trimmed on /dev/sdb1 [root@master:~]$ df Filesystem Type Size Used Avail Use% Mounted on /dev/sdb1 ext4 5.8G 502M 5.3G 9% / /dev/sda1 ext4 485M 39M 443M 9% /boot > However it seems to me that the situation you're describing calls for > optimization on a storage side (TP vdisk in your case), not file system > side. > > And again, for fine grained discard you can use -o discard with a terrible performance impact at runtime