Re: [PATCH] ext4: Avoid trim error on fs with small groups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Martin, Lukas,

On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> 
> Lukas,
> 
> > My understanding always was that the request needs to be both
> > properly
> > aligned and of a certain minimum size (discard_granularity) to take
> > effect.
> > 
> > If what you're saying is true then I feel like the documentation of
> > discard_granularity
> > 
> > Documentation/ABI/testing/sysfs-block
> > Documentation/block/queue-sysfs.rst
> > 
> > should change because that's not how I understand the notion of
> > internal allocation unit. However the sentence you quoted is not
> > entirely clear to me either so I'll leave that decision to someone
> > who
> > understands it better than me.
> > 
> > Martin could you please clarify it for us ?
> 
> The rationale behind exposing the discard granularity to userland was
> to
> facilitate mkfs.* picking allocation units that were friendly wrt. the
> device's internal granularity. The nature of that granularity depends
> on
> the type of device (thin provisioned, resource provisioned, SSD, etc.).
> 
> Just like the other queue limits the discard granularity was meant as a
> hint (some of that hintiness got lost as a result of conflating zeroing
> and deallocating but that has since been resolved).
> 
> It is true that some devices get confused if you submit discards that
> are smaller than their internal granularity. However, those devices are
> typically the ones that don't actually support reporting that
> granularity in the first place! Whereas SCSI devices generally don't
> care. They'll happily ignore any parts of the request that are smaller
> than whatever size they use internally.
> 
> One of the problems with "optimizing" away pieces that are smaller than
> the reported granularity is when you combine devices. Say you have a
> RAID1 of an SSD that reports 4096 bytes and one that reports 256MB. The
> stacked device will then have a discard granularity of 256MB and thus
> the SSD with the smaller granularity won't receive any of the discards
> that might otherwise help it manage its media.

I've checked btrfs and xfs, and they treat the minlen like Jan's patch
would - the minlen is set to the device's granularity, and discarding
smaller ranges is silently not even attempted.

So this seems to be common practice among file system implementations,
and thus Jan's patch would make ext4 behave like the others. But I'm
still uncertain if this behavior is ideal, and your remarks seem to
confirm that.

The fstrim man page text is highly misleading. "The default value is
zero, discarding every free block" is obviously wrong, given the
current actual behavior of the major filesystems.

The way this is currently implemented, the user has no way to actually
"discard every free block" to the extent supported by the device. I
think that being able to do this would be desirable, at least in
certain situations.

If we changed this, with the default value of 0 used by fstrim, file
systems would attempt to free every block, which is slow and would
likely either fail or have no effect on may devices. Maybe we should
treat the value "0" like "automatic", i.e. adjust it the minlen to the
device's granularity like we do now, but leave the value unchanged if
the user explicitly sets a small non-zero value? That way "fstrim -m
512" could be used to force discarding every block that can be
discarded.

Regards
Martin




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux