Re: [PATCH] ext4: Avoid trim error on fs with small groups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 15, 2021 at 04:10:12PM +0100, Martin Wilck wrote:
> On Mon, 2021-11-15 at 15:53 +0100, Lukas Czerner wrote:
> > On Mon, Nov 15, 2021 at 03:22:02PM +0100, Martin Wilck wrote:
> > > On Mon, 2021-11-15 at 13:51 +0100, Jan Kara wrote:
> > > > [Added Martin to CC who originally found the problem]
> > > > 
> > > > On Mon 15-11-21 12:48:21, Lukas Czerner wrote:
> > > > > On Fri, Nov 12, 2021 at 04:22:02PM +0100, Jan Kara wrote:
> > > > > > A user reported FITRIM ioctl failing for him on ext4 on some
> > > > > > devices
> > > > > > without apparent reason.  After some debugging we've found
> > > > > > out
> > > > > > that
> > > > > > these devices (being LVM volumes) report rather large discard
> > > > > > granularity of 42MB and the filesystem had 1k blocksize and
> > > > > > thus
> > > > > > group
> > > > > > size of 8MB. Because ext4 FITRIM implementation puts discard
> > > > > > granularity into minlen, ext4_trim_fs() declared the trim
> > > > > > request
> > > > > > as
> > > > > > invalid. However just silently doing nothing seems to be a
> > > > > > more
> > > > > > appropriate reaction to such combination of parameters since
> > > > > > user
> > > > > > did
> > > > > > not specify anything wrong.
> > > > > 
> > > > > Hi Jan,
> > > > > 
> > > > > I agree that it's better to silently do nothing rather than
> > > > > returning
> > > > > -ENOTSUPP in this case and the patch looks mostly fine.
> > > > > 
> > > > > However currently we return the adjusted minlen back to the
> > > > > user
> > > > > and it
> > > > > is also stated in the fstrim man page. I think it's worth
> > > > > keeping
> > > > > that
> > > > > behavior.
> > > > 
> > > > OK.
> > > > 
> > > > > When I think about it, it would probably be worth updating
> > > > > fstrim
> > > > > to
> > > > > notify the user that the minlen changed, I can send a patch for
> > > > > that.
> > > > 
> > > > I've added Martin to this conversation because he is of the
> > > > opinion
> > > > that
> > > > the filesystem actually should not increase the minlen based on
> > > > discard
> > > > granularity at all. It should leave this to the lower layers or
> > > > userspace.
> > > > Honestly I don't have strong opinion either way so I wanted to
> > > > throw
> > > > Martin's opinion into the mix as a possibility. Also maybe you
> > > > remember
> > > > whether the motivation of the original commit 5c2ed62fd447 was
> > > > motivated by
> > > > some real world experience or just theoretical concerns?
> > > > 
> > > >                                                                 H
> > > > onza
> > > 
> > > Thanks for notifying me, Jan. FTR, below's my argument, quoted from
> > > bugzilla:
> > > 
> > > Whether or not the discard *can* be carried out, and whether it
> > > makes
> > > sense to do so on any given device, is not the file system's
> > > business.
> > > It's up to the block layer and the device itself. And whether or
> > > not
> > > blocks *should* be discarded isn't the filesystem's business,
> > > either;
> > > it's the decision of user space (*).
> > > 
> > > I find it strange that file system code starts reasoning whether
> > > block
> > > IO commands are "likely" to succeed or fail. IMO the only valid
> > > reason
> > > for the filesystem to intervene would be if the result of the
> > > FITRIM
> > > would be a suboptimal block allocation, causing performance
> > > deterioration for future filesystem I/O. I don't see that that's
> > > the
> > > case here. 
> > 
> > It is not a matter of probability, or chance as you seem to imply.
> > The
> > discard granularity indicates the internal allocation unit and if the
> > discard request is smaller than that it will do nothing. So from my
> > POV
> > it's just an optimization to avoid sending such pointless requests.
> > 
> > Am I wrong in thinking that such discard requests are useless ?
> 
> Please have a look at __blkdev_issue_discard():
> https://elixir.bootlin.com/linux/latest/source/block/blk-lib.c#L26
> 
> Unless I'm misreading the code, it does very well pass REQ_OP_DISCARD
> bios to the device which don't meet the granularity or alignment
> requirements. It tries to align as well as possible, but the first and
> last bio submitted don't necessarily match the granularity.
> 
> For SCSI at least, unmap granularity is mainly a means for
> optimization: "An unmap request with a number of logical blocks that is
> not a multiple of this value may result in unmap operations on fewer
> LBAs than requested" (SBC5, §6.6.4). So, devices _may_ ignore such
> requests, but that's not necessarily the case.

My understanding always was that the request needs to be both properly
aligned and of a certain minimum size (discard_granularity) to take
effect.

If what you're saying is true then I feel like the documentation of
discard_granularity

Documentation/ABI/testing/sysfs-block
Documentation/block/queue-sysfs.rst

should change because that's not how I understand the notion of internal
allocation unit. However the sentence you quoted is not entirely clear
to me either so I'll leave that decision to someone who understands it
better than me.

Martin could you please clarify it for us ?

Adding Martin Petersen to cc and somehow we're off the list so add
fsdevel as well.

Thanks!
-Lukas

> 
> Another point is that only the block layer has full information about
> the alignment, which depends on partition start sectors. I believe the
> block layer should be the authority to decide whether or not the
> request is valid for the device.
> 
> Regards,
> Martin
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux