Re: Questions about filesystems from SQLite author presentation

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Tue, 7 Jan 2020 09:18:46 -0800

On Tue, Jan 07, 2020 at 09:55:06AM +0100, Jan Kara wrote:
> On Tue 07-01-20 08:40:00, Sitsofe Wheeler wrote:
> > On Mon, 6 Jan 2020 at 10:16, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > > today) and even if you wanted to use something like TRIM it wouldn't
> > > > be worth it unless you were trimming a large (gigabytes) amount of
> > > > data (https://youtu.be/-oP2BOsMpdo?t=6330 ).
> > >
> > > Punch the space out, then run a periodic background fstrim so the
> > > filesystem can issue efficient TRIM commands over free space...
> > 
> > Jan mentions this over on https://youtu.be/-oP2BOsMpdo?t=6268 .
> > Basically he advises against hole punching if you're going to write to
> > that area again because it fragments the file, hurts future
> > performance etc. But I guess if you were using FALLOC_FL_ZERO_RANGE no
> > hole is punched (so no fragmentation) and you likely get faster reads
> > of that area until the data is rewritten too.
> 
> Yes, no fragmentation in this case (well, there's still the fact that
> the extent tree needs to record that a particular range is marked as
> unwritten so that will get fragmented but it is merged again as soon as the
> range is written).
> 
> > Are areas that have had
> > FALLOC_FL_ZERO_RANGE run on them eligible for trimming if someone goes
> > on to do a background trim (Jan - doesn't this sound like the best of
> > both both worlds)?
> 
> No, these areas are still allocated for the file and thus background trim
> will not touch them. Concievably, we could use trim for such areas but
> technically this is going to be too expensive to discover them (you'd need
> to read all the inodes and their extent trees to discover them) at least
> for ext4 and I belive for xfs as well.
> 
> > My question is what happens if you call FALLOC_FL_ZERO_RANGE and your
> > filesystem is too dumb to mark extents unwritten - will it literally
> > go away and write a bunch of zeros over that region and your disk is a
> > slow HDD or will that call just fail? It's almost like you need
> > something that can tell you if FALLOC_FL_ZERO_RANGE is efficient...
> 
> It is upto the filesystem how it implements the operation but so far we
> managed to maintain a situation that FALLOC_FL_ZERO_RANGE returns error if
> it is not efficient.

The manpage says "...the specified range will not be physically zeroed
out on the device (except for partial blocks at the either end of the
range), and I/O is (otherwise) required only to update metadata."

I think that should be sufficient to hold the fs authors to
"FALLOC_FL_ZERO_RANGE must be efficient".

Though I've also wondered if that means fs is free to call
blkdev_issue_zeroout with NOFALLBACK in lieu of using unwritten extents?

--D

> 
> 								Honza
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR