Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 6 Jan 2025 11:46:39 -0800

On Tue, Dec 24, 2024 at 11:17:08AM +0530, Sai Chaitanya Mitta wrote:
> Hi Darrick,
>             Thanks for the quick response, we are exposing XFS file (created
> through fallocate -l <size> <path>) as block device through
> SPDK bdev (https://github.com/spdk/spdk) over NVMe-oF, Now initiator will
> connect to the target and provide a block device to database applications.
> What I have observed is databases' applications are issuing flush IO post
> each/couple of writes, this flush at backend at backend translates to
> fsync (through aio/io_uring) operation on FD (which is time taking process),
> if we are doing no-op for flush IO then performance is 5x better compared to
> serving flush operation. Doing no-op for flush and if system shutdown abruptly
> then we are observing data loss (since metadata for new extents are not yet
> persistent) to overcome this data loss issue and having better performance
> below are the steps used:
> 1. Created file through fallocate using FALLOC_FL_ZERO_RANGE option
> 2. Explicitly zeroed file as mentioned in code (this marks all extents as
>    written and there are no metadata changes related to data [what I observed],
>    but there are atime and mtime updates of file).
> 3. Expose zeroed file to user as block device (as mentioned above).
> 
> Using above approach if system shutdown abruptly then I am not able
> to reproduce data loss issue. So, planning to use above method to ensure
> both data integrity and better performance

That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets
merged into the kernel, if anything perturbs the file mapping (e.g.
background backup process reflinks the file) then you immediately become
vulnerable to these crash integrity problems without notice.

(Unless you're actually getting leases on the file ranges and reacting
appropriately when the leases break...)

--D

> On Tue, Dec 24, 2024 at 3:23 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> >
> > On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote:
> > > Hi Team,
> > >            Is there any method/tool available to explicitly mark XFS
> > > file extents as written? One approach I
> > > am aware is explicitly zeroing the entire file (this file may be even
> > > in hundreds of GB in size) through
> > > synchronous/asynchronous(aio/io_uring) mechanism but it is time taking
> > > process for large files,
> > > is there any optimization/approach we can do to explicitly zeroing
> > > file/mark extents as written?
> >
> > Why do you need to mark them written?
> >
> > --D
> >
> > >
> > > Synchronous Approach:
> > >                     while offset < size {
> > >                         let bytes_written = img_file
> > >                             .write_at(&buf, offset)
> > >                             .map_err(|e| {
> > >                                 error!("Failed to zero out file: {}
> > > error: {:?}", vol_name, e);
> > >                             })?;
> > >                         if offset == size {
> > >                             break;
> > >                         }
> > >                         offset = offset + bytes_written as u64;
> > >                     }
> > >                     img_file.sync_all();
> > >
> > > Asynchronous approach:
> > >                    Currently used fio with libaio as ioengine but
> > > results are almost same.
> > >
> > > --
> > > Thanks& Regards,
> > > M.Sai Chaithanya.
> > >
> 
> 
> 
> -- 
> Thanks& Regards,
> M.Sai Chaithanya.
>