On Sun, Oct 27, 2024 at 7:28 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > > > > On 10/26/24 2:47 AM, Joanne Koong wrote: > > On Fri, Oct 25, 2024 at 10:36 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > >> > >> On Thu, Oct 24, 2024 at 6:38 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > >>> > >>> > >>> > >>> On 10/25/24 12:54 AM, Joanne Koong wrote: > >>>> On Mon, Oct 21, 2024 at 2:05 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > >>>>> > >>>>> On Mon, Oct 21, 2024 at 3:15 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> On Fri, 18 Oct 2024 at 07:31, Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > >>>>>> > >>>>>>> I feel like this is too much restrictive and I am still not sure why > >>>>>>> blocking on fuse folios served by non-privileges fuse server is worse > >>>>>>> than blocking on folios served from the network. > >>>>>> > >>>>>> Might be. But historically fuse had this behavior and I'd be very > >>>>>> reluctant to change that unconditionally. > >>>>>> > >>>>>> With a systemwide maximal timeout for fuse requests it might make > >>>>>> sense to allow sync(2), etc. to wait for fuse writeback. > >>>>>> > >>>>>> Without a timeout allowing fuse servers to block sync(2) indefinitely > >>>>>> seems rather risky. > >>>>> > >>>>> Could we skip waiting on writeback in sync(2) if it's a fuse folio? > >>>>> That seems in line with the sync(2) documentation Jingbo referenced > >>>>> earlier where it states "The writing, although scheduled, is not > >>>>> necessarily complete upon return from sync()." > >>>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sync.html > >>>>> > >>>> > >>>> So I think the answer to this is "no" for Linux. What the Linux man > >>>> page for sync(2) says: > >>>> > >>>> "According to the standard specification (e.g., POSIX.1-2001), sync() > >>>> schedules the writes, but may return before the actual writing is > >>>> done. However Linux waits for I/O completions, and thus sync() or > >>>> syncfs() provide the same guarantees as fsync() called on every file > >>>> in the system or filesystem respectively." [1] > >>> > >>> Actually as for FUSE, IIUC the writeback is not guaranteed to be > >>> completed when sync(2) returns since the temp page mechanism. When > >>> sync(2) returns, PG_writeback is indeed cleared for all original pages > >>> (in the address_space), while the real writeback work (initiated from > >>> temp page) may be still in progress. > >>> > >> > >> That's a great point. It seems like we can just skip waiting on > >> writeback to finish for fuse folios in sync(2) altogether then. I'll > >> look into what's the best way to do this. > > > > I think the most straightforward way to do this for sync(2) is to add > > the mapping check inside sync_bdevs(). With something like: > > > > diff --git a/block/bdev.c b/block/bdev.c > > index 738e3c8457e7..bcb2b6d3db94 100644 > > --- a/block/bdev.c > > +++ b/block/bdev.c > > @@ -1247,7 +1247,7 @@ void sync_bdevs(bool wait) > > mutex_lock(&bdev->bd_disk->open_mutex); > > if (!atomic_read(&bdev->bd_openers)) { > > ; /* skip */ > > - } else if (wait) { > > + } else if (wait && > > !mapping_no_writeback_wait(inode->i_mapping)) { > > /* > > * We keep the error status of individual mapping so > > * that applications can catch the writeback error using > > > > > > I'm afraid we are waiting in wait_sb_inodes (ksys_sync -> sync_inodes_sb > -> wait_sb_inodes) rather than sync_bdevs. sync_bdevs() is used to > writeback and sync the metadata residing on the block device directly > such as the superblock. It is sync_inodes_one_sb() that actually > writeback inodes. > Great point, thanks for the info! > > -- > Thanks, > Jingbo