Re: [PATCH v2 2/2] fuse: remove tmp folio for writebacks and internal rb tree

Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> · Mon, 28 Oct 2024 10:28:07 +0800



On 10/26/24 2:47 AM, Joanne Koong wrote:
> On Fri, Oct 25, 2024 at 10:36 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>>
>> On Thu, Oct 24, 2024 at 6:38 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 10/25/24 12:54 AM, Joanne Koong wrote:
>>>> On Mon, Oct 21, 2024 at 2:05 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>>>>>
>>>>> On Mon, Oct 21, 2024 at 3:15 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On Fri, 18 Oct 2024 at 07:31, Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> I feel like this is too much restrictive and I am still not sure why
>>>>>>> blocking on fuse folios served by non-privileges fuse server is worse
>>>>>>> than blocking on folios served from the network.
>>>>>>
>>>>>> Might be.  But historically fuse had this behavior and I'd be very
>>>>>> reluctant to change that unconditionally.
>>>>>>
>>>>>> With a systemwide maximal timeout for fuse requests it might make
>>>>>> sense to allow sync(2), etc. to wait for fuse writeback.
>>>>>>
>>>>>> Without a timeout allowing fuse servers to block sync(2) indefinitely
>>>>>> seems rather risky.
>>>>>
>>>>> Could we skip waiting on writeback in sync(2) if it's a fuse folio?
>>>>> That seems in line with the sync(2) documentation Jingbo referenced
>>>>> earlier where it states "The writing, although scheduled, is not
>>>>> necessarily complete upon return from sync()."
>>>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sync.html
>>>>>
>>>>
>>>> So I think the answer to this is "no" for Linux. What the Linux man
>>>> page for sync(2) says:
>>>>
>>>> "According to the standard specification (e.g., POSIX.1-2001), sync()
>>>> schedules the writes, but may return before the actual writing is
>>>> done.  However Linux waits for I/O completions, and thus sync() or
>>>> syncfs() provide the same guarantees as fsync() called on every file
>>>> in the system or filesystem respectively." [1]
>>>
>>> Actually as for FUSE, IIUC the writeback is not guaranteed to be
>>> completed when sync(2) returns since the temp page mechanism.  When
>>> sync(2) returns, PG_writeback is indeed cleared for all original pages
>>> (in the address_space), while the real writeback work (initiated from
>>> temp page) may be still in progress.
>>>
>>
>> That's a great point. It seems like we can just skip waiting on
>> writeback to finish for fuse folios in sync(2) altogether then. I'll
>> look into what's the best way to do this.
> 
> I think the most straightforward way to do this for sync(2) is to add
> the mapping check inside sync_bdevs(). With something like:
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 738e3c8457e7..bcb2b6d3db94 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -1247,7 +1247,7 @@ void sync_bdevs(bool wait)
>                 mutex_lock(&bdev->bd_disk->open_mutex);
>                 if (!atomic_read(&bdev->bd_openers)) {
>                         ; /* skip */
> -               } else if (wait) {
> +               } else if (wait &&
> !mapping_no_writeback_wait(inode->i_mapping)) {
>                         /*
>                          * We keep the error status of individual mapping so
>                          * that applications can catch the writeback error using
> 
> 

I'm afraid we are waiting in wait_sb_inodes (ksys_sync -> sync_inodes_sb
-> wait_sb_inodes) rather than sync_bdevs.  sync_bdevs() is used to
writeback and sync the metadata residing on the block device directly
such as the superblock.  It is sync_inodes_one_sb() that actually
writeback inodes.


-- 
Thanks,
Jingbo