Re: [PATCH 0/2] fuse: Rename DIRECT_IO_{RELAX -> ALLOW_MMAP}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 7, 2023 at 1:28 AM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 12/6/23 09:25, Amir Goldstein wrote:
> >>>> Is it actually important for FUSE_DIRECT_IO_ALLOW_MMAP fs
> >>>> (e.g. virtiofsd) to support FOPEN_PARALLEL_DIRECT_WRITES?
> >>>> I guess not otherwise, the combination would have been tested.
> >>>
> >>> I'm not sure how many people are aware of these different flags/features.
> >>> I had just finalized the backport of the related patches to RHEL8 on
> >>> Friday, as we (or our customers) need both for different jobs.
> >>>
> >>>>
> >>>> FOPEN_PARALLEL_DIRECT_WRITES is typically important for
> >>>> network fs and FUSE_DIRECT_IO_ALLOW_MMAP is typically not
> >>>> for network fs. Right?
> >>>
> >>> We kind of have these use cases for our network file systems
> >>>
> >>> FOPEN_PARALLEL_DIRECT_WRITES:
> >>>      - Traditional HPC, large files, parallel IO
> >>>      - Large file used on local node as container for many small files
> >>>
> >>> FUSE_DIRECT_IO_ALLOW_MMAP:
> >>>      - compilation through gcc (not so important, just not nice when it
> >>> does not work)
> >>>      - rather recent: python libraries using mmap _reads_. As it is read
> >>> only no issue of consistency.
> >>>
> >>>
> >>> These jobs do not intermix - no issue as in generic/095. If such
> >>> applications really exist, I have no issue with a serialization penalty.
> >>> Just disabling FOPEN_PARALLEL_DIRECT_WRITES because other
> >>> nodes/applications need FUSE_DIRECT_IO_ALLOW_MMAP is not so nice.
> >>>
> >>> Final goal is also to have FOPEN_PARALLEL_DIRECT_WRITES to work on plain
> >>> O_DIRECT and not only for FUSE_DIRECT_IO - I need to update this branch
> >>> and post the next version
> >>> https://github.com/bsbernd/linux/commits/fuse-dio-v4
> >>>
> >>>
> >>> In the mean time I have another idea how to solve
> >>> FOPEN_PARALLEL_DIRECT_WRITES + FUSE_DIRECT_IO_ALLOW_MMAP
> >>
> >> Please find attached what I had in my mind. With that generic/095 is not
> >> crashing for me anymore. I just finished the initial coding - it still
> >> needs a bit cleanup and maybe a few comments.
> >>
> >
> > Nice. I like the FUSE_I_CACHE_WRITES state.
> > For FUSE_PASSTHROUGH I will need to track if inode is open/mapped
> > in caching mode, so FUSE_I_CACHE_WRITES can be cleared on release
> > of the last open file of the inode.
> >
> > I did not understand some of the complexity here:
> >
> >>         /* The inode ever got page writes and we do not know for sure
> >>          * in the DIO path if these are pending - shared lock not possible */
> >>         spin_lock(&fi->lock);
> >>         if (!test_bit(FUSE_I_CACHE_WRITES, &fi->state)) {
> >>                 if (!(*cnt_increased)) {
> >
> > How can *cnt_increased be true here?
>
> I think you missed the 2nd entry into this function, when the shared
> lock was already taken?

Yeh, I did.

> I have changed the code now to have all
> complexity in this function (test, lock, retest with lock, release,
> wakeup). I hope that will make it easier to see the intention of the
> code. Will post the new patches in the morning.
>

Sounds good. Current version was a bit hard to follow.

>
> >
> >>                         fi->shared_lock_direct_io_ctr++;
> >>                         *cnt_increased = true;
> >>                 }
> >>                 excl_lock = false;
> >
> > Seems like in every outcome of this function
> > *cnt_increased = !excl_lock
> > so there is not need for out arg cnt_increased
>
> If excl_lock would be used as input - yeah, would have worked as well.
> Or a parameter like "retest-under-lock". Code is changed now to avoid
> going in and out.
>
> >
> >>         }
> >>         spin_unlock(&fi->lock);
> >>
> >> out:
> >>         if (excl_lock && *cnt_increased) {
> >>                 bool wake = false;
> >>                 spin_lock(&fi->lock);
> >>                 if (--fi->shared_lock_direct_io_ctr == 0)
> >>                         wake = true;
> >>                 spin_unlock(&fi->lock);
> >>                 if (wake)
> >>                         wake_up(&fi->direct_io_waitq);
> >>         }
> >
> > I don't see how this wake_up code is reachable.
> >
> > TBH, I don't fully understand the expected result.
> > Surely, the behavior of dio mixed with mmap is undefined. Right?
> > IIUC, your patch does not prevent dirtying page cache while dio is in
> > flight. It only prevents writeback while dio is in flight, which is the same
> > behavior as with exclusive inode lock. Right?
>
> Yeah, thanks. I will add it in the patch description.
>
> And there was actually an issue with the patch, as cache flushing needs
> to be initiated before doing the lock decision, fixed now.
>

I thought there was, because of the wait in fuse_send_writepage()
but wasn't sure if I was following the flow correctly.

> >
> > Maybe this interaction is spelled out somewhere else, but if not
> > better spell it out for people like me that are new to this code.
>
> Sure, thanks a lot for your helpful comments!
>

Just to be clear, this patch looks like a good improvement and
is mostly independent of the "inode caching mode" and
FOPEN_CACHE_MMAP idea that I suggested.

The only thing that my idea changes is replacing the
FUSE_I_CACHE_WRITES state with a FUSE_I_CACHE_IO_MODE
state, which is set earlier than FUSE_I_CACHE_WRITES
on caching file open or first direct_io mmap and unlike
FUSE_I_CACHE_WRITES, it is cleared on the last file close.

FUSE_I_CACHE_WRITES means that caching writes happened.
FUSE_I_CACHE_IO_MODE means the caching writes and reads
may happen.

FOPEN_PARALLEL_DIRECT_WRITES obviously shouldn't care
about "caching reads may happen", but IMO that is a small trade off
to make for maintaining the same state for
"do not allow parallel dio" and "do not allow passthrough open".

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux