Re: [PATCH 0/2] fuse: Rename DIRECT_IO_{RELAX -> ALLOW_MMAP}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/6/23 09:25, Amir Goldstein wrote:
Is it actually important for FUSE_DIRECT_IO_ALLOW_MMAP fs
(e.g. virtiofsd) to support FOPEN_PARALLEL_DIRECT_WRITES?
I guess not otherwise, the combination would have been tested.

I'm not sure how many people are aware of these different flags/features.
I had just finalized the backport of the related patches to RHEL8 on
Friday, as we (or our customers) need both for different jobs.


FOPEN_PARALLEL_DIRECT_WRITES is typically important for
network fs and FUSE_DIRECT_IO_ALLOW_MMAP is typically not
for network fs. Right?

We kind of have these use cases for our network file systems

FOPEN_PARALLEL_DIRECT_WRITES:
     - Traditional HPC, large files, parallel IO
     - Large file used on local node as container for many small files

FUSE_DIRECT_IO_ALLOW_MMAP:
     - compilation through gcc (not so important, just not nice when it
does not work)
     - rather recent: python libraries using mmap _reads_. As it is read
only no issue of consistency.


These jobs do not intermix - no issue as in generic/095. If such
applications really exist, I have no issue with a serialization penalty.
Just disabling FOPEN_PARALLEL_DIRECT_WRITES because other
nodes/applications need FUSE_DIRECT_IO_ALLOW_MMAP is not so nice.

Final goal is also to have FOPEN_PARALLEL_DIRECT_WRITES to work on plain
O_DIRECT and not only for FUSE_DIRECT_IO - I need to update this branch
and post the next version
https://github.com/bsbernd/linux/commits/fuse-dio-v4


In the mean time I have another idea how to solve
FOPEN_PARALLEL_DIRECT_WRITES + FUSE_DIRECT_IO_ALLOW_MMAP

Please find attached what I had in my mind. With that generic/095 is not
crashing for me anymore. I just finished the initial coding - it still
needs a bit cleanup and maybe a few comments.


Nice. I like the FUSE_I_CACHE_WRITES state.
For FUSE_PASSTHROUGH I will need to track if inode is open/mapped
in caching mode, so FUSE_I_CACHE_WRITES can be cleared on release
of the last open file of the inode.

I did not understand some of the complexity here:

        /* The inode ever got page writes and we do not know for sure
         * in the DIO path if these are pending - shared lock not possible */
        spin_lock(&fi->lock);
        if (!test_bit(FUSE_I_CACHE_WRITES, &fi->state)) {
                if (!(*cnt_increased)) {

How can *cnt_increased be true here?

I think you missed the 2nd entry into this function, when the shared lock was already taken? I have changed the code now to have all complexity in this function (test, lock, retest with lock, release, wakeup). I hope that will make it easier to see the intention of the code. Will post the new patches in the morning.



                        fi->shared_lock_direct_io_ctr++;
                        *cnt_increased = true;
                }
                excl_lock = false;

Seems like in every outcome of this function
*cnt_increased = !excl_lock
so there is not need for out arg cnt_increased

If excl_lock would be used as input - yeah, would have worked as well. Or a parameter like "retest-under-lock". Code is changed now to avoid going in and out.


        }
        spin_unlock(&fi->lock);

out:
        if (excl_lock && *cnt_increased) {
                bool wake = false;
                spin_lock(&fi->lock);
                if (--fi->shared_lock_direct_io_ctr == 0)
                        wake = true;
                spin_unlock(&fi->lock);
                if (wake)
                        wake_up(&fi->direct_io_waitq);
        }

I don't see how this wake_up code is reachable.

TBH, I don't fully understand the expected result.
Surely, the behavior of dio mixed with mmap is undefined. Right?
IIUC, your patch does not prevent dirtying page cache while dio is in
flight. It only prevents writeback while dio is in flight, which is the same
behavior as with exclusive inode lock. Right?

Yeah, thanks. I will add it in the patch description.

And there was actually an issue with the patch, as cache flushing needs to be initiated before doing the lock decision, fixed now.


Maybe this interaction is spelled out somewhere else, but if not
better spell it out for people like me that are new to this code.

Sure, thanks a lot for your helpful comments!



Thanks,
Bernd




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux