Re: [fuse-devel] 512 byte aligned write + O_DIRECT for xfstests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+CC fsdevel folks]

On Mon, Jun 22, 2020 at 8:33 AM Nikolaus Rath <Nikolaus@xxxxxxxx> wrote:
>
> On Jun 21 2020, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> >> I am not sure that is correct. At step 6, the write() request from
> >> userspace is still being processed. I don't think that it is reasonable
> >> to expect that the write() request is atomic, i.e. you can't expect to
> >> see none or all of the data that is *currently being written*.
> >
> > Apparently the standard is quite clear on this:
> >
> >   "All of the following functions shall be atomic with respect to each
> > other in the effects specified in POSIX.1-2017 when they operate on
> > regular files or symbolic links:
> >
> > [...]
> > pread()
> > read()
> > readv()
> > pwrite()
> > write()
> > writev()
> > [...]
> >
> > If two threads each call one of these functions, each call shall
> > either see all of the specified effects of the other call, or none of
> > them."[1]
> >
> > Thanks,
> > Miklos
> >
> > [1]
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07
>
> Thanks for digging this up, I did not know about this.
>
> That leaves FUSE in a rather uncomfortable place though, doesn't it?
> What does the kernel do when userspace issues a write request that's
> bigger than FUSE userspace pipe? It sounds like either the request must
> be splitted (so it becomes non-atomic), or you'd have to return a short
> write (which IIRC is not supposed to happen for local filesystems).
>

What makes you say that short writes are not supposed to happen?
and what is the definition of "local filesystem" in that claim?

FYI, a similar discussion is also happening about XFS "atomic rw" behavior [1].

Seems like the options for FUSE are:
- Take shared i_rwsem lock on read like XFS and regress performance of
  mixed rw workload
- Do the above only for non-direct and writeback_cache to minimize the
  damage potential
- Return short read/write for direct IO if request is bigger that FUSE
buffer size
- Add a FUSE mode that implements direct IO internally as something like
  RWF_UNCACHED [2] - this is a relaxed version of "no caching" in client or
  a stricter version of "cache write-through"  in the sense that
during an ongoing
  large write operation, read of those fresh written bytes only is served
  from the client cache copy and not from the server.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20200622010234.GD2040@xxxxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-fsdevel/20191217143948.26380-1-axboe@xxxxxxxxx/



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux