RE: Provision for filesystem specific open flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > > No.  If you want new flags bits, make a public proposal.  Maybe some 
> > > other filesystem would also benefit from them.
> > 
> > Ah, I see what you mean now, thanks.
> > 
> > I would like to propose O_CONCURRENT_WRITE as a new open flag.  It is 
> > currently used in the Panasas filesystem (panfs) and defined with value:
> > 
> > #define O_CONCURRENT_WRITE 020000000000
> > 
> > This flag has been provided by panfs to HPC users via the mpich 
> > package for well over a decade.  See:
> > 
> > https://github.com/pmodels/mpich/blob/master/src/mpi/romio/adio/ad_pan
> > fs/ad_panfs_open6.c#L344
> > 
> > O_CONCURRENT_WRITE indicates to the filesystem that the application 
> > doing the open is participating in a coordinated distributed manner 
> > with other such applications, possibly running on different hosts.  
> > This allows the panfs filesystem to delegate some of the cache 
> > coherency responsibilities to the application, improving performance.

> O_DIRECT already delegates responsibility for cache coherency to userspace
> applications and it allows for concurrent writes to a single file. Why do we
> need a new flag for this?

> > The reason this flag is used on open as opposed to having a post-open 
> > ioctl or fcntl SETFL is to allow panfs to catch and reject opens by 
> > applications that attempt to access files that have already been 
> > opened by applications that have set O_CONCURRENT_WRITE.

> Sounds kinda like how we already use O_EXCL on block devices.
> Perhaps something like:

> #define O_CONCURRENT_WRITE  (O_DIRECT | O_EXCL)

> To tell open to reject mixed mode access to the file on open?

> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

Thanks for this suggestion, but O_DIRECT has a significantly different meaning
to O_CONCURRENT_WRITE.  O_DIRECT forces the filesystem to not cache read or
write data, while O_CONCURRENT_WRITE allows caching and concurrent distributed
access.  I was not clear in my initial description of CONCURRENT_WRITE, so let
me add more details here.

When O_CONCURRENT_WRITE is used, portions of read and write data are still
cachable in the filesystem.  The filesystem continues to be responsible for
maintaining distributed coherency.  The user application is expected to provide
an access pattern that will allow the filesystem to cache data, thereby
improving performance.  If the application misbehaves, the filesystem will still
guarantee coherency but at a performance cost, as portions of the file will have
to be treated as non-cacheable.

In panfs, a well behaved CONCURRENT_WRITE application will consider the file's
layout on storage.  Access from different machines will not overlap within the
same RAID stripe so as not to cause distributed stripe lock contention.  Writes
to the file that are page aligned can be cached and the filesystem can aggregate
multiple such writes before writing out to storage.  Conversely, a
CONCURRENT_WRITE application that ends up colliding on the same stripe will see
worse performance.  Non page aligned writes are treated by panfs as
write-through and non-cachable, as the filesystem will have to assume that the
region of the page that is untouched by this machine might in fact be written to
on another machine.  Caching such a page and writing it out later might lead to
data corruption.

The benefit of CONCURRENT_WRITE is that unlike O_DIRECT, the application does
not have to implement any caching to see good performance.  The intricacies of
maintaining distributed coherency are left to the filesystem instead of to
the application developer.  Caching at the filesystem layer allows multiple
CONCURRENT_WRITE processes on the same machine to enjoy the performance benefits
of the page cache.

Think of this as a hybrid between exclusive access to a file, where the
filesystem can cache everything and a simplistic shared mode where the
filesystem caches nothing.

So we really do need a separate flag defined.  Thanks!





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux