On 2014/02/11, 2:13 AM, "Christoph Hellwig" <hch@xxxxxxxxxxxxx> wrote: >On Mon, Feb 10, 2014 at 09:29:29PM +0000, Al Viro wrote: >> I can live with that; it's a kludge, but it's less broken than that >> explicit constant - that one is a non-starter, since O_... flag >> values are arch-dependent. > >Grabbing their own O_FLAG is of course not acceptable at all. >Personally I don't think this version is acceptable for real mainline >either. What exactly are the semantics of the flag? Why don't you do >object allocation on demand like all delalloc filesystems by default? This was described in the original patch and follow-on email, but I'll repeat it here, and expand the detail a bit further: In kernel 3.11 O_TMPFILE was introduced, but the open flag value conflicts with the O_LOV_DELAY_CREATE flag 020000000 previously used by Lustre-aware applications. O_LOV_DELAY_CREATE allows applications to defer file layout and object creation from open time (the default) until it can instead be specified by the application using an ioctl. The main goal of the O_LOV_DELAY_CREATE flag is to allow the file to be opened in a "preliminary" manner to allow the application to specify the layout of the file across the Lustre storage targets (e.g. whether the app has millions of separate files each one written to a single server, or there is a single huge file spread across all of the servers, or some combination of the two, if it is RAID-0 or RAID-1, or whatever). FYI, an "object" in Lustre is not a fixed-size chunk of space like Ceph or HDFS that needs to be continuously allocated as a file grows, but rather a variable-sized inode-without-a-name that is written at arbitrary byte offsets and can be sparse, so there is no need for the client and metadata server to communicate after the initial file layout has been decided. The Lustre object(s) are normally allocated by the metadata server at open time to avoid RPC round-trips and lock contention for files opened by large numbers of nodes at once. The layout is normally specified by the filesystem default, or on the parent directory, but some applications need fine-grained control over the layout to optimize for a particular filesystem configuration. Instead of trying to find a non-conflicting O_LOV_DELAY_CREATE flag or define a Lustre-specific flag that isn't of use to most/any other filesystems, use (O_NOCTTY|FASYNC) as the new value. These flag are not meaningful for newly-created regular files and should be OK since O_LOV_DELAY_CREATE is only meaningful for new files. I looked into using O_ACCMODE/FMODE_WRITE_IOCTL, which allows calling ioctl() on the minimally-opened fd and is close to what is needed, but that doesn't allow specifying the actual read or write mode for the file, and fcntl(F_SETFL) doesn't allow O_RDONLY/O_WRONLY/O_RDWR to be set after the file is opened. We want to avoid the need to have lots of syscalls to do this, since they translate into extra RPCs that we want to avoid when creating potentially millions of files over the network. Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html