On 2014/02/03, 5:09 PM, "Dilger, Andreas" <andreas.dilger@xxxxxxxxx> wrote: >On 2014/02/03, 4:07 PM, "Andreas Dilger" <andreas.dilger@xxxxxxxxx> wrote: >>In kernel 3.11 O_TMPFILE was introduced, but the open flag value >>conflicts with the O_LOV_DELAY_CREATE flag 020000000 previously used >>by Lustre-aware applications. O_LOV_DELAY_CREATE allows applications >>to defer file layout and object creation from open time (the default) >>until it can instead be specified by the application using an ioctl. >> >>Instead of trying to find a non-conflicting O_LOV_DELAY_CREATE flag >>or define a Lustre-specific flag that isn't of use to most/any other >>filesystems, use (O_NOCTTY|FASYNC) as the new value. These flag >>are not meaningful for newly-created regular files and should be >>OK since O_LOV_DELAY_CREATE is only meaningful for new files. >> >>I looked into using O_ACCMODE/FMODE_WRITE_IOCTL, which allows calling >>ioctl() on the minimally-opened fd and is close to what is needed, >>but that doesn't allow specifying the actual read or write mode for >>the file, and fcntl(F_SETFL) doesn't allow O_RDONLY/O_WRONLY/O_RDWR >>to be set after the file is opened. Al, Christoph, any comments on this approach? >A few extra comments here that I wasn't sure should be in the commit >comment. > >The main goal of the O_LOV_DELAY_CREATE flag is to allow the file to be >opened in a "preliminary" manner to allow the application to specify the >layout of the file across the Lustre storage targets (e.g. whether the >app has millions of separate files, each one written to a single server, >or there is a single huge file spread across all of the servers, or some >combination of the two, if it is RAID-0 or RAID-1, or whatever). > >I'm open to a separate flag value for this, but I don't know if that is >of much interest to other filesystems since they don't have similar needs. >We want to avoid the need to have lots of syscalls to do this, since they >translate into multiple RPCs that we want to avoid when creating >potentially >millions of files over the network. > >Cheers, Andreas > >>Lustre-change: http://review.whamcloud.com/8312 >>Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4209 >>Signed-off-by: Andreas Dilger <andreas.dilger@xxxxxxxxx> >>Signed-off-by: Oleg Drokin <oleg.drokin@xxxxxxxxx> >>Signed-off-by: Peng Tao <bergwolf@xxxxxxxxx> >>--- >> .../lustre/lustre/include/lustre/lustre_user.h | 12 ++++------ >> drivers/staging/lustre/lustre/include/lustre_mdc.h | 11 ++++++++++ >> drivers/staging/lustre/lustre/llite/file.c | 21 >>++++++++++--------- >> drivers/staging/lustre/lustre/mdc/mdc_lib.c | 2 +- >> 4 files changed, 28 insertions(+), 18 deletions(-) >> >>diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h >>b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h >>index 6b6c0240..91f22a8 100644 >>--- a/drivers/staging/lustre/lustre/include/lustre/lustre_user.h >>+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_user.h >>@@ -265,13 +265,11 @@ struct ost_id { >> >> #define MAX_OBD_NAME 128 /* If this changes, a NEW ioctl must be added >>*/ >> >>-/* Hopefully O_LOV_DELAY_CREATE does not conflict with standard O_xxx >>flags. >>- * Previously it was defined as 0100000000 and conflicts with >>FMODE_NONOTIFY >>- * which was added since kernel 2.6.36, so we redefine it as 020000000. >>- * To be compatible with old version's statically linked binary, finally >>we >>- * define it as (020000000 | 0100000000). >>- * */ >>-#define O_LOV_DELAY_CREATE 0120000000 >>+/* Define O_LOV_DELAY_CREATE to be a mask that is not useful for regular >>+ * files, but are unlikely to be used in practice and are not harmful if >>+ * used incorrectly. O_NOCTTY and FASYNC are only meaningful for >>character >>+ * devices and are safe for use on new files (See LU-812, LU-4209). */ >>+#define O_LOV_DELAY_CREATE (O_NOCTTY | FASYNC) >> >> #define LL_FILE_IGNORE_LOCK 0x00000001 >> #define LL_FILE_GROUP_LOCKED 0x00000002 >>diff --git a/drivers/staging/lustre/lustre/include/lustre_mdc.h >>b/drivers/staging/lustre/lustre/include/lustre_mdc.h >>index c1e0270..468f363 100644 >>--- a/drivers/staging/lustre/lustre/include/lustre_mdc.h >>+++ b/drivers/staging/lustre/lustre/include/lustre_mdc.h >>@@ -166,6 +166,17 @@ void it_clear_disposition(struct lookup_intent *it, >>int flag); >> void it_set_disposition(struct lookup_intent *it, int flag); >> int it_open_error(int phase, struct lookup_intent *it); >> >>+static inline bool cl_is_lov_delay_create(unsigned int flags) >>+{ >>+ return (flags & O_LOV_DELAY_CREATE) == O_LOV_DELAY_CREATE; >>+} >>+ >>+static inline void cl_lov_delay_create_clear(unsigned int *flags) >>+{ >>+ if ((*flags & O_LOV_DELAY_CREATE) == O_LOV_DELAY_CREATE) >>+ *flags &= ~O_LOV_DELAY_CREATE; >>+} >>+ >> /** @} mdc */ >> >> #endif >>diff --git a/drivers/staging/lustre/lustre/llite/file.c >>b/drivers/staging/lustre/lustre/llite/file.c >>index c12821a..dc9da77 100644 >>--- a/drivers/staging/lustre/lustre/llite/file.c >>+++ b/drivers/staging/lustre/lustre/llite/file.c >>@@ -671,14 +671,13 @@ restart: >> >> ll_capa_open(inode); >> >>- if (!lli->lli_has_smd) { >>- if (file->f_flags & O_LOV_DELAY_CREATE || >>- !(file->f_mode & FMODE_WRITE)) { >>- CDEBUG(D_INODE, "object creation was delayed\n"); >>- GOTO(out_och_free, rc); >>- } >>+ if (!lli->lli_has_smd && >>+ (cl_is_lov_delay_create(file->f_flags) || >>+ (file->f_mode & FMODE_WRITE) == 0)) { >>+ CDEBUG(D_INODE, "object creation was delayed\n"); >>+ GOTO(out_och_free, rc); >> } >>- file->f_flags &= ~O_LOV_DELAY_CREATE; >>+ cl_lov_delay_create_clear(&file->f_flags); >> GOTO(out_och_free, rc); >> >> out_och_free: >>@@ -1381,23 +1380,25 @@ int ll_lov_setstripe_ea_info(struct inode *inode, >>struct file *file, >> ccc_inode_lsm_put(inode, lsm); >> CDEBUG(D_IOCTL, "stripe already exists for ino %lu\n", >> inode->i_ino); >>- return -EEXIST; >>+ GOTO(out, rc = -EEXIST); >> } >> >> ll_inode_size_lock(inode); >> rc = ll_intent_file_open(file, lum, lum_size, &oit); >> if (rc) >>- GOTO(out, rc); >>+ GOTO(out_unlock, rc); >> rc = oit.d.lustre.it_status; >> if (rc < 0) >> GOTO(out_req_free, rc); >> >> ll_release_openhandle(file->f_dentry, &oit); >> >>- out: >>+out_unlock: >> ll_inode_size_unlock(inode); >> ll_intent_release(&oit); >> ccc_inode_lsm_put(inode, lsm); >>+out: >>+ cl_lov_delay_create_clear(&file->f_flags); >> return rc; >> out_req_free: >> ptlrpc_req_finished((struct ptlrpc_request *) oit.d.lustre.it_data); >>diff --git a/drivers/staging/lustre/lustre/mdc/mdc_lib.c >>b/drivers/staging/lustre/lustre/mdc/mdc_lib.c >>index 91f6876..5b9f371 100644 >>--- a/drivers/staging/lustre/lustre/mdc/mdc_lib.c >>+++ b/drivers/staging/lustre/lustre/mdc/mdc_lib.c >>@@ -197,7 +197,7 @@ static __u64 mds_pack_open_flags(__u64 flags, __u32 >>mode) >> if (flags & FMODE_EXEC) >> cr_flags |= MDS_FMODE_EXEC; >> #endif >>- if (flags & O_LOV_DELAY_CREATE) >>+ if (cl_is_lov_delay_create(flags)) >> cr_flags |= MDS_OPEN_DELAY_CREATE; >> >> if (flags & O_NONBLOCK) >>-- >>1.7.3.4 >> >> > > >Cheers, Andreas >-- >Andreas Dilger > >Lustre Software Architect >Intel High Performance Data Division > > > Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html