> On May 30, 2016, at 2:42 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@xxxxxxx> wrote: > > > > You have explicit order in set_pnfs_layoutdriver function. > Your patch looks like ot to me. I will try it with our > server and will let you know. > > Tigran. > > ----- Original Message ----- >> From: "Jeff Layton" <jlayton@xxxxxxxxxxxxxxx> >> To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> >> Cc: linux-nfs@xxxxxxxxxxxxxxx, "Tom Haynes" <thomas.haynes@xxxxxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxx> >> Sent: Monday, May 30, 2016 8:05:20 PM >> Subject: Re: [RFC PATCH] nfsd: allow nfsd to advertise multiple layout types > >> On Mon, 2016-05-30 at 19:04 +0200, Mkrtchyan, Tigran wrote: >>> >>> ----- Original Message ----- >>>> >>>> From: "Jeff Layton" <jlayton@xxxxxxxxxxxxxxx> >>>> To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@xxxxxxx> >>>> Cc: linux-nfs@xxxxxxxxxxxxxxx, "Tom Haynes" , "Christoph Hellwig" <hch@xxxxxx> >>>> Sent: Monday, May 30, 2016 6:51:38 PM >>>> Subject: Re: [RFC PATCH] nfsd: allow nfsd to advertise multiple layout types >>>> >>>> On Mon, 2016-05-30 at 17:47 +0200, Mkrtchyan, Tigran wrote: >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> >>>>>> From: "Jeff Layton" <jlayton@xxxxxxxxxxxxxxx> >>>>>> To: linux-nfs@xxxxxxxxxxxxxxx >>>>>> Cc: "Tom Haynes" <thomas.haynes@xxxxxxxxxxxxxxx>, "Christoph Hellwig" >>>>>> <hch@xxxxxx> >>>>>> Sent: Sunday, May 29, 2016 11:06:14 PM >>>>>> Subject: Re: [RFC PATCH] nfsd: allow nfsd to advertise multiple layout types >>>>>> >>>>>> On Sun, 2016-05-29 at 16:59 -0400, Jeff Layton wrote: >>>>>>> >>>>>>> If the underlying filesystem supports multiple layout types, then there >>>>>>> is little reason not to advertise that fact to clients and let them >>>>>>> choose what type to use. >>>>>>> >>>>>>> Turn the ex_layout_type field into a bitfield. For each supported >>>>>>> layout type, we set a bit in that field. When the client requests a >>>>>>> layout, ensure that the bit for that layout type is set. When the >>>>>>> client requests attributes, send back a list of supported types. >>>>>>> >>>>>>> Signed-off-by: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> >>>>>>> --- >>>>>>> fs/nfsd/export.c | 4 ++-- >>>>>>> fs/nfsd/export.h | 2 +- >>>>>>> fs/nfsd/nfs4layouts.c | 6 +++--- >>>>>>> fs/nfsd/nfs4proc.c | 4 ++-- >>>>>>> fs/nfsd/nfs4xdr.c | 30 ++++++++++++++---------------- >>>>>>> 5 files changed, 22 insertions(+), 24 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c >>>>>>> index b4d84b579f20..f97ba49d5e66 100644 >>>>>>> --- a/fs/nfsd/export.c >>>>>>> +++ b/fs/nfsd/export.c >>>>>>> @@ -706,7 +706,7 @@ static void svc_export_init(struct cache_head *cnew, struct >>>>>>> cache_head *citem) >>>>>>> new->ex_fslocs.locations = NULL; >>>>>>> new->ex_fslocs.locations_count = 0; >>>>>>> new->ex_fslocs.migrated = 0; >>>>>>> - new->ex_layout_type = 0; >>>>>>> + new->ex_layout_types = 0; >>>>>>> new->ex_uuid = NULL; >>>>>>> new->cd = item->cd; >>>>>>> } >>>>>>> @@ -731,7 +731,7 @@ static void export_update(struct cache_head *cnew, struct >>>>>>> cache_head *citem) >>>>>>> item->ex_fslocs.locations_count = 0; >>>>>>> new->ex_fslocs.migrated = item->ex_fslocs.migrated; >>>>>>> item->ex_fslocs.migrated = 0; >>>>>>> - new->ex_layout_type = item->ex_layout_type; >>>>>>> + new->ex_layout_types = item->ex_layout_types; >>>>>>> new->ex_nflavors = item->ex_nflavors; >>>>>>> for (i = 0; i < MAX_SECINFO_LIST; i++) { >>>>>>> new->ex_flavors[i] = item->ex_flavors[i]; >>>>>>> diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h >>>>>>> index 2e315072bf3f..730f15eeb7ed 100644 >>>>>>> --- a/fs/nfsd/export.h >>>>>>> +++ b/fs/nfsd/export.h >>>>>>> @@ -57,7 +57,7 @@ struct svc_export { >>>>>>> struct nfsd4_fs_locations ex_fslocs; >>>>>>> uint32_t ex_nflavors; >>>>>>> struct exp_flavor_info ex_flavors[MAX_SECINFO_LIST]; >>>>>>> - enum pnfs_layouttype ex_layout_type; >>>>>>> + u32 ex_layout_types; >>>>>>> struct nfsd4_deviceid_map *ex_devid_map; >>>>>>> struct cache_detail *cd; >>>>>>> }; >>>>>>> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c >>>>>>> index 6d98d16b3354..2be9602b0221 100644 >>>>>>> --- a/fs/nfsd/nfs4layouts.c >>>>>>> +++ b/fs/nfsd/nfs4layouts.c >>>>>>> @@ -139,21 +139,21 @@ void nfsd4_setup_layout_type(struct svc_export *exp) >>>>>>> * otherwise advertise the block layout. >>>>>>> */ >>>>>>> #ifdef CONFIG_NFSD_FLEXFILELAYOUT >>>>>>> - exp->ex_layout_type = LAYOUT_FLEX_FILES; >>>>>>> + exp->ex_layout_types |= 1 << LAYOUT_FLEX_FILES; >>>>>>> #endif >>>>>>> #ifdef CONFIG_NFSD_BLOCKLAYOUT >>>>>>> /* overwrite flex file layout selection if needed */ >>>>>>> if (sb->s_export_op->get_uuid && >>>>>>> sb->s_export_op->map_blocks && >>>>>>> sb->s_export_op->commit_blocks) >>>>>>> - exp->ex_layout_type = LAYOUT_BLOCK_VOLUME; >>>>>>> + exp->ex_layout_types |= 1 << LAYOUT_BLOCK_VOLUME; >>>>>>> #endif >>>>>>> #ifdef CONFIG_NFSD_SCSILAYOUT >>>>>>> /* overwrite block layout selection if needed */ >>>>>>> if (sb->s_export_op->map_blocks && >>>>>>> sb->s_export_op->commit_blocks && >>>>>>> sb->s_bdev && sb->s_bdev->bd_disk->fops->pr_ops) >>>>>>> - exp->ex_layout_type = LAYOUT_SCSI; >>>>>>> + exp->ex_layout_types |= 1 << LAYOUT_SCSI; >>>>>>> #endif >>>>>>> } >>>>>>> >>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >>>>>>> index 2ee2dc170327..719c620753e2 100644 >>>>>>> --- a/fs/nfsd/nfs4proc.c >>>>>>> +++ b/fs/nfsd/nfs4proc.c >>>>>>> @@ -1219,12 +1219,12 @@ nfsd4_verify(struct svc_rqst *rqstp, struct >>>>>>> nfsd4_compound_state *cstate, >>>>>>> static const struct nfsd4_layout_ops * >>>>>>> nfsd4_layout_verify(struct svc_export *exp, unsigned int layout_type) >>>>>>> { >>>>>>> - if (!exp->ex_layout_type) { >>>>>>> + if (!exp->ex_layout_types) { >>>>>>> dprintk("%s: export does not support pNFS\n", __func__); >>>>>>> return NULL; >>>>>>> } >>>>>>> >>>>>>> - if (exp->ex_layout_type != layout_type) { >>>>>>> + if (!(exp->ex_layout_types & (1 << layout_type))) { >>>>>>> dprintk("%s: layout type %d not supported\n", >>>>>>> __func__, layout_type); >>>>>>> return NULL; >>>>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >>>>>>> index 9df898ba648f..4d3ae75ad4c8 100644 >>>>>>> --- a/fs/nfsd/nfs4xdr.c >>>>>>> +++ b/fs/nfsd/nfs4xdr.c >>>>>>> @@ -2164,22 +2164,20 @@ nfsd4_encode_aclname(struct xdr_stream *xdr, struct >>>>>>> svc_rqst *rqstp, >>>>>>> } >>>>>>> >>>>>>> static inline __be32 >>>>>>> -nfsd4_encode_layout_type(struct xdr_stream *xdr, enum pnfs_layouttype >>>>>>> layout_type) >>>>>>> +nfsd4_encode_layout_types(struct xdr_stream *xdr, u32 layout_types) >>>>>>> { >>>>>>> - __be32 *p; >>>>>>> + __be32 *p; >>>>>>> + unsigned long i = hweight_long(layout_types); >>>>>>> >>>>>>> - if (layout_type) { >>>>>>> - p = xdr_reserve_space(xdr, 8); >>>>>>> - if (!p) >>>>>>> - return nfserr_resource; >>>>>>> - *p++ = cpu_to_be32(1); >>>>>>> - *p++ = cpu_to_be32(layout_type); >>>>>>> - } else { >>>>>>> - p = xdr_reserve_space(xdr, 4); >>>>>>> - if (!p) >>>>>>> - return nfserr_resource; >>>>>>> - *p++ = cpu_to_be32(0); >>>>>>> - } >>>>>>> + p = xdr_reserve_space(xdr, 4 + 4 * i); >>>>>>> + if (!p) >>>>>>> + return nfserr_resource; >>>>>>> + >>>>>>> + *p++ = cpu_to_be32(i); >>>>>>> + >>>>>>> + for (i = LAYOUT_NFSV4_1_FILES; i < LAYOUT_TYPE_MAX; ++i) >>>>>>> + if (layout_types & (1 << i)) >>>>>>> + *p++ = cpu_to_be32(i); >>>>>>> >>>>>>> return 0; >>>>>>> } >>>>>>> @@ -2754,13 +2752,13 @@ out_acl: >>>>>>> } >>>>>>> #ifdef CONFIG_NFSD_PNFS >>>>>>> if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) { >>>>>>> - status = nfsd4_encode_layout_type(xdr, exp->ex_layout_type); >>>>>>> + status = nfsd4_encode_layout_types(xdr, exp->ex_layout_types); >>>>>>> if (status) >>>>>>> goto out; >>>>>>> } >>>>>>> >>>>>>> if (bmval2 & FATTR4_WORD2_LAYOUT_TYPES) { >>>>>>> - status = nfsd4_encode_layout_type(xdr, exp->ex_layout_type); >>>>>>> + status = nfsd4_encode_layout_types(xdr, exp->ex_layout_types); >>>>>>> if (status) >>>>>>> goto out; >>>>>>> } >>>>>> >>>>>> Just a (lightly-tested) RFC patch for now. >>>>>> >>>>>> This seems to do the right thing WRT to GETATTR response that requests >>>>>> a layout types list. The current client code spews this into the ring >>>>>> buffer though: >>>>>> >>>>>> NFS: decode_first_pnfs_layout_type: Warning: Multiple pNFS layout drivers per >>>>>> filesystem not supported >>>>>> >>>>>> ...so that would need a little work. Wireshark also seems to not parse >>>>>> the layout types list correctly. >>>>> Hi Jeff, >>>>> >>>>> I had an attempt to add multi-layout support into the client: >>>>> >>>>> https://github.com/0day-ci/linux/commit/e2d792dc32b82ffc511b009c0a8db5313126888e >>>>> >>>>> I can't find it in the list archive. This explains why Trond never >>>>> respond to it. >>>>> >>>>> Tigran. >>>>> >>>> Thanks. I had already rolled one up that does a different approach of a >>>> hardcoded selection order in the client code and sent it as an RFC. >>>> >>>> So...yours assumes that the server will send the "default" one first in >>>> the list, but AFAICT, the spec doesn't say anything about the >>>> fs_layout_type list being ordered in any way. I don't think we can make >>>> that assumption. Am I wrong there? >>> Imagine a muti-layout server in a deployment, where some client are old (RHEL6) >>> and some clients are new. RHEL6 will always take only the first one and fail, if >>> it wasn't nfs4_file_layout. But, if we always return 'well-known' as a first >>> one, >>> then, then RHEL6 clients will be able to coexists with newer clients. >>> >>> Our server can provide flexfile layouts, but 99% of our clients are >>> RHEL6-clones. >>> This stops us from enable flexfile support, unless we add per-client based >>> control >>> of layout type. >>> >>> Tigran. >>> >> >> Ahh ok, I see what you mean... >> >> So yeah, I don't think the spec places any significance on the order of >> the list, but we some de-facto precedence due to legacy clients. Not >> much we can do about that. I guess all you can do for those clients is >> just ensure that the first entry is the one most likely to be usable by >> those clients. >> >> This patch just orders the list by numerical value of the layout type >> constants. If someone can offer up a sane argument for ordering the >> list in a different fashion, then I'm OK with doing that. >> >> -- >> Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > Jeff, this patch looks good to me. As for backward compatibility, we could do some complicated client support detection (implementation ID based or fallback if layout is rejected), but that’s probably going overboard for now. Maybe we should just hardcode this array instead of building it in the order of the numerical value of each layout type. We could add them in the order that the client supported them. I think that would fix Tigran’s problem. Maybe it’s already in this order as client support quickly followed layout type number allocation? Reviewed-by: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> -dros -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html