On 2010-09-20 18:21, Fred Isaman wrote: > On Mon, Sep 20, 2010 at 10:24 AM, Benny Halevy <bhalevy@xxxxxxxxxxx> wrote: >> On 2010-09-18 05:17, Fred Isaman wrote: >>> From: The pNFS Team <linux-nfs@xxxxxxxxxxxxxxx> >>> >>> Allow a module implementing a layout type to register, and >>> have its mount/umount routines called for filesystems that >>> the server declares support it. >>> >>> Signed-off-by: TBD - melding/reorganization of several patches >>> --- >>> Documentation/filesystems/nfs/00-INDEX | 2 + >>> Documentation/filesystems/nfs/pnfs.txt | 48 ++++++++++++++++++ >>> fs/nfs/Kconfig | 2 +- >>> fs/nfs/pnfs.c | 82 +++++++++++++++++++++++++++++++- >>> fs/nfs/pnfs.h | 9 ++++ >>> 5 files changed, 140 insertions(+), 3 deletions(-) >>> create mode 100644 Documentation/filesystems/nfs/pnfs.txt >>> >>> diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX >>> index 2f68cd6..8d930b9 100644 >>> --- a/Documentation/filesystems/nfs/00-INDEX >>> +++ b/Documentation/filesystems/nfs/00-INDEX >>> @@ -12,5 +12,7 @@ nfs-rdma.txt >>> - how to install and setup the Linux NFS/RDMA client and server software >>> nfsroot.txt >>> - short guide on setting up a diskless box with NFS root filesystem. >>> +pnfs.txt >>> + - short explanation of some of the internals of the pnfs code >> >> that is, pnfs _client_ code... >> > > OK > > >>> rpc-cache.txt >>> - introduction to the caching mechanisms in the sunrpc layer. >>> diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt >>> new file mode 100644 >>> index 0000000..bc0b9cf >>> --- /dev/null >>> +++ b/Documentation/filesystems/nfs/pnfs.txt >>> @@ -0,0 +1,48 @@ >>> +Reference counting in pnfs: >>> +========================== >>> + >>> +The are several inter-related caches. We have layouts which can >>> +reference multiple devices, each of which can reference multiple data servers. >>> +Each data server can be referenced by multiple devices. Each device >>> +can be referenced by multiple layouts. To keep all of this straight, >>> +we need to reference count. >>> + >>> + >>> +struct pnfs_layout_hdr >>> +---------------------- >>> +The on-the-wire command LAYOUTGET corresponds to struct >>> +pnfs_layout_segment, usually referred to by the variable name lseg. >>> +Each nfs_inode may hold a pointer to a cache of of these layout >>> +segments in nfsi->layout, of type struct pnfs_layout_hdr. >>> + >>> +We reference the header for the inode pointing to it, across each >>> +outstanding RPC call that references it (LAYOUTGET, fs/nfs/LAYOUTRETURN, >>> +LAYOUTCOMMIT), and for each lseg held within. >>> + >>> +Each header is also (when non-empty) put on a list associated with >>> +struct nfs_client (cl_layouts). Being put on this list does not bump >>> +the reference count, as the layout is kept around by the lseg that >>> +keeps it in the list. >>> + >>> +deviceid_cache >>> +-------------- >>> +lsegs reference device ids, which are resolved per nfs_client and >>> +layout driver type. The device ids are held in a RCU cache (struct >>> +nfs4_deviceid_cache). The cache itself is referenced across each >>> +mount. The entries (struct nfs4_deviceid) themselves are held across >>> +the lifetime of each lseg referencing them. >>> + >>> +RCU is used because the deviceid is basically a write once, read many >>> +data structure. The hlist size of 32 buckets needs better >>> +justification, but seems reasonable given that we can have multiple >>> +deviceid's per filesystem, and multiple filesystems per nfs_client. >>> + >>> +The hash code is copied from the nfsd code base. A discussion of >>> +hashing and variations of this algorithm can be found at: >>> +http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 >>> + >>> +data server cache >>> +----------------- >>> +file driver devices refer to data servers, which are kept in a module >>> +level cache. Its reference is held over the lifetime of the deviceid >>> +pointing to it. >>> diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig >>> index 6c2aad4..5f1b936 100644 >>> --- a/fs/nfs/Kconfig >>> +++ b/fs/nfs/Kconfig >>> @@ -78,7 +78,7 @@ config NFS_V4_1 >>> depends on NFS_V4 && EXPERIMENTAL >>> help >>> This option enables support for minor version 1 of the NFSv4 protocol >>> - (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client. >>> + (RFC 5661) in the kernel's NFS client. >>> >>> If unsure, say N. >>> >>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c >>> index 2e5dba1..5a8a676 100644 >>> --- a/fs/nfs/pnfs.c >>> +++ b/fs/nfs/pnfs.c >>> @@ -32,16 +32,53 @@ >>> >>> #define NFSDBG_FACILITY NFSDBG_PNFS >>> >>> -/* STUB that returns the equivalent of "no module found" */ >>> +/* Locking: >>> + * >>> + * pnfs_spinlock: >>> + * protects pnfs_modules_tbl. >>> + */ >>> +static DEFINE_SPINLOCK(pnfs_spinlock); >>> + >>> +/* >>> + * pnfs_modules_tbl holds all pnfs modules >>> + */ >>> +static LIST_HEAD(pnfs_modules_tbl); >>> + >>> +/* Return the registered pnfs layout driver module matching given id */ >>> +static struct pnfs_layoutdriver_type * >>> +find_pnfs_driver_locked(u32 id) { >> >> nit: the curly brace should be moved down a line > > OK > > >> >>> + struct pnfs_layoutdriver_type *local; >>> + >>> + dprintk("PNFS: %s: Searching for %u\n", __func__, id); >> >> I'd move this printk down, before returning and print >> the result as well. >> > > OK > > >>> + list_for_each_entry(local, &pnfs_modules_tbl, pnfs_tblid) >>> + if (local->id == id) { >>> + if (!try_module_get(local->owner)) >>> + local = NULL; >> >> If this happens (e.g. in the case the module exited without >> calling pnfs_unregister_layoutdriver) another (or a different instance) >> layout driver might have registered on the same id so we need to keep >> looking. > > There can only be a single driver registered with a particular id. So > continuing the search is pointless. > > After this conditions happens, find_pnfs_driver_locked will not find the current one so next time pnfs_register_layoutdriver is called, it may register a new layout driver with the same id and then there will be two entries with the same id. One, for which try_module_get returns zero and another one which you will never find if you don't keep looking for it. Benny >> >>> + goto out; >>> + } >>> + local = NULL; >>> +out: >>> + return local; >>> +} >> >> how about the following? >> >> static struct pnfs_layoutdriver_type * >> find_pnfs_driver_locked(u32 id) >> { >> struct pnfs_layoutdriver_type *local, *found = NULL; >> >> list_for_each_entry (local, &pnfs_modules_tbl, pnfs_tblid) >> if (local->id == id) { >> if (!try_module_get(local->owner)) >> continue; >> found = local; >> break; >> } >> out: >> dprintk("%s: layout_type %u found %p\n", __func__, id, found); >> return found; >> } >> >>> + >>> static struct pnfs_layoutdriver_type * >>> find_pnfs_driver(u32 id) { >>> - return NULL; >>> + struct pnfs_layoutdriver_type *local; >>> + >>> + spin_lock(&pnfs_spinlock); >>> + local = find_pnfs_driver_locked(id); >>> + spin_unlock(&pnfs_spinlock); >>> + return local; >>> } >>> >>> /* Unitialize a mountpoint in a layout driver */ >>> void >>> unset_pnfs_layoutdriver(struct nfs_server *nfss) >>> { >>> + if (nfss->pnfs_curr_ld) { >>> + nfss->pnfs_curr_ld->uninitialize_mountpoint(nfss); >>> + module_put(nfss->pnfs_curr_ld->owner); >>> + } >>> nfss->pnfs_curr_ld = NULL; >>> } >>> >>> @@ -68,6 +105,13 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id) >>> goto out_no_driver; >>> } >>> } >>> + if (ld_type->initialize_mountpoint(server)) { >>> + printk(KERN_ERR >>> + "%s: Error initializing mount point for layout driver %u.\n", >>> + __func__, id); >>> + module_put(ld_type->owner); >>> + goto out_no_driver; >>> + } >>> server->pnfs_curr_ld = ld_type; >>> dprintk("%s: pNFS module for %u set\n", __func__, id); >>> return; >>> @@ -76,3 +120,37 @@ out_no_driver: >>> dprintk("%s: Using NFSv4 I/O\n", __func__); >>> server->pnfs_curr_ld = NULL; >>> } >>> + >>> +int >>> +pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type) >>> +{ >>> + int status = -EINVAL; >>> + struct pnfs_layoutdriver_type *tmp; >>> + >> >> Since we're relying on the fact the ld_type->id != 0 >> let's add >> BUG_ON(ld_type->id == 0); >> >> Benny > > OK. > > Fred > >> >>> + spin_lock(&pnfs_spinlock); >>> + tmp = find_pnfs_driver_locked(ld_type->id); >>> + if (!tmp) { >>> + list_add(&ld_type->pnfs_tblid, &pnfs_modules_tbl); >>> + status = 0; >>> + dprintk("%s Registering id:%u name:%s\n", __func__, ld_type->id, >>> + ld_type->name); >>> + } else { >>> + module_put(tmp->owner); >>> + printk(KERN_ERR "%s Module with id %d already loaded!\n", >>> + __func__, ld_type->id); >>> + } >>> + spin_unlock(&pnfs_spinlock); >>> + >>> + return status; >>> +} >>> +EXPORT_SYMBOL_GPL(pnfs_register_layoutdriver); >>> + >>> +void >>> +pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type) >>> +{ >>> + dprintk("%s Deregistering id:%u\n", __func__, ld_type->id); >>> + spin_lock(&pnfs_spinlock); >>> + list_del(&ld_type->pnfs_tblid); >>> + spin_unlock(&pnfs_spinlock); >>> +} >>> +EXPORT_SYMBOL_GPL(pnfs_unregister_layoutdriver); >>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h >>> index c628ef1..61531f3 100644 >>> --- a/fs/nfs/pnfs.h >>> +++ b/fs/nfs/pnfs.h >>> @@ -36,8 +36,17 @@ >>> >>> /* Per-layout driver specific registration structure */ >>> struct pnfs_layoutdriver_type { >>> + struct list_head pnfs_tblid; >>> + const u32 id; >>> + const char *name; >>> + struct module *owner; >>> + int (*initialize_mountpoint) (struct nfs_server *); >>> + int (*uninitialize_mountpoint) (struct nfs_server *); >>> }; >>> >>> +extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *); >>> +extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *); >>> + >>> void set_pnfs_layoutdriver(struct nfs_server *, u32 id); >>> void unset_pnfs_layoutdriver(struct nfs_server *); >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html