On Feb 9, 2012, at 2:27 PM, Myklebust, Trond wrote: > On Thu, 2012-02-09 at 19:13 +0000, Adamson, Dros wrote: >> On Feb 9, 2012, at 1:16 PM, Myklebust, Trond wrote: >> >>> On Thu, 2012-02-09 at 11:41 -0500, Weston Andros Adamson wrote: >>>> Include RPC statistics from all data servers in /proc/self/mountstats for pNFS >>>> filelayout mounts. >>>> >>>> The additional info printed on the "per-op statistics" line does not break >>>> current mountstats(8) from nfs-utils. >>>> >>>> Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxx> >>>> --- >>>> >>>> This is clearly a better approach than I was taking trying to keep DS rpc stats >>>> separate in /proc/self/mountstats. >>>> >>>> FYI, there are no other callers of rpc_print_iostats(). >>>> >>>> The new output of /proc/self/mountstats works fine with current mountstats >>>> userland tool. The additional data on the per-op statistics line doesn't make >>>> it into the output, but I left it in /proc/self/mountstats because it doesn't >>>> break anything and can be included in newer versions of mountstats. >>>> >>>> I plan on adding the per-DS stats (rpc and others) somewhere in /proc/fs/nfsfs/ >>>> soon. >>>> >>>> fs/nfs/nfs4filelayout.c | 30 ++++++++++++++++++++++++++ >>>> fs/nfs/pnfs.h | 23 ++++++++++++++++++++ >>>> fs/nfs/pnfs_dev.c | 45 ++++++++++++++++++++++++++++++++++++++++ >>>> fs/nfs/super.c | 3 +- >>>> include/linux/sunrpc/metrics.h | 13 +++++++++- >>>> net/sunrpc/stats.c | 38 ++++++++++++++++++++++++++++++++- >>>> 6 files changed, 147 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c >>>> index 79be7ac..51c71e5 100644 >>>> --- a/fs/nfs/nfs4filelayout.c >>>> +++ b/fs/nfs/nfs4filelayout.c >>>> @@ -33,6 +33,8 @@ >>>> #include <linux/nfs_page.h> >>>> #include <linux/module.h> >>>> >>>> +#include <linux/sunrpc/metrics.h> >>>> + >>>> #include "internal.h" >>>> #include "nfs4filelayout.h" >>>> >>>> @@ -918,6 +920,33 @@ filelayout_free_deveiceid_node(struct nfs4_deviceid_node *d) >>>> nfs4_fl_free_deviceid(container_of(d, struct nfs4_file_layout_dsaddr, id_node)); >>>> } >>>> >>>> +static unsigned int >>>> +filelayout_rpc_add_iostats(struct rpc_iostats *total, >>>> + unsigned int maxproc, >>>> + struct nfs4_deviceid_node *d) >>>> +{ >>>> + struct nfs4_file_layout_dsaddr *dsaddr; >>>> + struct nfs4_pnfs_ds *ds; >>>> + unsigned int num_ds = 0; >>>> + u32 i; >>>> + >>>> + dsaddr = container_of(d, struct nfs4_file_layout_dsaddr, id_node); >>>> + >>>> + for (i = 0; i < dsaddr->ds_num; i++) { >>>> + ds = dsaddr->ds_list[i]; >>>> + >>>> + /* only add iostats if nfs_client is different from MDS */ >>>> + if (ds && ds->ds_clp && d->nfs_client != ds->ds_clp) { >>>> + struct rpc_clnt *clnt = ds->ds_clp->cl_rpcclient; >>> >>> Hmmm.... Shouldn't this rather be the NFS_CLIENT(inode)? Also, why not >>> do this in the read/write completion code? Why walk the deviceid list at >>> all? >> >> I'm not really following you. Note that these are rpc_iostats not nfs_iostats. >> nfs_iostats have been properly counting operations to DSs for some time now. >> rpc_iostats count operations, rtt, exectime, etc per rpc procedure (or in the case of NFSv4.X there is a hack to use operations within compounds). >> >> We have to walk the device list to get access to the underlying DS's nfs_client->rpc_client -- I can't just put this in the read/write completion because, unlike nfs_iostats, rpc_iostats count all operations and not just read/write events/bytes (not to mention that the accumulation is done at the rpc layer). > > In doing this, you are assuming that DSes are not shared among multiple > filesystems or MDSes (i.e. that each DS is used by a single struct > nfs_server). For something like a NetApp filer in c-mode, that is > clearly not a valid assumption. True - I had thought of this, but I thought it was fine since I was just carrying this assumption forward... The current mountstats code already prints nfs_server->nfs_client->rpc_client->rpc_iostats -- which clearly doesn't differentiate between multiple nfs_servers that could be sharing the nfs_client. > >> I do have an implementation that doesn't need to walk the deviceid list by allowing a shared rpc_iostats struct between multiple rpc_clients (in addition to the current rpc_iostats structure), but that required adding locking and reference counting -- all for printing stats (obviously not what we want). > > This might be more in line with what we want. Note that it should be > easy to clone an rpc_client and then replace its rpc_iostats struct. I > don't think that needs any extra locking. We're already ignoring locking > here between different rpc_tasks, so throwing in different rpc_clients > to the mix will make no difference. Yeah, that's easy enough and i guess we could ignore locking, but we are still left with the same problem: how is this supposed to account for different mount points using the same nfs_client? nfs_client only has one rpc_client member. I doubt we want to make a hash lookup on nfs_server to get the right rpc_client (which could all use the same underlying xprt). Maybe it's time to move these stats into fs/nfs/ where they really belong? We could get rid of the hack that overloads procnum with opnum from inside the compound for v4+ and finally only show a specific mount's RPC stats. Thoughts? -dros
Attachment:
smime.p7s
Description: S/MIME cryptographic signature