MDS has inconsistent performance

Michael Sevilla <mikesevilla3@xxxxxxxxx> · Mon, 12 Jan 2015 22:17:26 -0800

I can't get consistent performance with 1 MDS. I have 2 clients create
100,000 files (separate directories) in a CephFS mount. I ran the
experiment 5 times (deleting the pools/fs and restarting the MDS in
between each run). I graphed the metadata throughput (requests per
second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png

Sometimes (run0, run3), both clients issue 2 lookups per create to the
MDS - this makes throughput high but the runtime long since the MDS
processes many more requests.
Sometimes (run2, run4), 1 client does 2 lookups per create and the
other doesn't do any lookups.
Sometimes (run1), neither client does any lookups - this has the
fastest runtime.

Does anyone know why the client behaves differently for the same exact
experiment? Reading the client logs, it looks like sometimes the
client enters add_update_cap() and clears the inode->flags in
check_cap_issue(), then when a lookup occurs (in _lookup()), the
client can't return ENOENT locally -- forcing it ask the MDS to do the
lookup. But this only happens sometimes (e.g., run0 and run3).

Details of the experiment:
Workload: 2 clients, 100,000 creates in separate directories, using
the FUSE client
MDS config: client_cache_size = 100000000, mds_cache_size = 16384000
Cluster: 18 OSDs, 1 MDS, 1 MON, data/metadata pools have 4096 PGs
Ceph version 0.90-877-gc219c43 (c219c43cc2943c794378214d77566e3f0d3f394a)

Thanks!

Michael
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html