On Tue, Jan 13, 2015 at 11:13 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Mon, Jan 12, 2015 at 10:17 PM, Michael Sevilla > <mikesevilla3@xxxxxxxxx> wrote: >> I can't get consistent performance with 1 MDS. I have 2 clients create >> 100,000 files (separate directories) in a CephFS mount. I ran the >> experiment 5 times (deleting the pools/fs and restarting the MDS in >> between each run). I graphed the metadata throughput (requests per >> second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png > > So that top line is ~20,000 processed requests/second, as measured at > the MDS? (Looking at perfcounters?) And the fast run is doing 10k > create requests/second? (This number is much higher than I expected!) Yes - top line was 20K req/s from perf counter dump and the fast run does about 13K creates/s. We were surprised, too... In fact, the performance of 1 client per MDS gives us similar performance to IndexFS - a system that came out in a paper at Supercomputing this year. Here is a throughput graph, normalized to the # of clients, that shows how powerful one MDS can actually be: https://github.com/michaelsevilla/mds/blob/master/graphs/thruput-norm.png Keep in mind that runs with more than 1 client aren't creates/s, but ops/sec. ;) > >> Sometimes (run0, run3), both clients issue 2 lookups per create to the >> MDS - this makes throughput high but the runtime long since the MDS >> processes many more requests. >> Sometimes (run2, run4), 1 client does 2 lookups per create and the >> other doesn't do any lookups. >> Sometimes (run1), neither client does any lookups - this has the >> fastest runtime. >> >> Does anyone know why the client behaves differently for the same exact >> experiment? Reading the client logs, it looks like sometimes the >> client enters add_update_cap() and clears the inode->flags in >> check_cap_issue(), then when a lookup occurs (in _lookup()), the >> client can't return ENOENT locally -- forcing it ask the MDS to do the >> lookup. But this only happens sometimes (e.g., run0 and run3). > > If you provide the logs I can check more carefully, but my guess is > that you've got another client mounting it, or are looking at both > directories from one of the clients, and this is inadvertently causing > them to go into shared rather than exclusive mode. I think you are right! Here is a subset of the client log: https://github.com/michaelsevilla/mds/blob/master/scratch/client0.log These snippets are zoomed into when the client stops sending "create, create, create, create..." and starts sending "lookup, lookup, create, lookup, lookup, create..." $ cat client0.log | grep "send_request client" create ...file.2098 create ...file.2099 create ...file.2100 create ...file.2101 lookup ...file.2102 lookup ...file.2102 create ...file.2102 lookup ...file.2103 lookup ...file.2103 create ...file.2103 lookup ...file.2104 lookup ...file.2104 create ...file.2104 I think what you are looking for is on line 687: ... clearing (I_COMPLETE|I_DIR_ORDERED) ... add_update_cap issued pAsLsXs -> pAsLsXsFsx It looks like we lose the exclusive mode on the file... but I don't understand why the MDS revokes it for 1 client but not the other. The MDS log is here: https://raw.githubusercontent.com/michaelsevilla/mds/master/scratch/mds.log > > How are you trying to keep the directories private during the > workload? Some of the more naive solutions won't stand up to > repetitive testing given how various components of the system > currently behave. Is there a way to keep the directories private (i.e. keep the always in exclusive mode? That'd be perfect... In my runs, one client does mkdir /mnt/cephfs/dir0 and there other does mdkir /mnt/cephfs/dir1... > >> >> Details of the experiment: >> Workload: 2 clients, 100,000 creates in separate directories, using >> the FUSE client >> MDS config: client_cache_size = 100000000, mds_cache_size = 16384000 > > That client_cache_size only has any effect if it's applied to the > client-side config. ;) Yes - I copy the ceph.conf to the client, too. I think it works because the 1 client, 1 MDS test caches all the inodes, according the perf counters. Thanks so much, Greg! Mike > -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html