Re: MDS has inconsistent performance

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 15 Jan 2015 11:28:09 -0800

Can you post the full logs somewhere to look at? These bits aren't
very helpful on their own (except to say, yes, the client cleared its
I_COMPLETE for some reason).

On Tue, Jan 13, 2015 at 3:45 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote:
> On Tue, Jan 13, 2015 at 11:13 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Mon, Jan 12, 2015 at 10:17 PM, Michael Sevilla
>> <mikesevilla3@xxxxxxxxx> wrote:
>>> I can't get consistent performance with 1 MDS. I have 2 clients create
>>> 100,000 files (separate directories) in a CephFS mount. I ran the
>>> experiment 5 times (deleting the pools/fs and restarting the MDS in
>>> between each run). I graphed the metadata throughput (requests per
>>> second): https://github.com/michaelsevilla/mds/blob/master/graphs/thruput.png
>>
>> So that top line is ~20,000 processed requests/second, as measured at
>> the MDS? (Looking at perfcounters?) And the fast run is doing 10k
>> create requests/second? (This number is much higher than I expected!)
>
> Yes - top line was 20K req/s from perf counter dump and the fast run
> does about 13K creates/s. We were surprised, too... In fact, the
> performance of 1 client per MDS gives us similar performance to
> IndexFS - a system that came out in a paper at Supercomputing this
> year. Here is a throughput graph, normalized to the # of clients, that
> shows how powerful one MDS can actually be:
> https://github.com/michaelsevilla/mds/blob/master/graphs/thruput-norm.png
>
> Keep in mind that runs with more than 1 client aren't creates/s, but ops/sec. ;)
>
>>
>>> Sometimes (run0, run3), both clients issue 2 lookups per create to the
>>> MDS - this makes throughput high but the runtime long since the MDS
>>> processes many more requests.
>>> Sometimes (run2, run4), 1 client does 2 lookups per create and the
>>> other doesn't do any lookups.
>>> Sometimes (run1), neither client does any lookups - this has the
>>> fastest runtime.
>>>
>>> Does anyone know why the client behaves differently for the same exact
>>> experiment? Reading the client logs, it looks like sometimes the
>>> client enters add_update_cap() and clears the inode->flags in
>>> check_cap_issue(), then when a lookup occurs (in _lookup()), the
>>> client can't return ENOENT locally -- forcing it ask the MDS to do the
>>> lookup. But this only happens sometimes (e.g., run0 and run3).
>>
>> If you provide the logs I can check more carefully, but my guess is
>> that you've got another client mounting it, or are looking at both
>> directories from one of the clients, and this is inadvertently causing
>> them to go into shared rather than exclusive mode.
>
> I think you are right! Here is a subset of the client log:
> https://github.com/michaelsevilla/mds/blob/master/scratch/client0.log
>
> These snippets are zoomed into when the client stops sending "create,
> create, create, create..." and starts sending "lookup, lookup, create,
> lookup, lookup, create..."
>
> $ cat client0.log | grep "send_request client"
> create ...file.2098
> create ...file.2099
> create ...file.2100
> create ...file.2101
> lookup ...file.2102
> lookup ...file.2102
> create ...file.2102
> lookup ...file.2103
> lookup ...file.2103
> create ...file.2103
> lookup ...file.2104
> lookup ...file.2104
> create ...file.2104
>
> I think what you are looking for is on line 687:
> ... clearing (I_COMPLETE|I_DIR_ORDERED)
> ... add_update_cap issued pAsLsXs -> pAsLsXsFsx
>
> It looks like we lose the exclusive mode on the file... but I don't
> understand why the MDS revokes it for 1 client but not the other. The
> MDS log is here:
> https://raw.githubusercontent.com/michaelsevilla/mds/master/scratch/mds.log
>
>
>>
>> How are you trying to keep the directories private during the
>> workload? Some of the more naive solutions won't stand up to
>> repetitive testing given how various components of the system
>> currently behave.
> Is there a way to keep the directories private (i.e. keep the always
> in exclusive mode? That'd be perfect... In my runs, one client does
> mkdir /mnt/cephfs/dir0 and there other does mdkir /mnt/cephfs/dir1...
>
>>
>>>
>>> Details of the experiment:
>>> Workload: 2 clients, 100,000 creates in separate directories, using
>>> the FUSE client
>>> MDS config: client_cache_size = 100000000, mds_cache_size = 16384000
>>
>> That client_cache_size only has any effect if it's applied to the
>> client-side config. ;)
> Yes - I copy the ceph.conf to the client, too. I think it works
> because the 1 client, 1 MDS test caches all the inodes, according the
> perf counters.
>
> Thanks so much, Greg!
>
> Mike
>
>> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html