Re: MDS has inconsistent performance

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 16 Jan 2015 10:43:17 -0800

On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla
<mikesevilla3@xxxxxxxxx> wrote:
> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote:
>>> Let me know if this works and/or you need anything else:
>>>
>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>>
>>> Beware - the clients were on debug=10. Also, I tried this with the
>>> kernel client and it is more consistent; it does the 2 lookups per
>>> create on 1 client every single time.
>>
>> Mmmm, there are no mds logs of note here. :(
>>
>
> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't
> show anything interesting...

It's not interesting. Caps are not logged at a very high level so I
think we'd actually want debug 20 on the mds, the messenger, and the
client subsystems.

>
>> I did look enough to see that:
>> 1) The MDS is for some reason revoking caps on the file create
>> prompting the switch to double-lookups, which it was not before. The
>> client doesn't really have any visibility into why that would be the
>> case; the best guess I can come up with is that maybe the MDS split up
>> the directory into multiple frags at this point — do you have that
>> enabled?
>
> Nope, unless any of these make a difference:
> $ ceph --admin-daemon... config show | grep frag
>   "mds_bal_frag": "false",
>   "mds_bal_fragment_interval": "5",
>   "mds_thrash_fragments": "0",
>   "mds_debug_frag": "false",
>
>> 2) The only way we set the I_COMPLETE flag is when we create an empty
>> directory, or when we do a complete listdir on one. That makes it
>> pretty difficult to get the flag back (and so do the optimal create
>> path) once you lose it. :( I'd love a better way to do so, but we'll
>> have to look at what's involved in a bit of depth.
>
> No need - with that reasoning it looks more like this is part of the
> design rather than a bug. I'll just have to accept the fact that the
> system is very complicated and clients touching stuff at certain times
> can make things less predictable... I just wanted to make sure I
> wasn't doing anything wrong. :)  I'll stick with the kernel client
> (it's almost twice as fast, anyways!)

Well, sort of — an isolated client with their own directory is
something we definitely want to have exclusive caps, but our
heuristics aren't sophisticated enough yet.

>
>> I'm not sure why the kernel client is so much more cautious, but I
>> think there were a number of troubles with the directory listing
>> orders and things which were harder to solve there – I don't remember
>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
>> more about that. What kernel client version are you using?
>>
>> And for a vanity data point, what kind of hardware is your MDS running on? :)
>
> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM,
> connected with 1Gbit. Kernel 3.4. We actually just installed beefier
> nodes so I'll keep you posted if we get other cool results.

Awesome! That's much faster than previously, although Zheng did some
work recently to split the journaling code into a separate thread
which I guess must have made a big difference.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html