Re: MDS has inconsistent performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's been a while and I don't imagine you care much right now, but I
finally made the time to look at these log details. The reason we got
slow turned out to be much stupider than I anticipated (we were losing
I_COMPLETE for bad reasons); I wrote up what I found at
http://tracker.ceph.com/issues/11226 and have an RFC change in
wip-11226-dir-fx (PR at https://github.com/ceph/ceph/pull/4168).

Thanks for the complaint and the logs! :)
-Greg

On Fri, Jan 16, 2015 at 1:13 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote:
> If you feel like perusing... log=20 on client, mds messenger, and mds:
>
> https://www.dropbox.com/s/uvmexh9impd3f3c/forgreg.tar.gz?dl=0
>
> In this run, only client 1 starts doing the extra lookups.
>
> On Fri, Jan 16, 2015 at 10:43 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla
>> <mikesevilla3@xxxxxxxxx> wrote:
>>> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote:
>>>>> Let me know if this works and/or you need anything else:
>>>>>
>>>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0
>>>>>
>>>>> Beware - the clients were on debug=10. Also, I tried this with the
>>>>> kernel client and it is more consistent; it does the 2 lookups per
>>>>> create on 1 client every single time.
>>>>
>>>> Mmmm, there are no mds logs of note here. :(
>>>>
>>>
>>> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't
>>> show anything interesting...
>>
>> It's not interesting. Caps are not logged at a very high level so I
>> think we'd actually want debug 20 on the mds, the messenger, and the
>> client subsystems.
>>
>>>
>>>> I did look enough to see that:
>>>> 1) The MDS is for some reason revoking caps on the file create
>>>> prompting the switch to double-lookups, which it was not before. The
>>>> client doesn't really have any visibility into why that would be the
>>>> case; the best guess I can come up with is that maybe the MDS split up
>>>> the directory into multiple frags at this point — do you have that
>>>> enabled?
>>>
>>> Nope, unless any of these make a difference:
>>> $ ceph --admin-daemon... config show | grep frag
>>>   "mds_bal_frag": "false",
>>>   "mds_bal_fragment_interval": "5",
>>>   "mds_thrash_fragments": "0",
>>>   "mds_debug_frag": "false",
>>>
>>>> 2) The only way we set the I_COMPLETE flag is when we create an empty
>>>> directory, or when we do a complete listdir on one. That makes it
>>>> pretty difficult to get the flag back (and so do the optimal create
>>>> path) once you lose it. :( I'd love a better way to do so, but we'll
>>>> have to look at what's involved in a bit of depth.
>>>
>>> No need - with that reasoning it looks more like this is part of the
>>> design rather than a bug. I'll just have to accept the fact that the
>>> system is very complicated and clients touching stuff at certain times
>>> can make things less predictable... I just wanted to make sure I
>>> wasn't doing anything wrong. :)  I'll stick with the kernel client
>>> (it's almost twice as fast, anyways!)
>>
>> Well, sort of — an isolated client with their own directory is
>> something we definitely want to have exclusive caps, but our
>> heuristics aren't sophisticated enough yet.
>>
>>>
>>>> I'm not sure why the kernel client is so much more cautious, but I
>>>> think there were a number of troubles with the directory listing
>>>> orders and things which were harder to solve there – I don't remember
>>>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk
>>>> more about that. What kernel client version are you using?
>>>>
>>>> And for a vanity data point, what kind of hardware is your MDS running on? :)
>>>
>>> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM,
>>> connected with 1Gbit. Kernel 3.4. We actually just installed beefier
>>> nodes so I'll keep you posted if we get other cool results.
>>
>> Awesome! That's much faster than previously, although Zheng did some
>> work recently to split the journaling code into a separate thread
>> which I guess must have made a big difference.
>> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux