It's been a while and I don't imagine you care much right now, but I finally made the time to look at these log details. The reason we got slow turned out to be much stupider than I anticipated (we were losing I_COMPLETE for bad reasons); I wrote up what I found at http://tracker.ceph.com/issues/11226 and have an RFC change in wip-11226-dir-fx (PR at https://github.com/ceph/ceph/pull/4168). Thanks for the complaint and the logs! :) -Greg On Fri, Jan 16, 2015 at 1:13 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote: > If you feel like perusing... log=20 on client, mds messenger, and mds: > > https://www.dropbox.com/s/uvmexh9impd3f3c/forgreg.tar.gz?dl=0 > > In this run, only client 1 starts doing the extra lookups. > > On Fri, Jan 16, 2015 at 10:43 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla >> <mikesevilla3@xxxxxxxxx> wrote: >>> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote: >>>>> Let me know if this works and/or you need anything else: >>>>> >>>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0 >>>>> >>>>> Beware - the clients were on debug=10. Also, I tried this with the >>>>> kernel client and it is more consistent; it does the 2 lookups per >>>>> create on 1 client every single time. >>>> >>>> Mmmm, there are no mds logs of note here. :( >>>> >>> >>> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't >>> show anything interesting... >> >> It's not interesting. Caps are not logged at a very high level so I >> think we'd actually want debug 20 on the mds, the messenger, and the >> client subsystems. >> >>> >>>> I did look enough to see that: >>>> 1) The MDS is for some reason revoking caps on the file create >>>> prompting the switch to double-lookups, which it was not before. The >>>> client doesn't really have any visibility into why that would be the >>>> case; the best guess I can come up with is that maybe the MDS split up >>>> the directory into multiple frags at this point — do you have that >>>> enabled? >>> >>> Nope, unless any of these make a difference: >>> $ ceph --admin-daemon... config show | grep frag >>> "mds_bal_frag": "false", >>> "mds_bal_fragment_interval": "5", >>> "mds_thrash_fragments": "0", >>> "mds_debug_frag": "false", >>> >>>> 2) The only way we set the I_COMPLETE flag is when we create an empty >>>> directory, or when we do a complete listdir on one. That makes it >>>> pretty difficult to get the flag back (and so do the optimal create >>>> path) once you lose it. :( I'd love a better way to do so, but we'll >>>> have to look at what's involved in a bit of depth. >>> >>> No need - with that reasoning it looks more like this is part of the >>> design rather than a bug. I'll just have to accept the fact that the >>> system is very complicated and clients touching stuff at certain times >>> can make things less predictable... I just wanted to make sure I >>> wasn't doing anything wrong. :) I'll stick with the kernel client >>> (it's almost twice as fast, anyways!) >> >> Well, sort of — an isolated client with their own directory is >> something we definitely want to have exclusive caps, but our >> heuristics aren't sophisticated enough yet. >> >>> >>>> I'm not sure why the kernel client is so much more cautious, but I >>>> think there were a number of troubles with the directory listing >>>> orders and things which were harder to solve there – I don't remember >>>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk >>>> more about that. What kernel client version are you using? >>>> >>>> And for a vanity data point, what kind of hardware is your MDS running on? :) >>> >>> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM, >>> connected with 1Gbit. Kernel 3.4. We actually just installed beefier >>> nodes so I'll keep you posted if we get other cool results. >> >> Awesome! That's much faster than previously, although Zheng did some >> work recently to split the journaling code into a separate thread >> which I guess must have made a big difference. >> -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html