If you feel like perusing... log=20 on client, mds messenger, and mds: https://www.dropbox.com/s/uvmexh9impd3f3c/forgreg.tar.gz?dl=0 In this run, only client 1 starts doing the extra lookups. On Fri, Jan 16, 2015 at 10:43 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Fri, Jan 16, 2015 at 10:34 AM, Michael Sevilla > <mikesevilla3@xxxxxxxxx> wrote: >> On Thu, Jan 15, 2015 at 10:37 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>> On Thu, Jan 15, 2015 at 2:44 PM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote: >>>> Let me know if this works and/or you need anything else: >>>> >>>> https://www.dropbox.com/s/fq47w6jebnyluu0/lookup-logs.tar.gz?dl=0 >>>> >>>> Beware - the clients were on debug=10. Also, I tried this with the >>>> kernel client and it is more consistent; it does the 2 lookups per >>>> create on 1 client every single time. >>> >>> Mmmm, there are no mds logs of note here. :( >>> >> >> Meaning you couldn't find mds.issdm-15.log? Or that that log didn't >> show anything interesting... > > It's not interesting. Caps are not logged at a very high level so I > think we'd actually want debug 20 on the mds, the messenger, and the > client subsystems. > >> >>> I did look enough to see that: >>> 1) The MDS is for some reason revoking caps on the file create >>> prompting the switch to double-lookups, which it was not before. The >>> client doesn't really have any visibility into why that would be the >>> case; the best guess I can come up with is that maybe the MDS split up >>> the directory into multiple frags at this point — do you have that >>> enabled? >> >> Nope, unless any of these make a difference: >> $ ceph --admin-daemon... config show | grep frag >> "mds_bal_frag": "false", >> "mds_bal_fragment_interval": "5", >> "mds_thrash_fragments": "0", >> "mds_debug_frag": "false", >> >>> 2) The only way we set the I_COMPLETE flag is when we create an empty >>> directory, or when we do a complete listdir on one. That makes it >>> pretty difficult to get the flag back (and so do the optimal create >>> path) once you lose it. :( I'd love a better way to do so, but we'll >>> have to look at what's involved in a bit of depth. >> >> No need - with that reasoning it looks more like this is part of the >> design rather than a bug. I'll just have to accept the fact that the >> system is very complicated and clients touching stuff at certain times >> can make things less predictable... I just wanted to make sure I >> wasn't doing anything wrong. :) I'll stick with the kernel client >> (it's almost twice as fast, anyways!) > > Well, sort of — an isolated client with their own directory is > something we definitely want to have exclusive caps, but our > heuristics aren't sophisticated enough yet. > >> >>> I'm not sure why the kernel client is so much more cautious, but I >>> think there were a number of troubles with the directory listing >>> orders and things which were harder to solve there – I don't remember >>> if we introduced the I_DIR_ORDERED flag in it or not. Zheng can talk >>> more about that. What kernel client version are you using? >>> >>> And for a vanity data point, what kind of hardware is your MDS running on? :) >> >> Really, really old hardware from 2006: 2 dual-core CPUs, 8GB RAM, >> connected with 1Gbit. Kernel 3.4. We actually just installed beefier >> nodes so I'll keep you posted if we get other cool results. > > Awesome! That's much faster than previously, although Zheng did some > work recently to split the journaling code into a separate thread > which I guess must have made a big difference. > -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html