On Tue, 26 Jul 2011, Jojy Varghese wrote: > Sage > Attached is the dmesg outputs after an scp of three level > directory. Here is how the directory tree looks like: > > > testscpdirB/ > ÿÿÿÿÿÿ level1 > ÿÿÿÿÿÿ l1f1 > ÿÿÿÿÿÿ l1f2 > ÿÿÿÿÿÿ l1f3 > ÿÿÿÿÿÿ level2 > ÿÿÿÿÿÿ l2f1 > ÿÿÿÿÿÿ l2f2 > ÿÿÿÿÿÿ l2f3 > ÿÿÿÿÿÿ l2f4 > ÿÿÿÿÿÿ level3 > ÿÿÿÿÿÿ l3f1 > > "scp_c_kernel.log" has one more level of detail in the log > (ceph_fill_trace) and I changed the top level directory name. You will > observe that the top level directory has "have_lease" false. > > As you will see in the log, the top level dentry is looked up multiple times. Hmm, yep, that is strange! Looks like a bug, although it may be one we already fixed in the last kernel. Can you confirm which version of the kernel client and server side you are running? Also, can you do it one more time with everything in inode.c and caps.c enabled (or just the scripts/kcon_most.sh from ceph.git), and also the mds log (debug mds = 20, debug ms = 1)? That will have everything I need. Opened ticket #1350 to track this. Thanks! sage > > > thx, > Jojy > > > > On Tue, Jul 26, 2011 at 10:56 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Tue, 26 Jul 2011, Jojy Varghese wrote: > >> Sage > >> I tried the simple use case of mkdir on the ceph mounted dir but > >> still see the issue. So i am wondering if our setup has anything to do > >> with it (although ideally it should not). Anything i should be looking > >> at given this behavior?' > > > > Can you capture the mds and kernel logs for the simple case? > > > > debug mds = 20 > > debug ms = 1 > > > > and for the kernel side run ceph.git's src/scripts/kcon_most.sh (or > > similar) > > > > Thanks! > > sage > > > >> > >> > >> > >> thx > >> Jojy > >> > >> On Mon, Jul 25, 2011 at 9:12 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> > On Mon, 25 Jul 2011, Jojy Varghese wrote: > >> >> What i observe is that after a mkdir, the inode CAPS loses the > >> >> lease(FILE_SHARED). I would have thought that the owing client should > >> >> have a FILE_EXCL on the files/dirs it creates. > >> >> > >> >> Since it doesnt have a lease, the dentry(after splicing) is not cached. > >> > > >> > Can you describe the specific sequence of operations you're doing? I'm > >> > not seeing this behavior. I see > >> > > >> > $ mkdir foo > >> > client->mds lookup #1/foo > >> > client->mds mkdir #1/foo > >> > $ mkdir foo/a > >> > client->mds lookup #100000000/a > >> > client->mds mkdir #100000000/a > >> > > >> > with no repeated lookup on foo. > >> > > >> > sage > >> > > >> > > >> >> > >> >> thanks > >> >> Jojy > >> >> > >> >> On Sat, Jul 23, 2011 at 2:56 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> >> > On Fri, 22 Jul 2011, Jojy Varghese wrote: > >> >> >> Not sure how it is designed to work but I assume that some kind of > >> >> >> async RPC mechanism exists from the MDCs to the clients to update the > >> >> >> CAP for a file from "exclusive" to "shared". This will allow the > >> >> >> cached dentries to be pruned/dropped when another client updates the > >> >> >> file. > >> >> > > >> >> > Right. If the MDS needs to modify a dentry, it revoke any issued client > >> >> > leases before granting the write/exclusive lock to process the request. > >> >> > > >> >> > sage > >> >> > > >> >> >> > >> >> >> -Jojy > >> >> >> > >> >> >> On Fri, Jul 22, 2011 at 8:51 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> >> >> > On Fri, 22 Jul 2011, Jojy Varghese wrote: > >> >> >> >> Sage would the latest patches fix the lookup issue? > >> >> >> > > >> >> >> > No, the blocker there is the '[PATCH] vfs: add d_prune dentry operation' > >> >> >> > email on Jul 8 to linux-fsdevel and lkml. Once this set goes in (and > >> >> >> > cleans up a bunch of stuff Al found in a code audit last weekend) I'll be > >> >> >> > bugging him about it again. > >> >> >> > > >> >> >> > sage > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> >> > >> >> >> >> On Thu, Jul 21, 2011 at 10:55 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> >> >> >> > On Thu, 21 Jul 2011, Jojy Varghese wrote: > >> >> >> >> >> Thanks for the response Sage. We are using 2.6.39 kernel and in the > >> >> >> >> >> "ceph_lookup" method, i see that there is a shortcut for deciding > >> >> >> >> >> ENOENT but after the MDS lookup, i dont see a d_add. I am sure i am > >> >> >> >> >> missing something here. > >> >> >> >> > > >> >> >> >> > dout(" dir %p complete, -ENOENT\n", dir); > >> >> >> >> > d_add(dentry, NULL); > >> >> >> >> > > >> >> >> >> > ...but that is only for the negative lookup in a directory with the > >> >> >> >> > 'complete' flag set. And it's never set currently because we don't have > >> >> >> >> > d_prune yet (and the old use of d_release was racy). So ignore this part > >> >> >> >> > for now! > >> >> >> >> > > >> >> >> >> > You have an existing, unchanging, directory that you're seeing repeated > >> >> >> >> > lookups on, right? Like the top-level directory in the heirarchy you're > >> >> >> >> > copying? And the client is doing repeated lookups on the same name? > >> >> >> >> > > >> >> >> >> > The way to debug this is probably to start with the messages passing to > >> >> >> >> > the MDS and verifying that lookups are duplicated. Then enable the > >> >> >> >> > logging on the kernel client and see why the client isn't uses leases or > >> >> >> >> > the FILE_SHARED cap to avoid them. We can help you through that on #ceph > >> >> >> >> > if you like. > >> >> >> >> > > >> >> >> >> > sage > >> >> >> >> > > >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> thanks again > >> >> >> >> >> Jojy > >> >> >> >> >> > >> >> >> >> >> On Thu, Jul 21, 2011 at 9:49 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> >> >> >> >> > On Thu, 21 Jul 2011, Jojy Varghese wrote: > >> >> >> >> >> >> Hi > >> >> >> >> >> >> I just started looking at the ceph code in kernel and had a question > >> >> >> >> >> >> about performance considerations for lookup operations. I noticed that > >> >> >> >> >> >> for every operation (say copying a directory), the root dentry is > >> >> >> >> >> >> "looked" up multiple times and since they all go to MDS for the actual > >> >> >> >> >> >> lookup operation, it effects the performance. I am sure consistency is > >> >> >> >> >> >> the winner here. Is there any plan to improve this, maybe by having > >> >> >> >> >> >> MDS push the capability down to the clients when the dentry is > >> >> >> >> >> >> updated. So say from CAP_EXCL to CAP_SHARED when the dentry is > >> >> >> >> >> >> modified. This was the client node can cache the lookup operation and > >> >> >> >> >> >> does not have to make a round trip to the MDS. > >> >> >> >> >> > > >> >> >> >> >> > In general, the MDS has two ways of keeping a client's cached dentry > >> >> >> >> >> > consistent: > >> >> >> >> >> > > >> >> >> >> >> > - it can issue the FILE_SHARED capability bit on the parent directory, > >> >> >> >> >> > which means the entire directory is static and the client can cache > >> >> >> >> >> > dentry. > >> >> >> >> >> > - if it can't do that, it will issue a per-dentry lease > >> >> >> >> >> > > >> >> >> >> >> > There is an additional 'complete' bit that is used to indicate on the > >> >> >> >> >> > client that it has the _entire_ directory in cache. If set, it can do > >> >> >> >> >> > negative lookups and readdir without hitting the MDS. That's currently > >> >> >> >> >> > broken, pending the addition of a d_prune dentry_operation (see > >> >> >> >> >> > linux-fsdevel email from July 8). > >> >> >> >> >> > > >> >> >> >> >> > Anyway, long story short, if you're seeing repeated lookups on a dentry > >> >> >> >> >> > that isn't changing, something is broken. Can you describe the workload > >> >> >> >> >> > in more detail? Which versions of the client and mds are you running? > >> >> >> >> >> > > >> >> >> >> >> > sage > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> -- > >> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> >> >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> -- > >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> >> > >> >> >> > >> >> > >> >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> >