Sage, Will get back to you with the logs. Had another question about the implementation: Here is the piece of code that I am a bit confused about: In "fs/ceph/inode.c" ( function "ceph_fill_trace") /* do we have a lease on the whole dir? */ have_dir_cap = (le32_to_cpu(rinfo->diri.in->cap.caps) & CEPH_CAP_FILE_SHARED); /* do we have a dn lease? */ have_lease = have_dir_cap || (le16_to_cpu(rinfo->dlease->mask) & CEPH_LOCK_DN); So we check the capability "CEPH_CAP_FILE_SHARED" to make sure the entire directory has the lease. "have_lease" is then used to determine if the dentry can be cached. I would have thought that it should be "CEPH_CAP_FILE_EXCL" that should be used to determine if a dentry can be cached since it would mean that the client created the file and has "update" capabilities. thanks -Jojy On Tue, Jul 26, 2011 at 10:56 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Tue, 26 Jul 2011, Jojy Varghese wrote: >> Sage >> I tried the simple use case of mkdir on the ceph mounted dir but >> still see the issue. So i am wondering if our setup has anything to do >> with it (although ideally it should not). Anything i should be looking >> at given this behavior?' > > Can you capture the mds and kernel logs for the simple case? > > debug mds = 20 > debug ms = 1 > > and for the kernel side run ceph.git's src/scripts/kcon_most.sh (or > similar) > > Thanks! > sage > >> >> >> >> thx >> Jojy >> >> On Mon, Jul 25, 2011 at 9:12 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > On Mon, 25 Jul 2011, Jojy Varghese wrote: >> >> What i observe is that after a mkdir, the inode CAPS loses the >> >> lease(FILE_SHARED). I would have thought that the owing client should >> >> have a FILE_EXCL on the files/dirs it creates. >> >> >> >> Since it doesnt have a lease, the dentry(after splicing) is not cached. >> > >> > Can you describe the specific sequence of operations you're doing? I'm >> > not seeing this behavior. I see >> > >> > $ mkdir foo >> > client->mds lookup #1/foo >> > client->mds mkdir #1/foo >> > $ mkdir foo/a >> > client->mds lookup #100000000/a >> > client->mds mkdir #100000000/a >> > >> > with no repeated lookup on foo. >> > >> > sage >> > >> > >> >> >> >> thanks >> >> Jojy >> >> >> >> On Sat, Jul 23, 2011 at 2:56 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> > On Fri, 22 Jul 2011, Jojy Varghese wrote: >> >> >> Not sure how it is designed to work but I assume that some kind of >> >> >> async RPC mechanism exists from the MDCs to the clients to update the >> >> >> CAP for a file from "exclusive" to "shared". This will allow the >> >> >> cached dentries to be pruned/dropped when another client updates the >> >> >> file. >> >> > >> >> > Right. If the MDS needs to modify a dentry, it revoke any issued client >> >> > leases before granting the write/exclusive lock to process the request. >> >> > >> >> > sage >> >> > >> >> >> >> >> >> -Jojy >> >> >> >> >> >> On Fri, Jul 22, 2011 at 8:51 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> >> > On Fri, 22 Jul 2011, Jojy Varghese wrote: >> >> >> >> Sage would the latest patches fix the lookup issue? >> >> >> > >> >> >> > No, the blocker there is the '[PATCH] vfs: add d_prune dentry operation' >> >> >> > email on Jul 8 to linux-fsdevel and lkml. Once this set goes in (and >> >> >> > cleans up a bunch of stuff Al found in a code audit last weekend) I'll be >> >> >> > bugging him about it again. >> >> >> > >> >> >> > sage >> >> >> > >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> On Thu, Jul 21, 2011 at 10:55 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> >> >> > On Thu, 21 Jul 2011, Jojy Varghese wrote: >> >> >> >> >> Thanks for the response Sage. We are using 2.6.39 kernel and in the >> >> >> >> >> "ceph_lookup" method, i see that there is a shortcut for deciding >> >> >> >> >> ENOENT but after the MDS lookup, i dont see a d_add. I am sure i am >> >> >> >> >> missing something here. >> >> >> >> > >> >> >> >> > dout(" dir %p complete, -ENOENT\n", dir); >> >> >> >> > d_add(dentry, NULL); >> >> >> >> > >> >> >> >> > ...but that is only for the negative lookup in a directory with the >> >> >> >> > 'complete' flag set. And it's never set currently because we don't have >> >> >> >> > d_prune yet (and the old use of d_release was racy). So ignore this part >> >> >> >> > for now! >> >> >> >> > >> >> >> >> > You have an existing, unchanging, directory that you're seeing repeated >> >> >> >> > lookups on, right? Like the top-level directory in the heirarchy you're >> >> >> >> > copying? And the client is doing repeated lookups on the same name? >> >> >> >> > >> >> >> >> > The way to debug this is probably to start with the messages passing to >> >> >> >> > the MDS and verifying that lookups are duplicated. Then enable the >> >> >> >> > logging on the kernel client and see why the client isn't uses leases or >> >> >> >> > the FILE_SHARED cap to avoid them. We can help you through that on #ceph >> >> >> >> > if you like. >> >> >> >> > >> >> >> >> > sage >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> thanks again >> >> >> >> >> Jojy >> >> >> >> >> >> >> >> >> >> On Thu, Jul 21, 2011 at 9:49 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> >> >> >> > On Thu, 21 Jul 2011, Jojy Varghese wrote: >> >> >> >> >> >> Hi >> >> >> >> >> >> I just started looking at the ceph code in kernel and had a question >> >> >> >> >> >> about performance considerations for lookup operations. I noticed that >> >> >> >> >> >> for every operation (say copying a directory), the root dentry is >> >> >> >> >> >> "looked" up multiple times and since they all go to MDS for the actual >> >> >> >> >> >> lookup operation, it effects the performance. I am sure consistency is >> >> >> >> >> >> the winner here. Is there any plan to improve this, maybe by having >> >> >> >> >> >> MDS push the capability down to the clients when the dentry is >> >> >> >> >> >> updated. So say from CAP_EXCL to CAP_SHARED when the dentry is >> >> >> >> >> >> modified. This was the client node can cache the lookup operation and >> >> >> >> >> >> does not have to make a round trip to the MDS. >> >> >> >> >> > >> >> >> >> >> > In general, the MDS has two ways of keeping a client's cached dentry >> >> >> >> >> > consistent: >> >> >> >> >> > >> >> >> >> >> > - it can issue the FILE_SHARED capability bit on the parent directory, >> >> >> >> >> > which means the entire directory is static and the client can cache >> >> >> >> >> > dentry. >> >> >> >> >> > - if it can't do that, it will issue a per-dentry lease >> >> >> >> >> > >> >> >> >> >> > There is an additional 'complete' bit that is used to indicate on the >> >> >> >> >> > client that it has the _entire_ directory in cache. If set, it can do >> >> >> >> >> > negative lookups and readdir without hitting the MDS. That's currently >> >> >> >> >> > broken, pending the addition of a d_prune dentry_operation (see >> >> >> >> >> > linux-fsdevel email from July 8). >> >> >> >> >> > >> >> >> >> >> > Anyway, long story short, if you're seeing repeated lookups on a dentry >> >> >> >> >> > that isn't changing, something is broken. Can you describe the workload >> >> >> >> >> > in more detail? Which versions of the client and mds are you running? >> >> >> >> >> > >> >> >> >> >> > sage >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> -- >> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> >> >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html