Re: Consistency vs efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It looks like you're running pretty old version of the client AND server.  
I'm 90% sure this was fixed by 5dc09dd6b81c622960f628acdabda9eac8af1ceb on 
the server and/or 2f90b852e3ae73889d7f6de6ecf429b9b6a6b103 on the kernel 
side will fix this.  Running a new versin of either one should fix this, 
but I'd recommend running a current version of both!

sage


On Tue, 2 Aug 2011, Jojy Varghese wrote:

> Updated the bug.
> 
> On Mon, Aug 1, 2011 at 8:31 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Tue, 26 Jul 2011, Jojy Varghese wrote:
> >> Sage
> >>    Attached is the dmesg outputs after an scp of three level
> >> directory. Here is how the directory tree looks like:
> >>
> >>
> >> testscpdirB/
> >> ÿÿÿÿÿÿ level1
> >>     ÿÿÿÿÿÿ l1f1
> >>     ÿÿÿÿÿÿ l1f2
> >>     ÿÿÿÿÿÿ l1f3
> >>     ÿÿÿÿÿÿ level2
> >>         ÿÿÿÿÿÿ l2f1
> >>         ÿÿÿÿÿÿ l2f2
> >>         ÿÿÿÿÿÿ l2f3
> >>         ÿÿÿÿÿÿ l2f4
> >>         ÿÿÿÿÿÿ level3
> >>             ÿÿÿÿÿÿ l3f1
> >>
> >> "scp_c_kernel.log" has one more level of detail in the log
> >> (ceph_fill_trace) and I changed the top level directory name. You will
> >> observe that the top level directory has "have_lease" false.
> >>
> >> As you will see in the log, the top level dentry is looked up multiple times.
> >
> > Hmm, yep, that is strange!   Looks like a bug, although it may be one we
> > already fixed in the last kernel.  Can you confirm which version of the
> > kernel client and server side you are running?
> >
> > Also, can you do it one more time with everything in inode.c and caps.c
> > enabled (or just the scripts/kcon_most.sh from ceph.git), and also the mds
> > log (debug mds = 20, debug ms = 1)?  That will have everything I need.
> >
> > Opened ticket #1350 to track this.
> >
> > Thanks!
> > sage
> >
> >
> >>
> >>
> >> thx,
> >> Jojy
> >>
> >>
> >>
> >> On Tue, Jul 26, 2011 at 10:56 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> > On Tue, 26 Jul 2011, Jojy Varghese wrote:
> >> >> Sage
> >> >>  I tried the simple use case of mkdir on the ceph mounted dir but
> >> >> still see the issue. So i am wondering if our setup has anything to do
> >> >> with it (although ideally it should not). Anything i should be looking
> >> >> at given this behavior?'
> >> >
> >> > Can you capture the mds and kernel logs for the simple case?
> >> >
> >> > debug mds = 20
> >> > debug ms = 1
> >> >
> >> > and for the kernel side run ceph.git's src/scripts/kcon_most.sh (or
> >> > similar)
> >> >
> >> > Thanks!
> >> > sage
> >> >
> >> >>
> >> >>
> >> >>
> >> >> thx
> >> >> Jojy
> >> >>
> >> >> On Mon, Jul 25, 2011 at 9:12 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> >> > On Mon, 25 Jul 2011, Jojy Varghese wrote:
> >> >> >> What i observe is that after a mkdir, the inode CAPS loses the
> >> >> >> lease(FILE_SHARED). I would have thought that the owing client should
> >> >> >> have a FILE_EXCL on the files/dirs it creates.
> >> >> >>
> >> >> >> Since it doesnt have a lease, the dentry(after splicing) is not cached.
> >> >> >
> >> >> > Can you describe the specific sequence of operations you're doing?  I'm
> >> >> > not seeing this behavior.  I see
> >> >> >
> >> >> > $ mkdir foo
> >> >> >        client->mds lookup #1/foo
> >> >> >        client->mds mkdir #1/foo
> >> >> > $ mkdir foo/a
> >> >> >        client->mds lookup #100000000/a
> >> >> >        client->mds mkdir #100000000/a
> >> >> >
> >> >> > with no repeated lookup on foo.
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> thanks
> >> >> >> Jojy
> >> >> >>
> >> >> >> On Sat, Jul 23, 2011 at 2:56 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> >> >> > On Fri, 22 Jul 2011, Jojy Varghese wrote:
> >> >> >> >> Not sure how it is designed to work but I assume that some kind of
> >> >> >> >> async RPC mechanism exists from the MDCs to the clients to update the
> >> >> >> >> CAP for a file from "exclusive" to "shared". This will allow the
> >> >> >> >> cached dentries to be pruned/dropped when another client updates the
> >> >> >> >> file.
> >> >> >> >
> >> >> >> > Right.  If the MDS needs to modify a dentry, it revoke any issued client
> >> >> >> > leases before granting the write/exclusive lock to process the request.
> >> >> >> >
> >> >> >> > sage
> >> >> >> >
> >> >> >> >>
> >> >> >> >> -Jojy
> >> >> >> >>
> >> >> >> >> On Fri, Jul 22, 2011 at 8:51 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> >> >> >> > On Fri, 22 Jul 2011, Jojy Varghese wrote:
> >> >> >> >> >> Sage would the latest patches fix the lookup issue?
> >> >> >> >> >
> >> >> >> >> > No, the blocker there is the '[PATCH] vfs: add d_prune dentry operation'
> >> >> >> >> > email on Jul 8 to linux-fsdevel and lkml.  Once this set goes in (and
> >> >> >> >> > cleans up a bunch of stuff Al found in a code audit last weekend) I'll be
> >> >> >> >> > bugging him about it again.
> >> >> >> >> >
> >> >> >> >> > sage
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> On Thu, Jul 21, 2011 at 10:55 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> >> >> >> >> > On Thu, 21 Jul 2011, Jojy Varghese wrote:
> >> >> >> >> >> >> Thanks for the response Sage. We are using 2.6.39 kernel and in the
> >> >> >> >> >> >> "ceph_lookup" method, i see that there is a shortcut for deciding
> >> >> >> >> >> >> ENOENT but after the MDS lookup, i dont see a d_add. I am sure i am
> >> >> >> >> >> >> missing something here.
> >> >> >> >> >> >
> >> >> >> >> >> >                        dout(" dir %p complete, -ENOENT\n", dir);
> >> >> >> >> >> >                        d_add(dentry, NULL);
> >> >> >> >> >> >
> >> >> >> >> >> > ...but that is only for the negative lookup in a directory with the
> >> >> >> >> >> > 'complete' flag set.  And it's never set currently because we don't have
> >> >> >> >> >> > d_prune yet (and the old use of d_release was racy).  So ignore this part
> >> >> >> >> >> > for now!
> >> >> >> >> >> >
> >> >> >> >> >> > You have an existing, unchanging, directory that you're seeing repeated
> >> >> >> >> >> > lookups on, right?  Like the top-level directory in the heirarchy you're
> >> >> >> >> >> > copying?  And the client is doing repeated lookups on the same name?
> >> >> >> >> >> >
> >> >> >> >> >> > The way to debug this is probably to start with the messages passing to
> >> >> >> >> >> > the MDS and verifying that lookups are duplicated.  Then enable the
> >> >> >> >> >> > logging on the kernel client and see why the client isn't uses leases or
> >> >> >> >> >> > the FILE_SHARED cap to avoid them.  We can help you through that on #ceph
> >> >> >> >> >> > if you like.
> >> >> >> >> >> >
> >> >> >> >> >> > sage
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >>
> >> >> >> >> >> >> thanks again
> >> >> >> >> >> >> Jojy
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Thu, Jul 21, 2011 at 9:49 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> >> >> >> >> >> > On Thu, 21 Jul 2011, Jojy Varghese wrote:
> >> >> >> >> >> >> >> Hi
> >> >> >> >> >> >> >>   I just started looking at the ceph code in kernel and had a question
> >> >> >> >> >> >> >> about performance considerations for lookup operations. I noticed that
> >> >> >> >> >> >> >> for every operation (say copying a directory), the root dentry is
> >> >> >> >> >> >> >> "looked" up multiple times and since they all go to MDS for the actual
> >> >> >> >> >> >> >> lookup operation, it effects the performance. I am sure consistency is
> >> >> >> >> >> >> >> the winner here. Is there any plan to improve this, maybe by having
> >> >> >> >> >> >> >> MDS push the capability down to the clients when the dentry is
> >> >> >> >> >> >> >> updated. So say from CAP_EXCL to CAP_SHARED when the dentry is
> >> >> >> >> >> >> >> modified. This was the client node can cache the lookup operation and
> >> >> >> >> >> >> >> does not have to make a round trip to the MDS.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > In general, the MDS has two ways of keeping a client's cached dentry
> >> >> >> >> >> >> > consistent:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >  - it can issue the FILE_SHARED capability bit on the parent directory,
> >> >> >> >> >> >> > which means the entire directory is static and the client can cache
> >> >> >> >> >> >> > dentry.
> >> >> >> >> >> >> >  - if it can't do that, it will issue a per-dentry lease
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > There is an additional 'complete' bit that is used to indicate on the
> >> >> >> >> >> >> > client that it has the _entire_ directory in cache.  If set, it can do
> >> >> >> >> >> >> > negative lookups and readdir without hitting the MDS.  That's currently
> >> >> >> >> >> >> > broken, pending the addition of a d_prune dentry_operation (see
> >> >> >> >> >> >> > linux-fsdevel email from July 8).
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Anyway, long story short, if you're seeing repeated lookups on a dentry
> >> >> >> >> >> >> > that isn't changing, something is broken.  Can you describe the workload
> >> >> >> >> >> >> > in more detail?  Which versions of the client and mds are you running?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > sage
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> --
> >> >> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux