Re: CephFS Space Accounting and Quotas

Greg Farnum <greg@xxxxxxxxxxx> · Mon, 11 Mar 2013 08:48:31 -0700

On Monday, March 11, 2013 at 7:47 AM, Jim Schutt wrote:
> On 03/08/2013 07:05 PM, Greg Farnum wrote:
> > On Friday, March 8, 2013 at 2:45 PM, Jim Schutt wrote:
> > > On 03/07/2013 08:15 AM, Jim Schutt wrote:
> > > > On 03/06/2013 05:18 PM, Greg Farnum wrote:
> > > > > On Wednesday, March 6, 2013 at 3:14 PM, Jim Schutt wrote:
> > > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > [snip]
> > >  
> > > > > > Do you want the MDS log at 10 or 20?
> > > > >  
> > > > > More is better. ;)
> > > >  
> > > >  
> > > >  
> > > >  
> > > > OK, thanks.
> > >  
> > >  
> > > I've sent some mds logs via private email...
> > >  
> > > -- Jim  
> >  
> > I'm going to need to probe into this a bit more, but on an initial
> > examination I see that most of your stats are actually happening very
> > quickly — it's just that occasionally they take quite a while.
>  
>  
>  
> Interesting...
>  
> > Going
> > through the MDS log for one of those, the inode in question is
> > flagged with "needsrecover" from its first appearance in the log —
> > that really shouldn't happen unless a client had write caps on it and
> > the client disappeared. Any ideas? The slowness is being caused by
> > the MDS going out and looking at every object which could be in the
> > file — there are a lot since the file has a listed size of 8GB.
>  
>  
>  
> For this run, the MDS logging slowed it down enough to cause the
> client caps to occasionally go stale. I don't think it's the cause
> of the issue, because I was having it before I turned MDS debugging
> up. My client caps never go stale at, e.g., debug mds 5.

Oh, so this might be behaviorally different than you were seeing before? Drat.

You had said before that each newfstatat was taking tens of seconds, whereas in the strace log you sent along most of the individual calls were taking a bit less than 20 milliseconds. Do you have an strace of them individually taking much more than that, or were you just noticing that they took a long time in aggregate?
I suppose if you were going to run it again then just the message logging could also be helpful. That way we could at least check and see the message delays and if the MDS is doing other work in the course of answering a request.

> Otherwise, there were no signs of trouble while writing the files.
>  
> Can you suggest which kernel client debugging I might enable that
> would help understand what is happening? Also, I have the full
> MDS log from writing the files, if that will help. It's big (~10 GiB).
>  
> > (There are several other mysteries here that can probably be traced
> > to different varieties of non-optimal and buggy code as well — there
> > is a client which has write caps on the inode in question despite it
> > needing recovery, but the recovery isn't triggered until the stat
> > event occurs, etc).
>  
>  
>  
> OK, thanks for taking a look. Let me know if there is other
> logging I can enable that will be helpful.

I'm going to want to spend more time with the log I've got, but I'll think about if there's a different set of data we can gather less disruptively.  
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html