Re: CephFS Space Accounting and Quotas

Sage Weil <sage@xxxxxxxxxxx> · Wed, 6 Mar 2013 13:42:06 -0800 (PST)

On Wed, 6 Mar 2013, Greg Farnum wrote:
> > 'ls -lh <dir>' seems to be just the thing if you already know <dir>.
> >  
> > And it's perfectly suitable for our use case of not scheduling
> > new jobs for users consuming too much space.
> >  
> > I was thinking I might need to find a subtree where all the
> > subdirectories are owned by the same user, on the theory that
> > all the files in such a subtree would be owned by that same
> > user. E.g., we might want such a capability to manage space per
> > user in shared project directories.
> >  
> > So, I tried 'find <dir> -type d -exec ls -lhd {} \;'
> >  
> > Unfortunately, that ended up doing a 'newfstatat' on each file
> > under <dir>, evidently to learn if it was a directory. The
> > result was that same slowdown for files written on other clients.
> >  
> > Is there some other way I should be looking for directories if I
> > don't already know what they are?

Normally the readdir result as the d_type field filled in to indicate 
whether the dentry is a directory or not, which makes the stat 
unnecessary.  I'm surprised that find isn't doing that properly already!  
It's possible we aren't populating a field we should be in our readdir 
code...

> > Also, this issue of stat on files created on other clients seems
> > like it's going to be problematic for many interactions our users
> > will have with the files created by their parallel compute jobs -
> > any suggestion on how to avoid or fix it?
> >  
> 
> Brief background: stat is required to provide file size information, and 
> so when you do a stat Ceph needs to find out the actual file size. If 
> the file is currently in use by somebody, that requires gathering up the 
> latest metadata from them. Separately, while Ceph allows a client and 
> the MDS to proceed with a bunch of operations (ie, mknod) without having 
> it go to disk first, it requires anything which is visible to a third 
> party (another client) be durable on disk for consistency reasons.
> 
> These combine to mean that if you do a stat on a file which a client 
> currently has buffered writes for, that buffer must be flushed out to 
> disk before the stat can return. This is the usual cause of the slow 
> stats you're seeing. You should be able to adjust dirty data thresholds 
> to encourage faster writeouts, do fsyncs once a client is done with a 
> file, etc in order to minimize the likelihood of running into this.

This is the current behavior.  There is a bug in the tracker to introduce 
a new lock state to optimize the stat case so that writers are paused but 
buffers aren't flushed.  It hasn't been prioritized, but is not terribly 
complex.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html