Re: CephFS Space Accounting and Quotas

Greg Farnum <greg@xxxxxxxxxxx> · Wed, 6 Mar 2013 12:21:28 -0800

On Wednesday, March 6, 2013 at 11:58 AM, Jim Schutt wrote:
> On 03/06/2013 12:13 PM, Greg Farnum wrote:
> > Check out the directory sizes with ls -l or whatever — those numbers are semantically meaningful! :)
>  
>  
> That is just exceptionally cool!
>  
> >  
> > Unfortunately we can't (currently) use those "recursive statistics"
> > to do proper hard quotas on subdirectories as they're lazily
> > propagated following client ops, not as part of the updates. (Lazily
> > in the technical sense — it's actually quite fast in general). But
> > they'd work fine for soft quotas if somebody wrote the code, or to
> > block writes on a slight time lag.
>  
>  
>  
> 'ls -lh <dir>' seems to be just the thing if you already know <dir>.
>  
> And it's perfectly suitable for our use case of not scheduling
> new jobs for users consuming too much space.
>  
> I was thinking I might need to find a subtree where all the
> subdirectories are owned by the same user, on the theory that
> all the files in such a subtree would be owned by that same
> user. E.g., we might want such a capability to manage space per
> user in shared project directories.
>  
> So, I tried 'find <dir> -type d -exec ls -lhd {} \;'
>  
> Unfortunately, that ended up doing a 'newfstatat' on each file
> under <dir>, evidently to learn if it was a directory. The
> result was that same slowdown for files written on other clients.
>  
> Is there some other way I should be looking for directories if I
> don't already know what they are?
>  
> Also, this issue of stat on files created on other clients seems
> like it's going to be problematic for many interactions our users
> will have with the files created by their parallel compute jobs -
> any suggestion on how to avoid or fix it?
>  

Brief background: stat is required to provide file size information, and so when you do a stat Ceph needs to find out the actual file size. If the file is currently in use by somebody, that requires gathering up the latest metadata from them.
Separately, while Ceph allows a client and the MDS to proceed with a bunch of operations (ie, mknod) without having it go to disk first, it requires anything which is visible to a third party (another client) be durable on disk for consistency reasons.

These combine to mean that if you do a stat on a file which a client currently has buffered writes for, that buffer must be flushed out to disk before the stat can return. This is the usual cause of the slow stats you're seeing. You should be able to adjust dirty data thresholds to encourage faster writeouts, do fsyncs once a client is done with a file, etc in order to minimize the likelihood of running into this.
Also, I'd have to check but I believe opening a file with LAZY_IO or whatever will weaken those requirements — it's probably not the solution you'd like here but it's an option, and if this turns out to be a serious issue then config options to reduce consistency on certain operations are likely to make their way into the roadmap. :)
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html