Hi, The subject of quota enforcement came up in the IRC channel last week so I thought I'd resurrect this discussion. > > On Fri, 30 Apr 2010, Henry C Chang wrote: > > > In fact, I am trying to add "folder quota support" to ceph right now. > > > My rough idea is as below: > > > > > > (1) Store the quota limit in the xattr of the folder; > > > (2) When the client requests a new max size for writing content to a file, > > > MDS authorizes the request according to the quota and the rstat of the > > > folder. > > > > One thing to keep in mind is that because the recursive rsize info is > > lazily propagated up the file tree, this won't work perfectly. If you set > > a limit of 1GB on /foo and are writing data in /foo/bar/baz, it won't stop > > you right at 1GB. Similarly, if you hit the limit, and delete some stuff, > > it will take time before the MDS notices and lets you start writing again. > > > Hmm... this would be a problem.... > From the perspective of a user, I would be happy if I can write more > than my quota. However, I would get pissed off if I have deleted some > stuff but still cannot write anything and don't know how long I have to > wait. > > Is it possible to force MDS to propaget rsize info when files are deleted? > Or, can lazy propagation be bounded to a maximum interval (say 5 seconds)? The propagation is bounded by a tunable timeout (30 seconds by default, but adjustable). That's per ancestor.. so if you're three levels deep, the max is 3x that. In practice, it's typically less, though, and I think we could come up with something that would force propagation to happen faster in these situations. The reason it's there is just to limit the overhead of maintaining the recursive stats. We don't want to update all ancestors every time we change something, and because we're distributed over multiple nodes we can't. > > What is your use case? > > I want to create "depots" inside ceph: > - Each depot has its own quota limit and can be resized as needed. > - Multiple users can read/write the same depot concurrently. > > My original plan is to create a first-level folder for each depot > (e.g., /mnt/ceph/depot1, /mnt/ceph/depot2, ...) and set quota on it. > Do you have any suggestion on implementing such a use case? There's no reason to restrict this to first-level folders (if that's what you were suggesting). We should allow a subdir quota to be set on any directory, probably iff you are the owner. We can make the user interface based on xattrs, since that's generally nicer to interact with than an ioctl based interface. That's not to say the quota should necessarily be handled/stored internally as an xattr (although it could be). It might make more sense to add a field to the inode and extend the client/mds protocol to manipulate it. Either way, I think a coarse implementation could be done pretty easily, where by 'coarse' I mean we don't necessarily stop writes exactly at the limit (they can write a bit more before they start getting ENOSPC). On IRC the subject of soft quotas also came up (where you're allowed over the soft limit for some grace period before writes start failing). That's also not terribly difficult to implement (we just need to store some timestamp field as well so we know when they initially cross the soft threashold). Are you still interested in working on this? sage