Yes, I am interested and that's what I am doing right now. In fact, we have a clone of ceph on github, and have had a "quick" implementation already. You can get it from: http://github.com/tcloud/ceph/tree/folder-quota http://github.com/tcloud/ceph-client-standalone/tree/folder-quota To allow switching quota on/off, we added the option/configuration on both client and server sides. To enable folder quota, you need to mount ceph with "-o folder_quota=1" on client side. On server side, you need to add "folder quota = 1" in the global section of ceph config file. We also implemented a tool to set/unset/get/list quota limits on folders. To enforce the quota more precisely, our imlementation, however, sacrifies the writing throughput and introduces more traffic: 1. We modified the max_size request-reply behaviour between client and mds. Our client requests a new max_size only when endoff > max_size. (i.e., it will not pre-request a larger max-size when it's approached the max_size.) 2. Our client requests a constant 4 MB (the object size) every time. This degrades the throughput significantly. (It used to request more and more.) Anyway, it is the initial implementation. I will take your comments into consideration and try to revise the current implementation. Of course, I will need your help on rstat propagation issue 'cause I have no clue right now and have to dig the mds source code more to understand the existing implementation. :) A few questions about ceph testing: - When will a subtree be fragmented? - Can I force a subtree to be framented to faciliate testing? - How do I know which mds is authoritive for particular fragment? Thanks, Henry On Wed, Jun 2, 2010 at 3:17 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > Hi, > > The subject of quota enforcement came up in the IRC channel last week so I > thought I'd resurrect this discussion. > >> > On Fri, 30 Apr 2010, Henry C Chang wrote: >> > > In fact, I am trying to add "folder quota support" to ceph right now. >> > > My rough idea is as below: >> > > >> > > (1) Store the quota limit in the xattr of the folder; >> > > (2) When the client requests a new max size for writing content to a file, >> > > MDS authorizes the request according to the quota and the rstat of the >> > > folder. >> > >> > One thing to keep in mind is that because the recursive rsize info is >> > lazily propagated up the file tree, this won't work perfectly. If you set >> > a limit of 1GB on /foo and are writing data in /foo/bar/baz, it won't stop >> > you right at 1GB. Similarly, if you hit the limit, and delete some stuff, >> > it will take time before the MDS notices and lets you start writing again. >> > >> Hmm... this would be a problem.... >> From the perspective of a user, I would be happy if I can write more >> than my quota. However, I would get pissed off if I have deleted some >> stuff but still cannot write anything and don't know how long I have to >> wait. >> >> Is it possible to force MDS to propaget rsize info when files are deleted? >> Or, can lazy propagation be bounded to a maximum interval (say 5 seconds)? > > The propagation is bounded by a tunable timeout (30 seconds by default, > but adjustable). That's per ancestor.. so if you're three levels deep, > the max is 3x that. In practice, it's typically less, though, and I think > we could come up with something that would force propagation to happen > faster in these situations. The reason it's there is just to limit the > overhead of maintaining the recursive stats. We don't want to update all > ancestors every time we change something, and because we're distributed > over multiple nodes we can't. > >> > What is your use case? >> >> I want to create "depots" inside ceph: >> - Each depot has its own quota limit and can be resized as needed. >> - Multiple users can read/write the same depot concurrently. >> >> My original plan is to create a first-level folder for each depot >> (e.g., /mnt/ceph/depot1, /mnt/ceph/depot2, ...) and set quota on it. >> Do you have any suggestion on implementing such a use case? > > There's no reason to restrict this to first-level folders (if that's what > you were suggesting). We should allow a subdir quota to be set on any > directory, probably iff you are the owner. We can make the user interface > based on xattrs, since that's generally nicer to interact with than an > ioctl based interface. That's not to say the quota should necessarily be > handled/stored internally as an xattr (although it could be). It might > make more sense to add a field to the inode and extend the client/mds > protocol to manipulate it. > > Either way, I think a coarse implementation could be done pretty easily, > where by 'coarse' I mean we don't necessarily stop writes exactly at the > limit (they can write a bit more before they start getting ENOSPC). > > On IRC the subject of soft quotas also came up (where you're allowed over > the soft limit for some grace period before writes start failing). That's > also not terribly difficult to implement (we just need to store some > timestamp field as well so we know when they initially cross the soft > threashold). > > Are you still interested in working on this? > > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html