Hi Sage, On Thu, Jun 3, 2010 at 3:48 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Wed, 2 Jun 2010, Henry C Chang wrote: >> Yes, I am interested and that's what I am doing right now. >> In fact, we have a clone of ceph on github, and have had a "quick" >> implementation already. You can get it from: >> >> http://github.com/tcloud/ceph/tree/folder-quota >> http://github.com/tcloud/ceph-client-standalone/tree/folder-quota > > Oh, cool. I'll take a look at this today. > >> To allow switching quota on/off, we added the option/configuration on both >> client and server sides. To enable folder quota, you need to mount ceph with >> "-o folder_quota=1" on client side. On server side, you need to add >> "folder quota = 1" in the global section of ceph config file. We also >> implemented a tool to set/unset/get/list quota limits on folders. >> >> To enforce the quota more precisely, our imlementation, however, sacrifies >> the writing throughput and introduces more traffic: >> >> 1. We modified the max_size request-reply behaviour between client and mds. >> Our client requests a new max_size only when endoff > max_size. (i.e., it >> will not pre-request a larger max-size when it's approached the max_size.) >> >> 2. Our client requests a constant 4 MB (the object size) every time. This >> degrades the throughput significantly. (It used to request more and more.) > > Is this just to reduce the amount by which we might overshoot? I would > try to make it a tunable, maybe ('max size slop' or something) so that it Great! > preserves the current doubling logic but caps it at some value, so the > admin can trade throughput vs quota precision. And/or we can make it also > dynamic reduce that window as the user approaches the limit. Yes. But if there are multiple clients writing one subtree concurrently, it is a little bit difficult to say if we are approaching the limit.... we need to know how many clients are writing to the same subtree... > >> Anyway, it is the initial implementation. I will take your comments into >> consideration and try to revise the current implementation. Of course, I will >> need your help on rstat propagation issue 'cause I have no clue right now >> and have to dig the mds source code more to understand the existing >> implementation. :) > > Sure. > >> A few questions about ceph testing: >> - When will a subtree be fragmented? >> - Can I force a subtree to be framented to faciliate testing? > > By default the load balancer goes every 30 seconds. You can turn on mds > 'thrashing' that will export random directories to random nodes (to stress > test the migration), but that is probably overkill. > > It would probably be best to add something to MDS.cc's handle_command that > lets the admin explicit initiate a subtree migration, via something like > > $ ceph mds tell 0 export_dir /foo/bar 2 # send /foo/bar from mds0 to 2 > > I just pushed something to do that to unstable.. let me know if you run > into problems with it. > The export_dir command working well, and gives us a convenient way to test multi-mds scenarios. Not surprisingly, our current implementation is not working in mult-mds environment... :) My test setup: Under mount point, I created /volume, /volume/aaa, /volume/bbb. mds0 is authoritative for /volume, /volume/aaa. mds1 is authoritative for /volume/bbb. Quota is set on /volume: 250M Test case 0: pass cp 100M file to /volume/aaa/a0 cp 100M file to /volume/aaa/a1 cp 100M file to /volume/aaa/a2 ==> quota exceeded error is expected here Test case 1: pass cp 100M file to /volume/bbb/b0 cp 100M file to /volume/bbb/b1 cp 100M file to /volume/aaa/a1 ==> quota exceeded error is expected here Test case 2: failed cp 100M file to /volume/bbb/b0 cp 100M file to /volume/bbb/b1 cp 100M file to /volume/bbb/b2 ==> quota exceeded error is expected here It seems that rstat can be propagated up (from mds1 to mds0) quickly (case 1); however, the ancestor replica (/volume) in mds1 is not updated (case 2). I wonder how/when the replicas get updated. I'm still digging the source code to find where. :( Henry -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html