Hi Sage, Thanks for your comments, much appreciated. Am Dienstag, 27. August 2013, 10:19:46 schrieb Sage Weil: > Hi Guido! > > On Tue, 27 Aug 2013, Guido Winkelmann wrote: [...] > > - There is no dynamic tiered storage, and there probably never will be, if > > I understand the architecture correctly. > > You can have different pools with different perfomance characteristics > > (like one on cheap and large 7200 RPM disks, and another on SSDs), but > > once you have put a given bunch of data on one pool, it is pretty much > > stuck there. (I.e. you cannot move it to another pool without very tight > > and very manual coordination with all clients using it.) > > This is a key item on the roadmap for Emperor (nov) and Firefly (feb). > We are building two capabilities: 'cache pools' that let you put fast > storage in front of your main data pool, and a tiered 'cold' pool that > lets you bleed cold objects off to a cheaper, slower tier Sounds interesting. Will that work on entire PGs or on single objects? How do you keep track of which object lies on what pool without resorting to a lookup step before every operation? Will that feature retain backwards compatibility with older Ceph clients? > (probably using erasure coding.. which is also coming in firefly). ... which happens to address another issue I forgot to mention > > - There is no active data deduplication, and, again, if I understand the > > architecture correctly, there probably never will be. > > There is, however, sparse allocation and COW-cloning for RBD volumes, > > which does something similar. Under certain conditions, it is even > > possible to use the discard option of modern filesystems to automatically > > keep unused regions of an RBD volume sparse. > > You can do two things: > > - Do dedup inside an osd. Btrfs is growing this capability, and ZFS > already has it. This is not ideal because data is random distributed > across nodes. > > - You can build dedup on top of rados, for example by naming objects after > a hash of their content. This will never be a 'magic and transparent > dedup for all rados apps' because CAS is based on naming objects from > content, and rados fundamentally places data based on name and eschews > metadata. That means there isn't normally a way to point to the content > unless there is some MDS on top of rados. Someday CephFS will get this, > but raw librados users and RBD won't get it for free. I read that as TL;DR: No real deduplication. > > - Bad support for multiple customers accessing the same cluster. > > This is assuming that, if you have multiple customers, it is imperative > > that any one given customer must be unable to access or even modify the > > data of any other customer. You can have authorization on the pool layer, > > but it has been reported that Ceph reacts badly to defining a large > > number of pools. Multi-customer support in CephFS is non-existant. > > RadosGW probably supports multi-customer, but I haven't tried it. > > The just-released Dumpling included support for rados namespaces, which > are designed to address exactly this issue. Namespaces exist "inside" > pools, and the auth capabilities can restrict access to a specific > namespace. I'm having some trouble finding this in the documentation. Can you give me a pointer here? > > - No dynamic partitioning for CephFS > > The original paper talked about dynamic partioning of the CephFS > > namespace, so that multiple Metadata Servers could share the workload of > > a large number of CephFS clients. This isn't implemented yet (or > > implemented but not working properly?), and the only currently support > > multi-MDS configuration is 1 active / n standby. This limits the > > scalability of CephFS. It looks to me like CephFS is not a major focus of > > the development team at this time. > > This has been implemented since ~2006. We do not recommend it for > production because it has not had the QA attention it deserves. That > said, Zheng Yan has been doing a lot of great work here recently and > things have improved considerably. Please try it! You just need to do > 'ceph mds set_max_mds 3' (or whatever) to tell ceph how many active > ceph-mds daemons you want. Okay, I think I will try this. Guido _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com