Limitations of Ceph

Guido Winkelmann <guido-ceph@xxxxxxxxxxxxxxxxx> · Tue, 27 Aug 2013 18:35:09 +0200

Hi,

I have been running a small Ceph cluster for experimentation for a while, and 
now my employer has asked me to do little talk about my findings, and one 
important part is, of course, going to be practical limitations of Ceph.

Here is my list so far:

- Ceph is not supported by VMWare ESX. That may change in the future, but 
seeing how VMWare is now owned by EMC, they might make it a political decision 
to not support Ceph.
Apparently, you can import an RBD volume on linux server and then reexport it 
to a VMWare host as an iSCSI target, but doing so would introduce a bottleneck 
and a single point of failure, which kind of defeats the purpose of having a 
Ceph cluster in the first place.

- Ceph is not supported by Windows clients, or even, as far as I can tell, 
anything that isn't a very recent version of Linux. (User space only clients 
work in some cases.)

- There is no dynamic tiered storage, and there probably never will be, if I 
understand the architecture correctly.
You can have different pools with different perfomance characteristics (like 
one on cheap and large 7200 RPM disks, and another on SSDs), but once you have 
put a given bunch of data on one pool, it is pretty much stuck there. (I.e. 
you cannot move it to another pool without very tight and very manual 
coordination with all clients using it.)

- There is no active data deduplication, and, again, if I understand the 
architecture correctly, there probably never will be.
There is, however, sparse allocation and COW-cloning for RBD volumes, which 
does something similar. Under certain conditions, it is even possible to use 
the discard option of modern filesystems to automatically keep unused regions 
of an RBD volume sparse.

- Bad support for multiple customers accessing the same cluster.
This is assuming that, if you have multiple customers, it is imperative that 
any one given customer must be unable to access or even modify the data of any 
other customer. You can have authorization on the pool layer, but it has been 
reported that Ceph reacts badly to defining a large number of pools.
Multi-customer support in CephFS is non-existant.
RadosGW probably supports multi-customer, but I haven't tried it.

- No dynamic partitioning for CephFS
The original paper talked about dynamic partioning of the CephFS namespace, so 
that multiple Metadata Servers could share the workload of a large number of 
CephFS clients. This isn't implemented yet (or implemented but not working 
properly?), and the only currently support multi-MDS configuration is 1 active 
/ n standby. This limits the scalability of CephFS. It looks to me like CephFS 
is not a major focus of the development team at this time.

Can you give me some comments on that? Am I totally in the wrong on some of 
those points? Have I forgotten some important limitation?

Regards,

	Guido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com