Unfortunately it seems that currently CephFS doesn't support Hadoop 2.* The next step will be try Tachyon on top of Ceph. Maybe somebody tried such constellation already? -----Original Message----- From: Lionel Bouton [mailto:lionel+ceph@xxxxxxxxxxx] Sent: Tuesday, July 07, 2015 7:49 PM To: Dmitry Meytin Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: FW: Ceph data locality On 07/07/15 18:20, Dmitry Meytin wrote: > Exactly because of that issue I've reduced the number of Ceph replications to 2 and the number of HDFS copies is also 2 (so we're talking about 4 copies). > I want (but didn't tried yet) to change Ceph replication to 1 and change HDFS back to 3. You are stacking a distributed storage network on top of another, no wonder you find the performance below your expectations. You could (should?) use CephFS instead of HDFS on RBD backed VMs (as this is clearly redundant and inefficient). Note that if you try to use size=1 for your RBD pool instead (which will probably be slower than using Hadoop with CephFS) and loose only one disk you will probably freeze most or all of your VMs (as their disks will be split across all physical disks of your Ceph cluster) and certainly corrupt all of their filesystems. See http://ceph.com/docs/master/cephfs/hadoop/ If this doesn't work for you I'll suggest separating the VMs system disks from the Hadoop storage and run Hadoop storage nodes on bare metal. VMs could either be backed by local disks or RBD if you need to but in any case they should avoid creating any large IO spikes which could disturb the Hadoop storage nodes. Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com