You do not want to mix ceph with hadoop, because you'll loose data locality, which is the main point of hadoop systems. Every read/write request will go through network, this is not optimal. On Fri, Apr 24, 2020 at 9:04 AM <jesper@xxxxxxxx> wrote: > > Hi > > We have an 3 year old Hadoop cluster - up for refresh - so it is time > to evaluate options. The "only" usecase is running an HBase installation > which is important for us and migrating out of HBase would be a hazzle. > > Our Ceph usage has expanded and in general - we really like what we see. > > Thus - Can this be "sanely" consolidated somehow? I have seen this: > https://docs.ceph.com/docs/jewel/cephfs/hadoop/ > But it seem really-really bogus to me. > > It recommends that you set: > pool 3 'hadoop1' rep size 1 min_size 1 > > Which would - if I understand correct - be disastrous. The Hadoop end would > replicated in 3 across - but within Ceph the replication would be 1. > The 1 replication in ceph means pulling the OSD node would "gaurantee" the > pg's to go inactive - which could be ok - but there is nothing > gauranteeing that the other Hadoop replicas are not served out of the same > OSD-node/pg? In which case - rebooting an OSD node would bring the hadoop > cluster unavailable. > > Is anyone serving HBase out of Ceph - how does the stadck and > configuration look? If I went for 3 x replication in both Ceph and HDFS > then it would definately work, but 9x copies of the dataset is a bit more > than what looks feasible at the moment. > > Thanks for your reflections/input. > > Jesper > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx