I think the idea behind pool size of 1, is that hadoop already writes copies to 2 other pools(?). However that leaves the possibility that pg's of these 3 pools can maybe share an osd, and if that osd fails, you loose data in these pools. I have no idea what the chances are that the same data of different pools can end up on the same osd. -----Original Message----- To: ceph-users@xxxxxxx Subject: HBase/HDFS on Ceph/CephFS Hi We have an 3 year old Hadoop cluster - up for refresh - so it is time to evaluate options. The "only" usecase is running an HBase installation which is important for us and migrating out of HBase would be a hazzle. Our Ceph usage has expanded and in general - we really like what we see. Thus - Can this be "sanely" consolidated somehow? I have seen this: https://docs.ceph.com/docs/jewel/cephfs/hadoop/ But it seem really-really bogus to me. It recommends that you set: pool 3 'hadoop1' rep size 1 min_size 1 Which would - if I understand correct - be disastrous. The Hadoop end would replicated in 3 across - but within Ceph the replication would be 1. The 1 replication in ceph means pulling the OSD node would "gaurantee" the pg's to go inactive - which could be ok - but there is nothing gauranteeing that the other Hadoop replicas are not served out of the same OSD-node/pg? In which case - rebooting an OSD node would bring the hadoop cluster unavailable. Is anyone serving HBase out of Ceph - how does the stadck and configuration look? If I went for 3 x replication in both Ceph and HDFS then it would definately work, but 9x copies of the dataset is a bit more than what looks feasible at the moment. Thanks for your reflections/input. Jesper _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx