Re: HBase/HDFS on Ceph/CephFS

Xiaoxi Chen <superdebuger@xxxxxxxxx> · Mon, 27 Apr 2020 23:28:22 +0800

RBD is never a workable solution unless you want to pay the cost of
double-replication in both HDFS and Ceph.
I think the right approach is thinking about other implementation of the
FileSystem interface, like s3a and localfs.

s3a is straight forward,  ceph rgw provide s3 interface and s3a is stable
and well tested in Hadoop ecosystem, just run it. There are a few other
in-house solution offered by some vendor that integrating  librgw into the
s3a driver so it saves one extra hop and the management/LB cost of
maintaining an RGW cluster.

local filesystem is a bit tricky,  we just tried a POC that mounting CephFS
into every hadoop ,  configure Hadoop using LocalFS with Replica = 1. Which
end up with each data only write once into cephfs and cephfs take care of
the data durability.

There was a libcephfs-jni but it is significantly out of date and seems
be abandoned, which is a pity.

For both solutions for sure you lost the locality , but trading for better
scalability and compute/storage separation.

-Xiaoxi

Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> 于2020年4月24日周五 下午4:00写道：

>
> I think the idea behind pool size of 1, is that hadoop already writes
> copies to 2 other pools(?).
>
> However that leaves the possibility that pg's of these 3 pools can maybe
> share an osd, and if that osd fails, you loose data in these pools. I
> have no idea what the chances are that the same data of different pools
> can end up on the same osd.
>
>
> -----Original Message-----
> To: ceph-users@xxxxxxx
> Subject:  HBase/HDFS on Ceph/CephFS
>
> Hi
>
> We have an 3 year old Hadoop cluster - up for refresh - so it is time to
> evaluate options. The "only" usecase is running an HBase installation
> which is important for us and migrating out of HBase would be a hazzle.
>
> Our Ceph usage has expanded and in general - we really like what we see.
>
> Thus - Can this be "sanely" consolidated somehow? I have seen this:
> https://docs.ceph.com/docs/jewel/cephfs/hadoop/
> But it seem really-really bogus to me.
>
> It recommends that you set:
> pool 3 'hadoop1' rep size 1 min_size 1
>
> Which would - if I understand correct - be disastrous. The Hadoop end
> would replicated in 3 across - but within Ceph the replication would be
> 1.
> The 1 replication in ceph means pulling the OSD node would "gaurantee"
> the pg's to go inactive - which could be ok - but there is nothing
> gauranteeing that the other Hadoop replicas are not served out of the
> same OSD-node/pg? In which case - rebooting an OSD node would bring the
> hadoop cluster unavailable.
>
> Is anyone serving HBase out of Ceph - how does the stadck and
> configuration look? If I went for 3 x replication in both Ceph and HDFS
> then it would definately work, but 9x copies of the dataset is a bit
> more than what looks feasible at the moment.
>
> Thanks for your reflections/input.
>
> Jesper
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx