Re: [PATCH] Expose Ceph data location information to Hadoop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I didn't follow your numeric examples above---I missed how you mapped Offsets to Object Numbers---but I follow you on striping meaning different data locations for what Hadoop would think would be one Ceph object in one place.
I didn't explicitly describe the mapping algorithm, but it can be found in the function "ceph_calc_file_object_mapping(...)" in the kernel client. If you execute the algorithm with the parameters in my example you can reproduce the mapping I presented.

>> 
>> The more natural (and general) solution is to consider the stripe unit to be the _unit_ of Hadoop blocks, not entire objects. When stripe unit and block size are the same the result is analogous to HDFS's treatment of blocks.
> I agree with you, and push forward one more step:  Ceph and Hadoop should just think of a block/object as the same size.
Per Sage's response, Hadoop block can be equal to Ceph stripe unit.


> One of the TODO's is exposing Ceph's object size to Hadoop, and that "read" interface for block size will probably need to expand to a "write" interface to reduce confusion with folks configuring Hadoop to use a block size of N bytes.
How is configured block size relevant in Hadoop? This seems to me to be specific to HDFS. The analogy would be to configure the file layout parameters in Ceph.

> After writing this code, I do like seeing the words "scrap" and "JNI" so close in the same sentence.  That's more up to the Hadoop community, though; I don't know how well-accepted JNA is in their code base.
One solution might be a lazily populated sysfs interface for retrieving object information for a given file, circumventing the problem Java has calling IOCTLs. But that's another conversation.

> The ioctl struct ceph_ioctl_dataloc already returns the primary copy's object offset for an input file offset, though I think it would be a little more useful if it included replica offsets.
I can submit a patch for this. Sage, I remember you mentioning that reading from replicas might pose (scalability?) problems. Any thoughts on this?--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux