> I didn't follow your numeric examples above---I missed how you mapped Offsets to Object Numbers---but I follow you on striping meaning different data locations for what Hadoop would think would be one Ceph object in one place. I didn't explicitly describe the mapping algorithm, but it can be found in the function "ceph_calc_file_object_mapping(...)" in the kernel client. If you execute the algorithm with the parameters in my example you can reproduce the mapping I presented. >> >> The more natural (and general) solution is to consider the stripe unit to be the _unit_ of Hadoop blocks, not entire objects. When stripe unit and block size are the same the result is analogous to HDFS's treatment of blocks. > I agree with you, and push forward one more step: Ceph and Hadoop should just think of a block/object as the same size. Per Sage's response, Hadoop block can be equal to Ceph stripe unit. > One of the TODO's is exposing Ceph's object size to Hadoop, and that "read" interface for block size will probably need to expand to a "write" interface to reduce confusion with folks configuring Hadoop to use a block size of N bytes. How is configured block size relevant in Hadoop? This seems to me to be specific to HDFS. The analogy would be to configure the file layout parameters in Ceph. > After writing this code, I do like seeing the words "scrap" and "JNI" so close in the same sentence. That's more up to the Hadoop community, though; I don't know how well-accepted JNA is in their code base. One solution might be a lazily populated sysfs interface for retrieving object information for a given file, circumventing the problem Java has calling IOCTLs. But that's another conversation. > The ioctl struct ceph_ioctl_dataloc already returns the primary copy's object offset for an input file offset, though I think it would be a little more useful if it included replica offsets. I can submit a patch for this. Sage, I remember you mentioning that reading from replicas might pose (scalability?) problems. Any thoughts on this?-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html