> Yeah, not pretty. A shared library wouldn't really help here either, > right? And a command-line tool means additional overhead. JNI (or > equivalent) calling an ioctl seems like the most appropriate tool. I like the motivation behind the command-line tool, but agree with you on the overhead issues. The only other method for arbitrary communication between processes that comes to mind for this situation is a socket-based approach. This could take two forms: 1) A user-space daemon to service requests from Hadoop 2) A socket between kernel and user-space to service requests. The former is unattractive because it requires additional client setup, while the latter also poses challenges. However, If this approach seems attractive we could begin to experiment with the second option it in DebugFS to avoid ABI lock in? One thing all solutions have in common is that the cost on the Hadoop end is a one-time cost. While overhead is important a number of inefficient lookups may easily be masked by the start-up costs associated with Hadoop's infrastructure. > >>> The ioctl struct ceph_ioctl_dataloc already returns the primary copy's >>> object offset for an input file offset, though I think it would be a >>> little more useful if it included replica offsets. >> I can submit a patch for this. Sage, I remember you mentioning that >> reading from replicas might pose (scalability?) problems. Any thoughts >> on this? > > There are two things. First, we'd need a DATALOC_V2 ioctl that would > return locations for all replicas of the object. Is Hadoop smart about > scheduling jobs on the best replica? Good question. I'm not sure what its scheduling policy is, but replica location is a key component of the Hadoop API, providing the information to the scheduler by default. > Before going to the trouble, though, I want to make sure we'll really > benefit from all of that... I agree. This enhancement is orthogonal to the overall design. Thanks, Noah -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html