On Jun 18, 2012, at 10:50 AM, Jeff Darcy wrote: > On Sat, 2012-06-16 at 15:38 -0400, Vlad wrote: >> Greetings, >> >> I am not sure I understand fully the relationship b/w glusterfs >> project per se and https://github.com/gluster/hadoop-glusterfs, but >> I'd like to follow up on the "Hadoop Connector" mention here >> (http://www.gluster.org/community/documentation/index.php/Hadoop) and >> getFileBlockLocations API mention here >> (http://community.gluster.org/q/how-gluster-supports-map-reduce-in-absence-of-metadata-of-input-data-chunks-spread-over-multiple-machines-see-desciption/): >> >> - does 3.3 install any kind of a C lib that would allow for "where >> data actually landed" queries? >> >> This would be very useful to users who want to structure their HPC >> jobs in the map/reduce style, but without using Hadoop specifically. >> (Really, the main innovation in MapReduce is colocation of calculation >> and data and I'd rather use a fs that's mountable in the classic sense >> as opposed to HDFS). > > The getFileLocations API is implemented using a "magic" > extended-attribute request. The extended attribute is > trusted.glusterfs.pathinfo; if you try to fetch that, we dynamically > construct a reply describing where the data went. I suppose we could > provide a C/C++ library to parse that into some sort of structure that's > easier to use from within a program, but AFAIK that has not been done. Yes, I've figured out from the Java implementation that the API reads trusted.glusterfs.pathinfo xattr. This could be done in C/C++ easily enough without a dedicated lib, but the fact that the attribute is in trusted.* namespace implies that the reading process must be privileged, if I am not mistaken. It seems to me that this will be a problem for many end-user workloads...