On Sat, 2012-06-16 at 15:38 -0400, Vlad wrote: > Greetings, > > I am not sure I understand fully the relationship b/w glusterfs > project per se and https://github.com/gluster/hadoop-glusterfs, but > I'd like to follow up on the "Hadoop Connector" mention here > (http://www.gluster.org/community/documentation/index.php/Hadoop) and > getFileBlockLocations API mention here > (http://community.gluster.org/q/how-gluster-supports-map-reduce-in-absence-of-metadata-of-input-data-chunks-spread-over-multiple-machines-see-desciption/): > > - does 3.3 install any kind of a C lib that would allow for "where > data actually landed" queries? > > This would be very useful to users who want to structure their HPC > jobs in the map/reduce style, but without using Hadoop specifically. > (Really, the main innovation in MapReduce is colocation of calculation > and data and I'd rather use a fs that's mountable in the classic sense > as opposed to HDFS). The getFileLocations API is implemented using a "magic" extended-attribute request. The extended attribute is trusted.glusterfs.pathinfo; if you try to fetch that, we dynamically construct a reply describing where the data went. I suppose we could provide a C/C++ library to parse that into some sort of structure that's easier to use from within a program, but AFAIK that has not been done.