C/C++ lib equivalent of hadoop-glusterfs

vlad at demoninsight.com (Vlad) · Mon, 18 Jun 2012 23:40:57 -0400

On Jun 18, 2012, at 10:50 AM, Jeff Darcy wrote:

> On Sat, 2012-06-16 at 15:38 -0400, Vlad wrote:
>> Greetings,
>> 
>> I am not sure I understand fully the relationship b/w glusterfs
>> project per se and https://github.com/gluster/hadoop-glusterfs, but
>> I'd like to follow up on the "Hadoop Connector" mention here
>> (http://www.gluster.org/community/documentation/index.php/Hadoop) and
>> getFileBlockLocations API mention here
>> (http://community.gluster.org/q/how-gluster-supports-map-reduce-in-absence-of-metadata-of-input-data-chunks-spread-over-multiple-machines-see-desciption/):
>> 
>> - does 3.3 install any kind of a C lib that would allow for "where
>> data actually landed" queries?
>> 
>> This would be very useful to users who want to structure their HPC
>> jobs in the map/reduce style, but without using Hadoop specifically.
>> (Really, the main innovation in MapReduce is colocation of calculation
>> and data and I'd rather use a fs that's mountable in the classic sense
>> as opposed to HDFS).
> 
> The getFileLocations API is implemented using a "magic"
> extended-attribute request.  The extended attribute is
> trusted.glusterfs.pathinfo; if you try to fetch that, we dynamically
> construct a reply describing where the data went.  I suppose we could
> provide a C/C++ library to parse that into some sort of structure that's
> easier to use from within a program, but AFAIK that has not been done.

Yes, I've figured out from the Java implementation that the API reads trusted.glusterfs.pathinfo xattr. This could be done in C/C++ easily enough without a dedicated lib, but the fact that the attribute is in trusted.* namespace implies that the reading process must be privileged, if I am not mistaken. It seems to me that this will be a problem for many end-user workloads...