> I second that question. > > Extended attributes are pretty much critical for Disco. It uses them to > decide where to execute tasks, to optimize data locality: > > http://github.com/tuulos/disco/blob/c1d4ffadeba40af8a8547dd6afce562d267e464e/pydisco/disco/dfs/gluster.py#L36 > > If the extended attributes are really removed (I haven't upgraded yet to > 2.0.6), what's the official way of finding out where files are physically > stored? The reason we removed listing of Replicate's internal extended attribute records was because we found commands like 'rsync -X' would mess up and overwrite the extended attributes taking the filesystem to an inconsistent state. Ville, thanks for pointing that. We were not aware that these extended attributes had found a new purpose for themselves this way :-) They were not intended to be used this way at all. But for the same purpose what you are talking about, we have introduced the virtual extended attribute "trusted.glusterfs.location" which returns the hostname of the storage/posix volume on which the file resides. But, this feature is available only in mainline. http://git.gluster.com/?p=glusterfs.git;a=commit;h=5be3c142978257032bd11ad420382859fc204702 In fact the above patch was brought in with the intention of making GlusterFS fit into map/reduce frameworks nicely in the future. Now that you mention that this "feature" was already being used and got broken in 2.0.6 (which we were not aware), we'll get the "official way" of getting the hostname backported in 2.0.7. Note that the new method will return the server's hostname and not any volume name. So the gluster.py in disco.git might have to be modified to first look for this "official" xattr and then fail back to the old style. We also want feedback from you guys about if/how you want the location of file on multiple servers (for example Replicate could return multiple locations, and stripe has the content distributed across servers, possibly replicated as well). How and to what extent do the map/reduce frameworks make use of such information? does record-level location make sense at all? Thanks, Avati