Re: xattrs and bug 9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 14 Aug 2009, Anand Avati wrote:

I second that question.

Extended attributes are pretty much critical for Disco. It uses them to
decide where to execute tasks, to optimize data locality:

http://github.com/tuulos/disco/blob/c1d4ffadeba40af8a8547dd6afce562d267e464e/pydisco/disco/dfs/gluster.py#L36

If the extended attributes are really removed (I haven't upgraded yet to
2.0.6), what's the official way of finding out where files are physically
stored?

The reason we removed listing of Replicate's internal extended
attribute records was because we found commands like 'rsync -X' would
mess up and overwrite the extended attributes taking the filesystem to
an inconsistent state.

Ville, thanks for pointing that. We were not aware that these extended
attributes had found a new purpose for themselves this way :-) They
were not intended to be used this way at all. But for the same purpose
what you are talking about, we have introduced the virtual extended
attribute "trusted.glusterfs.location" which returns the hostname of
the storage/posix volume on which the file resides. But, this feature
is available only in mainline.

http://git.gluster.com/?p=glusterfs.git;a=commit;h=5be3c142978257032bd11ad420382859fc204702

Great! I'll update our systems to the latest git snapshot.

In fact the above patch was brought in with the intention of making
GlusterFS fit into map/reduce frameworks nicely in the future. Now
that you mention that this "feature" was already being used and got
broken in 2.0.6 (which we were not aware), we'll get the "official
way" of getting the hostname backported in 2.0.7. Note that the new
method will return the server's hostname and not any volume name. So
the gluster.py in disco.git might have to be modified to first look
for this "official" xattr and then fail back to the old style.

Hostname is even better for us than the volume name. Now the user has to provide a separate mapping for disco which maps volume names to hostnames.

We also want feedback from you guys about if/how you want the location
of file on multiple servers (for example Replicate could return
multiple locations, and stripe has the content distributed across
servers, possibly replicated as well). How and to what extent do the
map/reduce frameworks make use of such information? does record-level
location make sense at all?

Yes, we need locations of all replicas for each file. The current mechanism lists all replicas for each input, so Disco can resort to replicas if the master copy fails.

It would be great if trusted.glusterfs.location could return a list of hostnames. The list should be ordered according to the Gluster's preference to access the file, i.e. the second item should be the one that Gluster uses in case that the master copy fails etc. This ensures that Disco can preserve data locality even if individual volumes fail.

Striped files are not supported by Disco directly, so it doesn't do anything clever with them (yet). In general being able to query as much information as possible about files is beneficial.

It has been a deliberate choice to keep the storage layer separate from Disco. An upside of this design decision is that you're free to choose the best storage layer for your problem domain. For instance, I'm positive that Gluster is a good match for many adhoc data analysis tasks and rapid development in general. A downside is that coordination between the storage layer and the computation layer isn't always optimal.

I became interested in Gluster because a custom translator seemed like a reasonable way to bridge this gap. I was happy to notice that 95% of the benefits could be achieved with default translators, without the burden of maintaining a custom one.

I'm sure it'd benefit everybody if Gluster could continue supporting systems on top of it with minimal hassle by exposing other ways to interact(*) with the system than custom translators. With this respect, extended attributes and things like libglusterfs are really welcome features.

(*) in addition to querying the status of glusterfs (e.g. using extended attributes), it would be useful to _give_ information to Gluster as well. For instance, now I have to run two GlusterFS in parallel (inputfs and resultsfs in http://discoproject.org/doc/start/dfs.html), since only some directories need to be replicated (input data) whereas others are used over NUFA without replication (intermediate results). Disco could tag the latter temporary files with a special extended attribute, or by making a call to libglusterfs, so Gluster would know that replication is not needed.



Ville




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux