On Thu, Sep 5, 2013 at 2:53 PM, Stephen Watt <swatt at redhat.com> wrote: > Hi Folks > > We are pleased to announce a major update to the glusterfs-hadoop project > with the release of version 2.1. The glusterfs-hadoop project, available at > The glusterfs-hadoop project team, provides an Apache licensed Hadoop > FileSystem plugin which enables Apache Hadoop 1.x and 2.x to run directly > on top of GlusterFS. This release includes a re-architected plugin which > now extends existing functionality within Hadoop to run on local and POSIX > File Systems. > > -- Overview -- > > Apache Hadoop has a pluggable FileSystem Architecture. This means that if > you have a filesystem or object store that you would like to use with > Hadoop, you can create a Hadoop FileSystem plugin for it which will act as > a mediator between the generic Hadoop FileSystem interface and your > filesystem of choice. A popular example would be that over a million Hadoop > clusters are spun up on Amazon every year, a lot of which use Amazon S3 as > the Hadoop FileSystem. > > In order to configure the plugin, a specific deployment configuration is > required. Firstly, it is required that the Hadoop JobTracker and > TaskTrackers (or the Hadoop 2.x equivalents) are installed on servers > within the gluster trusted storage pool for a given gluster volume. The > JobTracker uses the plugin to query the extended attributes for job input > files in gluster to ascertain file placement as well as the distribution of > file replicas across the cluster. The TaskTrackers use the plugin to > leverage a local fuse mount of the gluster volume in order to access the > data required for the tasks. When the JobTracker receives a Hadoop job, it > uses the locality information it ascertains via the plugin to send the > tasks for the Hadoop Job to Hadoop TaskTrackers on servers that have the > data required for the task within their local bricks. This ensures data is > read from disk and not over the network. Please see the attached diagram > which provides an overview of the entire solution for a Hadoop 1.x > deployment. > > The community project, along with the documentation and available > releases, is hosted within the Gluster Forge at > http://forge.gluster.org/hadoop. The glusterfs-hadoop project will also > be available within the Fedora 20 release later this year, alongside fellow > Fedora newcomer Apache Hadoop and the already available gluster project. > The glusterfs-hadoop project team welcomes contributions and participation > from the broader community. > > Stay tuned for upcoming posts around GlusterFS integration into the Apache > Ambari and Fedora projects. > > Regards > The glusterfs-hadoop project team > _______________________________________________ > Announce mailing list > Announce at gluster.org > http://supercolony.gluster.org/mailman/listinfo/announce > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > Congratulations! This is great news!! Avati -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130905/8beabc1e/attachment.html>