Enabling Apache Hadoop on GlusterFS: glusterfs-hadoop 2.1 released

harsha at harshavardhana.net (Harshavardhana) · Fri, 6 Sep 2013 01:13:39 -0700

+1

On Thu, Sep 5, 2013 at 4:18 PM, Anand Avati <avati at gluster.org> wrote:

>
> On Thu, Sep 5, 2013 at 2:53 PM, Stephen Watt <swatt at redhat.com> wrote:
>
>> Hi Folks
>>
>> We are pleased to announce a major update to the glusterfs-hadoop project
>> with the release of version 2.1. The glusterfs-hadoop project, available at
>> The glusterfs-hadoop project team, provides an Apache licensed Hadoop
>> FileSystem plugin which enables Apache Hadoop 1.x and 2.x to run directly
>> on top of GlusterFS. This release includes a re-architected plugin which
>> now extends existing functionality within Hadoop to run on local and POSIX
>> File Systems.
>>
>> -- Overview --
>>
>> Apache Hadoop has a pluggable FileSystem Architecture. This means that if
>> you have a filesystem or object store that you would like to use with
>> Hadoop, you can create a Hadoop FileSystem plugin for it which will act as
>> a mediator between the generic Hadoop FileSystem interface and your
>> filesystem of choice. A popular example would be that over a million Hadoop
>> clusters are spun up on Amazon every year, a lot of which use Amazon S3 as
>> the Hadoop FileSystem.
>>
>> In order to configure the plugin, a specific deployment configuration is
>> required. Firstly, it is required that the Hadoop JobTracker and
>> TaskTrackers (or the Hadoop 2.x equivalents) are installed on servers
>> within the gluster trusted storage pool for a given gluster volume. The
>> JobTracker uses the plugin to query the extended attributes for job input
>> files in gluster to ascertain file placement as well as the distribution of
>> file replicas across the cluster. The TaskTrackers use the plugin to
>> leverage a local fuse mount of the gluster volume in order to access the
>> data required for the tasks. When the JobTracker receives a Hadoop job, it
>> uses the locality information it ascertains via the plugin to send the
>> tasks for the Hadoop Job to Hadoop TaskTrackers on servers that have the
>> data required for the task within their local bricks. This ensures data is
>> read from disk and not over the network. Please see the attached diagram
>> which provides an overview of the entire solution for a Hadoop 1.x
>> deployment.
>>
>> The community project, along with the documentation and available
>> releases, is hosted within the Gluster Forge at
>> http://forge.gluster.org/hadoop. The glusterfs-hadoop project will also
>> be available within the Fedora 20 release later this year, alongside fellow
>> Fedora newcomer Apache Hadoop and the already available gluster project.
>> The glusterfs-hadoop project team welcomes contributions and participation
>> from the broader community.
>>
>> Stay tuned for upcoming posts around GlusterFS integration into the
>> Apache Ambari and Fedora projects.
>>
>> Regards
>> The glusterfs-hadoop project team
>> _______________________________________________
>> Announce mailing list
>> Announce at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/announce
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
> Congratulations! This is great news!!
>
> Avati
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

-- 
*Religious confuse piety with mere ritual, the virtuous confuse regulation
with outcomes*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130906/5aa495f9/attachment-0001.html>