Re: hadoop namenode not starting due to bindException while deploying hadoop with cephFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes. I have setup ceph and hadoop in each node. ceph health is OK and the hadoop works fine when I use HDFS (I have ran the same command with HDFS and it works). One node is the admin(job tracker running), other 4  are slaves(tasktracker running). The problem occurs when I change the hadoop/conf/core-site.xml file to incorporate cephFS. Although the error does not show anything related to ceph, I am really confused why this error is happening.
I have another question, for running hadoop with cephFS should the hadoop input data be inside any directory or it has to be the directory where the cephFS has been mounted?

Regards,

Ridwan Rashid Noel
Doctoral Student,
Dept. of Computer Science,
University of Texas at San Antonio
Contact# 210-773-9966

On Mar 26, 2015 10:12 AM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:
On Wed, Mar 25, 2015 at 8:10 PM, Ridwan Rashid Noel <ridwan064@xxxxxxxxx> wrote:
> Hi Greg,
>
> Thank you for your response. I have understood that I should be starting
> only the mapred daemons when using cephFS instead of HDFS. I have fixed that
> and trying to run hadoop wordcount job using this instruction:
>
> bin/hadoop jar hadoop*examples*.jar wordcount /tmp/wc-input /tmp/wc-output
>
> but I am getting this error
>
> 15/03/26 02:54:35 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 15/03/26 02:54:35 INFO input.FileInputFormat: Total input paths to process :
> 1
> 15/03/26 02:54:35 WARN snappy.LoadSnappy: Snappy native library not loaded
> 15/03/26 02:54:35 INFO mapred.JobClient: Running job: job_201503260253_0001
> 15/03/26 02:54:36 INFO mapred.JobClient:  map 0% reduce 0%
> 15/03/26 02:54:36 INFO mapred.JobClient: Task Id :
> attempt_201503260253_0001_m_000021_0, Status : FAILED
> Error initializing attempt_201503260253_0001_m_000021_0:
> java.io.FileNotFoundException: File
> file:/tmp/hadoop-ceph/mapred/system/job_201503260253_0001/jobToken does not
> exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
>         at
> org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4445)
>         at
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
>         at
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
>         at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
>         at java.lang.Thread.run(Thread.java:745)

I'm not an expert at setting up Hadoop, but these errors are coming
out of the "RawLocalFileSystem", which I think means that worker node
is trying to use a local FS instead of Ceph. Did you set up each node
to access Ceph? Have you set up and used Hadoop previously?
-Greg

>
> .....
>
> I have used the core-site.xml configurations as mentioned in
> http://ceph.com/docs/master/cephfs/hadoop/
> Please tell me how can this problem be solved?
>
> Regards,
>
> Ridwan Rashid Noel
>
> Doctoral Student,
> Department of Computer Science,
> University of Texas at San Antonio
>
> Contact# 210-773-9966
>
> On Fri, Mar 20, 2015 at 4:04 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Fri, Mar 20, 2015 at 1:05 PM, Ridwan Rashid <ridwan064@xxxxxxxxx>
>> wrote:
>> > Gregory Farnum <greg@...> writes:
>> >
>> >>
>> >> On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid <ridwan064@...> wrote:
>> >> > Hi,
>> >> >
>> >> > I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop
>> >> > with
>> >> > cephFS. I have installed hadoop-1.1.1 in the nodes and changed the
>> >> > conf/core-site.xml file according to the ceph documentation
>> >> > http://ceph.com/docs/master/cephfs/hadoop/ but after changing the
>> >> > file the
>> >> > namenode is not starting (namenode can be formatted) but the other
>> >> > services(datanode, jobtracker, tasktracker) are running in hadoop.
>> >> >
>> >> > The default hadoop works fine but when I change the core-site.xml
>> >> > file as
>> >> > above I get the following bindException as can be seen from the
>> >> > namenode
>> > log:
>> >> >
>> >> >
>> >> > 2015-03-19 01:37:31,436 ERROR
>> >> > org.apache.hadoop.hdfs.server.namenode.NameNode:
>> >> > java.net.BindException:
>> >> > Problem binding to node1/10.242.144.225:6789 : Cannot assign
>> >> > requested
>> > address
>> >> >
>> >> >
>> >> > I have one monitor for the ceph cluster (node1/10.242.144.225) and I
>> >> > included in the core-site.xml file ceph://10.242.144.225:6789 as the
>> >> > value
>> >> > of fs.default.name. The 6789 port is the default port being used by
>> >> > the
>> >> > monitor node of ceph, so that may be the reason for the bindException
>> >> > but
>> >> > the ceph documentation mentions that it should be included like this
>> >> > in the
>> >> > core-site.xml file. It would be really helpful to get some pointers
>> >> > to where
>> >> > I am doing wrong in the setup.
>> >>
>> >> I'm a bit confused. The NameNode is only used by HDFS, and so
>> >> shouldn't be running at all if you're using CephFS. Nor do I have any
>> >> idea why you've changed anything in a way that tells the NameNode to
>> >> bind to the monitor's IP address; none of the instructions that I see
>> >> can do that, and they certainly shouldn't be.
>> >> -Greg
>> >>
>> >
>> > Hi Greg,
>> >
>> > I want to run a hadoop job (e.g. terasort) and want to use cephFS
>> > instead of
>> > HDFS. In "Using Hadoop with cephFS" documentation in
>> > http://ceph.com/docs/master/cephfs/hadoop/ if you look into the Hadoop
>> > configuration section, the first property fs.default.name has to be set
>> > as
>> > the ceph URI and in the notes it's mentioned as ceph://[monaddr:port]/.
>> > My
>> > core-site.xml of hadoop conf looks like this
>> >
>> > <configuration>
>> >
>> > <property>
>> >     <name>fs.default.name</name>
>> >     <value>ceph://10.242.144.225:6789</value>
>> > </property>
>>
>> Yeah, that all makes sense. But I don't understand why or how you're
>> starting up a NameNode at all, nor what config values it's drawing
>> from to try and bind to that port. The NameNode is the problem because
>> it shouldn't even be invoked.
>> -Greg
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux