Re: Ceph & Hbase

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 7 Jan 2016 12:52:24 -0800

On Thu, Jan 7, 2016 at 5:56 AM, Jose M <soloninguno@xxxxxxxxxxx> wrote:
> Hi,
>
> Following Yan's feeling that something could be wrong with ceph configuration, i started again from scratch, this time configuring ceph with three nodes (one mon, two osds).
>
> After starting hbase, it seems it moves forward a few more steps, but fails again, this time trying to create a file that starts with a dot (hidden file).
>
> 2016-01-06 14:36:08,509 INFO  [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:16010
> 2016-01-06 14:36:08,516 INFO  [main] master.HMaster: hbase.rootdir=ceph://ceph-mon:6789/hbase, hbase.cluster.distributed=true
> 2016-01-06 14:36:08,537 INFO  [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/192.168.1.196,16000,1452090965392
> 2016-01-06 14:36:08,750 INFO  [192.168.1.196:16000.activeMasterManager] master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/192.168.1.196,16000,1452090965392 from backup master directory
> 2016-01-06 14:36:08,771 INFO  [192.168.1.196:16000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=192.168.1.196,16000,1452090965392
> 2016-01-06 14:36:08,845 INFO  [master//192.168.1.196:16000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4d0894c1 connecting to ZooKeeper ensemble=localhost:2181
> 2016-01-06 14:36:08,845 INFO  [master//192.168.1.196:16000] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x4d0894c10x0, quorum=localhost:2181, baseZNode=/hbase
> 2016-01-06 14:36:08,866 INFO  [master//192.168.1.196:16000-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2016-01-06 14:36:08,868 INFO  [master//192.168.1.196:16000-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
> 2016-01-06 14:36:08,873 INFO  [master//192.168.1.196:16000-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15213a42e100007, negotiated timeout = 90000
> 2016-01-06 14:36:08,875 INFO  [master//192.168.1.196:16000] client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
> 2016-01-06 14:36:09,022 FATAL [192.168.1.196:16000.activeMasterManager] master.HMaster: Failed to become active master
> java.io.IOException: Error accessing ceph://ceph-mon:6789/hbase/data/hbase/meta/.tabledesc
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523)
>         at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1721)
>         at org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:369)
>         at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:350)
>         at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:331)
>         at org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:58)
>         at org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:481)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:126)
>         at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:649)
>         at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
>         at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-01-06 14:36:09,025 FATAL [192.168.1.196:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Error accessing ceph://ceph-mon:6789/hbase/data/hbase/meta/.tabledesc
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1523)
>         at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1721)
>         at org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:369)
>         at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:350)
>         at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:331)
>         at org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:58)
>         at org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:481)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:126)
>         at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:649)
>         at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
>         at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
>         at java.lang.Thread.run(Thread.java:745)
>
> I found that an old message in ceph mailing list talking about the same problem but with no real answers
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/039001.html
>
> Then I realize that the .metadesc was a directory, so I decide to create it manually with
>      hadoop fs -mkdir /hbase/data/hbase/meta/.tabledesc
>
> After starting hbase master again, I got another error, a NullPointer in Globber.java.
>
> 2016-01-06 19:38:03,067 INFO  [192.168.1.196:16000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=192.168.1.196,16000,1452109080969
> 2016-01-06 19:38:03,133 INFO  [master//192.168.1.196:16000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4ad81458 connecting to ZooKeeper ensemble=localhost:2181
> 2016-01-06 19:38:03,133 INFO  [master//192.168.1.196:16000] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x4ad814580x0, quorum=localhost:2181, baseZNode=/hbase
> 2016-01-06 19:38:03,135 INFO  [master//192.168.1.196:16000-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2016-01-06 19:38:03,136 INFO  [master//192.168.1.196:16000-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
> 2016-01-06 19:38:03,140 INFO  [master//192.168.1.196:16000-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15213a42e10000b, negotiated timeout = 90000
> 2016-01-06 19:38:03,143 INFO  [master//192.168.1.196:16000] client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
> 2016-01-06 19:38:03,272 INFO  [192.168.1.196:16000.activeMasterManager] util.FSTableDescriptorMigrationToSubdir: Migrating user tables
> 2016-01-06 19:38:03,290 INFO  [192.168.1.196:16000.activeMasterManager] util.FSTableDescriptorMigrationToSubdir: Migrating system tables
> 2016-01-06 19:38:03,292 INFO  [192.168.1.196:16000.activeMasterManager] util.FSTableDescriptorMigrationToSubdir: Migration complete.
> 2016-01-06 19:38:03,305 INFO  [192.168.1.196:16000.activeMasterManager] ceph.CephFileSystem: selectDataPool path=ceph://ceph-mon:6789/hbase/data/hbase/meta/.tmp/.tableinfo.0000000001 pool:repl=cephfs_data:2 wanted=3

I also know nothing about HBase (and Hadoop proper definitely works,
at least with the right config setups, because some students did a
whole bunch on it), but this line makes me wonder if it's angry
because you've only got two OSD nodes and it wants 3x replication.

The other thing you can do is turn on the regular ceph logging and see
if it's spitting out errors, or if they're all happening in Java. If
they're in Java that won't necessarily tell us if it's HBase or the
CephFileSystem interface code, but it'll get you closer to the right
place.
-Greg

> 2016-01-06 19:38:03,418 FATAL [192.168.1.196:16000.activeMasterManager] master.HMaster: Failed to become active master
> java.lang.NullPointerException
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:218)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
>         at org.apache.hadoop.hbase.util.FSUtils.getTableDirs(FSUtils.java:1368)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.checkTempDir(MasterFileSystem.java:506)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:149)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:126)
>         at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:649)
>         at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
>         at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-01-06 19:38:03,422 FATAL [192.168.1.196:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
> java.lang.NullPointerException
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:218)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
>         at org.apache.hadoop.hbase.util.FSUtils.getTableDirs(FSUtils.java:1368)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.checkTempDir(MasterFileSystem.java:506)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:149)
>         at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:126)
>         at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:649)
>         at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
>         at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-01-06 19:38:03,424 INFO  [192.168.1.196:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
>
> Maybe anyone can hive a hint on this? It seems there isn't a lot of people using ceph+hbase, but don't lose anything asking :)
>
> This is my current hbase-site.xml just in case
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
>   <property>
>     <name>hbase.rootdir</name>
>     <value>ceph://ceph-mon:6789/hbase</value>
>   </property>
> <property>
>   <name>hbase.cluster.distributed</name>
>   <value>true</value>
> </property>
>   <property>
>     <name>hbase.zookeeper.property.dataDir</name>
>     <value>ceph://ceph-mon:6789/zookeeper</value>
>   </property>
>   <property>
>     <name>hbase.zookeeper.property.clientPort</name>
>     <value>2181</value>
>   </property>
>   <property>
>     <name>hbase.zookeeper.quorum</name>
>     <value>localhost</value>
>   </property>
>   <property>
>     <name>fs.ceph.impl</name>
>     <value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
>   </property>
>   <property>
>     <name>fs.AbstractFileSystem.ceph.impl</name>
>     <value>org.apache.hadoop.fs.ceph.CephFs</value>
>   </property>
> </configuration>
>
> Thanks in advance!
> ________________________________________
> De: Yan, Zheng <ukernel@xxxxxxxxx>
> Enviado: jueves, 31 de diciembre de 2015 02:55 a.m.
> Para: Jose M
> Asunto: Re:  Ceph & Hbase
>
>
> I have no knowledge of hadoop/hbase. the "Permission denied" exception
> on mount is likely caused by incorrect ceph configuration (I didn't
> see ceph related options in hbase config)
>
> following URL is mail from a user who claim successfully run hbase
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/002856.html
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com