2011/9/27 Cédric Morandin <cedric.morandin@xxxxxxxx>: > Hi Wido, > > Thanks for your answer and your kind help. > I tried to give you all useful information but maybe something is missing. > Let me know if you want me to do more tests. > > Please find the output of ceph -s below: > [root@node91 ~]# ceph -s > 2011-09-26 22:48:08.048659 pg v297: 792 pgs: 792 active+clean; 24 KB data, 80512 KB used, 339 GB / 340 GB avail > 2011-09-26 22:48:08.049742 mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby > 2011-09-26 22:48:08.049764 osd e5: 4 osds: 4 up, 4 in > 2011-09-26 22:48:08.049800 log 2011-09-26 19:38:14.372125 osd3 138.96.126.95:6800/2973 242 : [INF] 2.1p3 scrub ok > 2011-09-26 22:48:08.049847 mon e1: 3 mons at {alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0} > > The same command ten minutes after the cfuse hangs on the client node : > > [root@node91 ~]# ceph -s > 2011-09-26 23:07:49.403774 pg v335: 792 pgs: 101 active, 276 active+clean, 415 active+clean+degraded; 4806 KB data, 114 MB used, 339 GB / 340 GB avail; 24/56 degraded (42.857%) > 2011-09-26 23:07:49.404847 mds e5: 1/1/1 up {0=alpha=up:active}, 1 up:standby > 2011-09-26 23:07:49.404867 osd e13: 4 osds: 2 up, 4 in > 2011-09-26 23:07:49.404929 log 2011-09-26 23:07:46.093670 mds0 138.96.126.91:6800/4682 2 : [INF] closing stale session client4124 138.96.126.91:0/5563 after 455.778957 > 2011-09-26 23:07:49.404966 mon e1: 3 mons at {alpha=138.96.126.91:6789/0,beta=138.96.126.92:6789/0,gamma=138.96.126.93:6789/0} > > [root@node91 ~]# /etc/init.d/ceph -a status > === mon.alpha === > running... > === mon.beta === > running... > === mon.gamma === > running... > === mds.alpha === > running... > === mds.beta === > running... > === osd.0 === > dead. > === osd.1 === > running... > === osd.2 === > running... > === osd.3 === > dead. > > I finally paste the last lines of osd.0 : > > 2011-09-26 22:57:06.822182 7faf6a6f8700 -- 138.96.126.92:6802/3157 >> 138.96.126.93:6801/3162 pipe(0x7faf50001320 sd=20 pgs=0 cs=0 l=0).accept connect_seq 2 vs existing 1 state 3 > 2011-09-26 23:07:09.084901 7faf8e1b5700 FileStore: sync_entry timed out after 600 seconds. > ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6) > 2011-09-26 23:07:09.084934 1: (SafeTimer::timer_thread()+0x323) [0x5c95a3] > 2011-09-26 23:07:09.084943 2: (SafeTimerThread::entry()+0xd) [0x5cbc7d] > 2011-09-26 23:07:09.084950 3: /lib64/libpthread.so.0() [0x31fec077e1] > 2011-09-26 23:07:09.084957 4: (clone()+0x6d) [0x31fe4e18ed] > 2011-09-26 23:07:09.084963 *** Caught signal (Aborted) ** > in thread 0x7faf8e1b5700 > ceph version 0.34 (commit:2f039eeeb745622b866d80feda7afa055e15f6d6) > 1: /usr/bin/cosd() [0x649ca9] > 2: /lib64/libpthread.so.0() [0x31fec0f4c0] > 3: (gsignal()+0x35) [0x31fe4329a5] > 4: (abort()+0x175) [0x31fe434185] > 5: (__assert_fail()+0xf5) [0x31fe42b935] > 6: (SyncEntryTimeout::finish(int)+0x130) [0x683400] > 7: (SafeTimer::timer_thread()+0x323) [0x5c95a3] > 8: (SafeTimerThread::entry()+0xd) [0x5cbc7d] > 9: /lib64/libpthread.so.0() [0x31fec077e1] > 10: (clone()+0x6d) [0x31fe4e18ed] may be the underlayer fs(btrfs/ext4) is busy or hang that result the sync_commit took more than 600s,the osd think it is dead, so use ceph_abort to terminate cosd process. > ceph.conf: > > [global] > max open files = 131072 > log file = /var/log/ceph/$name.log > pid file = /var/run/ceph/$name.pid > [mon] > mon data = /data/$name > mon clock drift allowed = 1 > [mon.alpha] > host = node91 > mon addr = 138.96.126.91:6789 > [mon.beta] > host = node92 > mon addr = 138.96.126.92:6789 > [mon.gamma] > host = node93 > mon addr = 138.96.126.93:6789 > [mds] > > keyring = /data/keyring.$name > [mds.alpha] > host = node91 > [mds.beta] > host = node92 > [osd] > osd data = /data/$name > osd journal = /data/$name/journal > osd journal size = 1000 > [osd.0] > host = node92 > [osd.1] > host = node93 > [osd.2] > host = node94 > [osd.3] > host = node95 > > ---- > > Thank you one more time for your help. > > Regards > > Cédric > > Le 23 sept. 2011 à 19:20, Wido den Hollander a écrit : > >> Hi. >> >> Could you sent us your ceph.conf and the output of "ceph -s" ? >> >> Wido >> >> On Fri, 2011-09-23 at 17:58 +0200, Cedric Morandin wrote: >>> Hi everybody, >>> >>> I didn't find any ceph-users list so I post here. If this is not the right place to do it please let me know. >>> I'm currently trying to test ceph but I'm probably doing something wrong because I have a really strange behavior. >>> >>> Context: >>> Ceph compiled and installed on five Centos6 machines. >>> A BTRFS partition is available on each machine. >>> This partition is mounted under /data/osd.[0-3] >>> Clients are using cfuse compiled for FC11 ( 2.6.29.4-167.fc11.x86_64 ) >>> >>> What happen: >>> I configured everything in ceph.conf, started ceph daemons on all nodes. >>> When I issue ceph health, I have a HEALTH_OK answer. >>> I can access the filesystem through cfuse and create some files on it, but when I try to create files bigger than 2 or 3 Mo, the filesystem hangs. >>> When I try to copy an entire directory ( ceph sources for instance) I have the same problem. >>> When the system is in this state, the cosd daemon die on OSD machines: [INF] osd0 out (down for 304.836218) >>> Even killing it doesn't release the mountpoint : >>> cosd 9170 root 10uW REG 8,6 8 2506754 /data/osd.0/fsid >>> cosd 9170 root 11r DIR 8,6 4096 2506753 /data/osd.0 >>> cosd 9170 root 12r DIR 8,6 24576 2506755 /data/osd.0/current >>> cosd 9170 root 13u REG 8,6 4 2506757 /data/osd.0/current/commit_op_seq >>> >>> >>> I tried to change some parameters but it results in the same problem: >>> Tried both with the 0.34 and 0.35 releases and using both BTRFS or EXTR3 with user_attr attribute. >>> I tried the cfuse client on one of the Centos 6 machine. >>> >>> I read everything on http://ceph.newdream.net/wiki but I can't figure out the problem. >>> Does somebody have any clue of the problem's origin ? >>> >>> Regards, >>> >>> Cedric Morandin >>> >>> >>> >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html