Hi guys, I did some test, 4 scenario: (1) 5 osds in 2 hosts, ceph-0.26 one of the osds will core dump when create several rbd. (2) 4 osds in 4 hosts, ceph-0.26 OK (3) 5 osds in 2 hosts, ceph-0.27.1 OK (4) 4 osds in 4 hosts, ceph-0.27.1 OK BTW, I have 2 question: 1. In these scenario, after I execute "cclass -a" and "ceph class activate rbd 1.3", I need to wait for several second before create rbd, otherwise, "librbd: failed to assign a block name for image" will come out. Is this all right? 2. I run some test with a modified testlibrbd.c, code add like: gettimeofday(&tv1, NULL); for (i = 0; i < num_test; i++) write_test_data(image, test_data, TEST_IO_SIZE * i, TEST_IO_SIZE); gettimeofday(&tv2, NULL); t1 = tv2.tv_sec-tv1.tv_sec; temp = (float)t1 + (tv2.tv_usec-tv1.tv_usec)/1000000.0; speed = 1.0*TEST_IO_SIZE*num_test/temp/1024/1024; printf("time used: temp=%.3f\n", temp); printf("write speed: %.2f MB/s\n", speed); The result I got is so slowly: time used: temp=46.611 write speed: 0.21 MB/s time used: temp=14.706 read speed: 0.68 MB/s time used: temp=45.453 aio write speed: 0.22 MB/s time used: temp=14.759 aio read speed: 0.68 MB/s But while the test, some cosd process is running with high CPU usage. Thx! Simon 2011/5/10 Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx>: > On Tue, May 10, 2011 at 7:15 AM, Simon Tian <aixt2006@xxxxxxxxx> wrote: >> Hi, >> >> Â ÂAs you said, one of the osds crashed: >> ================= log ======================== >> 2011-05-10 21:46:38.990311 4bc90940 osd2 8 pg[3.13a( v 8'1 (0'0,8'1] >> n=1 ec=2 les=6 5/5/4) [2,3] r=0 mlcod 0'0 active+clean] >> oi.user_version=8'2 is_modify=0 >> 2011-05-10 21:46:38.990386 4bc90940 osd2 8 pg[3.13a( v 8'1 (0'0,8'1] >> n=1 ec=2 les=6 5/5/4) [2,3] r=0 mlcod 0'0 active+clean] >> oi.user_version=8'2 is_modify=1 >> *** Caught signal (Segmentation fault) ** >> Âin thread 0x45382940 >> ========================================= >> >> I tried again, this time, i done "rbd create foo --size 1024" >> successfully, but when I run the code of testlibrbd.c, one of the osds >> crash again: >> ================= log ======================== >> 2011-05-10 22:08:20.008871 4c115940 osd3 10 pg[4.1( v 9'4 (9'2,9'4] >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean >> snaptrimq=[1~1]] dump_watchers testimg.rbd/head testimg.rbd/head(9'4 >> client4107.0:14 wrlock_by=unknown0.0:0) >> 2011-05-10 22:08:20.008903 4c115940 osd3 10 pg[4.1( v 9'4 (9'2,9'4] >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean >> snaptrimq=[1~1]] Â* obc->watcher: client4107 session=0xc80990 >> 2011-05-10 22:08:20.008925 4c115940 osd3 10 pg[4.1( v 9'4 (9'2,9'4] >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean >> snaptrimq=[1~1]] Â* oi->watcher: client4107 cookie=2 >> 2011-05-10 22:08:20.009232 4b914940 osd3 10 pg[4.1( v 9'4 (9'2,9'4] >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean] >> oi.user_version=10'5 is_modify=1 >> 2011-05-10 22:08:20.009267 4b914940 expires 2011-05-10 23:08:19.890032 >> now 2011-05-10 22:08:20.009260 >> 2011-05-10 22:08:20.009284 napshots_list >> 2011-05-10 22:08:20.009307 4b914940 osd3 10 pg[4.1( v 9'4 (9'2,9'4] >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean] >> oi.user_version=10'5 is_modify=0 >> 2011-05-10 22:08:20.009375 4b914940 osd3 10 pg[4.1( v 9'4 (9'2,9'4] >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean] >> oi.user_version=10'5 is_modify=1 >> *** Caught signal (Segmentation fault) ** >> Âin thread 0x4eb1c940 >> ========================================= >> > Can you by any chance get backtrace for that crash (gdb cosd core; > bt)? You might need to have the debug packages installed. > Also, note that you're not running the latest version so you might be > hitting something that was already fixed (not that I remember anything > specific, but it might be worth a try). > > Thanks, > Yehuda > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html