Re: rbd create error with 0.26

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 13 May 2011 08:58:14 -0700 (PDT)

On Fri, 13 May 2011, Simon Tian wrote:
> Hi guys,
> 
>      I did some test, 4 scenario:
> (1) 5 osds in 2 hosts, ceph-0.26      one of the osds will core dump
> when create several rbd.
> (2) 4 osds in 4 hosts, ceph-0.26      OK
> (3) 5 osds in 2 hosts, ceph-0.27.1   OK
> (4) 4 osds in 4 hosts, ceph-0.27.1   OK
> 
> 
> BTW, I have 2 question:
> 1.   In these scenario, after I execute "cclass -a" and "ceph class
> activate rbd 1.3",  I need to wait for several second before create
> rbd, otherwise, "librbd: failed to assign a block name for image" will
> come out. Is this all right?

For 0.27 and previous, yes.  The class distribution has removed in v0.28 
and it is now the administrator's (make install's, dpkg's, rpm's) 
responsibility to put the .so in the right directory.  This is much 
simpler all around and easier to debug.  If you are still having any class 
loading problems you should try the 'next' branch which will soon become 
v0.28.

> 2.   I run some test with a modified testlibrbd.c, code add like:
> 
> gettimeofday(&tv1, NULL);
>   for (i = 0; i < num_test; i++)
>     write_test_data(image, test_data, TEST_IO_SIZE * i, TEST_IO_SIZE);
> gettimeofday(&tv2, NULL);
> t1 = tv2.tv_sec-tv1.tv_sec;
> temp = (float)t1 + (tv2.tv_usec-tv1.tv_usec)/1000000.0;
> speed = 1.0*TEST_IO_SIZE*num_test/temp/1024/1024;
> printf("time used: temp=%.3f\n", temp);
> printf("write speed: %.2f MB/s\n", speed);
> 
> The result I got is so slowly:
> time used: temp=46.611
> write speed: 0.21 MB/s
> time used: temp=14.706
> read speed: 0.68 MB/s
> time used: temp=45.453
> aio write speed: 0.22 MB/s
> time used: temp=14.759
> aio read speed: 0.68 MB/s
> 
> But while the test, some cosd process is running with high CPU usage.

What is the IO size?  Is write_test_data synchronous?

For simple write benchmarking you can also use

	rados mkpool foo
	rados -p foo bench <seconds> write -b <blocksize> -t <threads>

and you'll see latency and throughput.  Blocksize defaults to 4M and 
"threads" (parallel IOs) default to 16, IIRC.

sage

> 
> Thx!
> Simon
> 
> 
> 2011/5/10 Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx>:
> > On Tue, May 10, 2011 at 7:15 AM, Simon Tian <aixt2006@xxxxxxxxx> wrote:
> >> Hi,
> >>
> >>    As you said, one of the osds crashed:
> >> ================= log ========================
> >> 2011-05-10 21:46:38.990311 4bc90940 osd2 8 pg[3.13a( v 8'1 (0'0,8'1]
> >> n=1 ec=2 les=6 5/5/4) [2,3] r=0 mlcod 0'0 active+clean]
> >> oi.user_version=8'2 is_modify=0
> >> 2011-05-10 21:46:38.990386 4bc90940 osd2 8 pg[3.13a( v 8'1 (0'0,8'1]
> >> n=1 ec=2 les=6 5/5/4) [2,3] r=0 mlcod 0'0 active+clean]
> >> oi.user_version=8'2 is_modify=1
> >> *** Caught signal (Segmentation fault) **
> >>  in thread 0x45382940
> >> =========================================
> >>
> >> I tried again, this time, i done "rbd create foo --size 1024"
> >> successfully, but when I run the code of testlibrbd.c, one of the osds
> >> crash again:
> >> ================= log ========================
> >> 2011-05-10 22:08:20.008871 4c115940 osd3 10 pg[4.1( v 9'4 (9'2,9'4]
> >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean
> >> snaptrimq=[1~1]] dump_watchers testimg.rbd/head testimg.rbd/head(9'4
> >> client4107.0:14 wrlock_by=unknown0.0:0)
> >> 2011-05-10 22:08:20.008903 4c115940 osd3 10 pg[4.1( v 9'4 (9'2,9'4]
> >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean
> >> snaptrimq=[1~1]]  * obc->watcher: client4107 session=0xc80990
> >> 2011-05-10 22:08:20.008925 4c115940 osd3 10 pg[4.1( v 9'4 (9'2,9'4]
> >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean
> >> snaptrimq=[1~1]]  * oi->watcher: client4107 cookie=2
> >> 2011-05-10 22:08:20.009232 4b914940 osd3 10 pg[4.1( v 9'4 (9'2,9'4]
> >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean]
> >> oi.user_version=10'5 is_modify=1
> >> 2011-05-10 22:08:20.009267 4b914940 expires 2011-05-10 23:08:19.890032
> >> now 2011-05-10 22:08:20.009260
> >> 2011-05-10 22:08:20.009284 napshots_list
> >> 2011-05-10 22:08:20.009307 4b914940 osd3 10 pg[4.1( v 9'4 (9'2,9'4]
> >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean]
> >> oi.user_version=10'5 is_modify=0
> >> 2011-05-10 22:08:20.009375 4b914940 osd3 10 pg[4.1( v 9'4 (9'2,9'4]
> >> n=1 ec=9 les=9 9/9/9) [3,0] r=0 mlcod 9'3 active+clean]
> >> oi.user_version=10'5 is_modify=1
> >> *** Caught signal (Segmentation fault) **
> >>  in thread 0x4eb1c940
> >> =========================================
> >>
> > Can you by any chance get backtrace for that crash (gdb cosd core;
> > bt)? You might need to have the debug packages installed.
> > Also, note that you're not running the latest version so you might be
> > hitting something that was already fixed (not that I remember anything
> > specific, but it might be worth a try).
> >
> > Thanks,
> > Yehuda
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>