Re: ceph 0.56.6 with kernel 2.6.32-358 (centos 6.3)

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 13 May 2013 08:27:51 -0700

On Friday, May 10, 2013, Lenon Join  wrote:
Hi all,
I deploy ceph 0.56.6,

I have 1 server run OSD deamon (format ext4), 1 server run Mon + MDS.

I use RAID 6 with 44TB capacity, I divided into 2 partitions (ext4), each corresponding to 1 OSD.

Ceph -s:

   health HEALTH_OK
   monmap e1: 1 mons at {0=10.160.0.70:6789/0}, election epoch 1, quorum 0 0
   osdmap e34: 2 osds: 2 up, 2 in

    pgmap v1207: 576 pgs: 576 active+clean; 79200 MB data, 194 GB used, 38569 GB / 40811 GB avail; 60490KB/s wr, 14op/s
   mdsmap e18: 1/1/1 up {0=1=up:active}, 1 up:standby

ceph osd tree

# id    weight  type name       up/down reweight
-1      44      root default

-3      44              rack unknownrack
-2      44                      host Ceph-store
0       22                              osd.0   up      1
1       22                              osd.1   up      1

But when I upload the data to mount partition of CEPH, I see errors:

2013-05-10 21:59:00.500316 osd.1 [WRN] 3 slow requests, 1 included below; oldest blocked for > 80.457194 secs

2013-05-10 21:59:00.500326 osd.1 [WRN] slow request 80.457194 seconds old, received at 2013-05-10 21:57:40.043056: osd_op(mds.0.5:437 200.00000001 [write 1189391~25263] 1.6e5f474) v4 currently no flag points reached

2013-05-10 21:59:05.500955 osd.1 [WRN] 4 slow requests, 1 included below; oldest blocked for > 85.457859 secs
2013-05-10 21:59:05.500960 osd.1 [WRN] slow request 40.456829 seconds old, received at 2013-05-10 21:58:25.044086: osd_op(mds.0.5:441 200.00000001 [write 1226678~7515] 1.6e5f474) v4 currently no flag points reached

2013-05-10 21:59:05.045241 osd.0 [WRN] 1 slow requests, 1 included below; oldest blocked for > 40.001093 secs
2013-05-10 21:59:05.045246 osd.0 [WRN] slow request 40.001093 seconds old, received at 2013-05-10 21:58:25.044108: osd_op(mds.0.5:442 200.00000000 [writefull 0~84] 1.844f3494) v4 currently no flag points reached

2013-05-10 21:59:26.577860 mon.0 [INF] pgmap v1216: 576 pgs: 576 active+clean; 84095 MB data, 203 GB used, 38559 GB / 40811 GB avail; 19867KB/s wr, 4op/s
2013-05-10 21:59:10.501512 osd.1 [WRN] 4 slow requests, 1 included below; oldest blocked for > 90.458411 secs

2013-05-10 21:59:10.501518 osd.1 [WRN] slow request 80.645454 seconds old, received at 2013-05-10 21:57:49.856013: osd_op(mds.0.5:439 200.00000001 [write 1214654~1503] 1.6e5f474) v4 currently no flag points reached

2013-05-10 21:59:32.040478 mon.0 [INF] pgmap v1217: 576 pgs: 576 active+clean; 84667 MB data, 204 GB used, 38558 GB / 40811 GB avail; 104MB/s wr, 26op/s
2013-05-10 21:59:15.502405 osd.1 [WRN] 4 slow requests, 1 included below; oldest blocked for > 95.459295 secs

2013-05-10 21:59:15.502414 osd.1 [WRN] slow request 80.458998 seconds old, received at 2013-05-10 21:57:55.043353: osd_op(mds.0.5:440 200.00000001 [write 1216157~10521] 1.6e5f474) v4 currently no flag points reached

2013-05-10 22:00:11.662631 mon.0 [INF] pgmap v1218: 576 pgs: 576 active+clean; 85451 MB data, 205 GB used, 38557 GB / 40811 GB avail; 20290KB/s wr, 5op/s
2013-05-10 22:00:17.109001 mon.0 [INF] pgmap v1219: 576 pgs: 576 active+clean; 86007 MB data, 206 GB used, 38556 GB / 40811 GB avail; 101MB/s wr, 27op/s

It takes place continuously, without a break.

Kernel 2.6.32-358 did not support sync ?

I need help!

Thanks

It sounds like your disks are unhappy with the Ceph-OSD workload (not too surprising on top of a RAID-6 and ext4). Can you run "ceph osd tell \* bench" and watch the results in a separate "ceph -w" window?

You can also try some more basic tests. See how each array does with multiple synchronous write streams hitting it.
-Greg

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com