Single OSD down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

 

Ceph: 80.9 (as of Monday 13th) previously 80.8

OS: Ubuntu 12.04, 3.13.0-44-generic #73~precise1-Ubuntu SMP Wed Dec 17 00:39:15 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Servers: DELL R515 1 x 2.7GHZ 6C AMD CPU w/ 32GB RAM & 10 x 3TB OSD’s w/ 2x Intel DCS3700 100GB Journals (5 OSD per SSD)

 

 

I recently had an issue with a OSD being offline, Disk seems fine ie: SMART details came back OK etc.

The cluster has been in a clean state and I’ve restarted the OSD 34 today. Seems to be rebalancing and no spam of errors in the osd logs so far just a few messages as per below,

 

2015-04-21 01:20:29.812221 7f1588382700  0 -- 10.100.128.13:6800/28635 >> 10.100.128.13:6812/46680 pipe(0x2bdc2000 sd=219 :57546 s=2 pgs=105 cs=1 l=0 c=0x2c084580).fault with nothing to send, going to standby

2015-04-21 01:21:58.067287 7f1587675700  0 -- 10.100.128.13:6800/28635 >> 10.100.128.13:6807/45051 pipe(0x2cd93a00 sd=169 :41651 s=2 pgs=91 cs=1 l=0 c=0x2c8af160).fault with nothing to send, going to standby

2015-04-21 01:22:28.071296 7f156ede9700  0 -- 10.100.128.13:6800/28635 >> 10.100.128.13:6808/45413 pipe(0x1cb4d780 sd=77 :6800 s=2 pgs=336 cs=3 l=0 c=0x2cd88420).fault with nothing to send, going to standby

2015-04-21 01:25:43.943967 7f1587d7c700  0 -- 10.100.128.13:6800/28635 >> 10.100.128.13:6816/48793 pipe(0x2cd93c80 sd=240 :58577 s=2 pgs=280 cs=1 l=0 c=0x2cd8db00).fault with nothing to send, going to standby

2015-04-21 01:37:58.597352 7f156ede9700  0 -- 10.100.128.13:6800/28635 >> 10.100.128.13:6808/45413 pipe(0x1cb4d780 sd=77 :50564 s=2 pgs=339 cs=5 l=0 c=0x2cd88420).fault with nothing to send, going to standby

 

 

 

Only new thing that’s changed is I did a Ceph and & OS updates to 80.9 on Monday 13th from 80.8 apart from that it’s been running fairly smooth bit higher latency then I would like but this seems related to my osds/disks.

I’ve attached the OSD log file below and here is a quick snip. The log file is pretty much full of this very similar message to this below just repeating for quite a while.

 

 

2015-04-16 00:43:10.187710 7f9634fd7700 20 -- 10.100.128.13:6800/43939 >> 10.100.128.12:6816/8982 pipe(0x2bf92500 sd=428 :6800 s=4 pgs=6 cs=1 l=0 c=0xe26adc0).join

2015-04-16 00:43:10.187786 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper reaped pipe 0x2bf92500 10.100.128.12:6816/8982

2015-04-16 00:43:10.187799 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper deleted pipe 0x2bf92500

2015-04-16 00:43:10.187802 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper done

2015-04-16 00:43:10.187820 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper

2015-04-16 00:43:10.187818 7f960741d700 10 -- 10.100.128.13:6800/43939 >> 10.100.128.10:6818/20488 pipe(0x2aefcc80 sd=357 :6800 s=4 pgs=27 cs=1 l=0 c=0x19547dc0).reader done

2015-04-16 00:43:10.187824 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper reaping pipe 0x2aefcc80 10.100.128.10:6818/20488

2015-04-16 00:43:10.187829 7f9634fd7700 10 -- 10.100.128.13:6800/43939 >> 10.100.128.10:6818/20488 pipe(0x2aefcc80 sd=357 :6800 s=4 pgs=27 cs=1 l=0 c=0x19547dc0).discard_queue

2015-04-16 00:43:10.187837 7f9634fd7700 10 -- 10.100.128.13:6800/43939 >> 10.100.128.10:6818/20488 pipe(0x2aefcc80 sd=357 :6800 s=4 pgs=27 cs=1 l=0 c=0x19547dc0).unregister_pipe – not

 

 

until eventually it shuts down the osd. I’m not quite sure why

 

 

2015-04-16 00:43:10.187846 7f9634fd7700 20 -- 10.100.128.13:6800/43939 >> 10.100.128.10:6818/20488 pipe(0x2aefcc80 sd=357 :6800 s=4 pgs=27 cs=1 l=0 c=0x19547dc0).join

2015-04-16 00:43:10.187958 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper reaped pipe 0x2aefcc80 10.100.128.10:6818/20488

2015-04-16 00:43:10.187973 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper deleted pipe 0x2aefcc80

2015-04-16 00:43:10.187976 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper done

2015-04-16 00:43:10.189618 7f963a08c780 10 accepter.stop accepter

2015-04-16 00:43:10.196239 7f9629d96700 20 accepter.accepter poll got 1

2015-04-16 00:43:10.196248 7f9629d96700 20 accepter.accepter closing

2015-04-16 00:43:10.196260 7f9629d96700 10 accepter.accepter stopping

2015-04-16 00:43:10.197310 7f963a08c780 20 -- 10.100.96.13:6801/43939 wait: stopped accepter thread

2015-04-16 00:43:10.197328 7f963a08c780 20 -- 10.100.96.13:6801/43939 wait: stopping reaper thread

2015-04-16 00:43:10.197344 7f9630ce4700 10 -- 10.100.96.13:6801/43939 reaper_entry done

2015-04-16 00:43:10.197450 7f963a08c780 20 -- 10.100.96.13:6801/43939 wait: stopped reaper thread

2015-04-16 00:43:10.197461 7f963a08c780 10 -- 10.100.96.13:6801/43939 wait: closing pipes

2015-04-16 00:43:10.197464 7f963a08c780 10 -- 10.100.96.13:6801/43939 reaper

2015-04-16 00:43:10.197467 7f963a08c780 10 -- 10.100.96.13:6801/43939 reaper done

2015-04-16 00:43:10.197469 7f963a08c780 10 -- 10.100.96.13:6801/43939 wait: waiting for pipes  to close

2015-04-16 00:43:10.197472 7f963a08c780 10 -- 10.100.96.13:6801/43939 wait: done.

2015-04-16 00:43:10.197474 7f963a08c780  1 -- 10.100.96.13:6801/43939 shutdown complete.

2015-04-16 00:43:10.197480 7f963a08c780 10 -- 10.100.128.13:0/43939 wait: waiting for dispatch queue

2015-04-16 00:43:10.197555 7f963a08c780 10 -- 10.100.128.13:0/43939 wait: dispatch queue is stopped

2015-04-16 00:43:10.197560 7f963a08c780 20 -- 10.100.128.13:0/43939 wait: stopping reaper thread

2015-04-16 00:43:10.197575 7f96314e5700 10 -- 10.100.128.13:0/43939 reaper_entry done

2015-04-16 00:43:10.197680 7f963a08c780 20 -- 10.100.128.13:0/43939 wait: stopped reaper thread

2015-04-16 00:43:10.197690 7f963a08c780 10 -- 10.100.128.13:0/43939 wait: closing pipes

2015-04-16 00:43:10.197693 7f963a08c780 10 -- 10.100.128.13:0/43939 reaper

2015-04-16 00:43:10.197695 7f963a08c780 10 -- 10.100.128.13:0/43939 reaper done

2015-04-16 00:43:10.197697 7f963a08c780 10 -- 10.100.128.13:0/43939 wait: waiting for pipes  to close

2015-04-16 00:43:10.197699 7f963a08c780 10 -- 10.100.128.13:0/43939 wait: done.

2015-04-16 00:43:10.197701 7f963a08c780  1 -- 10.100.128.13:0/43939 shutdown complete.

2015-04-16 00:43:10.197704 7f963a08c780 10 -- 10.100.96.13:6802/43939 wait: waiting for dispatch queue

2015-04-16 00:43:10.197779 7f963a08c780 10 -- 10.100.96.13:6802/43939 wait: dispatch queue is stopped

2015-04-16 00:43:10.197787 7f963a08c780 20 -- 10.100.96.13:6802/43939 wait: stopping accepter thread

2015-04-16 00:43:10.197789 7f963a08c780 10 accepter.stop accepter

2015-04-16 00:43:10.197812 7f9627591700 20 accepter.accepter poll got 1

2015-04-16 00:43:10.197828 7f9627591700 20 accepter.accepter closing

2015-04-16 00:43:10.197842 7f9627591700 10 accepter.accepter stopping

2015-04-16 00:43:10.197888 7f963a08c780 20 -- 10.100.96.13:6802/43939 wait: stopped accepter thread

2015-04-16 00:43:10.197897 7f963a08c780 20 -- 10.100.96.13:6802/43939 wait: stopping reaper thread

2015-04-16 00:43:10.197912 7f9631ce6700 10 -- 10.100.96.13:6802/43939 reaper_entry done

2015-04-16 00:43:10.197986 7f963a08c780 20 -- 10.100.96.13:6802/43939 wait: stopped reaper thread

2015-04-16 00:43:10.198002 7f963a08c780 10 -- 10.100.96.13:6802/43939 wait: closing pipes

2015-04-16 00:43:10.198005 7f963a08c780 10 -- 10.100.96.13:6802/43939 reaper

2015-04-16 00:43:10.198008 7f963a08c780 10 -- 10.100.96.13:6802/43939 reaper done

2015-04-16 00:43:10.198011 7f963a08c780 10 -- 10.100.96.13:6802/43939 wait: waiting for pipes  to close

2015-04-16 00:43:10.198013 7f963a08c780 10 -- 10.100.96.13:6802/43939 wait: done.

2015-04-16 00:43:10.198016 7f963a08c780  1 -- 10.100.96.13:6802/43939 shutdown complete.

2015-04-16 00:43:10.198019 7f963a08c780 10 -- 10.100.128.13:6801/43939 wait: waiting for dispatch queue

2015-04-16 00:43:10.198038 7f963a08c780 10 -- 10.100.128.13:6801/43939 wait: dispatch queue is stopped

2015-04-16 00:43:10.198043 7f963a08c780 20 -- 10.100.128.13:6801/43939 wait: stopping accepter thread

2015-04-16 00:43:10.198045 7f963a08c780 10 accepter.stop accepter

2015-04-16 00:43:10.198061 7f962658f700 20 accepter.accepter poll got 1

2015-04-16 00:43:10.198067 7f962658f700 20 accepter.accepter closing

2015-04-16 00:43:10.198079 7f962658f700 10 accepter.accepter stopping

2015-04-16 00:43:10.198126 7f963a08c780 20 -- 10.100.128.13:6801/43939 wait: stopped accepter thread

2015-04-16 00:43:10.198134 7f963a08c780 20 -- 10.100.128.13:6801/43939 wait: stopping reaper thread

2015-04-16 00:43:10.198150 7f9633cea700 10 -- 10.100.128.13:6801/43939 reaper_entry done

2015-04-16 00:43:10.198205 7f963a08c780 20 -- 10.100.128.13:6801/43939 wait: stopped reaper thread

2015-04-16 00:43:10.198213 7f963a08c780 10 -- 10.100.128.13:6801/43939 wait: closing pipes

2015-04-16 00:43:10.198216 7f963a08c780 10 -- 10.100.128.13:6801/43939 reaper

2015-04-16 00:43:10.198218 7f963a08c780 10 -- 10.100.128.13:6801/43939 reaper done

2015-04-16 00:43:10.198220 7f963a08c780 10 -- 10.100.128.13:6801/43939 wait: waiting for pipes  to close

2015-04-16 00:43:10.198223 7f963a08c780 10 -- 10.100.128.13:6801/43939 wait: done.

2015-04-16 00:43:10.198225 7f963a08c780  1 -- 10.100.128.13:6801/43939 shutdown complete.

2015-04-16 00:43:10.198227 7f963a08c780 10 -- 10.100.128.13:6800/43939 wait: waiting for dispatch queue

2015-04-16 00:43:10.198241 7f963a08c780 10 -- 10.100.128.13:6800/43939 wait: dispatch queue is stopped

2015-04-16 00:43:10.198245 7f963a08c780 20 -- 10.100.128.13:6800/43939 wait: stopping accepter thread

2015-04-16 00:43:10.198248 7f963a08c780 10 accepter.stop accepter

2015-04-16 00:43:10.198259 7f9628d94700 20 accepter.accepter poll got 1

2015-04-16 00:43:10.198265 7f9628d94700 20 accepter.accepter closing

2015-04-16 00:43:10.198276 7f9628d94700 10 accepter.accepter stopping

2015-04-16 00:43:10.198310 7f963a08c780 20 -- 10.100.128.13:6800/43939 wait: stopped accepter thread

2015-04-16 00:43:10.198315 7f963a08c780 20 -- 10.100.128.13:6800/43939 wait: stopping reaper thread

2015-04-16 00:43:10.198330 7f9634fd7700 10 -- 10.100.128.13:6800/43939 reaper_entry done

2015-04-16 00:43:10.198391 7f963a08c780 20 -- 10.100.128.13:6800/43939 wait: stopped reaper thread

2015-04-16 00:43:10.198399 7f963a08c780 10 -- 10.100.128.13:6800/43939 wait: closing pipes

2015-04-16 00:43:10.198401 7f963a08c780 10 -- 10.100.128.13:6800/43939 reaper

2015-04-16 00:43:10.198406 7f963a08c780 10 -- 10.100.128.13:6800/43939 reaper done

2015-04-16 00:43:10.198409 7f963a08c780 10 -- 10.100.128.13:6800/43939 wait: waiting for pipes  to close

2015-04-16 00:43:10.198411 7f963a08c780 10 -- 10.100.128.13:6800/43939 wait: done.

2015-04-16 00:43:10.198413 7f963a08c780  1 -- 10.100.128.13:6800/43939 shutdown complete.

2015-04-16 00:43:10.198416 7f963a08c780 10 -- 10.100.96.13:6830/43939 wait: waiting for dispatch queue

2015-04-16 00:43:10.198429 7f963a08c780 10 -- 10.100.96.13:6830/43939 wait: dispatch queue is stopped

2015-04-16 00:43:10.198433 7f963a08c780 20 -- 10.100.96.13:6830/43939 wait: stopping accepter thread

2015-04-16 00:43:10.198436 7f963a08c780 10 accepter.stop accepter

2015-04-16 00:43:10.198450 7f962558d700 20 accepter.accepter poll got 1

2015-04-16 00:43:10.198457 7f962558d700 20 accepter.accepter closing

2015-04-16 00:43:10.198465 7f962558d700 10 accepter.accepter stopping

2015-04-16 00:43:10.198495 7f963a08c780 20 -- 10.100.96.13:6830/43939 wait: stopped accepter thread

2015-04-16 00:43:10.198500 7f963a08c780 20 -- 10.100.96.13:6830/43939 wait: stopping reaper thread

2015-04-16 00:43:10.198517 7f96347d6700 10 -- 10.100.96.13:6830/43939 reaper_entry done

2015-04-16 00:43:10.198565 7f963a08c780 20 -- 10.100.96.13:6830/43939 wait: stopped reaper thread

2015-04-16 00:43:10.198578 7f963a08c780 10 -- 10.100.96.13:6830/43939 wait: closing pipes

2015-04-16 00:43:10.198581 7f963a08c780 10 -- 10.100.96.13:6830/43939 reaper

2015-04-16 00:43:10.198583 7f963a08c780 10 -- 10.100.96.13:6830/43939 reaper done

2015-04-16 00:43:10.198586 7f963a08c780 10 -- 10.100.96.13:6830/43939 wait: waiting for pipes  to close

2015-04-16 00:43:10.198588 7f963a08c780 10 -- 10.100.96.13:6830/43939 wait: done.

2015-04-16 00:43:10.198590 7f963a08c780  1 -- 10.100.96.13:6830/43939 shutdown complete.

 

 

Full OSD log below

https://drive.google.com/file/d/0B578d6cBmDPYQ1lCMUR2Y0tLNTA/view?usp=sharing

 

 

Regards,

Quenten Grasso

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux