Greg, Thanks for following up - I hope you had a GREAT vacation. I eventually deleted and re-added the rbd pool which fixed the hanging problem but left we with 114 stuck pages. Sam suggested that I permanently remove the down osd's and after a few hours of rebalancing everything is working fine :-) (ceph auth del osd.x ; ceph osd crush rm osd.x ; ceph osd rm osd.x). Jeff On Wed, Aug 14, 2013 at 01:54:16PM -0700, Gregory Farnum wrote: > On Thu, Aug 1, 2013 at 9:57 AM, Jeff Moskow <jeff@xxxxxxx> wrote: > > Greg, > > > > Thanks for the hints. I looked through the logs and found OSD's with > > RETRY's. I marked those "out" (marked in orange) and let ceph rebalance. > > Then I ran the bench command. > > I now have many more errors than before :-(. > > > > health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 151 pgs stuck > > unclean > > > > Note that the incomplete pg is still the same (2.1f6). > > > > Any ideas on what to try next? > > > > 2013-08-01 12:39:38.349011 osd.4 172.16.170.2:6801/1778 1154 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 18.085318 sec at 57979 KB/sec > > 2013-08-01 12:39:38.499002 osd.5 172.16.170.2:6802/19375 454 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 18.232358 sec at 57511 KB/sec > > 2013-08-01 12:39:44.077347 osd.3 172.16.170.2:6800/1647 1211 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 23.813801 sec at 44032 KB/sec > > 2013-08-01 12:39:49.118812 osd.16 172.16.170.4:6802/1837 746 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 28.453320 sec at 36852 KB/sec > > 2013-08-01 12:39:48.468020 osd.15 172.16.170.4:6801/1699 821 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 27.802566 sec at 37715 KB/sec > > 2013-08-01 12:39:54.369364 osd.0 172.16.170.1:6800/3783 948 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 34.076451 sec at 30771 KB/sec > > 2013-08-01 12:39:48.618080 osd.14 172.16.170.4:6800/1572 16161 : [INF] > > bench: wrote 1024 MB in blocks of 4096 KB in 27.952574 sec at 37512 KB/sec > > 2013-08-01 12:39:54.382830 osd.2 172.16.170.1:6803/22033 222 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 34.090170 sec at 30758 KB/sec > > 2013-08-01 12:40:03.458096 osd.6 172.16.170.3:6801/1738 1582 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 43.143180 sec at 24304 KB/sec > > 2013-08-01 12:40:03.724504 osd.10 172.16.170.3:6800/1473 1238 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 43.409558 sec at 24155 KB/sec > > 2013-08-01 12:40:02.426650 osd.8 172.16.170.3:6803/2013 8272 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 42.111713 sec at 24899 KB/sec > > 2013-08-01 12:40:02.997093 osd.7 172.16.170.3:6802/1864 1094 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 42.682079 sec at 24567 KB/sec > > 2013-08-01 12:40:02.867046 osd.9 172.16.170.3:6804/2149 2258 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 42.551771 sec at 24642 KB/sec > > 2013-08-01 12:39:54.360014 osd.1 172.16.170.1:6801/4243 3060 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 34.070725 sec at 30776 KB/sec > > 2013-08-01 12:42:56.984632 osd.11 172.16.170.5:6800/28025 43996 : [INF] > > bench: wrote 1024 MB in blocks of 4096 KB in 216.687559 sec at 4839 KB/sec > > 2013-08-01 12:43:21.271481 osd.13 172.16.170.5:6802/1872 1056 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 240.974360 sec at 4351 KB/sec > > 2013-08-01 12:43:39.320462 osd.12 172.16.170.5:6801/1700 1348 : [INF] bench: > > wrote 1024 MB in blocks of 4096 KB in 259.023646 sec at 4048 KB/sec > > Sorry for the slow reply; I've been out on vacation. :) > Looking through this list, I'm noticing that many of your OSDs are > reporting 4MB/s write speeds and they don't correspond to the ones you > marked out (though if your cluster was somehow under load that could > have something to do with the very different speed reports). > > You still want to look at the pg statistics for the stuck PG; I'm not > seeing that anywhere? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com -- =============================================================================== _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com