Re: "rbd ls -l" hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greg,

        Thanks for following up - I hope you had a GREAT vacation.

        I eventually deleted and re-added the rbd pool which fixed the hanging problem but
left we with 114 stuck pages.

        Sam suggested that I permanently remove the down osd's and after a few hours of
rebalancing everything is working fine :-)

(ceph auth del osd.x ; ceph osd crush rm osd.x ; ceph osd rm osd.x).

Jeff

On Wed, Aug 14, 2013 at 01:54:16PM -0700, Gregory Farnum wrote:
> On Thu, Aug 1, 2013 at 9:57 AM, Jeff Moskow <jeff@xxxxxxx> wrote:
> > Greg,
> >
> >     Thanks for the hints.  I looked through the logs and found OSD's with
> > RETRY's.  I marked those "out" (marked in orange) and let ceph rebalance.
> > Then I ran the bench command.
> > I now have many more errors than before :-(.
> >
> > health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 151 pgs stuck
> > unclean
> >
> > Note that the incomplete pg is still the same (2.1f6).
> >
> > Any ideas on what to try next?
> >
> > 2013-08-01 12:39:38.349011 osd.4 172.16.170.2:6801/1778 1154 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 18.085318 sec at 57979 KB/sec
> > 2013-08-01 12:39:38.499002 osd.5 172.16.170.2:6802/19375 454 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 18.232358 sec at 57511 KB/sec
> > 2013-08-01 12:39:44.077347 osd.3 172.16.170.2:6800/1647 1211 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 23.813801 sec at 44032 KB/sec
> > 2013-08-01 12:39:49.118812 osd.16 172.16.170.4:6802/1837 746 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 28.453320 sec at 36852 KB/sec
> > 2013-08-01 12:39:48.468020 osd.15 172.16.170.4:6801/1699 821 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 27.802566 sec at 37715 KB/sec
> > 2013-08-01 12:39:54.369364 osd.0 172.16.170.1:6800/3783 948 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 34.076451 sec at 30771 KB/sec
> > 2013-08-01 12:39:48.618080 osd.14 172.16.170.4:6800/1572 16161 : [INF]
> > bench: wrote 1024 MB in blocks of 4096 KB in 27.952574 sec at 37512 KB/sec
> > 2013-08-01 12:39:54.382830 osd.2 172.16.170.1:6803/22033 222 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 34.090170 sec at 30758 KB/sec
> > 2013-08-01 12:40:03.458096 osd.6 172.16.170.3:6801/1738 1582 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 43.143180 sec at 24304 KB/sec
> > 2013-08-01 12:40:03.724504 osd.10 172.16.170.3:6800/1473 1238 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 43.409558 sec at 24155 KB/sec
> > 2013-08-01 12:40:02.426650 osd.8 172.16.170.3:6803/2013 8272 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 42.111713 sec at 24899 KB/sec
> > 2013-08-01 12:40:02.997093 osd.7 172.16.170.3:6802/1864 1094 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 42.682079 sec at 24567 KB/sec
> > 2013-08-01 12:40:02.867046 osd.9 172.16.170.3:6804/2149 2258 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 42.551771 sec at 24642 KB/sec
> > 2013-08-01 12:39:54.360014 osd.1 172.16.170.1:6801/4243 3060 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 34.070725 sec at 30776 KB/sec
> > 2013-08-01 12:42:56.984632 osd.11 172.16.170.5:6800/28025 43996 : [INF]
> > bench: wrote 1024 MB in blocks of 4096 KB in 216.687559 sec at 4839 KB/sec
> > 2013-08-01 12:43:21.271481 osd.13 172.16.170.5:6802/1872 1056 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 240.974360 sec at 4351 KB/sec
> > 2013-08-01 12:43:39.320462 osd.12 172.16.170.5:6801/1700 1348 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 259.023646 sec at 4048 KB/sec
> 
> Sorry for the slow reply; I've been out on vacation. :)
> Looking through this list, I'm noticing that many of your OSDs are
> reporting 4MB/s write speeds and they don't correspond to the ones you
> marked out (though if your cluster was somehow under load that could
> have something to do with the very different speed reports).
> 
> You still want to look at the pg statistics for the stuck PG; I'm not
> seeing that anywhere?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

-- 
===============================================================================
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux