Re: [EXTERNAL] Re: jewel blocked requests

"WRIGHT, JON R (JON R)" <jonrodwright@xxxxxxxxx> · Thu, 22 Sep 2016 15:57:04 -0400

Thanks for responding, and I apologize for taking so long to get back to 
you.  But all the osds are

ceph version 10.2.2

The osd with the bad disk is down and out of the cluster.  The disk may 
get replaced today.   I'm still getting blocked request messages, but at 
a significantly lower rate.  Most of the current messages are associated 
with two hosts.

Jon

On 9/19/2016 7:45 PM, Will.Boege wrote:
Sorry make that 'ceph tell osd.* version'

On Sep 19, 2016, at 2:55 PM, WRIGHT, JON R (JON R) <jonrodwright@xxxxxxxxx> wrote:

When you say client, we're actually doing everything through Openstack vms and cinder block devices.

librbd and librados are:

/usr/lib/librbd.so.1.0.0

/usr/lib/librados.so.2

But I think this problem may have been related to a disk going back.  We got Disk I/O errors over the weekend and are replacing a disk, and I think the blocked requests may have all been associated with PGs that included the bad OSD/disk.

Would this make sense?

Jon

On 9/15/2016 3:49 AM, Wido den Hollander wrote:

Op 13 september 2016 om 18:54 schreef "WRIGHT, JON R (JON R)" <jonrodwright@xxxxxxxxx>:

VM Client OS: ubuntu 14.04

Openstack: kilo

libvirt: 1.2.12

nova-compute-kvm: 1:2015.1.4-0ubuntu2
What librados/librbd version are you running on the client?

Wido

Jon

On 9/13/2016 11:17 AM, Wido den Hollander wrote:

Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" <jonrodwright@xxxxxxxxx>:

Yes, I do have old clients running.  The clients are all vms.  Is it
typical that vm clients have to be rebuilt after a ceph upgrade?
No, not always, but it is just that I saw this happening recently after a Jewel upgrade.

What version are the client(s) still running?

Wido

Thanks,

Jon

On 9/12/2016 4:05 PM, Wido den Hollander wrote:
Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" <jonrodwright@xxxxxxxxx>:

Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
because of 'blocked requests > 32 sec'.   Seems to be related to writes.

Has anyone else seen this?  Or can anyone suggest what the problem might be?
Do you by any chance have old clients connecting? I saw this after a Jewel upgrade as well and it was because of very old clients still connecting to the cluster.

Wido

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com