Hello! On Mon, Oct 05, 2015 at 09:35:26PM -0600, robert wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > With some off-list help, we have adjusted > osd_client_message_cap=10000. This seems to have helped a bit and we > have seen some OSDs have a value up to 4,000 for client messages. But > it does not solve the problem with the blocked I/O. > One thing that I have noticed is that almost exactly 30 seconds elapse > between an OSD boots and the first blocked I/O message. I don't know > if the OSD doesn't have time to get it's brain right about a PG before > it starts servicing it or what exactly. I have problems like yours in my cluster. All of them can be fixed with restarting some osds, but i can not restart all my osds time to time. Problem occurs when client is writing to rbd wolume or when recovering volume. Typical message is (this was when recovering): [WRN] slow request 30.929654 seconds old, received at 2015-10-06 13:00:41.412329: osd_op(client.1068613.0:192715 rbd_data.dc7650539e6a.0000000000000820 [set-alloc-hint object_size 4194304 write_size 4194304,write 3371008~4096] 5.d66fd55d snapc c=[c] ack+ondisk+write+known_if_redirected e4009) currently waiting for subops from 51 Restarting osd.51 in such scenario fixes the problem. There are no slow requests with low io on systems, only when i do something like uploading image. Some times ago i had too much created but not used osds. In that time, when going down for restart, osds did not inform mon about this. Removing unused osds entries fixes this issue. But when doing ceph crush dump i can see them. Maybe, it is the root of problem? I tried to do getcrushmap/edit/setcrushmap, but entries are in their place. Maybe, my experience will help You to find answer. I hope, it wil fix my problems :) -- WBR, Max A. Krasilnikov _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com