Hi, we’re currently expanding our cluster to grow the number of IOPS we can provide to clients. We’re still on Hammer but in the process of upgrading to Jewel. We started adding pure-SSD OSDs in the last days (based on MICRON S610DC-3840) and the slow requests we’ve seen in the past have started to show a different pattern. I’m currently seeing those: 2016-12-05 15:13:37.527469 osd.60 172.22.4.46:6818/19894 8080 : cluster [WRN] 5 slow requests, 1 included below; oldest blocked for > 31.675358 secs 2016-12-05 15:13:37.527478 osd.60 172.22.4.46:6818/19894 8081 : cluster [WRN] slow request 31.674886 seconds old, received at 2016-12-05 15:13:05.852525: osd_op(client.518589944.0:2734750 rbd_data.1e2b40f879e2a9e3.00000000000000a2 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1892352~4096] 277.ceaf1c22 ack+ondisk+write+known_if_redirected e1107736) currently waiting for rw locks As slow requests is something that happens a lot to us, I’m willing to invest some time to understand this more in-depth. I’d be happy to either write an open source tool to help interpreting diagnosing those, or at least write a blog post. The documentation and google don't tell much about the way to interpret those messages. So. Two questions: - any hint (beside from meticuluously reading the source) on interpreting those slow request messages in detail? - specifically the “waiting for rw locks” is something that’s new to us - can someone enlighten me that it means given the message above? Cheers, Christian -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · http://flyingcircus.io Forsterstraße 29 · 06112 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com