There are a lot of next steps on http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ You probably want to look at the bits about using the admin socket, and diagnosing slow requests. :) -Greg On Sun, Feb 8, 2015 at 8:48 PM, Matthew Monaco <matt@xxxxxxxxx> wrote: > Hello! > > *** Shameless plug: Sage, I'm working with Dirk Grunwald on this cluster; I > believe some of the members of your thesis committee were students of his =) > > We have a modest cluster at CU Boulder and are frequently plagued by "requests > are blocked" issues. I'd greatly appreciate any insight or pointers. The issue > is not specific to any one OSD; I'm pretty sure they've all showed up in ceph > health detail at this point. > > We have 8 identical nodes: > > - 5 * 1TB Seagate enterprise SAS drives > - btrfs > - 1 * Intel 480G S3500 SSD > - with 5*16G partitions as journals > - also hosting the OS, unfortunately > - 64G RAM > - 2 * Xeon E5-2630 v2 > - So 24 hyperthreads @ 2.60 GHz > - 10G-ish IPoIB for networking > > So the cluster has 40TB over 40 OSDs total with a very straightforward crushmap. > These nodes are also (unfortunately for the time being) OpenStack compute nodes > and 99% of the usage is OpenStack volumes/images. I see a lot of kernel messages > like: > > ib_mthca 0000:02:00.0: Async event 16 for bogus QP 00dc0408 > > which may or may not be correlated w/ the Ceph hangs. > > Other info: we have 3 mons on 3 of the 8 nodes listed above. The openstack > volumes pool has 4096 pgs and is sized 3. This is probably too many PGs, but > came from an initial misunderstanding of the formula in the documentation. > > Thanks, > Matt > > > PS - I'm trying to secure funds to get an additional 8 nodes with a little less > RAM and CPU to move the OSDs to, with dual 10G Ethernet, and a SATA DOM for the > OS so the SSD will be strictly journal. I may even be able to get an additional > SSD or two per-node to use for caching or simply to set a higher primary affinity > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com