On 02/09/2015 08:20 AM, Gregory Farnum wrote: > There are a lot of next steps on > http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ > > You probably want to look at the bits about using the admin socket, and > diagnosing slow requests. :) -Greg Yeah, I've been through most of that. It's still been difficult to pinpoint what's causing the blocking. Can I get some clarification on this comment: > Ceph acknowledges writes after journaling, so fast SSDs are an attractive > option to accelerate the response time–particularly when using the ext4 or > XFS filesystems. By contrast, the btrfs filesystem can write and journal > simultaneously. Does this mean btrfs doesn't need separate journal partition/block device? I.e., is what ceph-disk does when creating with --fs-type btrfs entirely non-optimal (creates a 5G journal partition and the rest a btrfs partition). I just don't get the "by contrast." If the OSD is btrfs+rotational, then why doesn't putting the journal on an SSD help (as much?) if writes are returned after journaling? > On Sun, Feb 8, 2015 at 8:48 PM, Matthew Monaco <matt@xxxxxxxxx> wrote: >> Hello! >> >> *** Shameless plug: Sage, I'm working with Dirk Grunwald on this cluster; I >> believe some of the members of your thesis committee were students of his >> =) >> >> We have a modest cluster at CU Boulder and are frequently plagued by >> "requests are blocked" issues. I'd greatly appreciate any insight or >> pointers. The issue is not specific to any one OSD; I'm pretty sure >> they've all showed up in ceph health detail at this point. >> >> We have 8 identical nodes: >> >> - 5 * 1TB Seagate enterprise SAS drives - btrfs - 1 * Intel 480G S3500 SSD >> - with 5*16G partitions as journals - also hosting the OS, unfortunately >> - 64G RAM - 2 * Xeon E5-2630 v2 - So 24 hyperthreads @ 2.60 GHz - 10G-ish >> IPoIB for networking >> >> So the cluster has 40TB over 40 OSDs total with a very straightforward >> crushmap. These nodes are also (unfortunately for the time being) >> OpenStack compute nodes and 99% of the usage is OpenStack volumes/images. I >> see a lot of kernel messages like: >> >> ib_mthca 0000:02:00.0: Async event 16 for bogus QP 00dc0408 >> >> which may or may not be correlated w/ the Ceph hangs. >> >> Other info: we have 3 mons on 3 of the 8 nodes listed above. The openstack >> volumes pool has 4096 pgs and is sized 3. This is probably too many PGs, >> but came from an initial misunderstanding of the formula in the >> documentation. >> >> Thanks, Matt >> >> >> PS - I'm trying to secure funds to get an additional 8 nodes with a little >> less RAM and CPU to move the OSDs to, with dual 10G Ethernet, and a SATA >> DOM for the OS so the SSD will be strictly journal. I may even be able to >> get an additional SSD or two per-node to use for caching or simply to set >> a higher primary affinity >> >> >> _______________________________________________ ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com