Hello, Nice cluster, I wouldn't mind getting my hand or her ample nacelles, er, wrong movie. ^o^ On Thu, 18 Dec 2014 21:35:36 -0600 Sean Sullivan wrote: > Hello Yall! > > I can't figure out why my gateways are performing so poorly and I am not > sure where to start looking. My RBD mounts seem to be performing fine > (over 300 MB/s) > I wouldn't call 300MB/s writes fine with a cluster of this size. How are you testing this (which tool, settings, from where)? > while uploading a 5G file to Swift/S3 takes 2m32s > (32MBps i believe). If we try a 1G file it's closer to 8MBps. Testing > with nuttcp shows that I can transfer from a client with 10G interface > to any node on the ceph cluster at the full 10G and ceph can transfer > close to 20G between itself. I am not really sure where to start looking > as outside of another issue which I will mention below I am clueless. > I know nuttin about radosgw, but I wouldn't be surprised that the difference you see here is based how that is eventually written to the storage (smaller chunks than what you're using to test RBD performance). > I have a weird setup I'm always interested in monster storage nodes, care to share what case this is? > [osd nodes] > 60 x 4TB 7200 RPM SATA Drives What maker/model? > 12 x 400GB s3700 SSD drives Journals, one assumes. > 3 x SAS2308 PCI-Express Fusion-MPT cards (drives are split evenly across > the 3 cards) I smell a port-expander or 3 on your backplane. And while making sure that your SSDs get undivided 6Gb/s love would probably help, you still have plenty of bandwidth here (4.5Gb/s per drive), so no real issue. > 512 GB of RAM Sufficient. > 2 x CPU E5-2670 v2 @ 2.50GHz Vastly, and I mean VASTLY insufficient. It would still be 10GHz short of the (optimistic IMHO) recommendation of 1GHz per OSD w/o SSD journals. With SSD journals my experience shows that with certain write patterns even 3.5GHz per OSD isn't sufficient. (there are several threads about this here) > 2 x 10G interfaces LACP bonded for cluster traffic > 2 x 10G interfaces LACP bonded for public traffic (so a total of 4 10G > ports) > Your journals could handle 5.5GB/s, so you're limiting yourself here a bit, but not too horribly. If I had been given this hardware, I would have RAIDed things (different controller) to keep the number of OSDs per node to something the CPUs (any CPU really!) can handle. Something like 16 x 4HDD RAID10 + SSDs +spares (if possible) for performance and 8 x 8HDD RAID6 + SSDs +spares for capacity. That still gives you 336 or 168 OSDs, allows for a replication size of 2 and as bonus you'll probably never have to deal with a failed OSD. ^o^ > [monitor nodes and gateway nodes] > 4 x 300G 1500RPM SAS drives in raid 10 I would have used Intel DC S3700s here as well, mons love their leveldb to be fast but > 1 x SAS 2208 combined with this it should be fine. > 64G of RAM > 2 x CPU E5-2630 v2 > 2 x 10G interfaces LACP bonded for public traffic (total of 2 10G ports) > > > Here is a pastebin dump of my details, I am running ceph giant 0.87 > (c51c8f9d80fa4e0168aa52685b8de40e42758578) and kernel 3.13.0-40-generic > across the entire cluster. > > http://pastebin.com/XQ7USGUz -- ceph health detail That looks positively scary, blocked requests for hours... > http://pastebin.com/8DCzrnq1 -- /etc/ceph/ceph.conf > http://pastebin.com/BC3gzWhT -- ceph osd tree scroll, scroll, woah! ^o^ > http://pastebin.com/eRyY4H4c -- /var/log/radosgw/client.radosgw.rgw03.log > http://paste.ubuntu.com/9565385/ -- crushmap (pastebin wouldn't let me) > > > We ran into a few issues with density (conntrack limits, pid limit, and > number of open files) all of which I adjusted by bumping the ulimits in > /etc/security/limits.d/ceph.conf or sysctl. I am no longer seeing any > signs of these limits being hit so I have not included my limits or > sysctl conf. If you like this as well let me know and I can include it. > > One of the issues I am seeing is that OSDs have started to flop/ be > marked as slow. The cluster was HEALTH_OK with all of the disks added > for over 3 weeks before this behaviour started. Anything changed? In particular I assume this a new cluster, has much data been added? A "ceph -s" output would be nice and educational. Can you correlate the time when you start seeing slow, blocked requests with scrubs or deep-scrubs? If so try setting your cluster temporarily to noscrub and nodeep-scrub and see if that helps. In case it does, setting "osd_scrub_sleep" (start with something high like 1.0 or 0.5 and lower until it hurts again) should help permanently. I have a cluster that could scrub things in minutes until the amount of objects/data and steady load reached a threshold and now its hours. In this context, check the fragmentation of your OSDs. How busy (ceph.log ops/s) is your cluster at these times? > RBD transfers seem to be > fine for the most part which makes me think that this has little baring > on the gateway issue but it may be related. Rebooting the OSD seems to > fix this issue. > Do you see the same OSDs misbehaving over and over again or is this fully random? How busy are your storage nodes? CPU wise mostly, atop is a nice tool to check this. My guess w/o further data at this point would be that you're running out of CPU at certain times, explaining your flopping OSDs. Regards, Christian > I would like to figure out the root cause of both of these issues and > post the results back here if possible (perhaps it can help other > people). I am really looking for a place to start looking at as the > gateway just outputs that it is posting data and all of the logs > (outside of the monitors reporting down osds) seem to show a fully > functioning cluster. > > Please help. I am in the #ceph room on OFTC every day as 'seapasulli' as > well. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com