Re: 1256 OSD/21 server ceph cluster performance issues.

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Mon, 22 Dec 2014 15:26:47 -0800

On Mon, Dec 22, 2014 at 2:57 PM, Sean Sullivan <seapasulli@xxxxxxxxxxxx> wrote:

    Thanks Craig!

    I think that this may very well be my issue with osds dropping out
    but I am still not certain as I had the cluster up for a small
    period while running rados bench for a few days without any status
    changes.

Mine were fine for a while too, through several benchmarks and a large RadosGW import.  My problems were memory pressure plus an XFS bug, so it took a while to manifest.  When it did, all of the ceph-osd processes on that node would have periods of ~30 seconds with 100% CPU.  Some OSDs would get kicked out.  Once that started, it was a downward spiral of recovery causing increasing load causing more OSDs to get kicked out...

Once I found the memory problem, I cronned a buffer flush, and that usually kept things from getting too bad.

I was able to see on the CPU graphs that CPU was increasing before the problems started.  Once CPU got close to 100% usage on all cores, that's when the OSDs started dropping out.  Hard to say if it was the CPU itself, or if the CPU was just a symptom of the memory pressure plus XFS bug.

 The real big issue that I have is the radosgw one
    currently. After I figure out the root cause of the slow radosgw
    performance and correct that, it should hopefully buy me enough time
    to figure out the osd slow issue. 

    It just doesn't make sense that I am getting 8mbps per client no
    matter 1 or 60 clients while rbd and rados shoot well above 600MBs
    (above 1000 as well). 

That is strange.  I was able to get >300 Mbps per client, on a 3 node cluster with GigE.  I expected that each client would saturate the GigE on their own, but 300 Mbps is more than enough for now.

I am using the Ceph apache and fastcgi module, but otherwise it's a pretty standard apache setup.  My RadosGW processes are using a fair amount of CPU, but as long as you have some idle CPU, that shouldn't be the bottleneck.

    May I ask how you are monitoring your clusters logs? Are you just
    using rsyslog or do you have a logstash type system set up? Load
    wise I do not see a spike until I pull an osd out of the cluster or
    stop then start an osd without marking nodown. 

I'm monitoring the cluster with Zabbix, and that gives me pretty much the same info that I'd get in the logs.  I am planning to start pushing the logs to Logstash soon, as soon as I get my logstash is able to handle the extra load.

    I do think that CPU is probably the cause of the osd slow issue
    though as it makes the most logical sense. Did you end up dropping
    ceph and moving to zfs or did you stick with it and try to mitigate
    it via file flusher/ other tweaks? 

I'm still on Ceph.  I worked around the memory pressure by reformatting my XFS filesystems to use regular sized inodes.  It was a rough couple of months, but everything has been stable for the last two months.

I do still want to use ZFS on my OSDs.  It's got all the features of BtrFS, with the extra feature of being production ready.  It's just not production ready in Ceph yet.  It's coming along nicely though, and I hope to reformat one node to be all ZFS sometime next year.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com