On Mon, Dec 22, 2014 at 2:57 PM, Sean Sullivan <seapasulli@xxxxxxxxxxxx> wrote:
Thanks Craig!
I think that this may very well be my issue with osds dropping out but I am still not certain as I had the cluster up for a small period while running rados bench for a few days without any status changes.
Mine were fine for a while too, through several benchmarks and a large RadosGW import. My problems were memory pressure plus an XFS bug, so it took a while to manifest. When it did, all of the ceph-osd processes on that node would have periods of ~30 seconds with 100% CPU. Some OSDs would get kicked out. Once that started, it was a downward spiral of recovery causing increasing load causing more OSDs to get kicked out...
Once I found the memory problem, I cronned a buffer flush, and that usually kept things from getting too bad.
I was able to see on the CPU graphs that CPU was increasing before the problems started. Once CPU got close to 100% usage on all cores, that's when the OSDs started dropping out. Hard to say if it was the CPU itself, or if the CPU was just a symptom of the memory pressure plus XFS bug.
The real big issue that I have is the radosgw one currently. After I figure out the root cause of the slow radosgw performance and correct that, it should hopefully buy me enough time to figure out the osd slow issue.
It just doesn't make sense that I am getting 8mbps per client no matter 1 or 60 clients while rbd and rados shoot well above 600MBs (above 1000 as well).
That is strange. I was able to get >300 Mbps per client, on a 3 node cluster with GigE. I expected that each client would saturate the GigE on their own, but 300 Mbps is more than enough for now.
I am using the Ceph apache and fastcgi module, but otherwise it's a pretty standard apache setup. My RadosGW processes are using a fair amount of CPU, but as long as you have some idle CPU, that shouldn't be the bottleneck.
May I ask how you are monitoring your clusters logs? Are you just using rsyslog or do you have a logstash type system set up? Load wise I do not see a spike until I pull an osd out of the cluster or stop then start an osd without marking nodown.
I'm monitoring the cluster with Zabbix, and that gives me pretty much the same info that I'd get in the logs. I am planning to start pushing the logs to Logstash soon, as soon as I get my logstash is able to handle the extra load.
I do think that CPU is probably the cause of the osd slow issue though as it makes the most logical sense. Did you end up dropping ceph and moving to zfs or did you stick with it and try to mitigate it via file flusher/ other tweaks?
I'm still on Ceph. I worked around the memory pressure by reformatting my XFS filesystems to use regular sized inodes. It was a rough couple of months, but everything has been stable for the last two months.
I do still want to use ZFS on my OSDs. It's got all the features of BtrFS, with the extra feature of being production ready. It's just not production ready in Ceph yet. It's coming along nicely though, and I hope to reformat one node to be all ZFS sometime next year.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com