Re: 1256 OSD/21 server ceph cluster performance issues.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Awesome! I have yet to hear of any zfs in ceph chat nor have I seen it on the mailing lists that I have caught. I would assume it would function pretty well considering how long it has been in use along some production systems I have seen. I have little to no experience with it personally though.

I thought the rados issue was weird as well. Even with a degraded cluster I feel like I should be getting better throughput unless I hit an object with a bunch of bad PGs or something. We are using 2x 2x10G  cards in LACP to get over 10G on average and have separate gateway nodes (Went with the Supermicro kit after all) so CPU on those nodes shouldn't be an issue. It is extremely low as it is currently which is again surprising.

I honestly think that this is some kind of radosgw bug in giant as I have another giant cluster with the exact same config that is performing much better with much less hardware. Hopefully it is indeed a bug of somesort and not yet another screw up on my end. Furthermore hopefully I find the bug and fix it for others to find and profit from ^_^.

Thanks for all of your help!


On 12/22/2014 05:26 PM, Craig Lewis wrote:


On Mon, Dec 22, 2014 at 2:57 PM, Sean Sullivan <seapasulli@xxxxxxxxxxxx> wrote:
Thanks Craig!

I think that this may very well be my issue with osds dropping out but I am still not certain as I had the cluster up for a small period while running rados bench for a few days without any status changes.

Mine were fine for a while too, through several benchmarks and a large RadosGW import.  My problems were memory pressure plus an XFS bug, so it took a while to manifest.  When it did, all of the ceph-osd processes on that node would have periods of ~30 seconds with 100% CPU.  Some OSDs would get kicked out.  Once that started, it was a downward spiral of recovery causing increasing load causing more OSDs to get kicked out...

Once I found the memory problem, I cronned a buffer flush, and that usually kept things from getting too bad.

I was able to see on the CPU graphs that CPU was increasing before the problems started.  Once CPU got close to 100% usage on all cores, that's when the OSDs started dropping out.  Hard to say if it was the CPU itself, or if the CPU was just a symptom of the memory pressure plus XFS bug.


 
The real big issue that I have is the radosgw one currently. After I figure out the root cause of the slow radosgw performance and correct that, it should hopefully buy me enough time to figure out the osd slow issue.

It just doesn't make sense that I am getting 8mbps per client no matter 1 or 60 clients while rbd and rados shoot well above 600MBs (above 1000 as well).

That is strange.  I was able to get >300 Mbps per client, on a 3 node cluster with GigE.  I expected that each client would saturate the GigE on their own, but 300 Mbps is more than enough for now.

I am using the Ceph apache and fastcgi module, but otherwise it's a pretty standard apache setup.  My RadosGW processes are using a fair amount of CPU, but as long as you have some idle CPU, that shouldn't be the bottleneck.
 

 

May I ask how you are monitoring your clusters logs? Are you just using rsyslog or do you have a logstash type system set up? Load wise I do not see a spike until I pull an osd out of the cluster or stop then start an osd without marking nodown.

I'm monitoring the cluster with Zabbix, and that gives me pretty much the same info that I'd get in the logs.  I am planning to start pushing the logs to Logstash soon, as soon as I get my logstash is able to handle the extra load.
 

I do think that CPU is probably the cause of the osd slow issue though as it makes the most logical sense. Did you end up dropping ceph and moving to zfs or did you stick with it and try to mitigate it via file flusher/ other tweaks?


I'm still on Ceph.  I worked around the memory pressure by reformatting my XFS filesystems to use regular sized inodes.  It was a rough couple of months, but everything has been stable for the last two months.

I do still want to use ZFS on my OSDs.  It's got all the features of BtrFS, with the extra feature of being production ready.  It's just not production ready in Ceph yet.  It's coming along nicely though, and I hope to reformat one node to be all ZFS sometime next year.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux