Ok,
So good news that RADOS appears to be doing well. I'd say next is to
follow some of the recommendations here:
http://ceph.com/docs/master/radosgw/troubleshooting/
If you examine the objecter_requests and perfcounters during your
cosbench write test, it might help explain where the requests are
backing up. Another thing to look for (as noted in the above URL) are
HTTP errors in the apache logs (if relevant).
Other general thoughts: When you upgraded to hammer did you change the
RGW configuration at all? Are you using civetweb now? Does the
rgw.buckets pool have enough PGs?
Mark
On 07/21/2015 08:17 PM, Florent MONTHEL wrote:
Hi Mark
I've something like 600 write IOPs on EC pool and 800 write IOPs on replicated 3 pool with rados bench
With Radosgw I have 30/40 write IOPs with Cosbench (1 radosgw- the same with 2) and servers are sleeping :
- 0.005 core for radosgw process
- 0.01 core for osd process
I don't know if we can have .rgw* pool locking or something like that with Hammer (or situation specific to me)
On 100% read profile, Radosgw and Ceph servers are working very well with more than 6000 IOPs on one radosgw server :
- 7 cores for radosgw process
- 1 core for each osd process
- 0,5 core for each Apache process
Thanks
Sent from my iPhone
On 14 juil. 2015, at 21:03, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Hi Florent,
10x degradation is definitely unusual! A couple of things to look at:
Are 8K rados bench writes to the rgw.buckets pool slow? You can with something like:
rados -p rgw.buckets bench 30 write -t 256 -b 8192
You may also want to try targeting a specific RGW server to make sure the RR-DNS setup isn't interfering (at least while debugging). It may also be worth creating a new replicated pool and try writes to that pool as well to see if you see much difference.
Mark
On 07/14/2015 07:17 PM, Florent MONTHEL wrote:
Yes of course thanks Mark
Infrastructure : 5 servers with 10 sata disks (50 osd at all) - 10gb connected - EC 2+1 on rgw.buckets pool - 2 radosgw RR-DNS like installed on 2 cluster servers
No SSD drives used
We're using Cosbench to send :
- 8k object size : 100% read with 256 workers : better results with Hammer
- 8k object size : 80% read - 20% write with 256 workers : real degradation between Firefly and Hammer (divided by something like 10)
- 8k object size : 100% write with 256 workers : real degradation between Firefly and Hammer (divided by something like 10)
Thanks
Sent from my iPhone
On 14 juil. 2015, at 19:57, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
On 07/14/2015 06:42 PM, Florent MONTHEL wrote:
Hi All,
I've just upgraded Ceph cluster from Firefly 0.80.8 (Redhat Ceph 1.2.3) to Hammer (Redhat Ceph 1.3) - Usage : radosgw with Apache 2.4.19 on MPM prefork mode
I'm experiencing huge write performance degradation just after upgrade (Cosbench).
Do you already run performance tests between Hammer and Firefly ?
No problem with read performance that was amazing
Hi Florent,
Can you talk a little bit about how your write tests are setup? How many concurrent IOs and what size? Also, do you see similar problems with rados bench?
We have done some testing and haven't seen significant performance degradation except when switching to civetweb which appears to perform deletes more slowly than what we saw with apache+fcgi.
Mark
Sent from my iPhone
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com