Re: Impact of fancy striping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas,

Does atop show anything out of the ordinary when you run the benchmark (both on the Ceph nodes and the node you run the benchmark from)?
It should give a good indication what could be limiting your performance.

I would highly recommend against using 9 disk RAID0 for the disks:
* I expect it to be significantly slower
* Failure of one disk would result in a re-sync of 9 x the amount of data. This could take ages while you have significantly reduced performance
* Seems there is a significant chance of catastrophic failure losing all data.
   If you really want to use RAID I would use RAID 10 and do 2 instead of 3 replicas.


Cheers,
Robert van Leeuwen






From: ceph-users-bounces@xxxxxxxxxxxxxx [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of nicolasc [nicolas.canceill@xxxxxxxxxxx]
Sent: Thursday, December 12, 2013 5:23 PM
To: Craig Lewis; ceph-users@xxxxxxxxxxxxxx
Subject: Re: Impact of fancy striping

Hi James, Robert, Craig,

Thank your for those informative answers! You all pointed out interesting issues.

I know losing 1 SAS disk in RAID0 means losing all journals, but this is for testing so I do not care.

I do not think sequential write speed to the RAID0 array is the bottleneck (I benchmarked it at more than 500MB/s). However, I failed to realize that the synchronous writes of several OSDs would become random instead of sequential, thank you for explaining that.

I want to try this setup with several journals on a single partition (to mitigate seek time), and I also want to try replacing my 9 OSDs (per node) by a big RAID0 array of 9 disks — leaving replication to Ceph. But first I wanted to get an idea of SSD performance, so I created a 1GB RAMdisk for every OSD journal.

Shockingly, even with every journal on a dedicated RAMdisk, I still witnessed less than 100MB/s sequential writes with 4MB blocks. This is writing to an RBD image, independently of the format, the size, the striping pattern, or whether the image is mounted (with XFS on it) or directly accessed.

So, maybe my journal setup is not satisfying, but the bottleneck seems to be somewhere else. Any idea at all about striping? Or maybe pool/PG config? (I blindly followed the PG ratios indicated in the docs).

Thank you all for your help. Best regards,

Nicolas Canceill
Scalable Storage Systems
SURFsara (Amsterdam, NL)




On 12/06/2013 07:31 PM, Robert van Leeuwen wrote:
If I understand correctly you have one sas disk as a journal for multiple OSDs.
If you do small synchronous writes it will become a IO bottleneck pretty quickly:
Due to multiple journals on the same disk it will no longer be sequential writes writes to one journal but  4k writes to x journals making it fully random.
I would expect a performance of 100 to 200 IOPS max.
Doing an iostat -x or atop should show this bottleneck immediately.
This is also the reason to go with SSDs: they have reasonable random IO performance.

Cheers,
Robert van Leeuwen

Sent from my iPad

On 6 dec. 2013, at 17:05, "nicolasc" <nicolas.canceill@xxxxxxxxxxx> wrote:


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux