Re: rbd over xfs slow performances

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 18, 2013 at 09:05:12AM -0500, Mark Nelson wrote:
> 
> So Ceph pseudo-randomly distributes data to different OSDs, which
> means that you are more or less limited by the slowest OSD in your
> system.  IE if one node can only process X objects per second,
> outstanding operations will slowly back up on it until you max out
> the number of outstanding operations that are allowed and the other
> OSDs get starved while the slow one tries to catch up.
> 
> So lets say 50MB/s per device to match your slow one.
> 
> 1) If you put your journals on the same devices, you are doing 2
> writes for every incoming write since we do full data journalling.
> Assuming that's the case we are down to 25MB/s.
> 

I increased the flush intervals so it's nearly 30 seconds and disabled
the filestore flusher, now I'm close to those 25MB/s if I do not exceed
30seconds of writing.

For longer time, I get ~ 15MB/s.

> 2) Now, are you writing to a pool that has 2X replication?  If so,
> you are writing out an object to both devices for every write, but
> also incurring extra latency because the primary OSD will wait until
> it has replicated a write to the secondary OSD before it can
> acknowledge to the client.  With replication of 2 and 2 servers,
> that means that our aggregate throughput at best can only be 25MB/s
> if each server can only individually do 25MB/s.  In reality because
> of the extra overhead involved, it will probably be less.
> 

is there some parameter to make this synchronisation "asynchrone", i.e.
send the ack if it has reached the other server buffer, not the other
server disk?

(I understand of course the risk of loosing data in this circumstance)

> 3) Now we must also account for the extra overhead that XFS causes.
> We suggest XFS because it's stable, but especially on ceph prior to
> version 0.58, it's not typically as fast as BTRFS/EXT4. 

Well, as all my servers are using ext4, I'll give it a try but I don't
expect to gain a lot of percents in performances ;)

> Some things
> that might help are using noatime and inode64, making sure you are

Yes I do use those options.

> describing your RAID array to XFS, and make sure your partitions are
> properly aligned for the RAID. 

Well I don't know how to do/check this, but will try to find the way to
do this ;)

> One other suggestion:  If your
> controllers have WB cache, enabling it can really help in some
> cases.
> 

of course, this is the first thing I check on a server ;)

thank you very much for those explanations!


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux