Re: rbd cache on full ssd cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, Mar 11, 2016 at 2:01 AM, Christian Balzer <chibi@xxxxxxx> wrote:


Hello,

As alway there are many similar threads in here, googling and reading up
stuff are good for you.

On Thu, 10 Mar 2016 16:55:03 +0200 Yair Magnezi wrote:

> Hello Cephers .
>
> I wonder if anyone has some experience with full ssd cluster .
> We're testing ceph ( "firefly" ) with 4 nodes ( supermicro
>  SYS-F628R3-R72BPT ) * 1TB  SSD , total of 12 osds .
> Our network is 10 gig .
Much more, relevant details, from SW versions (kernel, OS, Ceph) and
configuration (replica size of your pool) to precise HW info.

    H/W  --> 4 nodes  supermicro ( SYS-F628R3-R72BPT ) , every node has 64 GB mem , 
                  MegaRAID SAS 2208 : RAID0 , 4 * 1 TB ssd ( SAMSUNG MZ7KM960HAHP-00005 ) 
                 
    Cluster --. 4 nodes , 12 OSD's , replica size = 2  , ubuntu 14.04.1 LTS ,

In particular your SSDs, exact maker/version/size.
Where are your journals?

    SAMSUNG MZ7KM960HAHP-00005 , 893.752 GB 
    Journals on the same drive data ( all SSD as  mentioned ) 
 
Also Firefly is EOL, Hammer and even more so the upcoming Jewel have
significant improvements with SSDs.

> We used the ceph_deploy for installation with all defaults  ( followed
> ceph documentation for integration with open-stack )
> As much as we understand there is no need to enable the rbd cache as
> we're running on full ssd.
RBD cache as in the client side librbd cache is always very helpful, fast
backing storage or not.
It can significantly reduce the number of small writes, something Ceph has
to do a lot of heavy lifting for.

> bench marking the cluster shows very poor performance write but mostly
> read ( clients are open-stack but also vmware instances ) .

Benchmarking how (exact command line for fio for example) and with what
results?
You say poor, but that might be "normal" for your situation, we can't
really tell w/o hard data.

   
   
   fio --name=randread --ioengine=libaio --iodepth=1 --rw=randread --bs=4k --direct=1 --size=256M --numjobs=10 --runtime=120 --group_reporting --directory=/ceph_test2
    
   root@open-compute1:~# fio --name=randread --ioengine=libaio --iodepth=1 --rw=randread --bs=4k --direct=1 --size=256M --numjobs=10 --runtime=120 --group_reporting --directory=/ceph_test2
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
fio-2.1.3
Starting 10 processes
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
randread: Laying out IO file(s) (1 file(s) / 256MB)
Jobs: 10 (f=10): [rrrrrrrrrr] [100.0% done] [4616KB/0KB/0KB /s] [1154/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=10): err= 0: pid=25393: Mon Mar 14 09:17:24 2016
  read : io=597360KB, bw=4976.5KB/s, iops=1244, runt=120038msec
    slat (usec): min=4, max=497, avg=22.91, stdev=14.70
    clat (usec): min=154, max=57106, avg=8007.97, stdev=14477.89
     lat (usec): min=276, max=57125, avg=8031.36, stdev=14477.36
    clat percentiles (usec):
     |  1.00th=[  350],  5.00th=[  390], 10.00th=[  414], 20.00th=[  454],
     | 30.00th=[  494], 40.00th=[  540], 50.00th=[  612], 60.00th=[  732],
     | 70.00th=[ 1064], 80.00th=[10304], 90.00th=[37632], 95.00th=[38656],
     | 99.00th=[40192], 99.50th=[41216], 99.90th=[43264], 99.95th=[43776],
     | 99.99th=[44800]
    bw (KB  /s): min=  314, max=  967, per=10.01%, avg=498.08, stdev=83.91
    lat (usec) : 250=0.01%, 500=31.64%, 750=29.32%, 1000=8.21%
    lat (msec) : 2=5.22%, 4=3.35%, 10=2.22%, 20=0.46%, 50=19.56%
    lat (msec) : 100=0.01%
  cpu          : usr=0.14%, sys=0.41%, ctx=153613, majf=0, minf=78
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=149340/w=0/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=597360KB, aggrb=4976KB/s, minb=4976KB/s, maxb=4976KB/s, mint=120038msec, maxt=120038msec

Disk stats (read/write):
  rbd0: ios=149207/3, merge=0/3, ticks=1194356/0, in_queue=1194452, util=100.00%

    
  conf file ( client side ) -->

  [global]
fsid = 609317d9-c8ee-462f-a82f-f5c28c6c561b
mon_initial_members = open-ceph1,open-ceph2,open-ceph3
mon_host = 10.63.4.101,10.63.4.102,10.63.4.103
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true
public_network = 10.63.4.0/23

filestore_flusher = false

[client]
rbd cache = true
cache writethrough until flush = true
rbd_readahead_trigger_requests = 50
rbd_readahead_max_bytes = 4096
rbd_readahead_disable_after_bytes = 0
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/ceph/
rbd concurrent management ops = 20



"Poor" write performance would indicative of SSDs that are unsuitable for
Ceph.

> any input is much appreciated ( especially want to know which parameter
> is crucial for read performance in full ssd cluster )
>

read_ahead in your clients can improve things, but I guess your cluster
has more fundamental problems than this.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028552.html


Thanks
 
 
Christian
--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
http://www.gol.com/


This e-mail, as well as any attached document, may contain material which is confidential and privileged and may include trademark, copyright and other intellectual property rights that are proprietary to Kenshoo Ltd,  its subsidiaries or affiliates ("Kenshoo"). This e-mail and its attachments may be read, copied and used only by the addressee for the purpose(s) for which it was disclosed herein. If you have received it in error, please destroy the message and any attachment, and contact us immediately. If you are not the intended recipient, be aware that any review, reliance, disclosure, copying, distribution or use of the contents of this message without Kenshoo's express permission is strictly prohibited.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux