Primary workload - rbd, kvm
пятница, 14 августа 2015 г. пользователь Ben Hines написал:
пятница, 14 августа 2015 г. пользователь Ben Hines написал:
Nice to hear that you have no SSD failures yet in 10months.
How many OSDs are you running, and what is your primary ceph workload?
(RBD, rgw, etc?)
-Ben
On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович
<megov@xxxxxxxxxx> wrote:
> Hi!
>
>
> Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for ceph
> journals
> and DC S3700 400Gb in the SSD pool: same hosts, separate root in crushmap.
>
> SSD pool are not yet in production, journаlling SSDs works under production
> load
> for 10 months. They're in good condition - no faults, no degradation.
>
> We specially take 200Gb SSD for journals to reduce costs, and also have a
> higher
> than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
> 1/3 to 1/6.
>
> So, as a conclusion - I'll recommend you to get a bigger budget and buy
> durable
> and fast SSDs for Ceph.
>
> Megov Igor
> CIO, Yuterra
>
> ________________________________
> От: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> от имени Voloshanenko
> Igor <igor.voloshanenko@xxxxxxxxx>
> Отправлено: 13 августа 2015 г. 15:54
> Кому: Jan Schermer
> Копия: ceph-users@xxxxxxxxxxxxxx
> Тема: Re: CEPH cache layer. Very slow
>
> So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
> intel S3500 240G (((
>
> Any other models? (((
>
> 2015-08-13 15:45 GMT+03:00 Jan Schermer <jan@xxxxxxxxxxx>:
>>
>> I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO
>> and not just "PRO" or "DC EVO"!).
>> Those were very cheap but are out of stock at the moment (here).
>> Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
>> which IMO makes them superior without needing many tricks to do its job.
>>
>> Jan
>>
>> On 13 Aug 2015, at 14:40, Voloshanenko Igor <igor.voloshanenko@xxxxxxxxx>
>> wrote:
>>
>> Tnx, Irek! Will try!
>>
>> but another question to all, which SSD good enough for CEPH now?
>>
>> I'm looking into S3500 240G (I have some S3500 120G which show great
>> results. Around 8x times better than Samsung)
>>
>> Possible you can give advice about other vendors/models with same or below
>> price level as S3500 240G?
>>
>> 2015-08-13 12:11 GMT+03:00 Irek Fasikhov <malmyzh@xxxxxxxxx>:
>>>
>>> Hi, Igor.
>>> Try to roll the patch here:
>>>
>>> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
>>>
>>> P.S. I am no longer tracks changes in this direction(kernel), because we
>>> use already recommended SSD
>>>
>>> С уважением, Фасихов Ирек Нургаязович
>>> Моб.: +79229045757
>>>
>>> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
>>> <igor.voloshanenko@xxxxxxxxx>:
>>>>
>>>> So, after testing SSD (i wipe 1 SSD, and used it for tests)
>>>>
>>>> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
>>>> --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
>>>> --gr[53/1800]
>>>> ting --name=journal-test
>>>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
>>>> iodepth=1
>>>> fio-2.1.3
>>>> Starting 1 process
>>>> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
>>>> 00m:00s]
>>>> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
>>>> 10:46:42 2015
>>>> write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
>>>> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>>>> lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
>>>> clat percentiles (usec):
>>>> | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
>>>> 2928],
>>>> | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
>>>> 3408],
>>>> | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
>>>> 4016],
>>>> | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
>>>> 99.95th=[10048],
>>>> | 99.99th=[14912]
>>>> bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
>>>> stdev=34.31
>>>> lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
>>>> cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
>>>> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>>> >=64=0.0%
>>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>>> >=64=0.0%
>>>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>>> >=64=0.0%
>>>> issued : total=r=0/w=17243/d=0, short=r=0/w=0/d=0
>>>>
>>>> Run status group 0 (all jobs):
>>>> WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
>>>> mint=60001msec, maxt=60001msec
>>>>
>>>> Disk stats (read/write):
>>>> sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576,
>>>> util=99.30%
>>>>
>>>> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
>>>>
>>>> I try to change cache mode :
>>>> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
>>>> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type
>>>>
>>>> no luck, still same shit results, also i found this article:
>>>> https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
>>>> which disable CMD_FLUSH
>>>> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba
>>>>
>>>> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
>>>> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
>>>> because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this
>>>> exception was not included into libsata.c)
>>>>
>>>> 2015-08-12 19:17 GMT+03:00 Pieter Koorts <pieter.koorts@xxxxxx>:
>>>>>
>>>>> Hi Igor
>>>>>
>>>>> I suspect you have very much the same problem as me.
>>>>>
>>>>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg22260.html
>>>>>
>>>>> Basically Samsung drives (like many SATA SSD's) are very much hit and
>>>>> miss so you will need to test them like described here to see if they are
>>>>> any good.
>>>>> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>>>>>
>>>>> To give you an idea my average performance went from 11MB/s (with
>>>>> Samsung SSD) to 30MB/s (without any SSD) on write performance. This is a
>>>>> very small cluster.
>>>>>
>>>>> Pieter
>>>>>
>>>>> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor
>>>>> <igor.voloshanenko@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes,
>>>>> 12 disks on each, 10 HDD, 2 SSD)
>>>>>
>>>>> Also we cover this with custom crushmap with 2 root leaf
>>>>>
>>>>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>>>> -100 5.00000 root ssd
>>>>> -102 1.00000 host ix-s2-ssd
>>>>> 2 1.00000 osd.2 up 1.00000 1.00000
>>>>> 9 1.00000 osd.9 up 1.00000 1.00000
>>>>> -103 1.00000 host ix-s3-ssd
>>>>> 3 1.00000 osd.3 up 1.00000 1.00000
>>>>> 7 1.00000 osd.7 up 1.00000 1.00000
>>>>> -104 1.00000 host ix-s5-ssd
>>>>> 1 1.00000 osd.1 up 1.00000 1.00000
>>>>> 6 1.00000 osd.6 up 1.00000 1.00000
>>>>> -105 1.00000 host ix-s6-ssd
>>>>> 4 1.00000 osd.4 up 1.00000 1.00000
>>>>> 8 1.00000 osd.8 up 1.00000 1.00000
>>>>> -106 1.00000 host ix-s7-ssd
>>>>> 0 1.00000 osd.0 up 1.00000 1.00000
>>>>> 5 1.00000 osd.5 up 1.00000 1.00000
>>>>> -1 5.00000 root platter
>>>>> -2 1.00000 host ix-s2-platter
>>>>> 13 1.00000 osd.13 up 1.00000 1.00000
>>>>> 17 1.00000 osd.17 up 1.00000 1.00000
>>>>> 21 1.00000 osd.21 up 1.00000 1.00000
>>>>> 27 1.00000 osd.27 up 1.00000 1.00000
>>>>> 32 1.00000 osd.32 up 1.00000 1.00000
>>>>> 37 1.00000 osd.37 up 1.00000 1.00000
>>>>> 44 1.00000 osd.44 up 1.00000 1.00000
>>>>> 48 1.00000 osd.48 up 1.00000 1.00000
>>>>> 55 1.00000 osd.55 up 1.00000 1.00000
>>>>> 59 1.00000 osd.59 up 1.00000 1.00000
>>>>> -3 1.00000 host ix-s3-platter
>>>>> 14 1.00000 osd.14 up 1.00000 1.00000
>>>>> 18 1.00000 osd.18 up 1.00000 1.00000
>>>>> 23 1.00000 osd.23 up 1.00000 1.00000
>>>>> 28 1.00000 osd.28 up 1.00000 1.00000
>>>>> 33 1.00000 osd.33 up 1.00000 1.00000
>>>>> 39 1.00000 osd.39 up 1.00000 1.00000
>>>>> 43 1.00000 osd.43 up 1.00000 1.00000
>>>>> 47 1.00000 osd.47 up 1.00000 1.00000
>>>>> 54 1.00000 osd.54 up 1.00000 1.00000
>>>>> 58 1.00000 osd.58 up 1.00000 1.00000
>>>>> -4 1.00000 host ix-s5-platter
>>>>> 11 1.00000 osd.11 up 1.00000 1.00000
>>>>> 16 1.00000 osd.16 up 1.00000 1.00000
>>>>> 22 1.00000 osd.22 up 1.00000 1.00000
>>>>> 26 1.00000 osd.26 up 1.00000 1.00000
>>>>> 31 1.00000 osd.31 up 1.00000 1.00000
>>>>> 36 1.00000 osd.36 up 1.00000 1.00000
>>>>> 41 1.00000 osd.41 up 1.00000 1.00000
>>>>> 46 1.00000 osd.46 up 1.00000 1.00000
>>>>> 51 1.00000 osd.51 up 1.00000 1.00000
>>>>> 56 1.00000 osd.56 up 1.00000 1.00000
>>>>> -5 1.00000 host ix-s6-platter
>>>>> 12 1.00000 osd.12 up 1.00000 1.00000
>>>>> 19 1.00000 osd.19 up 1.00000 1.00000
>>>>> 24 1.00000 osd.24 up 1.00000 1.00000
>>>>> 29 1.00000 osd.29 up 1.00000 1.00000
>>>>> 34 1.00000 osd.34 up 1.00000 1.00000
>>>>> 38 1.00000 osd.38 up 1.00000 1.00000
>>>>> 42 1.00000 osd.42 up 1.00000 1.00000
>>>>> 50 1.00000 osd.50 up 1.00000 1.00000
>>>>> 53 1.00000 osd.53 up 1.00000 1.00000
>>>>> 57 1.00000 osd.57 up 1.00000 1.00000
>>>>> -6 1.00000 host ix-s7-platter
>>>>> 10 1.00000 osd.10 up 1.00000 1.00000
>>>>> 15 1.00000 osd.15 up 1.00000 1.00000
>>>>> 20 1.00000 osd.20 up 1.00000 1.00000
>>>>> 25 1.00000 osd.25 up 1.00000 1.00000
>>>>> 30 1.00000 osd.30 up 1.00000 1.00000
>>>>> 35 1.00000 osd.35 up 1.00000 1.00000
>>>>> 40 1.00000 osd.40 up 1.00000 1.00000
>>>>> 45 1.00000 osd.45 up 1.00000 1.00000
>>>>> 49 1.00000 osd.49 up 1.00000 1.00000
>>>>> 52 1.00000 osd.52 up 1.00000 1.00000
>>>>>
>>>>>
>>>>> Then create 2 pools, 1 on HDD (platters), 1 on SSD/
>>>>> and put SSD pul in from of HDD pool (cache tier)
>>>>>
>>>>> now we receive very bad performance results from cluster.
>>>>> Even with rados bench we received very unstable performance with even
>>>>> zero speed. So it's create very big issues for our clients.
>>>>>
>>>>> I try to tune all possible values, including OSD, but still no luck.
>>>>>
>>>>> Also very unbelievble situation, when i do
>>>>> ceph tell... bench on SSD OSD - i receive about 20MB/s
>>>>> If for HDD - 67 MB/s...
>>>>>
>>>>> I don;t understand why cache pools which consist of SSD works so bad...
>>>>> We used Samsung 850 Pro 256 Gb as SSDs
>>>>>
>>>>> Can you guys give me advice please...
>>>>>
>>>>> Also very idiotic thing, when i set cache-mode to forward and try to
>>>>> flush-evict all object (not all object evicted, some busy (locked on KVM
>>>>> sides). but now i receive quite stable results for rados bench
>>>>>
>>>>> Total time run: 30.275871
>>>>> Total writes made: 2076
>>>>> Write size: 4194304
>>>>> Bandwidth (MB/sec): 274.278
>>>>>
>>>>> Stddev Bandwidth: 75.1445
>>>>> Max bandwidth (MB/sec): 368
>>>>> Min bandwidth (MB/sec): 0
>>>>> Average Latency: 0.232892
>>>>> Stddev Latency: 0.240356
>>>>> Max latency: 2.01436
>>>>> Min latency: 0.0716344
>>>>>
>>>>> Without zeros, etc... So i don't understand how it's possible.
>>>>>
>>>>> Also interesting thing, when i disable overlay for pool, rados bench
>>>>> become around 70MB/s as for ordinary HDD, but in same time rados bench for
>>>>> SSD pool, which not used anymore show same bad results...
>>>>>
>>>>> So please, give me some direction to deeg...
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com