Re: About use same SSD for OS and Journal

Kurt Bauer <kurt.bauer@xxxxxxxxxxxx> · Sat, 26 Oct 2013 14:24:57 +0200



Hi,

Mike Dawson schrieb:
> Kurt,
>
> When you had OS and osd journals co-located, how many osd journals
> were on the SSD containing the OS?
5 OSD journals + MON journal + OS
>
> You mention you now use a 5:1 ratio. Was the ratio something like 11:1
> before (one SSD for OS plus 11 osd journals to 11 OSDs in a 12-disk
> chassis)?
We now have 2 15k SAS2 HDDs in Raid 1 for OS and MON, 10 OSDs (2TB and
3TB SAS2 HDDs), 2 SAS2 SSDs, each holds journals for 5 OSDs on raw
partition, in a 16 slot chassis (3 nodes).
>
> Also, what throughput per drive were you seeing on the cluster during
> the periods where things got laggy due to backfills, etc?
Hmm, we didn't measure anything then. We were really in state of
emergency, because most of the KVM hosts, having there disks on the CEPH
cluster had massive problems with IO, applications crashing and so on.
The trigger of the problem were disks getting full and therefore OSD
near full warnings. I then did two things, which obviously stressed the
cluster remarkably. I increased the number of PGs for two pools, as they
were two low and the utilization of the disk was not balanced and added
2 OSDs.
The backfilling was extremly slow then, about 0.001% per 10Min. (around
20% degradation) and the SSD was at 100% utilization the whole time. All
the parameters were at defaults and this was cuttlefish.
>
> Last, did you attempt to throttle using ceph config setting in the old
> setup? 
What saved us, was exactly that, ie. throttling, what we ended up with was
osd recovery max active = 2
osd max backfills = 5
That sped up the recovery and left enough resources for the KVM hosts to
"survive", although IO intensive applications were still "not happy".
> Do you need to throttle in your current setup?
That's a very good question ;-) Right now it's still throttled, but we
need to insert 8 more OSDs to the cluster and a further node in the near
future. We are on dumpling now and rebuilt the nodes as mentioned above,
but I'm not sure what's the right strategy for adding OSDs. Given the
experience we had, I'm a little reluctant in just setting the defaults
again and hoping everything will work out right ;-).
But I think I'll ask about that in a seperate mail on the list shortly.

Best regards,
Kurt
>
> Thanks,
> Mike Dawson
>
>
> On 10/24/2013 10:40 AM, Kurt Bauer wrote:
>> Hi,
>>
>> we had a setup like this and ran into trouble, so I would strongly
>> discourage you from setting it up like this. Under normal circumstances
>> there's no problem, but when the cluster is under heavy load, for
>> example when it has a lot of pgs backfilling, for whatever reason
>> (increasing num of pgs, adding OSDs,..), there's obviously a lot of
>> entries written to the journals.
>> What we saw then was extremly laggy behavior of the cluster and when
>> looking at the iostats of the SSD, they were at 100% most of the time. I
>> don't exactly know what causes this and why the SSDs can't cope with the
>> amount of IOs, but seperating OS and journals did the trick. We now have
>> quick 15k HDDs in Raid1 for OS and Monitor journal and per 5 OSD
>> journals one SSD with one partition per journal (used as raw partition).
>>
>> Hope that helps,
>> best regards,
>> Kurt
>>
>> Martin Catudal schrieb:
>>> Hi,
>>>       Here my scenario :
>>> I will have a small cluster (4 nodes) with 4 (4 TB) OSD's per node.
>>>
>>> I will have OS installed on two SSD in raid 1 configuration.
>>>
>>> Is one of you have successfully and efficiently a Ceph cluster that is
>>> built with Journal on a separate partition on the OS SSD's?
>>>
>>> I know that it may occur a lot of IO on the Journal SSD and I'm scared
>>> of have my OS suffer from too much IO.
>>>
>>> Any background experience?
>>>
>>> Martin
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Kurt Bauer <kurt.bauer@xxxxxxxxxxxx>
Vienna University Computer Center - ACOnet - VIX
Universitaetsstrasse 7, A-1010 Vienna, Austria, Europe
Tel: ++43 1 4277 - 14070 (Fax: - 9140)  KB1970-RIPE
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com