Re: OSD memory usage

Christian Wuerdig <christian.wuerdig@xxxxxxxxx> · Sat, 16 Sep 2017 10:30:35 +1200



Assuming you're using Bluestore you could experiments with the cache
settings (http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/)

In your case setting bluestore_cache_size_hdd lower than the default
1GB might help with the RAM usage

various people have reported solving OOM issues by setting this to
512MB, not sure what the performance impact might be

On Tue, Sep 12, 2017 at 6:15 AM,  <bulk.schulz@xxxxxxxxxxx> wrote:
> Please excuse my brain-fart.  We're using 24 disks on the servers in
> question.  Only after discussing this further with a colleague did we
> realize this.
>
> This brings us right to the minimum-spec which generally isn't a good idea.
>
> Sincerely
>
> -Dave
>
>
> On 11/09/17 11:38 AM, bulk.schulz@xxxxxxxxxxx wrote:
>>
>> [This sender failed our fraud detection checks and may not be who they
>> appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
>>
>>
>> Hi Everyone,
>>
>> I wonder if someone out there has a similar problem to this?
>>
>> I keep having issues with memory usage.  I have 2 OSD servers wiith 48G
>> memory and 12 2TB OSDs.  I seem to have significantly more memory than
>> the minimum spec, but these two machines with 2TB drives seem to OOM
>> kill and crash periodically -- basically any time the cluster goes into
>> recovery for even 1 OSD this happens.
>>
>> 12 Drives * 2TB = 24 TB.  By using the 1GB RAM per 1TB Disk rule: I
>> should need only 24TB or so.
>>
>> I am testing and benchmarking at this time so most changes are fine.  I
>> am abusing this filesystem considerably by running 14 clients with
>> something that is more or less dd each to a different file but that's
>> the point :)
>>
>> When it's working, the performance is really good.  3GB/s with 3x
>> replicated data pool up to around 10GB/s with 1X replication (just for
>> kicks and giggles) My bottleneck is likely the SAS channels to those
>> disks.
>>
>> I'm using the 12.2.0 release running on Centos 7
>>
>> Testing cephfs with one MDS and 3 montors.  The MON/MDS are not on the
>> servers in question.
>>
>> Total of around 350 OSDs (all spinning disk) most of which are 1TB
>> drives on 15 servers that are a bit older with Xeon E5620's.
>>
>> Dual QDR Infiniband (20GBit) fabrics (1 cluster and 1 client).
>>
>> Any thoughts?  Am I missing some tuning parameter in /proc or something?
>>
>> Thanks
>> -Dave
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com