Re: 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 3 Nov 2017 13:56:38 +1000

On Wed, Nov 1, 2017 at 11:54 PM, Mazzystr <mazzystr@xxxxxxxxx> wrote:
> I experienced this as well on tiny Ceph cluster testing...
>
> HW spec - 3x
> Intel i7-4770K quad core
> 32Gb m2/ssd
> 8Gb memory
> Dell PERC H200
> 6 x 3Tb Seagate
> Centos 7.x
> Ceph 12.x
>
> I also run 3 memory hungry procs on the Ceph nodes.  Obviously there is a
> memory problem here.  Here are the steps I took avoid oom-killer killing the
> node ...
>
> /etc/rc.local -
> for i in $(pgrep ceph-mon); do echo -17 > /proc/$i/oom_score_adj; done
> for i in $(pgrep ceph-osd); do echo -17 > /proc/$i/oom_score_adj; done
> for i in $(pgrep ceph-mgr); do echo 50 > /proc/$i/oom_score_adj; done
>
> /etc/sysctl.conf -
> vm.swappiness = 100
> vm.vfs_cache_pressure = 1000

This is generally not a good idea. Just sayin'

$ grep -A17 ^vfs_cache_pressure sysctl/vm.txt
vfs_cache_pressure
------------------

This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative
performance impact. Reclaim code needs to take various locks to find freeable
directory and inode objects. With vfs_cache_pressure=1000, it will look for
ten times more freeable objects than there are.

> vm.min_free_kbytes = 512
>
> /etc/ceph/ceph.conf -
> [osd]
>         bluestore_cache_size = 52428800
>         bluestore_cache_size_hdd = 52428800
>         bluestore_cache_size_ssd = 52428800
>         bluestore_cache_kv_max = 52428800
>
> You're going to see memory page-{in,out} skyrocket with this setup but it
> should keep oom-killer at bay until a memory fix can be applied.  Client
> performance to the cluster wasn't spectacular but wasn't terrible.  I was
> seeing +/- 60Mb/sec of bandwidth.
>
> Ultimately I upgraded the nodes to 16Gb
>
> /Chris C
>
> On Tue, Oct 31, 2017 at 10:30 PM, shadow_lin <shadow_lin@xxxxxxx> wrote:
>>
>> Hi Sage,
>> We have tried compiled the latest ceph source code from github.
>> The build is ceph version 12.2.1-249-g42172a4
>> (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable).
>> The memory problem seems better but the memory usage of osd is still keep
>> increasing as more data are wrote into the rbd image and the memory usage
>> won't drop after the write is stopped.
>>        Could you specify from which commit the memeory bug is fixed?
>> Thanks
>> 2017-11-01
>> ________________________________
>> lin.yunfan
>> ________________________________
>>
>> 发件人：Sage Weil <sage@xxxxxxxxxxxx>
>> 发送时间：2017-10-24 20:03
>> 主题：Re:  [luminous]OSD memory usage increase when writing a lot
>> of data to cluster
>> 收件人："shadow_lin"<shadow_lin@xxxxxxx>
>> 抄送："ceph-users"<ceph-users@xxxxxxxxxxxxxx>
>>
>> On Tue, 24 Oct 2017, shadow_lin wrote:
>> > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body
>> > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All,
>> > The cluster has 24 osd with 24 8TB hdd.
>> > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the
>> > memory
>> > is below the remmanded value, but this osd server is an ARM  server so I
>> > can't do anything to add more ram.
>> > I created a replicated(2 rep) pool and an 20TB image and mounted to the
>> > test
>> > server with xfs fs.
>> >
>> > I have set the ceph.conf to this(according to other related post
>> > suggested):
>> > [osd]
>> >         bluestore_cache_size = 104857600
>> >         bluestore_cache_size_hdd = 104857600
>> >         bluestore_cache_size_ssd = 104857600
>> >         bluestore_cache_kv_max = 103809024
>> >
>> >  osd map cache size = 20
>> >         osd map max advance = 10
>> >         osd map share max epochs = 10
>> >         osd pg epoch persisted max stale = 10
>> > The bluestore cache setting did improve the situation,but if i try to
>> > write
>> > 1TB data by dd command(dd if=/dev/zero of=test bs=1G count=1000)  to rbd
>> > the
>> > osd will eventually be killed by oom killer.
>> > If I only wirte like 100G  data once then everything is fine.
>> >
>> > Why does the osd memory usage keep increasing whle writing ?
>> > Is there anything I can do to reduce the memory usage?
>>
>> There is a bluestore memory bug that was fixed just after 12.2.1 was
>> released; it will be fixed in 12.2.2.  In the meantime, you can run
>> consider running the latest luminous branch (not fully tested) from
>> https://shaman.ceph.com/builds/ceph/luminous.
>>
>> sage
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com