On Wed, Nov 1, 2017 at 11:54 PM, Mazzystr <mazzystr@xxxxxxxxx> wrote: > I experienced this as well on tiny Ceph cluster testing... > > HW spec - 3x > Intel i7-4770K quad core > 32Gb m2/ssd > 8Gb memory > Dell PERC H200 > 6 x 3Tb Seagate > Centos 7.x > Ceph 12.x > > I also run 3 memory hungry procs on the Ceph nodes. Obviously there is a > memory problem here. Here are the steps I took avoid oom-killer killing the > node ... > > /etc/rc.local - > for i in $(pgrep ceph-mon); do echo -17 > /proc/$i/oom_score_adj; done > for i in $(pgrep ceph-osd); do echo -17 > /proc/$i/oom_score_adj; done > for i in $(pgrep ceph-mgr); do echo 50 > /proc/$i/oom_score_adj; done > > /etc/sysctl.conf - > vm.swappiness = 100 > vm.vfs_cache_pressure = 1000 This is generally not a good idea. Just sayin' $ grep -A17 ^vfs_cache_pressure sysctl/vm.txt vfs_cache_pressure ------------------ This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes. Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are. > vm.min_free_kbytes = 512 > > /etc/ceph/ceph.conf - > [osd] > bluestore_cache_size = 52428800 > bluestore_cache_size_hdd = 52428800 > bluestore_cache_size_ssd = 52428800 > bluestore_cache_kv_max = 52428800 > > You're going to see memory page-{in,out} skyrocket with this setup but it > should keep oom-killer at bay until a memory fix can be applied. Client > performance to the cluster wasn't spectacular but wasn't terrible. I was > seeing +/- 60Mb/sec of bandwidth. > > Ultimately I upgraded the nodes to 16Gb > > /Chris C > > On Tue, Oct 31, 2017 at 10:30 PM, shadow_lin <shadow_lin@xxxxxxx> wrote: >> >> Hi Sage, >> We have tried compiled the latest ceph source code from github. >> The build is ceph version 12.2.1-249-g42172a4 >> (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable). >> The memory problem seems better but the memory usage of osd is still keep >> increasing as more data are wrote into the rbd image and the memory usage >> won't drop after the write is stopped. >> Could you specify from which commit the memeory bug is fixed? >> Thanks >> 2017-11-01 >> ________________________________ >> lin.yunfan >> ________________________________ >> >> 发件人:Sage Weil <sage@xxxxxxxxxxxx> >> 发送时间:2017-10-24 20:03 >> 主题:Re: [luminous]OSD memory usage increase when writing a lot >> of data to cluster >> 收件人:"shadow_lin"<shadow_lin@xxxxxxx> >> 抄送:"ceph-users"<ceph-users@xxxxxxxxxxxxxx> >> >> On Tue, 24 Oct 2017, shadow_lin wrote: >> > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body >> > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All, >> > The cluster has 24 osd with 24 8TB hdd. >> > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the >> > memory >> > is below the remmanded value, but this osd server is an ARM server so I >> > can't do anything to add more ram. >> > I created a replicated(2 rep) pool and an 20TB image and mounted to the >> > test >> > server with xfs fs. >> > >> > I have set the ceph.conf to this(according to other related post >> > suggested): >> > [osd] >> > bluestore_cache_size = 104857600 >> > bluestore_cache_size_hdd = 104857600 >> > bluestore_cache_size_ssd = 104857600 >> > bluestore_cache_kv_max = 103809024 >> > >> > osd map cache size = 20 >> > osd map max advance = 10 >> > osd map share max epochs = 10 >> > osd pg epoch persisted max stale = 10 >> > The bluestore cache setting did improve the situation,but if i try to >> > write >> > 1TB data by dd command(dd if=/dev/zero of=test bs=1G count=1000) to rbd >> > the >> > osd will eventually be killed by oom killer. >> > If I only wirte like 100G data once then everything is fine. >> > >> > Why does the osd memory usage keep increasing whle writing ? >> > Is there anything I can do to reduce the memory usage? >> >> There is a bluestore memory bug that was fixed just after 12.2.1 was >> released; it will be fixed in 12.2.2. In the meantime, you can run >> consider running the latest luminous branch (not fully tested) from >> https://shaman.ceph.com/builds/ceph/luminous. >> >> sage >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com