Re: 13.2.4 odd memory leak?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/8/19 8:12 AM, Steffen Winther Sørensen wrote:


On 8 Mar 2019, at 14.30, Mark Nelson <mnelson@xxxxxxxxxx <mailto:mnelson@xxxxxxxxxx>> wrote:


On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:

On 5 Mar 2019, at 10.02, Paul Emmerich <paul.emmerich@xxxxxxxx <mailto:paul.emmerich@xxxxxxxx>> wrote:

Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.
Yeap thanks, setting it at 1G+256M worked :)
Hope this won’t bloat memory during coming weekend VM backups through CephFS



FWIW, setting it to 1.2G will almost certainly result in the bluestore caches being stuck at cache_min, ie 128MB and the autotuner may not be able to keep the OSD memory that low.  I typically recommend a bare minimum of 2GB per OSD, and on SSD/NVMe backed OSDs 3-4+ can improve performance significantly.
This a smaller dev cluster, not much IO, 4 nodes of 16GB & 6x HDD OSD

Just want to avoid consuming swap, which bloated after patching to 13.2.4 from 13.2.2 after performing VM snapshots to CephFS, Otherwise cluster has been fine for ages…
/Steffen


Understood.  We struggled with whether we should have separate HDD and SSD defaults for osd_memory_target, but we were seeing other users having problems with setting the global default vs the ssd/hdd default and not seeing expected behavior.  We decided to have a single osd_memory_target to try to make the whole thing simpler with only a single parameter to set.  The 4GB/OSD is aggressive but can dramatically improve performance on NVMe and we figured that it sort of communicates to users where we think the sweet spot is (and as devices and data sets get larger, this is going to be even more important).


Mark





Mark



On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen
<stefws@xxxxxxxxx <mailto:stefws@xxxxxxxxx>> wrote:


On 4 Mar 2019, at 16.09, Paul Emmerich <paul.emmerich@xxxxxxxx <mailto:paul.emmerich@xxxxxxxx>> wrote:

Bloated to ~4 GB per OSD and you are on HDDs?

Something like that yes.


13.2.3 backported the cache auto-tuning which targets 4 GB memory
usage by default.


See https://ceph.com/releases/13-2-4-mimic-released/

Right, thanks…


The bluestore_cache_* options are no longer needed. They are replaced
by osd_memory_target, defaulting to 4GB. BlueStore will expand
and contract its cache to attempt to stay within this
limit. Users upgrading should note this is a higher default
than the previous bluestore_cache_size of 1GB, so OSDs using
BlueStore will use more memory by default.
For more details, see the BlueStore docs.

Adding a 'osd memory target’ value to our ceph.conf and restarting an OSD just makes the OSD dump like this:

[osd]
  ; this key makes 13.2.4 OSDs abort???
  osd memory target = 1073741824

  ; other OSD key settings
  osd pool default size = 2  # Write an object 2 times.
  osd pool default min size = 1 # Allow writing one copy in a degraded state.

  osd pool default pg num = 256
  osd pool default pgp num = 256

  client cache size = 131072
  osd client op priority = 40
  osd op threads = 8
  osd client message size cap = 512
  filestore min sync interval = 10
  filestore max sync interval = 60

  recovery max active = 2
  recovery op priority = 30
  osd max backfills = 2




osd log snippet:
 -472> 2019-03-05 08:36:02.233 7f2743a8c1c0  1 -- - start start
 -471> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 init /var/lib/ceph/osd/ceph-12 (looks like hdd)  -470> 2019-03-05 08:36:02.234 7f2743a8c1c0  2 osd.12 0 journal /var/lib/ceph/osd/ceph-12/journal  -469> 2019-03-05 08:36:02.234 7f2743a8c1c0  1 bluestore(/var/lib/ceph/osd/ceph-12) _mount path /var/lib/ceph/osd/ceph-12  -468> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev create path /var/lib/ceph/osd/ceph-12/block type kernel  -467> 2019-03-05 08:36:02.235 7f2743a8c1c0  1 bdev(0x55b31af4a000 /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block  -466> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bdev(0x55b31af4a000 /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c800000, 137 GiB) block_size 4096 (4 KiB) rotational  -465> 2019-03-05 08:36:02.236 7f2743a8c1c0  1 bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2  -464> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev create path /var/lib/ceph/osd/ceph-12/block type kernel  -463> 2019-03-05 08:36:02.237 7f2743a8c1c0  1 bdev(0x55b31af4aa80 /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block  -462> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bdev(0x55b31af4aa80 /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c800000, 137 GiB) block_size 4096 (4 KiB) rotational  -461> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-12/block size 137 GiB
 -460> 2019-03-05 08:36:02.238 7f2743a8c1c0  1 bluefs mount
 -459> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option compaction_readahead_size = 2097152  -458> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option compression = kNoCompression  -457> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option max_write_buffer_number = 4  -456> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option min_write_buffer_number_to_merge = 1  -455> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option recycle_log_file_num = 4  -454> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option writable_file_max_buffer_size = 0  -453> 2019-03-05 08:36:02.339 7f2743a8c1c0  0  set rocksdb option write_buffer_size = 268435456  -452> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option compaction_readahead_size = 2097152  -451> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option compression = kNoCompression  -450> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option max_write_buffer_number = 4  -449> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option min_write_buffer_number_to_merge = 1  -448> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option recycle_log_file_num = 4  -447> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option writable_file_max_buffer_size = 0  -446> 2019-03-05 08:36:02.340 7f2743a8c1c0  0  set rocksdb option write_buffer_size = 268435456  -445> 2019-03-05 08:36:02.340 7f2743a8c1c0  1 rocksdb: do_open column families: [default]  -444> 2019-03-05 08:36:02.341 7f2743a8c1c0  4 rocksdb: RocksDB version: 5.13.0  -443> 2019-03-05 08:36:02.342 7f2743a8c1c0  4 rocksdb: Git sha rocksdb_build_git_sha:@0@  -442> 2019-03-05 08:36:02.342 7f2743a8c1c0  4 rocksdb: Compile date Jan  4 2019
...
 -271> 2019-03-05 08:36:02.431 7f2743a8c1c0  1 freelist init
 -270> 2019-03-05 08:36:02.535 7f2743a8c1c0  1 bluestore(/var/lib/ceph/osd/ceph-12) _open_alloc opening allocation metadata  -269> 2019-03-05 08:36:02.714 7f2743a8c1c0  1 bluestore(/var/lib/ceph/osd/ceph-12) _open_alloc loaded 93 GiB in 31828 extents  -268> 2019-03-05 08:36:02.722 7f2743a8c1c0  2 osd.12 0 journal looks like hdd
 -267> 2019-03-05 08:36:02.722 7f2743a8c1c0  2 osd.12 0 boot
 -266> 2019-03-05 08:36:02.723 7f272a0f3700  5 bluestore.MempoolThread(0x55b31af46a30) _tune_cache_size target: 1073741824 heap: 64675840 unmapped: 786432 mapped: 63889408 old cache_size: 134217728 new cache size: 17349132402135320576  -265> 2019-03-05 08:36:02.723 7f272a0f3700  5 bluestore.MempoolThread(0x55b31af46a30) _trim_shards cache_size: 17349132402135320576 kv_alloc: 134217728 kv_used: 5099462 meta_alloc: 0 meta_used: 21301 data_alloc: 0 data_used: 0
...
2019-03-05 08:36:40.166 7f03fc57f700  1 osd.12 pg_epoch: 7063 pg[2.93( v 6687'5 (0'0,6687'5] local-lis/les=7015/7016 n=1 ec=103/103 lis/c 7015/7015 les/c/f 7016/7016/0 7063/7063/7063) [12,19] r=0 lpr=7063 pi=[7015,7063)/1 crt=6687'5 lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] start_peering_interval up [19] -> [12,19], acting [19] -> [12,19], acting_primary 19 -> 12, up_primary 19 -> 12, role -1 -> 0, features acting 4611087854031142907 upacting 4611087854031142907 2019-03-05 08:36:40.167 7f03fc57f700  1 osd.12 pg_epoch: 7063 pg[2.93( v 6687'5 (0'0,6687'5] local-lis/les=7015/7016 n=1 ec=103/103 lis/c 7015/7015 les/c/f 7016/7016/0 7063/7063/7063) [12,19] r=0 lpr=7063 pi=[7015,7063)/1 crt=6687'5 lcod 0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary 2019-03-05 08:36:40.167 7f03fb57d700  1 osd.12 pg_epoch: 7061 pg[2.40( v 6964'703 (0'0,6964'703] local-lis/les=6999/7000 n=1 ec=103/103 lis/c 6999/6999 les/c/f 7000/7000/0 7061/7061/6999) [8] r=-1 lpr=7061 pi=[6999,7061)/1 crt=6964'703 lcod 0'0 unknown mbc={}] start_peering_interval up [8,12] -> [8], acting [8,12] -> [8], acting_primary 8 -> 8, up_primary 8 -> 8, role 1 -> -1, features acting 4611087854031142907 upacting 4611087854031142907
  1/ 5 heartbeatmap
  1/ 5 perfcounter
  1/ 5 rgw
  1/ 5 rgw_sync
  1/10 civetweb
  1/ 5 javaclient
  1/ 5 asok
  1/ 1 throttle
  0/ 0 refs
  1/ 5 xio
  1/ 5 compressor
  1/ 5 bluestore
  1/ 5 bluefs
  1/ 3 bdev
  1/ 5 kstore
  4/ 5 rocksdb
  4/ 5 leveldb
  4/ 5 memdb
  1/ 5 kinetic
  1/ 5 fuse
  1/ 5 mgr
  1/ 5 mgrc
  1/ 5 dpdk
  1/ 5 eventtrace
 -2/-2 (syslog threshold)
 -1/-1 (stderr threshold)
 max_recent     10000
 max_new         1000
 log_file /var/log/ceph/ceph-osd.12.log
--- end dump of recent events ---

2019-03-05 08:36:07.750 7f272a0f3700 -1 *** Caught signal (Aborted) **
in thread 7f272a0f3700 thread_name:bstore_mempool

ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
1: (()+0x911e70) [0x55b318337e70]
2: (()+0xf5d0) [0x7f2737a4e5d0]
3: (gsignal()+0x37) [0x7f2736a6f207]
4: (abort()+0x148) [0x7f2736a708f8]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x242) [0x7f273aec62b2]
6: (()+0x25a337) [0x7f273aec6337]
7: (()+0x7a886e) [0x55b3181ce86e]
8: (BlueStore::MempoolThread::entry()+0x3b0) [0x55b3181d0060]
9: (()+0x7dd5) [0x7f2737a46dd5]
10: (clone()+0x6d) [0x7f2736b36ead]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Even without the ‘osd memory target’ conf key, OSD claims on start:

bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 1073741824

Any hints appreciated!

/Steffen


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io <http://www.croit.io>
Tel: +49 89 1896585 90

On Mon, Mar 4, 2019 at 3:55 PM Steffen Winther Sørensen
<stefws@xxxxxxxxx> wrote:


List Members,

patched a centos 7  based cluster from 13.2.2 to 13.2.4 last monday, everything appeared working fine.

Only this morning I found all OSDs in the cluster to be bloated in memory foot print, possible after weekend backup through MDS.

Anyone else seeing possible memory leak in 13.2.4 OSD possible primarily when using MDS?

TIA

/Steffen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux