I'm running Kraken built from Git right now and I've found that my OSDs eat as much memory as they can before they're killed by OOM. I understand that Bluestore is experimental but thought the fact that it does this should be known.
My setup:
- Xeon D-1540, 32GB DDR4 ECC RAM
- Arch Linux
- Single node, 4 8TB OSDs, each prepared with "ceph-disk prepare --bluestore /dev/sdX"
- Built from Git fac6335a1eea12270f76cf2c7814648669e6515a
Steps to reproduce:
- Start mon
- Start OSDs
- ceph osd pool create pool 256 256 erasure myprofile storage
- rados bench -p pool <time> write -t 32
- ceph osd pool delete pool
- ceph osd pool create pool 256 256 replicated
- rados bench -p pool <time> write -t 32
- ceph osd pool delete pool
The OSDs start at ~500M used each (according to "ceph tell osd.0 heap stats"), before they're allocated PGs. After creating and peering PGs, they're at ~514M each.
After running rados bench for 10s, memory is at ~727M each. Running pprof on a dump shows the top entry as:
218.9 96.1% 96.1% 218.9 96.1% ceph::buffer::create_aligned
Running rados bench another 10s pushes memory to 836M each. pprof again shows similar results:
305.2 96.8% 96.8% 305.2 96.8% ceph::buffer::create_aligned
I can continue this process until the OSDs are killed by OOM.
This only happens with Bluestore, other backends (like filestore) work fine.
When I delete the pool, the OSDs release the memory and return to their ~500M resting point.
Repeating the test with a replicated pool results in the OSDs consuming elevated memory (~610M peak) while writing but returning to resting levels when writing ends.
It'd be great if I could do something about this myself but I don't understand the code very well and I can't figure out if there's a way to trace the path taken for the memory to be allocated like there is for CPU usage.
Any advice or solution would be much appreciated.
Thanks!
Lucas
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com