Re: Nautilus: (Minority of) OSDs with huge buffer_anon usage - triggering OOMkiller in worst cases.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sam,


I saw your comment in the other thread but wanted to reply here since you provided the mempool and perf counters.  It looks like the priority cache is (like in Harald's case) shrinking all of the caches to their smallest values trying to compensate for all of the stuff collecting in buffer_anon.  Notice how there are only ~8000 items in the onode cache and 127 items in the data cache. This is just another indication that something isn't being cleaned up properly in buffer_anon.


I don't see a new tracker ticket from Harald, would you mind creating one for this and include the relevant information from your cluster?  That would be most helpful: https://tracker.ceph.com/


On a side note, we haven't seen this in our test framework so there must be some specific combination of workload and settings causing it.


Thanks,

Mark


On 5/21/20 5:28 AM, aoanla@xxxxxxxxx wrote:
Hi,

Following on from various woes, we see an odd and unhelpful behaviour with some OSDs on our cluster currently.
A minority of OSDs seem to have runaway memory usage, rising to 10s of GB, whilst other OSDs on the same host behave sensibly. This started when we moved from Mimic -> Nautilus, as far as we can tell.

In best case, this causes some nodes to start swapping [and reduces their performance]. In worst case, it triggers the OOMkiller.

I have dumped the mempool for these OSDs, which shows that almost all the memory is in the buffer_anon pool.
The perf dump shows that the OSD is targetting the 4GB limit that's set for it, but for some reason is unable to reach this due to stuff in the priorty_cache (which seems to be mostly what is filling buffer_anon)

Can anyone advise on what we should do next?

(mempool dump and excerpt of perf dump at end of email).

Thanks for any help,

Sam Skipsey

MEMPOOL DUMP
{
     "mempool": {
         "by_pool": {
             "bloom_filter": {
                 "items": 0,
                 "bytes": 0
             },
             "bluestore_alloc": {
                 "items": 5629372,
                 "bytes": 45034976
             },
             "bluestore_cache_data": {
                 "items": 127,
                 "bytes": 65675264
             },
             "bluestore_cache_onode": {
                 "items": 8275,
                 "bytes": 4634000
             },
             "bluestore_cache_other": {
                 "items": 2967913,
                 "bytes": 62469216
             },
             "bluestore_fsck": {
                 "items": 0,
                 "bytes": 0
             },
             "bluestore_txc": {
                 "items": 145,
                 "bytes": 100920
             },
             "bluestore_writing_deferred": {
                 "items": 335,
                 "bytes": 13160884
             },
             "bluestore_writing": {
                 "items": 1406,
                 "bytes": 5379120
             },
             "bluefs": {
                 "items": 1105,
                 "bytes": 24376
             },
             "buffer_anon": {
                 "items": 13705143,
                 "bytes": 40719040439
             },
             "buffer_meta": {
                 "items": 6820143,
                 "bytes": 600172584
             },
             "osd": {
                 "items": 96,
                 "bytes": 1138176
             },
             "osd_mapbl": {
                 "items": 59,
                 "bytes": 7022524
             },
             "osd_pglog": {
                 "items": 491049,
                 "bytes": 156701043
             },
             "osdmap": {
                 "items": 107885,
                 "bytes": 1723616
             },
             "osdmap_mapping": {
                 "items": 0,
                 "bytes": 0
             },
             "pgmap": {
                 "items": 0,
                 "bytes": 0
             },
             "mds_co": {
                 "items": 0,
                 "bytes": 0
             },
             "unittest_1": {
                 "items": 0,
                 "bytes": 0
             },
             "unittest_2": {
                 "items": 0,
                 "bytes": 0
             }
         },
         "total": {
             "items": 29733053,
             "bytes": 41682277138
         }
     }
}

PERF DUMP excerpt:

"prioritycache": {
         "target_bytes": 4294967296,
         "mapped_bytes": 38466584576,
         "unmapped_bytes": 425984,
         "heap_bytes": 38467010560,
         "cache_bytes": 134217728
     },
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux