Re: OSDs taking too much memory, for buffer_anon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mark

Thank you! This is 14.2.8, on Ubuntu Bionic. Some with kernel 4.15, some with 5.3, but that does not seem to make a difference here. Transparent Huge Pages are not used according to
grep -i AnonHugePages /proc/meminfo

Workload is a mix of OpenStack volumes (replicated) and RGW on EC 8+3. EC pool with 1024 PGs, 900M objects.

Around 500 hdd OSDs (4 and 8 TB), 30 ssd OSDs (2 TB). The maximum number of PGs per OSD is only 123. The hdd OSDs have DB on SSD, but a bit less than 30 GB unfortunately. I have seen 200 GB and more slow_bytes, compression of the DB seems to help a lot.

No BlueStore compression.

I had a look at the related thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/JQ72K5LK3YFFETNNL4MX6HHZLF5GBYDT/

Today I saw a correlation that may match your thoughts. During 1 hour with a high number of write IOPS (not throughput) on the EC pool, available memory increased drastically.

Cheers
 Harry

On 20.05.20 15:15, Mark Nelson wrote:
Hi Harald,


Thanks!  So you can see from the perf dump that the target bytes are a little below 4GB, but the mapped bytes are around 7GB.  The priority cache manager has reacted by setting the "cache_bytes" to 128MB which is the minimum global value and each cache is getting 64MB (the local minimum value per cache).  Ultimately this means the priority cache manager has basically told all of the caches to shrink to their smallest possible values so it's doing the right thing.  So the next question is why buffer_anon is so huge. Looking at the mempool stats, there are not that many items but still a lot of memory used.  On average those items in buffer_anon are ~150K.  It can't be just buffer anon though, you've got several gigabytes of mapped memory being used beyond that and around 4GB of unmapped memory that tcmalloc should be freeing every iteration of the priority cache manager.


So next questions:  What version of Ceph is this, and do you have transparent huge pages enabled? We automatically disable it now, but if you are running an older version you might want to disable (or at least set it to madvise) manually.  Also, what kind of workload is hitting the OSDs?  If you can reliably make it grow you could try doing a heap profile at the same time the workload is going on and see if you can see where the memory is being used.


Mark


On 5/20/20 7:36 AM, Harald Staub wrote:
Hi Mark

Thank you for you explanations! Some numbers of this example osd below.

Cheers
 Harry

From dump mempools:

            "buffer_anon": {
                "items": 29012,
                "bytes": 4584503367
            },

From perf dump:

    "prioritycache": {
        "target_bytes": 3758096384,
        "mapped_bytes": 7146692608,
        "unmapped_bytes": 3825983488,
        "heap_bytes": 10972676096,
        "cache_bytes": 134217728
    },
    "prioritycache:data": {
        "pri0_bytes": 0,
        "pri1_bytes": 0,
        "pri2_bytes": 0,
        "pri3_bytes": 0,
        "pri4_bytes": 0,
        "pri5_bytes": 0,
        "pri6_bytes": 0,
        "pri7_bytes": 0,
        "pri8_bytes": 0,
        "pri9_bytes": 0,
        "pri10_bytes": 0,
        "pri11_bytes": 0,
        "reserved_bytes": 67108864,
        "committed_bytes": 67108864
    },
    "prioritycache:kv": {
        "pri0_bytes": 0,
        "pri1_bytes": 0,
        "pri2_bytes": 0,
        "pri3_bytes": 0,
        "pri4_bytes": 0,
        "pri5_bytes": 0,
        "pri6_bytes": 0,
        "pri7_bytes": 0,
        "pri8_bytes": 0,
        "pri9_bytes": 0,
        "pri10_bytes": 0,
        "pri11_bytes": 0,
        "reserved_bytes": 67108864,
        "committed_bytes": 67108864
    },
    "prioritycache:meta": {
        "pri0_bytes": 0,
        "pri1_bytes": 0,
        "pri2_bytes": 0,
        "pri3_bytes": 0,
        "pri4_bytes": 0,
        "pri5_bytes": 0,
        "pri6_bytes": 0,
        "pri7_bytes": 0,
        "pri8_bytes": 0,
        "pri9_bytes": 0,
        "pri10_bytes": 0,
        "pri11_bytes": 0,
        "reserved_bytes": 67108864,
        "committed_bytes": 67108864
    },

On 20.05.20 14:05, Mark Nelson wrote:
Hi Harald,


Any idea what the priority_cache_manger perf counters show? (or you can also enable debug osd / debug priority_cache_manager) The osd memory autotuning works by shrinking the bluestore and rocksdb caches to some target value to try and keep the mapped memory of the process bellow the osd_memory_target.  In some cases it's possible that something other than the caches are using the memory (usually pglog) or there's tons of pinned stuff in the cache that for some reason can't be evicted. Knowing the cache tuning stats might help tell if it's trying to shrink the caches and can't for some reason or if there's something else going on.


Thanks,

Mark



On 5/20/20 6:10 AM, Harald Staub wrote:
As a follow-up to our recent memory problems with OSDs (with high pglog values: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LJPJZPBSQRJN5EFE632CWWPK3UMGG3VF/#XHIWAIFX4AXZK5VEFOEBPS5TGTH33JZO ), we also see high buffer_anon values. E.g. more than 4 GB, with "osd memory target" set to 3 GB. Is there a way to restrict it?

As it is called "anon", I guess that it would first be necessary to find out what exactly is behind this?

Well maybe it is just as Wido said, with lots of small objects, there will be several problems.

Cheers
 Harry
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux