Re: [CEPH] OSD Memory Usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

I will read more about it.

Thank you :)

Nguyen Huu Khoi


On Thu, Nov 16, 2023 at 3:21 PM Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:

> Orch ps seems to show virtual size set instead of resident size set.
>
> /Z
>
> On Thu, 16 Nov 2023 at 09:43, Nguyễn Hữu Khôi <nguyenhuukhoinw@xxxxxxxxx>
> wrote:
>
>> Hello,
>> Yes, I see it does not  exceed RSS but I see in "ceph orch ps". it is
>> over target.  Does Mem Use include cache, I am right?
>>
>> NAME                    HOST      PORTS        STATUS         REFRESHED
>>  AGE  MEM USE  MEM LIM  VERSION    IMAGE ID      CONTAINER ID
>>
>> osd.7                   sg-osd01               running (3d)      8m ago
>> 4w    4231M    4096M  17.2.6     90a2664234e1  922185643cb8
>> osd.8                   sg-osd03               running (3d)      7m ago
>> 4w    3407M    4096M  17.2.6     90a2664234e1  0ec74fe54bbe
>> osd.9                   sg-osd01               running (3d)      8m ago
>> 4w    4575M    4096M  17.2.6     90a2664234e1  c2f1c1ee2087
>> osd.10                  sg-osd03               running (3d)      7m ago
>> 4w    3821M    4096M  17.2.6     90a2664234e1  fecbd5e910de
>> osd.11                  sg-osd01               running (3d)      8m ago
>> 4w    3578M    4096M  17.2.6     90a2664234e1  f201704e9026
>> osd.12                  sg-osd03               running (3d)      7m ago
>> 4w    3076M    4096M  17.2.6     90a2664234e1  e741b67b6582
>> osd.13                  sg-osd01               running (3d)      8m ago
>> 4w    3688M    4096M  17.2.6     90a2664234e1  bffa59278fc2
>> osd.14                  sg-osd03               running (3d)      7m ago
>> 4w    3652M    4096M  17.2.6     90a2664234e1  7d9eb3fb9c1e
>> osd.15                  sg-osd01               running (3d)      8m ago
>> 4w    3343M    4096M  17.2.6     90a2664234e1  d96a425ae5c9
>> osd.16                  sg-osd03               running (3d)      7m ago
>> 4w    2492M    4096M  17.2.6     90a2664234e1  637c43176fdc
>> osd.17                  sg-osd01               running (3d)      8m ago
>> 4w    3011M    4096M  17.2.6     90a2664234e1  a39456dd2c0c
>> osd.18                  sg-osd03               running (3d)      7m ago
>> 4w    2341M    4096M  17.2.6     90a2664234e1  7b750672391b
>> osd.19                  sg-osd01               running (3d)      8m ago
>> 4w    2672M    4096M  17.2.6     90a2664234e1  6358234e95f5
>> osd.20                  sg-osd03               running (3d)      7m ago
>> 4w    3297M    4096M  17.2.6     90a2664234e1  2ecba6b066fd
>> osd.21                  sg-osd01               running (3d)      8m ago
>> 4w    5147M    4096M  17.2.6     90a2664234e1  1d0e4efe48bd
>> osd.22                  sg-osd03               running (3d)      7m ago
>> 4w    3432M    4096M  17.2.6     90a2664234e1  5bb6d4f71f9d
>> osd.23                  sg-osd03               running (3d)      7m ago
>> 4w    2893M    4096M  17.2.6     90a2664234e1  f7e1948e57d5
>> osd.24                  sg-osd02               running (3d)      7m ago
>>  12d    3007M    4096M  17.2.6     90a2664234e1  85d896abe467
>> osd.25                  sg-osd02               running (3d)      7m ago
>>  12d    2666M    4096M  17.2.6     90a2664234e1  9800cd8ff1a1
>> osd.26                  sg-osd02               running (3d)      7m ago
>>  12d    2918M    4096M  17.2.6     90a2664234e1  f2e0b2d50625
>> osd.27                  sg-osd02               running (3d)      7m ago
>>  12d    3586M    4096M  17.2.6     90a2664234e1  ee2fa3a9b40a
>> osd.28                  sg-osd02               running (3d)      7m ago
>>  12d    2391M    4096M  17.2.6     90a2664234e1  4cf7adf9f60a
>> osd.29                  sg-osd02               running (3d)      7m ago
>>  12d    5642M    4096M  17.2.6     90a2664234e1  8c1ba98a1738
>> osd.30                  sg-osd02               running (3d)      7m ago
>>  12d    4728M    4096M  17.2.6     90a2664234e1  e308497de2e5
>> osd.31                  sg-osd02               running (3d)      7m ago
>>  12d    3615M    4096M  17.2.6     90a2664234e1  89b80d464627
>> osd.32                  sg-osd02               running (3d)      7m ago
>>  12d    1703M    4096M  17.2.6     90a2664234e1  1e4608786078
>> osd.33                  sg-osd02               running (3d)      7m ago
>>  12d    3039M    4096M  17.2.6     90a2664234e1  16e04a1da987
>> osd.34                  sg-osd02               running (3d)      7m ago
>>  12d    2434M    4096M  17.2.6     90a2664234e1  014076e28182
>>
>>
>>
>> btw as you said, I feel this value does not have much impact because if
>> we set 1 or 4GB. It still can consume much memory when they need more
>> memory,
>>
>> Nguyen Huu Khoi
>>
>>
>> On Thu, Nov 16, 2023 at 2:13 PM Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>> wrote:
>>
>>> You're most welcome!
>>>
>>> I'd say that real leak issues are very rare. For example, these are my
>>> OSDs with memory target=16GB which have been running for quite a while, as
>>> you can see they don't exceed 16 GB RSS:
>>>
>>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>>> COMMAND
>>>   92298 167       20   0   18.7g  15.8g  12264 S   1.3   4.2   1974:06
>>> ceph-osd
>>>   94527 167       20   0   19.5g  15.8g  12248 S   2.3   4.2   2287:26
>>> ceph-osd
>>>   93749 167       20   0   19.1g  15.7g  12804 S   2.3   4.2   1768:22
>>> ceph-osd
>>>   89534 167       20   0   20.1g  15.7g  12412 S   4.0   4.2   2512:18
>>> ceph-osd
>>> 3706552 167       20   0   20.5g  15.7g  15588 S   2.3   4.2   1385:26
>>> ceph-osd
>>>   90297 167       20   0   19.5g  15.6g  12432 S   3.0   4.1   2261:00
>>> ceph-osd
>>>    9799 167       20   0   22.9g  15.4g  12432 S   2.0   4.1   2494:00
>>> ceph-osd
>>>    9778 167       20   0   23.1g  15.3g  12556 S   2.6   4.1   2591:25
>>> ceph-osd
>>>    9815 167       20   0   23.4g  15.1g  12584 S   2.0   4.0   2722:28
>>> ceph-osd
>>>    9809 167       20   0   22.3g  15.1g  12068 S   3.6   4.0   5234:52
>>> ceph-osd
>>>    9811 167       20   0   23.4g  14.9g  12952 S   2.6   4.0   2593:19
>>> ceph-osd
>>>    9819 167       20   0   23.9g  14.9g  12636 S   2.6   4.0   3043:19
>>> ceph-osd
>>>    9820 167       20   0   23.3g  14.8g  12884 S   2.0   3.9   3073:43
>>> ceph-osd
>>>    9769 167       20   0   22.4g  14.7g  12612 S   2.6   3.9   2840:22
>>> ceph-osd
>>>    9836 167       20   0   24.0g  14.7g  12648 S   2.6   3.9   3300:34
>>> ceph-osd
>>>    9818 167       20   0   22.0g  14.7g  12152 S   2.3   3.9   5729:06
>>> ceph-osd
>>>
>>> Long story short, if you set reasonable targets, OSDs are unlikely to
>>> exceed them during normal operations. If you set memory targets too low, it
>>> is likely that they will be exceeded as OSDs need reasonable amounts of
>>> memory to operate.
>>>
>>> /Z
>>>
>>> On Thu, 16 Nov 2023 at 08:37, Nguyễn Hữu Khôi <nguyenhuukhoinw@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hello. Thank you very much for your explanation.
>>>>
>>>> Because I thought that  osd_memory_target will help me limit OSD memory
>>>> usage which will help prevent memory leak - I tried google and many people
>>>> talked about memory leak. A nice man, @Anthony D'Atri
>>>> <aad@xxxxxxxxxxxxxx> , on this forum helped me to understand that it
>>>> wont help to limit OSD usage.
>>>>
>>>> I set it to 1GB because I want to see how this option works.
>>>>
>>>> I will read and test with caches options.
>>>>
>>>> Nguyen Huu Khoi
>>>>
>>>>
>>>> On Thu, Nov 16, 2023 at 12:23 PM Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> osd_memory_target is a "target", i.e. an OSD make an effort to consume
>>>>> up to the specified amount of RAM, but won't consume less than required for
>>>>> its operation and caches, which have some minimum values such as for
>>>>> example osd_memory_cache_min, bluestore_cache_size,
>>>>> bluestore_cache_size_hdd, bluestore_cache_size_ssd, etc. The recommended
>>>>> and default OSD memory target is 4 GB.
>>>>>
>>>>> Your nodes have a sufficient amount of RAM, thus I don't see why you
>>>>> would want to reduce OSD memory consumption below the recommended defaults,
>>>>> especially considering that in-memory caches are important for Ceph
>>>>> operations as they're many times faster than the fastest storage devices. I
>>>>> run my OSDs with osd_memory_target=17179869184 (16 GB) and it helps,
>>>>> especially with slower HDD-backed OSDs.
>>>>>
>>>>> /Z
>>>>>
>>>>> On Thu, 16 Nov 2023 at 01:02, Nguyễn Hữu Khôi <
>>>>> nguyenhuukhoinw@xxxxxxxxx> wrote:
>>>>>
>>>>>> Hello,
>>>>>> I am using a CEPH cluster. After monitoring it, I set:
>>>>>>
>>>>>> ceph config set osd osd_memory_target_autotune false
>>>>>>
>>>>>> ceph config set osd osd_memory_target 1G
>>>>>>
>>>>>> Then restart all OSD services then do test again, I just use fio
>>>>>> commands
>>>>>> from multi clients and I see that OSD memory consume is over 1GB.
>>>>>> Would you
>>>>>> like to help me understand this case?
>>>>>>
>>>>>> Ceph version: Quincy
>>>>>>
>>>>>> OSD: 3 nodes with 11 nvme each and 512GB ram per node.
>>>>>>
>>>>>> CPU: 2 socket xeon gold 6138 cpu with 56 cores per socket.
>>>>>>
>>>>>> Network: 25Gbps x 2 for public network and 25Gbps x 2 for storage
>>>>>> network.
>>>>>> MTU is 9000
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>>
>>>>>> Nguyen Huu Khoi
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>
>>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux