Re: [CEPH] OSD Memory Usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Orch ps seems to show virtual size set instead of resident size set.

/Z

On Thu, 16 Nov 2023 at 09:43, Nguyễn Hữu Khôi <nguyenhuukhoinw@xxxxxxxxx>
wrote:

> Hello,
> Yes, I see it does not  exceed RSS but I see in "ceph orch ps". it is over
> target.  Does Mem Use include cache, I am right?
>
> NAME                    HOST      PORTS        STATUS         REFRESHED
>  AGE  MEM USE  MEM LIM  VERSION    IMAGE ID      CONTAINER ID
>
> osd.7                   sg-osd01               running (3d)      8m ago
> 4w    4231M    4096M  17.2.6     90a2664234e1  922185643cb8
> osd.8                   sg-osd03               running (3d)      7m ago
> 4w    3407M    4096M  17.2.6     90a2664234e1  0ec74fe54bbe
> osd.9                   sg-osd01               running (3d)      8m ago
> 4w    4575M    4096M  17.2.6     90a2664234e1  c2f1c1ee2087
> osd.10                  sg-osd03               running (3d)      7m ago
> 4w    3821M    4096M  17.2.6     90a2664234e1  fecbd5e910de
> osd.11                  sg-osd01               running (3d)      8m ago
> 4w    3578M    4096M  17.2.6     90a2664234e1  f201704e9026
> osd.12                  sg-osd03               running (3d)      7m ago
> 4w    3076M    4096M  17.2.6     90a2664234e1  e741b67b6582
> osd.13                  sg-osd01               running (3d)      8m ago
> 4w    3688M    4096M  17.2.6     90a2664234e1  bffa59278fc2
> osd.14                  sg-osd03               running (3d)      7m ago
> 4w    3652M    4096M  17.2.6     90a2664234e1  7d9eb3fb9c1e
> osd.15                  sg-osd01               running (3d)      8m ago
> 4w    3343M    4096M  17.2.6     90a2664234e1  d96a425ae5c9
> osd.16                  sg-osd03               running (3d)      7m ago
> 4w    2492M    4096M  17.2.6     90a2664234e1  637c43176fdc
> osd.17                  sg-osd01               running (3d)      8m ago
> 4w    3011M    4096M  17.2.6     90a2664234e1  a39456dd2c0c
> osd.18                  sg-osd03               running (3d)      7m ago
> 4w    2341M    4096M  17.2.6     90a2664234e1  7b750672391b
> osd.19                  sg-osd01               running (3d)      8m ago
> 4w    2672M    4096M  17.2.6     90a2664234e1  6358234e95f5
> osd.20                  sg-osd03               running (3d)      7m ago
> 4w    3297M    4096M  17.2.6     90a2664234e1  2ecba6b066fd
> osd.21                  sg-osd01               running (3d)      8m ago
> 4w    5147M    4096M  17.2.6     90a2664234e1  1d0e4efe48bd
> osd.22                  sg-osd03               running (3d)      7m ago
> 4w    3432M    4096M  17.2.6     90a2664234e1  5bb6d4f71f9d
> osd.23                  sg-osd03               running (3d)      7m ago
> 4w    2893M    4096M  17.2.6     90a2664234e1  f7e1948e57d5
> osd.24                  sg-osd02               running (3d)      7m ago
>  12d    3007M    4096M  17.2.6     90a2664234e1  85d896abe467
> osd.25                  sg-osd02               running (3d)      7m ago
>  12d    2666M    4096M  17.2.6     90a2664234e1  9800cd8ff1a1
> osd.26                  sg-osd02               running (3d)      7m ago
>  12d    2918M    4096M  17.2.6     90a2664234e1  f2e0b2d50625
> osd.27                  sg-osd02               running (3d)      7m ago
>  12d    3586M    4096M  17.2.6     90a2664234e1  ee2fa3a9b40a
> osd.28                  sg-osd02               running (3d)      7m ago
>  12d    2391M    4096M  17.2.6     90a2664234e1  4cf7adf9f60a
> osd.29                  sg-osd02               running (3d)      7m ago
>  12d    5642M    4096M  17.2.6     90a2664234e1  8c1ba98a1738
> osd.30                  sg-osd02               running (3d)      7m ago
>  12d    4728M    4096M  17.2.6     90a2664234e1  e308497de2e5
> osd.31                  sg-osd02               running (3d)      7m ago
>  12d    3615M    4096M  17.2.6     90a2664234e1  89b80d464627
> osd.32                  sg-osd02               running (3d)      7m ago
>  12d    1703M    4096M  17.2.6     90a2664234e1  1e4608786078
> osd.33                  sg-osd02               running (3d)      7m ago
>  12d    3039M    4096M  17.2.6     90a2664234e1  16e04a1da987
> osd.34                  sg-osd02               running (3d)      7m ago
>  12d    2434M    4096M  17.2.6     90a2664234e1  014076e28182
>
>
>
> btw as you said, I feel this value does not have much impact because if we
> set 1 or 4GB. It still can consume much memory when they need more memory,
>
> Nguyen Huu Khoi
>
>
> On Thu, Nov 16, 2023 at 2:13 PM Zakhar Kirpichenko <zakhar@xxxxxxxxx>
> wrote:
>
>> You're most welcome!
>>
>> I'd say that real leak issues are very rare. For example, these are my
>> OSDs with memory target=16GB which have been running for quite a while, as
>> you can see they don't exceed 16 GB RSS:
>>
>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>> COMMAND
>>   92298 167       20   0   18.7g  15.8g  12264 S   1.3   4.2   1974:06
>> ceph-osd
>>   94527 167       20   0   19.5g  15.8g  12248 S   2.3   4.2   2287:26
>> ceph-osd
>>   93749 167       20   0   19.1g  15.7g  12804 S   2.3   4.2   1768:22
>> ceph-osd
>>   89534 167       20   0   20.1g  15.7g  12412 S   4.0   4.2   2512:18
>> ceph-osd
>> 3706552 167       20   0   20.5g  15.7g  15588 S   2.3   4.2   1385:26
>> ceph-osd
>>   90297 167       20   0   19.5g  15.6g  12432 S   3.0   4.1   2261:00
>> ceph-osd
>>    9799 167       20   0   22.9g  15.4g  12432 S   2.0   4.1   2494:00
>> ceph-osd
>>    9778 167       20   0   23.1g  15.3g  12556 S   2.6   4.1   2591:25
>> ceph-osd
>>    9815 167       20   0   23.4g  15.1g  12584 S   2.0   4.0   2722:28
>> ceph-osd
>>    9809 167       20   0   22.3g  15.1g  12068 S   3.6   4.0   5234:52
>> ceph-osd
>>    9811 167       20   0   23.4g  14.9g  12952 S   2.6   4.0   2593:19
>> ceph-osd
>>    9819 167       20   0   23.9g  14.9g  12636 S   2.6   4.0   3043:19
>> ceph-osd
>>    9820 167       20   0   23.3g  14.8g  12884 S   2.0   3.9   3073:43
>> ceph-osd
>>    9769 167       20   0   22.4g  14.7g  12612 S   2.6   3.9   2840:22
>> ceph-osd
>>    9836 167       20   0   24.0g  14.7g  12648 S   2.6   3.9   3300:34
>> ceph-osd
>>    9818 167       20   0   22.0g  14.7g  12152 S   2.3   3.9   5729:06
>> ceph-osd
>>
>> Long story short, if you set reasonable targets, OSDs are unlikely to
>> exceed them during normal operations. If you set memory targets too low, it
>> is likely that they will be exceeded as OSDs need reasonable amounts of
>> memory to operate.
>>
>> /Z
>>
>> On Thu, 16 Nov 2023 at 08:37, Nguyễn Hữu Khôi <nguyenhuukhoinw@xxxxxxxxx>
>> wrote:
>>
>>> Hello. Thank you very much for your explanation.
>>>
>>> Because I thought that  osd_memory_target will help me limit OSD memory
>>> usage which will help prevent memory leak - I tried google and many people
>>> talked about memory leak. A nice man, @Anthony D'Atri
>>> <aad@xxxxxxxxxxxxxx> , on this forum helped me to understand that it
>>> wont help to limit OSD usage.
>>>
>>> I set it to 1GB because I want to see how this option works.
>>>
>>> I will read and test with caches options.
>>>
>>> Nguyen Huu Khoi
>>>
>>>
>>> On Thu, Nov 16, 2023 at 12:23 PM Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> osd_memory_target is a "target", i.e. an OSD make an effort to consume
>>>> up to the specified amount of RAM, but won't consume less than required for
>>>> its operation and caches, which have some minimum values such as for
>>>> example osd_memory_cache_min, bluestore_cache_size,
>>>> bluestore_cache_size_hdd, bluestore_cache_size_ssd, etc. The recommended
>>>> and default OSD memory target is 4 GB.
>>>>
>>>> Your nodes have a sufficient amount of RAM, thus I don't see why you
>>>> would want to reduce OSD memory consumption below the recommended defaults,
>>>> especially considering that in-memory caches are important for Ceph
>>>> operations as they're many times faster than the fastest storage devices. I
>>>> run my OSDs with osd_memory_target=17179869184 (16 GB) and it helps,
>>>> especially with slower HDD-backed OSDs.
>>>>
>>>> /Z
>>>>
>>>> On Thu, 16 Nov 2023 at 01:02, Nguyễn Hữu Khôi <
>>>> nguyenhuukhoinw@xxxxxxxxx> wrote:
>>>>
>>>>> Hello,
>>>>> I am using a CEPH cluster. After monitoring it, I set:
>>>>>
>>>>> ceph config set osd osd_memory_target_autotune false
>>>>>
>>>>> ceph config set osd osd_memory_target 1G
>>>>>
>>>>> Then restart all OSD services then do test again, I just use fio
>>>>> commands
>>>>> from multi clients and I see that OSD memory consume is over 1GB.
>>>>> Would you
>>>>> like to help me understand this case?
>>>>>
>>>>> Ceph version: Quincy
>>>>>
>>>>> OSD: 3 nodes with 11 nvme each and 512GB ram per node.
>>>>>
>>>>> CPU: 2 socket xeon gold 6138 cpu with 56 cores per socket.
>>>>>
>>>>> Network: 25Gbps x 2 for public network and 25Gbps x 2 for storage
>>>>> network.
>>>>> MTU is 9000
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>>
>>>>> Nguyen Huu Khoi
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux