Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Den ons 10 jan. 2024 kl 19:20 skrev huxiaoyu@xxxxxxxxxxxx
<huxiaoyu@xxxxxxxxxxxx>:
> Dear Ceph folks,
>
> I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one with replication 3, and the other with EC 4+2. After around 400 days runing quietly and smoothly, recently the two clusters occured with similar problems: some of OSDs consume ca 18 GB while the memory target is setting at 2GB.
>
> What could wrong in the background?  Does it mean any slow OSD memory leak issues with 14.2.22 which i do not know yet?

While I am sorry to not be able to help you with the actual problem, I
just wanted to comment that the memory targets and user-selectable ram
sizes are only parts of what the OSD will use (even if you have no
bugs or memory leaks), so you can tell the OSD to aim for 2 or 6 or 12
G and it will work towards this goal for the resizeable buffers and
caches and all that, but there are some parts you don't get to control
and which (at least during recovery and so on) eat tons of ram
regardless of your preferences. The OSD will try to allocate as much
as it feels needed in order to fix itself without considering the
targets you may have set for it.

It seems very strange that your OSDs would all at the same time jump
up to use 9x as much ram as expected and I really hope you can figure
out why and how to get it back to normal operations again, I just
wanted to add my comment so that others who are sizing their OSD hosts
don't think that "if I set a value to 2G, then I can run 8 OSDs on
this 16G ram box and all will be fine since no OSD will eat more than
2". It will work like this when all is fine and healthy and when the
cluster is new and almost empty but not in all situations.
When/if you do get memory leaks, having tons of ram will just delay
the inevitable of course, so it's not a solution to just buy too much
ram either.

I would at least try setting noout+norebalance and restart one of
those problematic OSDs and see if they quickly return to this huge
overdraw of memory or not.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux