Re: OSD hanging on 12.2.12 by message worker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Max,

I don't think this is allocator related issue. The symptoms that triggered us to start using bitmap allocator over stupid one were:

- write op latency gradually increasing over time (days not hours)

- perf showing significant amount of time spent in allocator related function

- OSD reboot was the only remedy.

It had nothing related to network activity and/or client restarts.


Thanks,

Igor


On 6/7/2019 11:05 AM, Max Vernimmen wrote:
Thank you for the suggestion to use the bitmap allocator. I looked at the ceph documentation and could find no mention of this setting. This makes me wonder how safe and production ready this setting really is. I'm hesitant to apply that to our production environment.
If the allocator setting helps to resolve the problem then it looks to me like there is a bug in the 'stupid' allocator that is causing this behavior. Would this qualify for creating a bug report or is some more debugging needed before I can do that?

On Thu, Jun 6, 2019 at 11:18 AM Stefan Kooman <stefan@xxxxxx> wrote:
Quoting Max Vernimmen (vernimmen@xxxxxxxxxxxxx):
>
> This is happening several times per day after we made several changes at
> the same time:
>
>    - add physical ram to the ceph nodes
>    - move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache kv
>    max' to 'bluestore cache autotune = 1' and 'osd memory target =
>    20401094656'.
>    - update ceph from 12.2.8 to 12.2.11
>    - update clients from 12.2.8 to 12.2.11
>
> We have since upgraded the ceph nodes to 12.2.12 but it did not help to fix
> this problem.

Have you tried the new bitmap allocator for the OSDs already (available
since 12.2.12):

[osd]

# MEMORY ALLOCATOR
bluestore_allocator = bitmap
bluefs_allocator = bitmap

The issues you are reporting sound like an issue many of us have seen on
luminous and mimic clusters and has been identified to be caused by the
"stupid allocator" memory allocator.

Gr. Stefan


--
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx


--
Max Vernimmen
Senior DevOps Engineer
Textkernel

------------------------------------------------------------------------------
Textkernel BV, Nieuwendammerkade 26/a5, 1022 AB, Amsterdam, NL
-----------------------------------------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux