Re: CephFS metadata: Large omap object found

Eugen Block <eblock@xxxxxx> · Tue, 01 Oct 2019 10:34:55 +0000

Thank you, Paul.

The thresholds were recently reduced by a factor of 10. I guess you
have a lot of (open) files? Maybe use more active MDS servers?

We'll consider adding more MDS servers, although the workload hasn't  
been an issue yet.

Or increase the thresholds, I wouldn't worry at all about 200k omap
keys if you are running on reasonable hardware.
The usual argument for a low number of omap keys is recovery time, but
if you are running a metadata-heavy workload on something that has
problems recovering 200k keys in less than a few seconds, then you are
doing something wrong anyways.

We haven't had any issues with MDS failovers and/or recovery yet, I  
guess higher thresholds would be fine.
To get rid of the warning (for a week) it was sufficient to issue a  
deep-scrub on the affected PG while the listomapkeys output was lower  
than 200k. Maybe we were just "lucky" until now because the  
deep-scrubs are issued outside of business hours, so the number of  
open files should be lower.

Anyway, thank you for your input, it seems as if this is not a problem  
at the moment.

Regards,
Eugen

Zitat von Paul Emmerich <paul.emmerich@xxxxxxxx>:

The thresholds were recently reduced by a factor of 10. I guess you
have a lot of (open) files? Maybe use more active MDS servers?

Or increase the thresholds, I wouldn't worry at all about 200k omap
keys if you are running on reasonable hardware.
The usual argument for a low number of omap keys is recovery time, but
if you are running a metadata-heavy workload on something that has
problems recovering 200k keys in less than a few seconds, then you are
doing something wrong anyways.

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Tue, Oct 1, 2019 at 9:10 AM Eugen Block <eblock@xxxxxx> wrote:

Hi all,

we have a new issue in our Nautilus cluster.
The large omap warning seems to be more common for RGW usage, but we
currently only use CephFS and RBD. I found one thread [1] regarding
metadata pool, but it doesn't really help in our case.

The deep-scrub of PG 36.6 brought up this message (deep-scrub finished
with "ok"):

2019-09-30 20:18:22.548401 osd.9 (osd.9) 275 : cluster [WRN] Large
omap object found. Object: 36:654134d2:::mds0_openfiles.0:head Key
count: 238621 Size (bytes): 9994510

I checked xattr (none) and omapheader:

ceph01:~ # rados -p cephfs-metadata listxattr mds0_openfiles.0
ceph01:~ # rados -p cephfs-metadata getomapheader mds0_openfiles.0
header (42 bytes) :
00000000  13 00 00 00 63 65 70 68  20 66 73 20 76 6f 6c 75   
|....ceph fs volu|
00000010  6d 65 20 76 30 31 31 01  01 0d 00 00 00 74 c3 12  |me  
v011......t..|
00000020  00 00 00 00 00 01 00 00  00 00                    |..........|
0000002a

ceph01:~ # ceph fs volume ls
[
   {
     "name": "cephfs"
   }
]

The respective OSD has default thresholds regarding large_omap:

ceph02:~ # ceph daemon osd.9 config show | grep large_omap
     "osd_deep_scrub_large_omap_object_key_threshold": "200000",
     "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",

Can anyone point me to a solution for this?

Best regards,
Eugen

[1]  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033813.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx