Re: Large omap objects - how to fix ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for everyones comments, including the thread hijackers :)

I solved this in our infrastructure slightly differently:

1) find largest omap(s)
# for i in `rados -p .bbp-gva-master.rgw.buckets.index ls`; do echo -n "$i:"; rados -p .bbp-gva-master.rgw.buckets.index listomapkeys $i |wc -l; done > omapkeys
# sort -t: -k2 -r -n omapkeys  |head -1
.dir.bbp-gva-master.125103342.18:7558822

2) confirm that the above index is not used by any buckets
# cat bucketstats
#!/bin/bash
for bucket in $(radosgw-admin bucket list | jq -r .[]); do
    bucket_id=$(radosgw-admin metadata get bucket:${bucket} | jq -r .data.bucket.bucket_id)
    marker=$(radosgw-admin metadata get bucket:${bucket} | jq -r .data.bucket.marker)
    echo "$bucket:$bucket_id:$marker"
done
# ./bucketstats > bucketstats.out
# grep 125103342.18 bucketstats.out

3) delete the rados object
rados -p .bbp-gva-master.rgw.buckets.index rm .dir.bbp-gva-master.125103342.18

4) perform a deep scrub on the PGs that were affected
# for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail -n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
137.1b: 1
137.36: 1
# ceph pg deep-scrub 137.1b
# ceph pg deep-scrub 137.36

Kind regards,

Ben Morrice

______________________________________________________________________
Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland
On 10/31/2018 11:02 AM, Alexandru Cucu wrote:
Hi,

Didn't know that auto resharding does not remove old instances. Wrote
my own script for cleanup as I've discovered this before reading your
message.
Not very wlll tested, but here it is:

for bucket in $(radosgw-admin bucket list | jq -r .[]); do
    bucket_id=$(radosgw-admin metadata get bucket:${bucket} | jq -r
.data.bucket.bucket_id)
    marker=$(radosgw-admin metadata get bucket:${bucket} | jq -r
.data.bucket.marker)
    for instance in $(radosgw-admin metadata list bucket.instance | jq
-r .[] | grep "^${bucket}:" | grep -v ${bucket_id} | grep -v ${marker}
| cut -f2 -d':'); do
         radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
         radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
    done
done


On Tue, Oct 30, 2018 at 3:30 PM Tomasz Płaza <tomasz.plaza@xxxxxxxxxx> wrote:
Hi hijackers,

Please read: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030317.html

TL;DR: Ceph should reshard big indexes, but after that it leaves them to be removed manually. Starting from some version, deep-scrub reports indexes above some threshold as HALTH_WARN. You should find it in osd logs. If You do not have logs, just listomapkeys on every object in default.rgw.buckets.index and find the biggest ones... it should be safe to remove those (radosgw-admin bi purge) but I can not guarantee it.


On 26.10.2018 at 17:18, Florian Engelmann wrote:

Hi,

hijacking the hijacker! Sorry!

radosgw-admin bucket reshard --bucket somebucket --num-shards 8
*** NOTICE: operation will not remove old bucket index objects ***
***         these will need to be removed manually             ***
tenant:
bucket name: somebucket
old bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.1923153.1
new bucket instance id: cb1594b3-a782-49d0-a19f-68cd48870a63.3119759.1
total entries: 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000 41000 42000 43000 44000 45000 46000 47000 48000 49000 50000 51000 52000 53000 54000 55000 56000 57000 58000 59000 60000 61000 62000 63000 64000 65000 66000 67000 68000 69000 70000 71000 72000 73000 74000 75000 76000 77000 78000 79000 80000 81000 82000 83000 84000 85000 86000 87000 88000 89000 90000 91000 92000 93000 94000 95000 96000 97000 98000 99000 100000 101000 102000 103000 104000 105000 106000 107000 108000 109000 110000 111000 112000 113000 114000 115000 116000 117000 118000 119000 120000 121000 122000 123000 124000 125000 126000 127000 128000 129000 130000 131000 132000 133000 134000 135000 136000 137000 138000 139000 140000 141000 142000 143000 144000 145000 146000 147000 148000 149000 150000 151000 152000 153000 154000 155000 156000 157000 158000 159000 160000 161000 162000 163000 164000 165000 166000 167000 168000 169000 170000 171000 172000 173000 174000 175000 176000 177000 178000 179000 180000 181000 182000 183000 184000 185000 186000 187000 188000 189000 190000 191000 192000 193000 194000 195000 196000 197000 198000 199000 200000 201000 202000 203000 204000 205000 206000 207000 207660

What to do now?

ceph -s is still:

    health: HEALTH_WARN
            1 large omap objects

But I have no idea how to:
*** NOTICE: operation will not remove old bucket index objects ***
***         these will need to be removed manually             ***


All the best,
Flo


Am 10/26/18 um 3:56 PM schrieb Alexandru Cucu:

Hi,

Sorry to hijack this thread. I have a similar issue also with 12.2.8
recently upgraded from Jewel.

I my case all buckets are within limits:
     # radosgw-admin bucket limit check | jq '.[].buckets[].fill_status' | uniq
     "OK"

     # radosgw-admin bucket limit check | jq
'.[].buckets[].objects_per_shard'  | sort -n | uniq
     0
     1
     30
     109
     516
     5174
     50081
     50088
     50285
     50323
     50336
     51826

rgw_max_objs_per_shard is set to the default of 100k

---
Alex Cucu

On Fri, Oct 26, 2018 at 4:09 PM Ben Morrice <ben.morrice@xxxxxxx> wrote:


Hello all,

After a recent Luminous upgrade (now running 12.2.8 with all OSDs
migrated to bluestore, upgraded from 11.2.0 and running filestore) I am
currently experiencing the warning 'large omap objects'.
I know this is related to large buckets in radosgw, and luminous
supports 'dynamic sharding' - however I feel that something is missing
from our configuration and i'm a bit confused on what the right approach
is to fix it.

First a bit of background info:

We previously had a multi site radosgw installation, however recently we
decommissioned the second site. With the radosgw multi-site
configuration we had 'bucket_index_max_shards = 0'. Since
decommissioning the second site, I have removed the secondary zonegroup
and changed 'bucket_index_max_shards' to be 16 for the single primary zone.
All our buckets do not have a 'num_shards' field when running
'radosgw-admin bucket stats --bucket <bucketname>'
Is this normal ?

Also - I'm finding it difficult to find out exactly what to do with the
buckets that are affected with 'large omap' (see commands below).
My interpretation of 'search the cluster log' is also listed below.

What do I need to do to with the below buckets get back to an overall
ceph HEALTH OK state ? :)


# ceph health detail
HEALTH_WARN 2 large omap objects
2 large objects found in pool '.bbp-gva-master.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

# ceph osd pool get .bbp-gva-master.rgw.buckets.index pg_num
pg_num: 64

# for i in `ceph pg ls-by-pool .bbp-gva-master.rgw.buckets.index | tail
-n +2 | awk '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep
num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
137.1b: 1
137.36: 1

# cat buckets
#!/bin/bash
buckets=`radosgw-admin metadata list bucket |grep \" | cut -d\" -f2`
for i in $buckets
do
    id=`radosgw-admin bucket stats --bucket $i |grep \"id\" | cut -d\" -f4`
    pg=`ceph osd map .bbp-gva-master.rgw.buckets.index ${id} | awk
'{print $11}' | cut -d\( -f2 | cut -d\) -f1`
    echo "$i:$id:$pg"
done
# ./buckets > pglist
# egrep '137.1b|137.36' pglist |wc -l
192

The following doesn't appear to do change anything

# for bucket in `cut -d: -f1 pglist`; do radosgw-admin reshard add
--bucket $bucket --num-shards 8; done

# radosgw-admin reshard process



--
Kind regards,

Ben Morrice

______________________________________________________________________
Ben Morrice | e: ben.morrice@xxxxxxx | t: +41-21-693-9670
EPFL / BBP
Biotech Campus
Chemin des Mines 9
1202 Geneva
Switzerland

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux