Odp.: pgs incomplete and inactive

Tomasz Kuzemko <tomasz.kuzemko@xxxxxxxxxxxx> · Mon, 27 Aug 2018 11:36:03 +0000

Hello Josef,
I would suggest setting up a bigger disk (if not physical then maybe a LVM volume from 2 smaller disks) and cloning (remember about extended attributes!) the OSD data dir to the new disk, then try to bring the OSD back into cluster.

--
Tomasz Kuzemko
tomasz.kuzemko@xxxxxxxxxxxx

________________________________________
Od: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> w imieniu użytkownika Josef Zelenka <josef.zelenka@xxxxxxxxxxxxxxxx>
Wysłane: poniedziałek, 27 sierpnia 2018 13:29
Do: Paul Emmerich; ceph-users@xxxxxxxxxxxxxx
Temat: Re:  pgs incomplete and inactive

The fullratio was ignored, that's why that happenned most likely. I
can't delete pgs, because it's only kb's worth of space - the osd is
40gb, 39.8 gb is taken up by omap - that's why i can't move/extract. Any
clue on how to compress/move away the omap dir?

On 27/08/18 12:34, Paul Emmerich wrote:
> Don't ever let an OSD run 100% full, that's usually bad news.
> Two ways to salvage this:
>
> 1. You can try to extract the PGs with ceph-objectstore-tool and
> inject them into another OSD; Ceph will find them and recover
> 2. You seem to be using Filestore, so you should easily be able to
> just delete a whole PG on the full OSD's file system to make space
> (preferably one that is already recovered and active+clean even
> without the dead OSD)
>
>
> Paul
>
> 2018-08-27 10:44 GMT+02:00 Josef Zelenka <josef.zelenka@xxxxxxxxxxxxxxxx>:
>> Hi, i've had a very ugly thing happen to me over the weekend. Some of my
>> OSDs in a root that handles metadata pools overflowed to 100% disk usage due
>> to omap size(even though i had 97% full ratio, which is odd) and refused to
>> start. There were some pgs on those OSDs that went away with them. I have
>> tried compacting the omap, moving files away etc, but nothing  - i can't
>> export the pgs, i get errors like this:
>>
>> 2018-08-27 04:42:33.436182 7fcb53382580  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1535359353436170, "job": 1, "event": "recovery_started",
>> "log_files": [5504, 5507]}
>> 2018-08-27 04:42:33.436194 7fcb53382580  4 rocksdb:
>> [/build/ceph-12.2.5/src/rocksdb/db/db_impl_open.cc:482] Recovering log #5504
>> mode 2
>> 2018-08-27 04:42:35.422502 7fcb53382580  4 rocksdb:
>> [/build/ceph-12.2.5/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all
>> background work
>> 2018-08-27 04:42:35.431613 7fcb53382580  4 rocksdb:
>> [/build/ceph-12.2.5/src/rocksdb/db/db_impl.cc:343] Shutdown complete
>> 2018-08-27 04:42:35.431716 7fcb53382580 -1 rocksdb: IO error: No space left
>> on device/var/lib/ceph/osd/ceph-5//current/omap/005507.sst: No space left on
>> device
>> Mount failed with '(1) Operation not permitted'
>> 2018-08-27 04:42:35.432945 7fcb53382580 -1
>> filestore(/var/lib/ceph/osd/ceph-5/) mount(1723): Error initializing rocksdb
>> :
>>
>> I decided to take the loss and mark the osds as lost and remove them from
>> the cluster, however, it left 4 pgs hanging in incomplete + inactive state,
>> which apparently prevents my radosgw from starting. Is there another way to
>> export/import the pgs into their new osds/recreate them? I'm running
>> Luminous 12.2.5 on Ubuntu 16.04.
>>
>> Thanks
>>
>> Josef
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com