Re: Remove empty orphaned PGs not mapped to a pool

Eugen Block <eblock@xxxxxx> · Thu, 05 Oct 2023 10:12:52 +0000

Maybe the mismatching OSD versions had an impact on the unclean tier  
removal, but this is just a guess. I couldn't reproduce it in a  
Pacific test cluster, the removal worked fine without leaving behind  
empty PGs. But I had only a few rbd images in that pool so it's not  
really representative, I guess.
Hopefully Joachim has some more details how they got rid of this situation.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello Eugen, Hello Joachim,

@Joachim: Interesting! And you got empty PGs, too? How did you solve  
the problem?

@Eugen: This is one of our biggest clusters and we're in the process  
to migrate from Nautilus to Octopus and to migrate from CentOS to  
Ubuntu.

The cache tier pool's OSDs were still version 14 OSDs. Most of the  
other OSDs are version 15 already.

So I tested the command:

ceph-objectstore-tool --data-path /path/to/osd --op remove --pgid 3.0 --force

in a test cluster environment and this worked fine.

But the test scenario was not similar to our productive environment  
and the PG wasn't empty.

I did not find a way to emulate the same situation in the test scenario, yet.

Best,
Malte

Am 05.10.23 um 11:03 schrieb Eugen Block:
I know, I know... but since we are already using it (for years) I  
have to check how to remove it safely, maybe as long as we're on  
Pacific. ;-)

Zitat von Joachim Kraftmayer - ceph ambassador  
<joachim.kraftmayer@xxxxxxxxx>:

@Eugen

We have seen the same problems 8 years ago. I can only recommend  
never to use cache tiering in production.
At Cephalocon this was part of my talk and as far as I remember  
cache tiering will also disappear from ceph soon.

Cache tiering has been deprecated in the Reef release as it has  
lacked a maintainer for a very long time. This does not mean it  
will be certainly removed, but we may choose to remove it without  
much further notice.

https://docs.ceph.com/en/latest/rados/operations/cache-tiering/

Regards, Joachim

___________________________________
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 05.10.23 um 10:02 schrieb Eugen Block:
Which ceph version is this? I'm trying to understand how removing  
a pool leaves the PGs of that pool... Do you have any logs or  
something from when you removed the pool?
We'll have to deal with a cache tier in the forseeable future as  
well so this is quite relevant for us as well. Maybe I'll try to  
reproduce it in a test cluster first.
Are those SSDs exclusively for the cache tier or are they used by  
other pools as well? If they were used only for the cache tier  
you should be able to just remove them without any risk. But as I  
said, I'd rather try to understand before purging them.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello Eugen,

yes, we followed the documentation and everything worked fine.  
The cache is gone.

Removing the pool worked well. Everything is clean.

The PGs are empty active+clean.

Possible solutions:

1.

ceph pg {pg-id} mark_unfound_lost delete

I do not think this is the right way since it is for PGs with  
status unfound. But it could work also.

2.

Set the following for the three disk:

ceph osd lost {osd-id}

I am not sure how the cluster will react to this.

3.

ceph-objectstore-tool --data-path /path/to/osd --op remove  
--pgid 3.0 --force

Now, will the cluster accept the removed PG status?

4.

The three disks are still presented in the crush rule, class  
ssd, each single OSD under one host entry.

What if I remove them from crush?

Do you have a better idea, Eugen?

Best,
Malte

Am 04.10.23 um 09:21 schrieb Eugen Block:
Hi,

just for clarity, you're actually talking about the cache tier  
as described in the docs [1]? And you followed the steps until  
'ceph osd tier remove cold-storage hot-storage' successfully?  
And the pool has been really deleted successfully ('ceph osd  
pool ls detail')?

[1]  
https://docs.ceph.com/en/latest/rados/operations/cache-tiering/#removing-a-cache-tier

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello,

we removed an SSD cache tier and its pool.

The PGs for the pool do still exist.

The cluster is healthy.

The PGs are empty and they reside on the cache tier pool's SSDs.

We like to take out the disks but it is not possible. The  
cluster sees the PGs and answers with a HEALTH_WARN.

Because of the replication of three there are still 128 PGs on  
three of the 24 OSDs. We were able to remove the other OSDs.

Summary:

- pool removed
- 3 x 128 empty PGs still exist
- 3 of 24 OSDs still exist

How is it possible to remove these empty and healthy PGs?

The only way I found was something like:

ceph pg {pg-id} mark_unfound_lost delete

Is that the right way?

Some output of:

ceph pg ls-by-osd 23

PG      OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES  
OMAP_BYTES* OMAP_KEYS*  LOG   STATE         SINCE  
VERSION         REPORTED UP ACTING             SCRUB_STAMP  
DEEP_SCRUB_STAMP
3.0          0         0          0        0 0            0    
0     0  active+clean    27h 0'0    2627265:196316  
[15,6,23]p15  [15,6,23]p15 2023-09-28T12:41:52.982955+0200  
2023-09-27T06:48:23.265838+0200
3.1          0         0          0        0 0            0    
0     0  active+clean     9h 0'0    2627266:19330 [6,23,15]p6   
[6,23,15]p6 2023-09-29T06:30:57.630016+0200  
2023-09-27T22:58:21.992451+0200
3.2          0         0          0        0 0            0    
0     0  active+clean     2h 0'0   2627265:1135185  
[23,15,6]p23  [23,15,6]p23 2023-09-29T13:42:07.346658+0200  
2023-09-24T14:31:52.844427+0200
3.3          0         0          0        0 0            0    
0     0  active+clean    13h 0'0    2627266:193170 [6,15,23]p6  
 [6,15,23]p6 2023-09-29T01:56:54.517337+0200  
2023-09-27T17:47:24.961279+0200
3.4          0         0          0        0 0            0    
0     0  active+clean    14h 0'0   2627265:2343551  
[23,6,15]p23  [23,6,15]p23 2023-09-29T00:47:47.548860+0200  
2023-09-25T09:39:51.259304+0200
3.5          0         0          0        0 0            0    
0     0  active+clean     2h 0'0    2627265:194111  
[15,6,23]p15  [15,6,23]p15 2023-09-29T13:28:48.879959+0200  
2023-09-26T15:35:44.217302+0200
3.6          0         0          0        0 0            0    
0     0  active+clean     6h 0'0   2627265:2345717  
[23,15,6]p23  [23,15,6]p23 2023-09-29T09:26:02.534825+0200  
2023-09-27T21:56:57.500126+0200

Best regards,
Malte
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx