Re: MDS daemons stuck in resolve, please help

Frank Schilder <frans@xxxxxx> · Mon, 13 Sep 2021 09:36:57 +0000

Hi Dan and Patrick,

I created a tracker item for the snapshot issue: https://tracker.ceph.com/issues/52581

Patrick, could you please take a quick look at it?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 07 September 2021 15:30
To: Dan van der Ster; Patrick Donnelly
Cc: ceph-users
Subject:  Re: MDS daemons stuck in resolve, please help

Hi Dan and Patrick,

I collected some additional information trying the following: delete a snapshot, add a snapshot. My main concern was that no snaptrim operations would be executed. However, this is not the case. After removing a snapshot, the PGs on the respective pools started snaptrimming. So, no issue here. About the extra snapshots in pool con-fs2-data2:

1) before deleting snapshot:

pool 12 'con-fs2-meta1' no removed_snaps list shown
pool 13 'con-fs2-meta2' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
pool 14 'con-fs2-data' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
pool 17 'con-fs2-data-ec-ssd' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]
pool 19 'con-fs2-data2' removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1]

# ceph daemon mds.ceph-23 dump snaps | grep snapid
            "snapid": 400,
            "snapid": 445,
            "snapid": 770,
            "snapid": 772,
            "snapid": 774,
            "snapid": 776,
            "snapid": 778,
            "snapid": 780,
            "snapid": 782,
            "snapid": 784,
            "snapid": 786,
            "snapid": 788,

2) after delete (772) + add snapshot (791):

pool 12 'con-fs2-meta1'
pool 13 'con-fs2-meta2' removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 14 'con-fs2-data' removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
  removed_snaps_queue [304~1,316~1]
pool 17 'con-fs2-data-ec-ssd' removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
pool 19 'con-fs2-data2' removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2]
        removed_snaps_queue [304~1,316~1]

# ceph daemon mds.ceph-23 dump snaps | grep snapid
            "snapid": 400,
            "snapid": 445,
            "snapid": 770,
            "snapid": 774,
            "snapid": 776,
            "snapid": 778,
            "snapid": 780,
            "snapid": 782,
            "snapid": 784,
            "snapid": 786,
            "snapid": 788,
            "snapid": 791,

The removed snaps set was correctly updated in the fields 303~3 and 315~2. The problematic snapshots are the ones still present in pool con-fs2-data2 in the set [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3], which should not be present. They correspond to decimal snap IDs 727 729 731 733 735 737 739 741 743 745 747.

Some relevant history: We had pool con-fs2-data as data pool on "/" from the beginning. About three weeks ago we replaces the data pool on root with con-fs2-data2. The extra snapshots might date back to the time right after exchanging the data pool on "/". Maybe we hit a bug that occurs by changing directory layouts while snapshots are present on a system?

The 11 extra snapshots seem to cause severe performance issues. I would be most grateful for any advice how to get rid or them. The corresponding fs snapshots have been deleted at least a week ago.

Many thanks and best regards!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx