Hi Dan and Patrick, I created a tracker item for the snapshot issue: https://tracker.ceph.com/issues/52581 Patrick, could you please take a quick look at it? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 07 September 2021 15:30 To: Dan van der Ster; Patrick Donnelly Cc: ceph-users Subject: Re: MDS daemons stuck in resolve, please help Hi Dan and Patrick, I collected some additional information trying the following: delete a snapshot, add a snapshot. My main concern was that no snaptrim operations would be executed. However, this is not the case. After removing a snapshot, the PGs on the respective pools started snaptrimming. So, no issue here. About the extra snapshots in pool con-fs2-data2: 1) before deleting snapshot: pool 12 'con-fs2-meta1' no removed_snaps list shown pool 13 'con-fs2-meta2' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] pool 14 'con-fs2-data' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] pool 17 'con-fs2-data-ec-ssd' removed_snaps [2~18e,191~2c,1be~144,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] pool 19 'con-fs2-data2' removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~1,305~1,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~1] # ceph daemon mds.ceph-23 dump snaps | grep snapid "snapid": 400, "snapid": 445, "snapid": 770, "snapid": 772, "snapid": 774, "snapid": 776, "snapid": 778, "snapid": 780, "snapid": 782, "snapid": 784, "snapid": 786, "snapid": 788, 2) after delete (772) + add snapshot (791): pool 12 'con-fs2-meta1' pool 13 'con-fs2-meta2' removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] pool 14 'con-fs2-data' removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] removed_snaps_queue [304~1,316~1] pool 17 'con-fs2-data-ec-ssd' removed_snaps [2~18e,191~2c,1be~144,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] pool 19 'con-fs2-data2' removed_snaps [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3,307~1,309~1,30b~1,30d~1,30f~1,311~1,313~1,315~2] removed_snaps_queue [304~1,316~1] # ceph daemon mds.ceph-23 dump snaps | grep snapid "snapid": 400, "snapid": 445, "snapid": 770, "snapid": 774, "snapid": 776, "snapid": 778, "snapid": 780, "snapid": 782, "snapid": 784, "snapid": 786, "snapid": 788, "snapid": 791, The removed snaps set was correctly updated in the fields 303~3 and 315~2. The problematic snapshots are the ones still present in pool con-fs2-data2 in the set [2d6~1,2d8~1,2da~1,2dc~1,2de~1,2e0~1,2e2~1,2e4~1,2e6~1,2e8~1,2ea~18,303~3], which should not be present. They correspond to decimal snap IDs 727 729 731 733 735 737 739 741 743 745 747. Some relevant history: We had pool con-fs2-data as data pool on "/" from the beginning. About three weeks ago we replaces the data pool on root with con-fs2-data2. The extra snapshots might date back to the time right after exchanging the data pool on "/". Maybe we hit a bug that occurs by changing directory layouts while snapshots are present on a system? The 11 extra snapshots seem to cause severe performance issues. I would be most grateful for any advice how to get rid or them. The corresponding fs snapshots have been deleted at least a week ago. Many thanks and best regards! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx