Hi Patrick, Xiubo and List,finally we managed to get the filesystem repaired and running again! YEAH, I'm so happy!!
Big thanks for your support Patrick and Xiubo! (Would love invite you for a beer)!
Please see some comments and (important?) questions below: On 6/25/24 03:14, Patrick Donnelly wrote:
On Mon, Jun 24, 2024 at 5:22 PM Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> wrote:(resending this, the original message seems that it didn't make it through between all the SPAM recently sent to the list, my apologies if it doubles at some point) Hi List, we are still struggeling to get our cephfs back online again, this is an update to inform you what we did so far, and we kindly ask for any input on this to get an idea on how to proceed: After resetting the journals Xiubo suggested (in a PM) to go on with the disaster recovery procedure: cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100 [root@ceph01-b ~]# cephfs-data-scan init Inode 0x0x1 already exists, skipping create. Use --force-init to overwrite the existing object. Inode 0x0x100 already exists, skipping create. Use --force-init to overwrite the existing object. We did not use --force-init and proceeded with scan_extents using a single worker, which was indeed very slow. After ~24h we interupted the scan_extents and restarted it with 32 workers which went through in about 2h15min w/o any issue. Then I started scan_inodes with 32 workers this was also finished after ~50min no output on stderr or stdout. I went on with scan_links, which after ~45 minutes threw the following error: # cephfs-data-scan scan_links Error ((2) No such file or directory)Not sure what this indicates necessarily. You can try to get more debug information using: [client] debug mds = 20 debug ms = 1 debug client = 20 in the local ceph.conf for the node running cephfs-data-scan.
I did that, and restarted the "cephfs-data-scan scan_links" .It didn't produce any additional debug output, however this time it just went through without error (~50 min)
We then reran "cephfs-data-scan cleanup" and it also finished without error after about 10h.
We then set the fs as repaired and all seems to work fin again: [root@ceph01-b ~]# ceph mds repaired 0 repaired: restoring rank 1:0 [root@ceph01-b ~]# ceph -s cluster: id: aae23c5c-a98b-11ee-b44d-00620b05cac4 health: HEALTH_OK services: mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 6d)mgr: cephmon-01.dsxcho(active, since 6d), standbys: cephmon-02.nssigg, cephmon-03.rgefle
mds: 1/1 daemons up, 5 standby osd: 336 osds: 336 up (since 2M), 336 in (since 4M) data: volumes: 1/1 healthy pools: 4 pools, 6401 pgs objects: 284.68M objects, 623 TiB usage: 890 TiB used, 3.1 PiB / 3.9 PiB avail pgs: 6206 active+clean 140 active+clean+scrubbing 55 active+clean+scrubbing+deep io: client: 3.9 MiB/s rd, 84 B/s wr, 482 op/s rd, 1.11k op/s wr [root@ceph01-b ~]# ceph fs status cephfs - 0 clients ======RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active default.cephmon-03.xcujhz Reqs: 0 /s 124k 60.3k 1993 0
POOL TYPE USED AVAIL ssd-rep-metadata-pool metadata 298G 63.4T sdd-rep-data-pool data 10.2T 84.5T hdd-ec-data-pool data 808T 1929T STANDBY MDS default.cephmon-01.cepqjp default.cephmon-01.pvnqad default.cephmon-02.duujba default.cephmon-02.nyfook default.cephmon-03.chjusjMDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
The msd log however shows some "bad backtrace on directory inode" messages:2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8082 from mon.1 2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:standby --> up:replay
2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 replay_start2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 waiting for osdmap 34331 (which blocklists prior instance) 2024-06-25T18:45:36.581+0000 7f858de4c700 0 mds.0.cache creating system inode with ino:0x100 2024-06-25T18:45:36.581+0000 7f858de4c700 0 mds.0.cache creating system inode with ino:0x1
2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.journal EResetJournal 2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.sessionmap wipe start 2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.sessionmap wipe result 2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.sessionmap wipe done2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.8082 Finished replaying journal 2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.8082 making mds journal writeable 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8083 from mon.1 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:replay --> up:reconnect
2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 reconnect_start 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 reopen_log 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 reconnect_done2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8084 from mon.1 2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:reconnect --> up:rejoin
2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.0.8082 rejoin_start 2024-06-25T18:45:38.583+0000 7f8594659700 1 mds.0.8082 rejoin_joint_start2024-06-25T18:45:38.592+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340 2024-06-25T18:45:38.680+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b 2024-06-25T18:45:38.754+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d90 2024-06-25T18:45:38.754+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d9f
2024-06-25T18:45:38.785+0000 7f858fe50700 1 mds.0.8082 rejoin_done2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8085 from mon.1 2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:rejoin --> up:active 2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.0.8082 recovery_done -- successful recovery!
2024-06-25T18:45:39.584+0000 7f8594659700 1 mds.0.8082 active_start 2024-06-25T18:45:39.585+0000 7f8594659700 1 mds.0.8082 cluster recovered.2024-06-25T18:45:42.409+0000 7f8591e54700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request. 2024-06-25T18:57:28.213+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x4
Is there anything that we can do about this, to get rid of the "bad backtrace on directory inode"?
Sone more question: 1.As Xiubo suggested, we now tried to mount the filesystem with the "nowsysnc" option <https://tracker.ceph.com/issues/61009#note-26>:
[root@ceph01-b ~]# mount -t ceph cephfs_user@.cephfs=/ /mnt/cephfs -o secretfile=/etc/ceph/ceph.client.cephfs_user.secret,nowsync
however the option seems not to show up in /proc/mounts [root@ceph01-b ~]# grep ceph /proc/mountscephfs_user@aae23c5c-a98b-11ee-b44d-00620b05cac4.cephfs=/ /mnt/cephfs ceph rw,relatime,name=cephfs_user,secret=<hidden>,ms_mode=prefer-crc,acl,mon_addr=10.1.3.21:3300/10.1.3.22:3300/10.1.3.23:3300 0 0
The kernel version is 5.14.0 (from Rocky 9.3) [root@ceph01-b ~]# uname -aLinux ceph01-b 5.14.0-362.24.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 13 17:33:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Is this expected? How can we make sure that the filesystem uses 'nowsync', so that we do not hit the bug <https://tracker.ceph.com/issues/61009> again?
2. There are two empty files in lost+found now. Is ist save to remove them? [root@ceph01-b lost+found]# ls -la total 0 drwxr-xr-x 2 root root 1 Jan 1 1970 . drwxr-xr-x 4 root root 2 Mar 13 21:22 .. -r-x------ 1 root root 0 Jun 20 23:50 100037a50e2 -r-x------ 1 root root 0 Jun 20 19:05 200049612e5 3.Are there any specific steps that we should perform now (scrub or similar things) before we put the filesystem into production again?
Best & thanks again Dietmar
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx