Re: [EXTERN] Re: Urgent help with degraded filesystem needed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,  Xiubo and List,

finally we managed to get the filesystem repaired and running again! YEAH, I'm so happy!!

Big thanks for your support Patrick and Xiubo! (Would love invite you for a beer)!


Please see some comments and (important?) questions below:

On 6/25/24 03:14, Patrick Donnelly wrote:
On Mon, Jun 24, 2024 at 5:22 PM Dietmar Rieder
<dietmar.rieder@xxxxxxxxxxx> wrote:

(resending this, the original message seems that it didn't make it through between all the SPAM recently sent to the list, my apologies if it doubles at some point)

Hi List,

we are still struggeling to get our cephfs back online again, this is an update to inform you what we did so far, and we kindly ask for any input on this to get an idea on how to proceed:

After resetting the journals Xiubo suggested (in a PM) to go on with the disaster recovery procedure:

cephfs-data-scan init skipped creating the inodes 0x0x1 and 0x0x100

[root@ceph01-b ~]# cephfs-data-scan init
Inode 0x0x1 already exists, skipping create.  Use --force-init to overwrite the existing object.
Inode 0x0x100 already exists, skipping create.  Use --force-init to overwrite the existing object.

We did not use --force-init and proceeded with scan_extents using a single worker, which was indeed very slow.

After ~24h we interupted the scan_extents and restarted it with 32 workers which went through in about 2h15min w/o any issue.

Then I started scan_inodes with 32 workers this was also finished after ~50min no output on stderr or stdout.

I went on with scan_links, which after ~45 minutes threw the following error:

# cephfs-data-scan scan_links
Error ((2) No such file or directory)

Not sure what this indicates necessarily. You can try to get more
debug information using:

[client]
   debug mds = 20
   debug ms = 1
   debug client = 20

in the local ceph.conf for the node running cephfs-data-scan.

I did that, and restarted the  "cephfs-data-scan scan_links" .

It didn't produce any additional debug output, however this time it just went through without error (~50 min)

We then reran "cephfs-data-scan cleanup" and it also finished without error after about 10h.

We then set the fs as repaired and all seems to work fin again:

[root@ceph01-b ~]# ceph mds repaired 0
repaired: restoring rank 1:0

[root@ceph01-b ~]# ceph -s
  cluster:
    id:     aae23c5c-a98b-11ee-b44d-00620b05cac4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum cephmon-01,cephmon-03,cephmon-02 (age 6d)
mgr: cephmon-01.dsxcho(active, since 6d), standbys: cephmon-02.nssigg, cephmon-03.rgefle
    mds: 1/1 daemons up, 5 standby
    osd: 336 osds: 336 up (since 2M), 336 in (since 4M)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 6401 pgs
    objects: 284.68M objects, 623 TiB
    usage:   890 TiB used, 3.1 PiB / 3.9 PiB avail
    pgs:     6206 active+clean
             140  active+clean+scrubbing
             55   active+clean+scrubbing+deep

  io:
    client:   3.9 MiB/s rd, 84 B/s wr, 482 op/s rd, 1.11k op/s wr


[root@ceph01-b ~]# ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active default.cephmon-03.xcujhz Reqs: 0 /s 124k 60.3k 1993 0
         POOL            TYPE     USED  AVAIL
ssd-rep-metadata-pool  metadata   298G  63.4T
  sdd-rep-data-pool      data    10.2T  84.5T
   hdd-ec-data-pool      data     808T  1929T
       STANDBY MDS
default.cephmon-01.cepqjp
default.cephmon-01.pvnqad
default.cephmon-02.duujba
default.cephmon-02.nyfook
default.cephmon-03.chjusj
MDS version: ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)


The msd log however shows some "bad backtrace on directory inode" messages:

2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8082 from mon.1 2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:standby --> up:replay
2024-06-25T18:45:36.575+0000 7f8594659700  1 mds.0.8082 replay_start
2024-06-25T18:45:36.575+0000 7f8594659700 1 mds.0.8082 waiting for osdmap 34331 (which blocklists prior instance) 2024-06-25T18:45:36.581+0000 7f858de4c700 0 mds.0.cache creating system inode with ino:0x100 2024-06-25T18:45:36.581+0000 7f858de4c700 0 mds.0.cache creating system inode with ino:0x1
2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.journal EResetJournal
2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.sessionmap wipe start
2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.sessionmap wipe result
2024-06-25T18:45:36.589+0000 7f858ce4a700  1 mds.0.sessionmap wipe done
2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.8082 Finished replaying journal 2024-06-25T18:45:36.589+0000 7f858ce4a700 1 mds.0.8082 making mds journal writeable 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8083 from mon.1 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:37.578+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:replay --> up:reconnect
2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 reconnect_start
2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 reopen_log
2024-06-25T18:45:37.578+0000 7f8594659700  1 mds.0.8082 reconnect_done
2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8084 from mon.1 2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:38.579+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:reconnect --> up:rejoin
2024-06-25T18:45:38.579+0000 7f8594659700  1 mds.0.8082 rejoin_start
2024-06-25T18:45:38.583+0000 7f8594659700  1 mds.0.8082 rejoin_joint_start
2024-06-25T18:45:38.592+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e42340 2024-06-25T18:45:38.680+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d8b 2024-06-25T18:45:38.754+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d90 2024-06-25T18:45:38.754+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x10003e45d9f
2024-06-25T18:45:38.785+0000 7f858fe50700  1 mds.0.8082 rejoin_done
2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.default.cephmon-03.xcujhz Updating MDS map to version 8085 from mon.1 2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.0.8082 handle_mds_map i am now mds.0.8082 2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.0.8082 handle_mds_map state change up:rejoin --> up:active 2024-06-25T18:45:39.582+0000 7f8594659700 1 mds.0.8082 recovery_done -- successful recovery!
2024-06-25T18:45:39.584+0000 7f8594659700  1 mds.0.8082 active_start
2024-06-25T18:45:39.585+0000 7f8594659700  1 mds.0.8082 cluster recovered.
2024-06-25T18:45:42.409+0000 7f8591e54700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request. 2024-06-25T18:57:28.213+0000 7f858e64d700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x4


Is there anything that we can do about this, to get rid of the "bad backtrace on directory inode"?


Sone more question:

1.
As Xiubo suggested, we now tried to mount the filesystem with the "nowsysnc" option <https://tracker.ceph.com/issues/61009#note-26>:

[root@ceph01-b ~]# mount -t ceph cephfs_user@.cephfs=/ /mnt/cephfs -o secretfile=/etc/ceph/ceph.client.cephfs_user.secret,nowsync

however the option seems not to show up in /proc/mounts

[root@ceph01-b ~]# grep ceph /proc/mounts
cephfs_user@aae23c5c-a98b-11ee-b44d-00620b05cac4.cephfs=/ /mnt/cephfs ceph rw,relatime,name=cephfs_user,secret=<hidden>,ms_mode=prefer-crc,acl,mon_addr=10.1.3.21:3300/10.1.3.22:3300/10.1.3.23:3300 0 0

The kernel version is 5.14.0 (from Rocky 9.3)

[root@ceph01-b ~]# uname -a
Linux ceph01-b 5.14.0-362.24.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 13 17:33:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Is this expected? How can we make sure that the filesystem uses 'nowsync', so that we do not hit the bug <https://tracker.ceph.com/issues/61009> again?


2.
There are two empty files in lost+found now. Is ist save to remove them?

[root@ceph01-b lost+found]# ls -la
total 0
drwxr-xr-x 2 root root 1 Jan  1  1970 .
drwxr-xr-x 4 root root 2 Mar 13 21:22 ..
-r-x------ 1 root root 0 Jun 20 23:50 100037a50e2
-r-x------ 1 root root 0 Jun 20 19:05 200049612e5

3.
Are there any specific steps that we should perform now (scrub or similar things) before we put the filesystem into production again?


Best & thanks again
  Dietmar

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux