Re: How to recover from active+clean+inconsistent+failed_repair?

Sagara Wijetunga <sagarawmw@xxxxxxxxx> · Tue, 3 Nov 2020 09:39:18 +0000 (UTC)

 Hi Frank
1. We will disable the disk controller and disk-level caching to avoid future issues.
2. My pools are:
ceph osd lspools
    2 cephfs_metadata
    3 cephfs_data
    4 rbd
The PG now inconsistent is 3.b,  therefore, it belongs to cephfs_data pool.
Following also shows the PG 3.b belongs to cephfs_data:
ceph pg ls-by-pool cephfs_data | grep 3.b
3.b     6992        0         0       0  9649392528           0          0 3005 active+clean+inconsistent   ...

3. Deep scrubs shows only one object having an issue: soid 3:d577e975:::1000023675e.00000000
This object seems lost.
rados -p cephfs_metadata ls | grep 1000023675e.00000000
rados -p cephfs_data ls | grep 1000023675e.00000000
rados -p rbd ls | grep 1000023675e.00000000

4. I tried to find what are the files effected by this issue, but I get "No such file or directory" for the path. I have properly mounted ceph on home as before.
cephfs-data-scan -c /etc/ceph/ceph.conf pg_files /home/sagara 3.b2020-11-03T17:06:21.770+0800 7f3f213ab100 -1 pgeffects.hit_dir: Failed to open path: (2) No such file or directory
How do I see what are the files effected by this issue?

5. What should be the course of the action now to bring the cluster to "active+clean" to move forward? I don't mind roll back the PG having the issue. I have a file-level backup. If roll back the PG is the way forward, how to do?

Thank you.
Best regards
Sagara

    On Monday, November 2, 2020, 11:29:55 PM GMT+8, Frank Schilder <frans@xxxxxx> wrote:  

 > But there can be a on chip disk controller on the motherboard, I'm not sure.

There is always some kind of controller. Could be on-board. Usually, the cache settings are accessible when booting into the BIOS set-up.

> If your worry is fsync persistence

No, what I worry about is volatile write cache, which is usually enabled by default. This cache exists on disk as well as on controller. To avoid loosing writes on power fail, the controller needs to be in write-through mode and the disk write cache disabled. The latter can be done with smartctl, the former in the BIOS setup.

Did you test power failure? If so, how often? On how many hosts simultaneously? Pulling network cables will not trigger cache related problems. The problem with write cache is, that you rely on a lot of bells and whistles where some usually fail. With ceph, this will lead to exactly the problem you are observing now.

Your pool configuration looks OK. You need to find out where exactly the scrub errors are situated. It looks like meta-data damage and you might loose some data. Be careful to do only read-only admin operations for now.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx