Re: How to recover from active+clean+inconsistent+failed_repair?

Frank Schilder <frans@xxxxxx> · Mon, 2 Nov 2020 15:29:45 +0000

> But there can be a on chip disk controller on the motherboard, I'm not sure.

There is always some kind of controller. Could be on-board. Usually, the cache settings are accessible when booting into the BIOS set-up.

> If your worry is fsync persistence

No, what I worry about is volatile write cache, which is usually enabled by default. This cache exists on disk as well as on controller. To avoid loosing writes on power fail, the controller needs to be in write-through mode and the disk write cache disabled. The latter can be done with smartctl, the former in the BIOS setup.

Did you test power failure? If so, how often? On how many hosts simultaneously? Pulling network cables will not trigger cache related problems. The problem with write cache is, that you rely on a lot of bells and whistles where some usually fail. With ceph, this will lead to exactly the problem you are observing now.

Your pool configuration looks OK. You need to find out where exactly the scrub errors are situated. It looks like meta-data damage and you might loose some data. Be careful to do only read-only admin operations for now.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Sagara Wijetunga <sagarawmw@xxxxxxxxx>
Sent: 02 November 2020 16:08:58
To: ceph-users@xxxxxxx; Frank Schilder
Subject: Re:  Re: How to recover from active+clean+inconsistent+failed_repair?

> Hmm, I'm getting a bit confused. Could you also send the output of "ceph osd pool ls detail".

File ceph-osd-pool-ls-detail.txt attached.

> Did you look at the disk/controller cache settings?

I don't have disk controllers on Ceph machines. The hard disk is directly attached to the motherboard via SATA cable. But there can be a on chip disk controller on the motherboard, I'm not sure.

If your worry is fsync persistence, I have thoroughly tested database fsync reliability on Ceph RBD with hundreds of transactions per second and remove network cable and restart the database machine, etc. while inserts going on. and I did not lose a single transaction. I simulated this many times and persistence on my Ceph cluster was perfect (i.e not a single loss).

> I think you should start a deep-scrub with "ceph pg deep-scrub 3.b" and record the output of "ceph -w | grep '3\.b'" (note the single quotes).

> The error messages you included in one of your first e-mails are only on 1 out of 3 scrub errors (3 lines for 1 error). We need to find all 3 errors.

I ran again the "ceph pg deep-scrub 3.b", here is the whole output of ceph -w:

2020-11-02 22:33:48.224392 osd.0 [ERR] 3.b shard 2 soid 3:d577e975:::1000023675e.00000000:head : candidate had a missing snapset key, candidate had a missing info key

2020-11-02 22:33:48.224396 osd.0 [ERR] 3.b soid 3:d577e975:::1000023675e.00000000:head : failed to pick suitable object info

2020-11-02 22:35:30.087042 osd.0 [ERR] 3.b deep-scrub 3 errors

Btw, I'm very grateful for your perseverance on this.

Best regards

Sagara

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx