Re: How to recover from active+clean+inconsistent+failed_repair?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
 > Hmm, I'm getting a bit confused. Could you also send the output of "ceph osd pool ls detail".

File ceph-osd-pool-ls-detail.txt attached.

> Did you look at the disk/controller cache settings?
I don't have disk controllers on Ceph machines. The hard disk is directly attached to the motherboard via SATA cable. But there can be a on chip disk controller on the motherboard, I'm not sure. 
If your worry is fsync persistence, I have thoroughly tested database fsync reliability on Ceph RBD with hundreds of transactions per second and remove network cable and restart the database machine, etc. while inserts going on. and I did not lose a single transaction. I simulated this many times and persistence on my Ceph cluster was perfect (i.e not a single loss).

> I think you should start a deep-scrub with "ceph pg deep-scrub 3.b" and record the output of "ceph -w | grep '3\.b'" (note the single quotes).

> The error messages you included in one of your first e-mails are only on 1 out of 3 scrub errors (3 lines for 1 error). We need to find all 3 errors.

I ran again the "ceph pg deep-scrub 3.b", here is the whole output of ceph -w:

2020-11-02 22:33:48.224392 osd.0 [ERR] 3.b shard 2 soid 3:d577e975:::1000023675e.00000000:head : candidate had a missing snapset key, candidate had a missing info key

2020-11-02 22:33:48.224396 osd.0 [ERR] 3.b soid 3:d577e975:::1000023675e.00000000:head : failed to pick suitable object info

2020-11-02 22:35:30.087042 osd.0 [ERR] 3.b deep-scrub 3 errors

Btw, I'm very grateful for your perseverance on this.

Best regards
Sagara

  
ceph osd pool ls detail
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 4051 lfor 0/0/3797 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 target_size_ratio 0.8 application cephfs

pool 3 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3736 lfor 0/3266/3582 flags hashpspool stripe_width 0 target_size_ratio 0.8 application cephfs

pool 4 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 4156 lfor 0/4156/4154 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.8 application rbd
	removed_snaps [1~3]
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux