Re: 1 pg inconsistent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Yeap, osd.24 has experienced a read error. If you check the system log on osd.24 host, you'll probably find some relevant kernel messages about SATA errors. The LBA sector that triggered the read error is logged in the kernel messages.

If you run a smartctl -x on the osd.24 SATA disk device you'll probably find out the disk has started accumulating "pending sectors" or has even increased the "relocated sector count" significantly. It may also have significantly higher "multi zone error rate" than other disks of the same age.

Immediate action would be to decommission the disk.

Once decouple from ceph, you may run the disk vendor's diagnostics suite on the drive and it may come clean in the end, but even so, I'd use it for non-critical staff thereafter.

Best regards.



On 17/07/2020 12.38, Abhimnyu Dhobale wrote:
Thanks For your reply,

Please find the below output and suggest.

[root@vpsapohmcs01 ~]# rados -p vpsacephcl01 list-inconsistent-obj 1.3c9
--format=json-pretty
{
     "epoch": 845,
     "inconsistents": [
         {
             "object": {
                 "name": "rbd_data.515c96b8b4567.000000000000c377",
                 "nspace": "",
                 "locator": "",
                 "snap": "head",
                 "version": 21101
             },
             "errors": [],
             "union_shard_errors": [
                 "read_error"
             ],
             "selected_object_info": {
                 "oid": {
                     "oid": "rbd_data.515c96b8b4567.000000000000c377",
                     "key": "",
                     "snapid": -2,
                     "hash": 867656649,
                     "max": 0,
                     "pool": 1,
                     "namespace": ""
                 },
                 "version": "853'21101",
                 "prior_version": "853'21100",
                 "last_reqid": "client.2317742.0:24909022",
                 "user_version": 21101,
                 "size": 4194304,
                 "mtime": "2020-07-16 21:02:20.564245",
                 "local_mtime": "2020-07-16 21:02:20.572003",
                 "lost": 0,
                 "flags": [
                     "dirty",
                     "omap_digest"
                 ],
                 "truncate_seq": 0,
                 "truncate_size": 0,
                 "data_digest": "0xffffffff",
                 "omap_digest": "0xffffffff",
                 "expected_object_size": 4194304,
                 "expected_write_size": 4194304,
                 "alloc_hint_flags": 0,
                 "manifest": {
                     "type": 0
                 },
                 "watchers": {}
             },
             "shards": [
                 {
                     "osd": 5,
                     "primary": false,
                     "errors": [],
                     "size": 4194304,
                     "omap_digest": "0xffffffff",
                     "data_digest": "0x8ebd7de4"
                 },
                 {
                     "osd": 19,
                     "primary": true,
                     "errors": [],
                     "size": 4194304,
                     "omap_digest": "0xffffffff",
                     "data_digest": "0x8ebd7de4"
                 },
                 {
                     "osd": 24,
                     "primary": false,
                     "errors": [
                         "read_error"
                     ],
                     "size": 4194304
                 }
             ]
         }
     ]
}



On Tue, Jul 14, 2020 at 6:40 PM Eric Smith <Eric.Smith@xxxxxxxxxx> wrote:

If you run (Substitute your pool name for <pool>):

rados -p <pool> list-inconsistent-obj 1.574 --format=json-pretty

You should get some detailed information about which piece of data
actually has the error and you can determine what to do with it from there.

-----Original Message-----
From: Abhimnyu Dhobale <adhobale8@xxxxxxxxx>
Sent: Tuesday, July 14, 2020 5:13 AM
To: ceph-users@xxxxxxx
Subject:  1 pg inconsistent

Good Day,

Ceph is showing below error frequently. every time after pg repair it is
resolved.

[root@vpsapohmcs01 ~]# ceph health detail HEALTH_ERR 1 scrub errors;
Possible data damage: 1 pg inconsistent OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.574 is
active+clean+inconsistent, acting [19,25,2]

[root@vpsapohmcs02 ~]# cat /var/log/ceph/ceph-osd.19.log | grep error
2020-07-12 11:42:11.824 7f864e0b2700 -1 log_channel(cluster) log [ERR] :
1.574 shard 25 soid
1:2ea0a7a3:::rbd_data.515c96b8b4567.0000000000007a7c:head : candidate had
a read error
2020-07-12 11:42:15.035 7f86520ba700 -1 log_channel(cluster) log [ERR] :
1.574 deep-scrub 1 errors

[root@vpsapohmcs01 ~]# ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic
(stable)

Request you to please suggest.

--
Thanks & Regards
Abhimnyu Dhobale
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux