Yeap, osd.24 has experienced a read error. If you check the system log
on osd.24 host, you'll probably find some relevant kernel messages about
SATA errors. The LBA sector that triggered the read error is logged in
the kernel messages.
If you run a smartctl -x on the osd.24 SATA disk device you'll probably
find out the disk has started accumulating "pending sectors" or has even
increased the "relocated sector count" significantly. It may also have
significantly higher "multi zone error rate" than other disks of the
same age.
Immediate action would be to decommission the disk.
Once decouple from ceph, you may run the disk vendor's diagnostics suite
on the drive and it may come clean in the end, but even so, I'd use it
for non-critical staff thereafter.
Best regards.
On 17/07/2020 12.38, Abhimnyu Dhobale wrote:
Thanks For your reply,
Please find the below output and suggest.
[root@vpsapohmcs01 ~]# rados -p vpsacephcl01 list-inconsistent-obj 1.3c9
--format=json-pretty
{
"epoch": 845,
"inconsistents": [
{
"object": {
"name": "rbd_data.515c96b8b4567.000000000000c377",
"nspace": "",
"locator": "",
"snap": "head",
"version": 21101
},
"errors": [],
"union_shard_errors": [
"read_error"
],
"selected_object_info": {
"oid": {
"oid": "rbd_data.515c96b8b4567.000000000000c377",
"key": "",
"snapid": -2,
"hash": 867656649,
"max": 0,
"pool": 1,
"namespace": ""
},
"version": "853'21101",
"prior_version": "853'21100",
"last_reqid": "client.2317742.0:24909022",
"user_version": 21101,
"size": 4194304,
"mtime": "2020-07-16 21:02:20.564245",
"local_mtime": "2020-07-16 21:02:20.572003",
"lost": 0,
"flags": [
"dirty",
"omap_digest"
],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": "0xffffffff",
"omap_digest": "0xffffffff",
"expected_object_size": 4194304,
"expected_write_size": 4194304,
"alloc_hint_flags": 0,
"manifest": {
"type": 0
},
"watchers": {}
},
"shards": [
{
"osd": 5,
"primary": false,
"errors": [],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x8ebd7de4"
},
{
"osd": 19,
"primary": true,
"errors": [],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x8ebd7de4"
},
{
"osd": 24,
"primary": false,
"errors": [
"read_error"
],
"size": 4194304
}
]
}
]
}
On Tue, Jul 14, 2020 at 6:40 PM Eric Smith <Eric.Smith@xxxxxxxxxx> wrote:
If you run (Substitute your pool name for <pool>):
rados -p <pool> list-inconsistent-obj 1.574 --format=json-pretty
You should get some detailed information about which piece of data
actually has the error and you can determine what to do with it from there.
-----Original Message-----
From: Abhimnyu Dhobale <adhobale8@xxxxxxxxx>
Sent: Tuesday, July 14, 2020 5:13 AM
To: ceph-users@xxxxxxx
Subject: 1 pg inconsistent
Good Day,
Ceph is showing below error frequently. every time after pg repair it is
resolved.
[root@vpsapohmcs01 ~]# ceph health detail HEALTH_ERR 1 scrub errors;
Possible data damage: 1 pg inconsistent OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.574 is
active+clean+inconsistent, acting [19,25,2]
[root@vpsapohmcs02 ~]# cat /var/log/ceph/ceph-osd.19.log | grep error
2020-07-12 11:42:11.824 7f864e0b2700 -1 log_channel(cluster) log [ERR] :
1.574 shard 25 soid
1:2ea0a7a3:::rbd_data.515c96b8b4567.0000000000007a7c:head : candidate had
a read error
2020-07-12 11:42:15.035 7f86520ba700 -1 log_channel(cluster) log [ERR] :
1.574 deep-scrub 1 errors
[root@vpsapohmcs01 ~]# ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic
(stable)
Request you to please suggest.
--
Thanks & Regards
Abhimnyu Dhobale
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx