Re: Object read error - enough copies available

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Thu, 30 May 2019 17:42:51 +0200

Am 30.05.19 um 17:00 schrieb Oliver Freyermuth:
> Dear Cephalopodians,
> 
> I found the messages:
>  2019-05-30 16:08:51.656363 [ERR]  Error -5 reading object 2:0979ae43:::10002954ea6.0000007c:head
>  2019-05-30 16:08:51.760660 [WRN]  Error(s) ignored for 2:0979ae43:::10002954ea6.0000007c:head enough copies available 
> just now in our logs (Mimic 13.2.5). However, everything stayed HEALTH_OK and seems fine. Pool 2 is an EC pool containing CephFS. 
> 
> Up to now I've never had to delve into the depths of RADOS, so I have some questions. If there are docs and I missed them, just redirect me :-). 
> 
> - How do I find the OSDs / PG for that object (is the PG contained in the name?)
>   I'd love to check SMART in more detail and deep-scrub that PG to see if this was just a hiccup, or a permanent error. 

I've progressed - and put it on the list in the hope it can also help others:
# ceph osd map cephfs_data 10002954ea6.0000007c
osdmap e40907 pool 'cephfs_data' (2) object '10002954ea6.0000007c' -> pg 2.c2759e90 (2.e90) -> up ([196,101,14,156,47,177], p196) acting ([196,101,14,156,47,177], p196)
# ceph pg deep-scrub 2.e90
instructing pg 2.e90s0 on osd.196 to deep-scrub

Checking the OSD logs (osd 196), I find:
-----------------------------------------
2019-05-30 16:08:51.759 7f46b36ac700  0 log_channel(cluster) log [WRN] : Error(s) ignored for 2:0979ae43:::10002954ea6.0000007c:head enough copies available
2019-05-30 17:13:39.817 7f46b36ac700  0 log_channel(cluster) log [DBG] : 2.e90 deep-scrub starts
2019-05-30 17:19:51.013 7f46b36ac700 -1 log_channel(cluster) log [ERR] : 2.e90 shard 14(2) soid 2:0979ae43:::10002954ea6.0000007c:head : candidate had a read error
2019-05-30 17:23:52.360 7f46b36ac700 -1 log_channel(cluster) log [ERR] : 2.e90s0 deep-scrub 0 missing, 1 inconsistent objects
2019-05-30 17:23:52.360 7f46b36ac700 -1 log_channel(cluster) log [ERR] : 2.e90 deep-scrub 1 errors
-----------------------------------------
And now, the cluster is in HEALTH_ERR as expected. So that would probably have happened automatically after a while - wouldn't it be better to alert the operator immediately,
e.g. by scheduling an immediate deep-scrub after a read-error?

I presume "shard 14(2)" means: "Shard on OSD 14, third (index 2) in the acting set". Correct? 

Checking that OSDs logs, I do indeed find:
-----------------------------------------
2019-05-30 16:08:51.566 7f2e7dc15700 -1 bdev(0x55ae2eade000 /var/lib/ceph/osd/ceph-14/block) _aio_thread got r=-5 ((5) Input/output error)
2019-05-30 16:08:51.566 7f2e7dc15700 -1 bdev(0x55ae2eade000 /var/lib/ceph/osd/ceph-14/block) _aio_thread translating the error to EIO for upper layer
2019-05-30 16:08:51.655 7f2e683ea700 -1 log_channel(cluster) log [ERR] : Error -5 reading object 2:0979ae43:::10002954ea6.0000007c:head
-----------------------------------------
The underlying disk has one problematic sector in SMART. Issuing:
# ceph pg repair 2.e90
has triggered rewriting that sector and allowed the disk to reallocate that sector, and Ceph is HEALTH_OK again. 

So my issue is solved, but two questions remain:
- Is it wanted that the error is "ignored" until the next deep-scrub happens? 

- Is there also a way to map the object name to a CephFS file object and vice-versa? 
  In one direction (file / inode to object), it seems this approach should work:
  http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005384.html

Cheers and thanks,
	Oliver

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com