Re: Odd single VM ceph error

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 15 Jan 2016 12:28:32 -0500 (EST)

On Thu, 14 Jan 2016, Robert LeBlanc wrote:
> We have a single VM that is acting odd. We had 7 SSD OSDs (out of 40) go
> down over a period of about 12 hours. These are a cache tier and have size
> 4, min_size 2. I'm not able to make heads or tails of the error and hoped
> someone here could help.
> 
> 2016-01-14 23:09:54.559121 osd.136 [ERR] 13.503 copy from
> f8bedd03/rbd_data.48a6325f5e3f87.000000000000683d/head//13 to
> f8bedd03/rbd_data.48a6325f5e3f87.000000000000683d/head//13 data digest
> 0x92bc163c != source 0x8fe2d0a9
> 
> The PG fully recovered then the error was
> 
> 2016-01-15 00:39:25.321469 osd.12 [ERR] 13.503 copy from
> f8bedd03/rbd_data.48a6325f5e3f87.000000000000683d/head//13 to
> f8bedd03/rbd_data.48a6325f5e3f87.000000000000683d/head//13 data digest
> 0x92bc163c != source 0x8fe2d0a9
> 
> A deep scrub of the PG comes back clean and a hash of the files on all OSDs
> match. The file system on this vm keeps going read only.
> 
> The osd file system is EXT4 and this is 0.94.5.

You're using cache tiering I take it?  I think the error is in the base 
tier, while the PG mentioned is the cache tier.

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com