Re: PG reported as inconsistent in status, but no inconsistencies visible to rados

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Forgot to send to the list with the first reply.

I'm honestly not exactly sure when it happened.  I hadn't looked at ceph status in several days prior to discovering the issue and submitting to the mailing list.  I've seen one or two inconsistent pg issues randomly crop up in the month or so since these nodes were spun up, but nothing I couldn't resolve.

There was an issue with one of the Proxmox VE nodes that store VM data in the ceph cluster.  A network driver issue that caused the NIC to be disabled.  That was a week or two ago, and has since been resolved.  While the problematic PG is in the pool used by Proxmox, I wouldn't expect the above problem would be able to cause store-level corruption on the OSDs.

Other than that, nothing of interest has happened that I'm aware of, though I don't yet have good monitoring on these nodes.

I'll put something in the tracker later today.

Thank you for your help.

-----Original Message-----
From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx] 
Sent: Wednesday, August 23, 2017 4:44 AM
To: Edward R Huyer <erhvks@xxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  PG reported as inconsistent in status, but no inconsistencies visible to rados

On Wed, Aug 23, 2017 at 12:47 AM, Edward R Huyer <erhvks@xxxxxxx> wrote:
> Neat, hadn't seen that command before.  Here's the fsck log from the 
> primary OSD:  https://pastebin.com/nZ0H5ag3
>
> Looks like the OSD's bluestore "filesystem" itself has some underlying errors, though I'm not sure what to do about them.

Hmmm... Can you tell us any more about how/when this happened?

Any corresponding event at all? Any interesting log entries around the same time?

Could you also open a tracker for this (or let me know and I can open one for you)? That way we can continue the investigation there.

>
> -----Original Message-----
> From: Brad Hubbard [mailto:bhubbard@xxxxxxxxxx]
> Sent: Monday, August 21, 2017 7:05 PM
> To: Edward R Huyer <erhvks@xxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  PG reported as inconsistent in status, but 
> no inconsistencies visible to rados
>
> Could you provide the output of 'ceph-bluestore-tool fsck' for one of these OSDs?
>
> On Tue, Aug 22, 2017 at 2:53 AM, Edward R Huyer <erhvks@xxxxxxx> wrote:
>> This is an odd one.  My cluster is reporting an inconsistent pg in 
>> ceph status and ceph health detail.  However, rados 
>> list-inconsistent-obj and rados list-inconsistent-snapset both report 
>> no inconsistencies.  Scrubbing the pg results in these errors in the osd logs:
>>
>>
>>
>> OSD 63 (primary):
>>
>> 2017-08-21 12:41:03.580068 7f0b36629700 -1
>> bluestore(/var/lib/ceph/osd/ceph-63) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x6b6b9184, expected 0x6706be76, 
>> device location [0x23f39d0000~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e#
>>
>> 2017-08-21 12:41:03.961945 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa soid 9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e:
>> failed to pick suitable object info
>>
>> 2017-08-21 12:41:15.357484 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa deep-scrub 3 errors
>>
>>
>>
>> OSD 50:
>>
>> 2017-08-21 12:41:03.592918 7f264be6d700 -1
>> bluestore(/var/lib/ceph/osd/ceph-50) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x64a1e2b1, expected 0x6706be76, 
>> device location [0x3418830000~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e#
>>
>>
>>
>> OSD 46:
>>
>> 2017-08-21 12:41:03.531394 7fb396b1f700 -1
>> bluestore(/var/lib/ceph/osd/ceph-46) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x7aa05c01, expected 0x6706be76, 
>> device location [0x1d6e1e0000~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.000000000000200f:e#
>>
>>
>>
>> This is on Ceph 12.1.4 (previously 12.1.1).
>>
>>
>>
>> Thoughts?
>>
>>
>>
>> -----
>>
>> Edward Huyer
>>
>> School of Interactive Games and Media
>>
>> Rochester Institute of Technology
>>
>> Golisano 70-2373
>>
>> 152 Lomb Memorial Drive
>>
>> Rochester, NY 14623
>>
>> 585-475-6651
>>
>> erhvks@xxxxxxx
>>
>>
>>
>> Obligatory Legalese:
>>
>> The information transmitted, including attachments, is intended only 
>> for the
>> person(s) or entity to which it is addressed and may contain 
>> confidential and/or privileged material. Any review, retransmission, 
>> dissemination or other use of, or taking of any action in reliance 
>> upon this information by persons or entities other than the intended 
>> recipient is prohibited. If you received this in error, please 
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Cheers,
> Brad



--
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux