Re: PG inconsistency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



IIRC, the EIO we had also correlated with a SMART status that showed the disk was bad enough for a warranty replacement -- so yes, I replaced the disk in these cases.

Cheers, Dan

On Thu Nov 06 2014 at 2:44:08 PM GuangYang <yguang11@xxxxxxxxxxx> wrote:
Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue).

In our case, we are using 6TB drive so that there are lot of data to migrate and as backfilling/recovering bring latency increasing, we hope to avoid that as much as we can..

Thanks,
Guang

________________________________
> From: daniel.vanderster@xxxxxxx
> Date: Thu, 6 Nov 2014 13:36:46 +0000
> Subject: Re: PG inconsistency
> To: yguang11@xxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
>
> Hi,
> I've only ever seen (1), EIO to read a file. In this case I've always
> just killed / formatted / replaced that OSD completely -- that moves
> the PG to a new master and the new replication "fixes" the
> inconsistency. This way, I've never had to pg repair. I don't know if
> this is a best or even good practise, but it works for us.
> Cheers, Dan
>
> On Thu Nov 06 2014 at 2:24:32 PM GuangYang
> <yguang11@xxxxxxxxxxx<mailto:yguang11@xxxxxxxxxxx>> wrote:
> Hello Cephers,
> Recently we observed a couple of inconsistencies in our Ceph cluster,
> there were two major patterns leading to inconsistency as I observed:
> 1) EIO to read the file, 2) the digest is inconsistent (for EC) even
> there is no read error).
>
> While ceph has built-in tool sets to repair the inconsistencies, I also
> would like to check with the community in terms of what is the best
> ways to handle such issues (e.g. should we run fsck / xfs_repair when
> such issue happens).
>
> In more details, I have the following questions:
> 1. When there is inconsistency detected, what is the chance there is
> some hardware issues which need to be repaired physically, or should I
> run some disk/filesystem tools to further check?
> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or
> should we solely relay on Ceph's repair tool sets?
>
> It would be great to hear you experience and suggestions.
>
> BTW, we are using XFS in the cluster.
>
> Thanks,
> Guang
                                         
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux