Re: OSD does not die when disk has failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Robert,

One of the theoretically possible (but not implemented in Ceph)
benefits of not crashing would be that an OSD could request the
errored piece of data from other OSDs and rewrite the data on the disk
in place. When a defective sector is rewritten, most disks and SSDs
mark the original one as still bad but reassign a spare to serve in
its place. The end result is that the block device no longer has bad
sectors visible to applications. Doing so, instead of just throwing an
SSD with just one defective block into the trash can, could reduce the
amount of digital waste. Note that this is not a good approach for
HDDs, where defects tend to multiply.

Source: I still have an OCZ Intrepid 3700 SSD with 18 remapped
sectors. All of them appeared during a misguided test through a
USB-to-SATA adapter, which apparently could not provide enough power.
Eight years later, it still works and still has only these 18 remapped
sectors.

Anyway, all of the above is of only theoretical importance, as the
code to hide/cure disk defects that way does not exist.

On Thu, Mar 21, 2024 at 5:15 AM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
>
> Hi Robert,
>
> I presume the plan was to support handling EIO at upper layers. But
> apparently that hasn't been completed. Or there are some bugs...
>
> Will take a look.
>
>
> Thanks,
>
> Igor
>
> On 3/19/2024 3:36 PM, Robert Sander wrote:
> > Hi,
> >
> > On 3/19/24 13:00, Igor Fedotov wrote:
> >>
> >> translating EIO to upper layers rather than crashing an OSD is a
> >> valid default behavior. One can alter this by setting
> >> bluestore_fail_eio parameter to true.
> >
> > What benefit lies in this behavior when in the end client IO stalls?
> >
> > Regards
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux