bluestore behavior on disks sector read errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

Every now and then , sectors die on disks.

When this happens on my bluestore (kraken) OSDs, I get 1 PG that becomes degraded.

The exact status is :

 

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

pg 12.127 is active+clean+inconsistent, acting [141,67,85]

 

If I do a # rados list-inconsistent-obj 12.127 --format=json-pretty

I get :

(…)

                    "osd": 112,

                    "errors": [

                        "read_error"

                    ],

                    "size": 4194304

 

When this happens, I’m forced to manually run “ceph pg repair” on the inconsistent PGs after I made sure this was a read error : I feel this should not be a manual process.

 

If I go on the machine and look at the syslogs, I indeed see a sector read error happened once or twice.

But if I try to read the sector manually, then I can because it was reallocated on the disk I presume.

Last time this happened, I ran badblocks on the disk and it found no issue…

 

My question therefore are :

 

why doen’t bluestore retry reading the sector (in case of transient errors) ? (maybe it does)

why isn’t the pg automatically fixed when a read error was detected ?

what will happen when the disks get old and reach up to 2048 bad sectors before the controllers/smart declare them as “failure predicted” ?

I can’t imagine manually fixing  up to Nx2048 PGs in an infrastructure of N disks where N could reach the sky…

 

Ideas ?

 

Thanks && regards

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux