Re: [External Email] Re: ceph-objectstore-tool core dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/4/21 11:57 AM, Dave Hall wrote:
> I also had a delay on the start of the repair scrub when I was dealing with
> this issue.  I ultimately increased the number of simultaneous scrubs, but
> I think you could also temporarily disable scrubs and then re-issue the 'pg
> repair'.  (But I'm not one of the experts on this.)
> 
> My perception is that between EC pools, large HDDs, and the overall OSD
> count, there might need to be some tuning to assure that scrubs can get
> scheduled:  A large HDD contains pieces of more PGs.  Each PG in an EC pool
> is spread across more disks than a replication pool.  Thus, especially if
> the number of OSDs is not large, there is an increased chance that more
> than one scrub will want to read the same OSD.   Scheduling nightmare if
> the number of simultaneous scrubs is low and client traffic is given
> priority.
> 
> -Dave

That seemed to be the case.  After ~24 hours, 1 of the 8 repair tasks
had completed.  Unfortunately, it found another error that wasn't
present before.

After checking the SMART logs, it looks like this particular disk is
failing.  No sense in pursuing this any further; I'll be replacing it
with a spare instead.

I'll look into disabling scrubs the next time I need to schedule a
repair.  Hopefully it will run the repair jobs a bit sooner.

Regards,

--Mike
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux