Re: raid6check extremely slow ?

Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx> · Tue, 12 May 2020 18:11:51 +0200

On Tue, May 12, 2020 at 04:27:59PM +1000, Adam Goryachev wrote:
> 
> On 12/5/20 11:52, Giuseppe Bilotta wrote:
> > On Mon, May 11, 2020 at 11:16 PM Guoqing Jiang
> > <guoqing.jiang@xxxxxxxxxxxxxxx> wrote:
> > > On 5/11/20 11:12 PM, Guoqing Jiang wrote:
> > > > On 5/11/20 10:53 PM, Giuseppe Bilotta wrote:
> > > > > Would it be possible/effective to lock multiple stripes at once? Lock,
> > > > > say, 8 or 16 stripes, process them, unlock. I'm not familiar with the
> > > > > internals, but if locking is O(1) on the number of stripes (at least
> > > > > if they are consecutive), this would help reduce (potentially by a
> > > > > factor of 8 or 16) the costs of the locks/unlocks at the expense of
> > > > > longer locks and their influence on external I/O.
> > > > > 
> > > > Hmm, maybe something like.
> > > > 
> > > > check_stripes
> > > > 
> > > >      -> mddev_suspend
> > > > 
> > > >      while (whole_stripe_num--) {
> > > >          check each stripe
> > > >      }
> > > > 
> > > >      -> mddev_resume
> > > > 
> > > > 
> > > > Then just need to call suspend/resume once.
> > > But basically, the array can't process any new requests when checking is
> > Yeah, locking the entire device might be excessive (especially if it's
> > a big one). Using a granularity larger than 1 but smaller than the
> > whole device could be a compromise. Since the “no lock” approach seems
> > to be about an order of magnitude faster (at least in Piergiorgio's
> > benchmark), my guess was that something between 8 and 16 could bring
> > the speed up to be close to the “no lock” case without having dramatic
> > effects on I/O. Reading all 8/16 stripes before processing (assuming
> > sufficient memory) might even lead to better disk utilization during
> > the check.
> 
> I know very little about this, but could you perhaps lock 2 x 16 stripes,
> and then after you complete the first 16, release the first 16, lock the 3rd
> 16 stripes, and while waiting for the lock continue to process the 2nd set
> of 16?

For some reason I don not know, the unlock
is global.
If I recall correctly, this was the way
Neil mentioned is "more" correct.

> Would that allow you to do more processing and less waiting for
> lock/release?

I think the general concept of pipelineing
is good, this would really improve the
performances of the whole thing.
If we could just multithread, I suspect
it could improve.

We need to solve the unlock problem...

bye,

> 
> Regards,
> Adam

-- 

piergiorgio