Re: Raid10 and page cache

NeilBrown <neilb@xxxxxxx> · Wed, 7 Dec 2011 16:10:03 +1100

On Tue, 6 Dec 2011 20:50:48 -0800 Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx>
wrote:

> I'm not sure whether it is what I mean,  to illustrate my problem let
> me put iostat -x -d 1 output  as below
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sdb               0.00     0.00  163.00    1.00  1304.00     8.00
> 8.00     0.26    1.59   1.59  26.00
> sdc               0.00     0.00   93.00    1.00   744.00     8.00
> 8.00     0.24    2.55   2.45  23.00
> sde               0.00     0.00   56.00    1.00   448.00     8.00
> 8.00     0.22    3.86   3.86  22.00
> sdd               0.00     0.00   88.00    1.00   704.00     8.00
> 8.00     0.18    2.02   2.02  18.00
> md_d0             0.00     0.00  401.00    0.00  3208.00     0.00
> 8.00     0.00    0.00   0.00   0.00
> 
> ==> this is normal operation, because of page cache, there's only read
> being submitted to the MD device.
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> sdb               0.00  1714.00    4.00  277.00    32.00 14810.00
> 52.82    34.04  105.05   2.92  82.00
> sdc               0.00  1685.00   12.00  270.00    96.00 14122.00
> 50.42    42.56  131.03   3.09  87.00
> sde               0.00  1385.00    8.00  261.00    64.00 12426.00
> 46.43    29.76   99.44   3.35  90.00
> sdd               0.00  1350.00    8.00  228.00    64.00 10682.00
> 45.53    40.93  133.56   3.69  87.00
> md_d0             0.00     0.00   32.00 16446.00   256.00 131568.00
>  8.00     0.00    0.00   0.00   0.00
> 
> ==> Huge page flush kick in, note the read requests is saturated on MD device.
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sda               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00   0.00   0.00
> sdb               0.00  1542.00    4.00  264.00    32.00 11760.00
> 44.00    66.58  230.22   3.73 100.00
> sdc               0.00  1185.00    0.00  272.00     0.00  9672.00
> 35.56    63.40  215.88   3.68 100.00
> sde               0.00  1352.00    0.00  298.00     0.00 12488.00
> 41.91    35.56  126.34   3.36 100.00
> sdd               0.00   996.00    0.00  294.00     0.00 10120.00
> 34.42    76.79  270.37   3.40 100.00
> md_d0             0.00     0.00    4.00    0.00    32.00     0.00
> 8.00     0.00    0.00   0.00   0.00
> 
> ==> Huge page flush still working,  no read is being done.
> 
> This is the problem , when page flush kick in, MD appears to refuse
> incoming read,  all under laying device is tuned to deadline scheduler
> and tuned to favor read, still, it don't work since MD simply don't
> submit new read to the underlying device.

The counters are update when a request completes, not when it is submitted,
so you cannot tell from this data if md is submitting the read requests or
not.

What kernel are you working with?  If it doesn't contain the commit
identified below can you try with that and see if it makes a difference?

Thanks,
NeilBrown

> 
> 2011/12/6 NeilBrown <neilb@xxxxxxx>:
> > On Tue, 6 Dec 2011 20:04:33 -0800 Yucong Sun (叶雨飞) <sunyucong@xxxxxxxxx>
> > wrote:
> >
> >> The problem with using page-flush as a write cache here is that write
> >> to MD don't go through IO scheduler, which is a very big problem,
> >> because when flush thread decide to write to MD,  it's impossible to
> >> control the write speed, or prioritize them with read, every requests
> >> basically is a fifo,  and when flush size is big, no read can be
> >> served.
> >>
> >
> > I'm not sure I understand....
> >
> > Requests don't go through an IO scheduler before they hit md, but they do
> > after md sends them on down, so they can be re-ordered there.
> >
> > There was a bug where raid10 would allow an arbitrary number of writes to
> > queue up so that flushing code didn't know when to stop.
> >
> > This was fixed by
> >   commit 34db0cd60f8a1f4ab73d118a8be3797c20388223
> >
> > nearly 2 months ago :-)
> >
> > NeilBrown
> >

Attachment:
signature.asc

Description: PGP signature