On Tue, 2018-08-28 at 07:38 -0700, Bart Van Assche wrote: > On 08/27/18 22:34, Damien Le Moal wrote: > > With zonemode=zbd, for random read operations with read_beyond_wp=0, > > the zbd code will always adjust an I/O offset so that the I/O falls in > > a non empty zone. The adjustment however always sets the I/O offset to > > the start of the zone, resulting in a high device read cache hit rate > > if the device has very few zones written. > > > > Fix this by improving the randomness of the I/Os by adjusting the I/O > > offset to a random value with the range of written data in the zone. > > > > Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> > > --- > > zbd.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/zbd.c b/zbd.c > > index 56197693..a985e022 100644 > > --- a/zbd.c > > +++ b/zbd.c > > @@ -1179,7 +1179,15 @@ enum io_u_action zbd_adjust_block(struct > > thread_data *td, struct io_u *io_u) > > io_u->buflen); > > goto eof; > > } > > - io_u->offset = zb->start << 9; > > + range = ((zb->wp - zb->start) << 9) - io_u->buflen; > > + if (td_random(td) && range >= 0) > > + io_u->offset = (zb->start << 9) + > > + ((io_u->offset - (zb->start << 9)) % > > + (range + 1)) / min_bs * min_bs; > > This looks wrong to me: it is not guaranteed that io_u->offset >= > (zb->start << 9). Please use the original zb in that expression. Yes indeed. Will do. > Additionally, since that expression is only evaluated if range >=0, > there is no need to add "+ 1" to "range". Thinking more about this, since zbd_find_zone() only guarantees that the zone returned has min_bs data in it, range can in fact still be <= 0 if io_u->buflen is larger than min_bs. Is it assumed here that min_bs and io_u->buflen are always equal ? If yes, then range < 0 is not possible, but range == 0 is definitely still possible and so the +1 is needed. Looking at get_next_buflen(), io_u->buflen can be larger than min_bs. So either we need to modify zbd_find_zone() to return a zone with at least io_u->buflen bytes in it, or we need to handle the range < 0 case... Am I getting this wrong ? > Thanks, > > Bart. > -- Damien Le Moal Western Digital