Re: poor read performance on rbd+LVM, LVM overload

Mike Snitzer <snitzer@xxxxxxxxxx> · Mon, 21 Oct 2013 11:01:29 -0400

On Mon, Oct 21 2013 at 10:11am -0400,
Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

> On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote:
> > It looks like without LVM we're getting 128KB requests (which IIRC is 
> > typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
> > fuzzy here, but I seem to recall a property on the request_queue or device 
> > that affected this.  RBD is currently doing
> 
> Unfortunately most device mapper modules still split all I/O into 4k
> chunks before handling them.  They rely on the elevator to merge them
> back together down the line, which isn't overly efficient but should at
> least provide larger segments for the common cases.

It isn't DM that splits the IO into 4K chunks; it is the VM subsystem
no?  Unless care is taken to assemble larger bios (higher up the IO
stack, e.g. in XFS), all buffered IO will come to bio-based DM targets
in $PAGE_SIZE granularity.

I would expect direct IO to before better here because it will make use
of bio_add_page to build up larger IOs.

Taking a step back, the rbd driver is exposing both the minimum_io_size
and optimal_io_size as 4M.  This symmetry will cause XFS to _not_ detect
the exposed limits as striping.  Therefore, AFAIK, XFS won't take steps
to respect the limits when it assembles its bios (via bio_add_page).

Sage, any reason why you don't use traditional raid geomtry based IO
limits?, e.g.:

minimum_io_size = raid chunk size
optimal_io_size = raid chunk size * N stripes (aka full stripe)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com