On Mon, 2011-05-02 at 09:36 -0400, Adam Chasen wrote: > Lowering rr_min_io provides marginal improvement. I see 6MB/s > improvement at an rr_min_io of 3 vs 100. I played around with it > before all the way down to 1. People seems to settle on 3. Still, I am > not seeing the bandwidth I assume it should be (4 aggregated links). > > Some additional information. If I attempt to pull from my two > multipath devices simultaneously (different LUNs, but same iSCSI > connections) then I can pull additional data (50MB/s vs 27-30MB/s > from each link). > > Adam > > This is a response to a direct email I sent to someone who had a > similar issue on this list a while back: > Date: Sat, 30 Apr 2011 00:13:20 +0200 > From: Bart Coninckx <bart.coninckx@xxxxxxxxxx> > Hi Adam, > > I believe setting rr_min_io to 3 in stead of 100 improved things > significantly. > What is still an unexplainable issue though is dd-ing to the multipath > device (very slow) while reading from it is very fast. Doing the same > piped over SSH to the original devices on the iSCSI server was OK, so it > seems like either an iSCSI or still a multipath issue. > > But I definitely remember that lowering rr_min_io helped quite a bit. > I think the paths are switched faster in this way resulting into more speed. > > Good luck, > > b. > > > On Mon, May 2, 2011 at 3:25 AM, Pasi KÃrkkÃinen <pasik@xxxxxx> wrote: > > On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote: > >> > >> [root@zed ~]# multipath -ll > >> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000 > >> size=1.1T features='0' hwhandler='0' wp=rw > >> `-+- policy='round-robin 0' prio=1 status=active > >> |- 88:0:0:0 sdd 8:48 active ready running > >> |- 86:0:0:0 sdc 8:32 active ready running > >> |- 89:0:0:0 sdg 8:96 active ready running > >> `- 87:0:0:0 sdf 8:80 active ready running > >> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000 > >> size=1.1T features='0' hwhandler='0' wp=rw > >> `-+- policy='round-robin 0' prio=1 status=active > >> |- 89:0:0:1 sdk 8:160 active ready running > >> |- 88:0:0:1 sdi 8:128 active ready running > >> |- 86:0:0:1 sdh 8:112 active ready running > >> `- 87:0:0:1 sdl 8:176 active ready running > >> > >> /etc/multipath.conf > >> defaults { > >> path_grouping_policy multibus > >> rr_min_io 100 > >> } > > > > Did you try a lower value for rr_min_io ? > > > > -- Pasi > > > >> > >> multipath-tools v0.4.9 (05/33, 2016) > >> 2.6.35.11-2-fl.smp.gcc4.4.x86_64 <snip> I'm quite curious to see what you ultimately find on this as we have a similar setup (four paths to an iSCSI SAN) and have struggled quite a bit. We had settled on using multipath for failover but load balancing using software RAID0 across the four devices. That seemed to provide more even scaling under various IO patterns until we realized we could not take a transactionally consistent snapshot of the SAN because we would not know which RAID transaction had been committed at the timeof the snapshot. Thus, we are planning to implement multibus. What scheduler are you using? We found that the default cfq scheduler in our kernel versions (2.6.28 and 29) did not scale at all to the number of parallel iSCSI sessions. Deadline or noop scaled almost linearly. We then realized that our SAN (Nexenta running ZFS) was doing its own optimization of writing to the physical media (we assumed that's what the scheduler is for) so we had no need for the overhead of any scheduler and set ours to noop except for local disks. I'm also very curious about your findings on rr_min_io. I cannot find my benchmarks but we tested various settings heavily. I do not recall if we saw more even scaling with 10 or 100. I remember being surprised that performance with it set to 1 was poor. I would have thought that, in a bonded environment, changing paths per iSCSI command would give optimal performance. Can anyone explain why it does not? We speculated that it either added too much overhead to manage the constant switching or it was the nature of iSCSI. Does each iSCSI command need to be acknowledged before the next one can be sent? If so, does multibus not increase throughput any individual iSCSI stream but only as we multiplex iSCSI streams? If that is the case, it would exacerbate the already significant problem of Linux, iSCSI, and latency. We have found that in any Linux disk IO that touches the Linux file system, iSCSI performance is quite poor because it is latency bound due to the maximum 4KB page size. I'm only parroting what others have told me so correct me if I am wrong. Since iSCSI can only commit 4KB at a time in Linux (unless bypassing the file system with raw devices, dd, or direct writes in something like Oracle), and since each write needs to be acknowledged before the next is sent, and because sending 4KB down a high speed pipe like 10Gbps or even 1Gbps comes nowhere near to saturating the link, iSCSI Linux IO is latency bound and no amount of increase in bandwidth or number of bound channels will increase the throughput of an individual iSCSI stream. Only minimizing latency will. I hope some of that might have helped and look forward to hearing about your optimization of multibus. Thanks - John -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel