On Thu, Jan 28, 2016 at 05:37:33PM -0500, Mike Snitzer wrote: > On Thu, Jan 28 2016 at 4:23pm -0500, > Benjamin Marzinski <bmarzins@xxxxxxxxxx> wrote: > > > I'd like to attend LSF/MM 2016 to participate in any discussions about > > redesigning how device-mapper multipath operates. I spend a significant > > chunk of time dealing with issues around multipath and I'd like to > > be part of any discussion about redesigning it. > > > > In addition, I'd be interesting in disucssions that deal with how > > device-mapper targets are dealing with blk-mq in general. For instance, > > it looks like the current dm-multipath blk-mq implementation is running > > into performance bottlenecks, and changing how path selection works into > > something that allows for more parallelism is a worthy discussion. > > At this point this isn't the sexy topic we'd like it to be -- not too > sure how a 30 minute session on this will go. The devil is really in > the details. Hopefully we can have more details once LSF rolls around > to make an in-person discussion productive. > > I've spent the past few days working on this and while there are > certainly various questions it is pretty clear that DM multipath's > m->lock (spinlock) is really _not_ a big bottleneck. It is an obvious > one for sure, but I removed the spinlock entirely (debug only) and then > the 'perf report -g' was completely benign -- no obvious bottlenecks. > Yet DM mpath performance on a really fast null_blk device, ~1850K read > IOPs, was still only ~950K -- as Jens rightly pointed out to me today: > > "sure, it's slower but taking a step back, it's about making sure we > have a pretty low overhead, so actual application workloads don't spend > a lot of time in the kernel > > ~1M IOPS is a _lot_". > > But even still, DM mpath is dropping 50% of potential IOPs on the floor. > There must be something inherently limiting in all the extra work done > to: 1) stack blk-mq devices (2 completely different sw -> hw mappings) > 2) clone top-level blk-mq requests for submission on the underlying > blk-mq paths. > > Anyway, my goal is to have my contribution to this LSF session be all > about what was wrong and how it has been fixed ;) > > But given how much harder analyzing this problem has become I'm less > encouraged I'll be able to do so. > > > But it would also be worth looking into changes about how the dm blk-mq > > impementation deals with the mapping between it's swqueues and > > hwqueue(s). Right now all the dm mapping is done in .queue_rq, instead > > of in .map_queue, but I'm not convinced it belongs there. > > blk-mq's .queue_rq hook is the logical place to do the mpath mapping, as > it deals with getting a request from the underlying paths. > > blk-mq's .map_queue is all about mapping sw to hw queues. It is very > blk-mq specific and isn't something DM has a roll in -- cannot yet see > why it'd need to. At the moment, we only have one hwqueue. But we could have one hwqueue per path. Then queue_rq would just be in charge of handing the requst down to the underlying device. In that setup, instead using a default mapping of all swqueues to one hwqueue in .map_queue, we would be mapping to the hardware queue for the path. I'd have to look through the blk-mq code more to know if one of these methods has an obvious advantage, but it seems like this way, if different cpus were using different paths (with the per-cpu load-balancing), you wouldn't constantly be accessing the hwqueue from different cpus. Although I suppose you may do better just by leaving multipath_map where it is now, and just adjusting the number of hardware queues. Speaking of which, have you tried fiddling around with that in your tests? > > There's also the issue that the bio targets may scale better on blk-mq > > devices than the blk-mq targets. > > Why is that surprising? request-based DM (and block core) has quite a > bit more work that it does. > > bio-based DM targets take a ~20% IOPs hit, whereas blk-mq request-based > DM takes a ~50% hit. I'd _love_ for request-based DM to get to only a > ~20% hit. (And for the bio-based 20% hit to be reduced further). Right. But like I said in an earlier email, if bio-based mpath would give us better performance on this class of devices, then all the blk-mq performance work helps both multipath and the other targets. I realize that bio based multipath had issues other than simply IO performance that caused us to switch, like a lack of good error information. But if the performance gap between request-based and bio-based dm persists for blk-mq devices (even assuming both improve), then we should at least revist the issues with bio-based multipath to see which set of problems looks easiest to tackle. -Ben -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel