On Wed, Jun 1, 2011 at 2:20 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote: > > [..] >> The level of consistency will then depend on whether you overcommit >> your hardware and how you have it configured. > > Agreed. > >> >> Consistency is very hard because at the end of the day, you still >> have shared resources. Even with blkio, I presume one guest can >> still impact another guest by forcing the disk to do excessive >> seeking or something of that nature. >> >> So absolutely consistency can't be the requirement for the use-case. >> The use-cases we are interested really are more about providing caps >> than anything else. > > I think both qemu and kenrel can do the job. The only thing which > seriously favors throttling implementation in qemu is the ability > to handle wide variety of backend files (NFS, qcow, libcurl based > devices etc). > > So what I am arguing is that your previous reason that qemu can do > a better job because it knows effective IOPS of guest, is not > necessarily a very good reason. To me simplicity of being able to handle > everything as file and do the throttling is the most compelling reason > to do this implementation in qemu. The variety of backends is the reason to go for a QEMU-based approach. If there were kernel mechanisms to handle non-block backends that would be great. cgroups NFS? Of course for something like Sheepdog or Ceph it becomes quite hard to do it in the kernel at all since they are userspace libraries that speak their protocol over sockets, and you really don't have sinight into what I/O operations they are doing from the kernel. One issue that concerns me is how effective iops and throughput are as capping mechanisms. If you cap throughput then you're likely to affect sequential I/O but do little against random I/O which can hog the disk with a seeky I/O pattern. If you limit iops you can cap random I/O but artifically limit sequential I/O, which may be able to perform a high number of iops without hogging the disk due to seek times at all. One proposed solution here (I think Christoph Hellwig suggested it) is to do something like merging sequential I/O counting so that multiple sequential I/Os only count as 1 iop. I like the idea of a proportional share of disk utilization but doing that from QEMU is problematic since we only know when we issued an I/O to the kernel, not when it's actually being serviced by the disk - there could be queue wait times in the block layer that we don't know about - so we end up with a magic number for disk utilization which may not be a very meaningful number. So given the constraints and the backends we need to support, disk I/O limits in QEMU with iops and throughput limits seem like the approach we need. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html