Am 26.10.2011 13:39, schrieb Daniel P. Berrange: > On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote: >> Am 26.10.2011 11:57, schrieb Daniel P. Berrange: >>> On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote: >>>> Kevin Wolf <kwolf@xxxxxxxxxx> writes: >>>> >>>>> Am 25.10.2011 16:06, schrieb Anthony Liguori: >>>>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote: >>>>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori: >>>>>>>> I'd be much more open to changing the default mode to cache=none FWIW since the >>>>>>>> risk of data loss there is much, much lower. >>>>>>> >>>>>>> I think people said that they'd rather not have cache=none as default >>>>>>> because O_DIRECT doesn't work everywhere. >>>>>> >>>>>> Where doesn't it work these days? I know it doesn't work on tmpfs. I know it >>>>>> works on ext[234], btrfs, nfs. >>>>> >>>>> Besides file systems (and probably OSes) that don't support O_DIRECT, >>>>> there's another case: Our defaults don't work on 4k sector disks today. >>>>> You need to explicitly specify the logical_block_size qdev property for >>>>> cache=none to work on them. >>>>> >>>>> And changing this default isn't trivial as the right value doesn't only >>>>> depend on the host disk, but it's also guest visible. The only way out >>>>> would be bounce buffers, but I'm not sure that doing that silently is a >>>>> good idea... >>>> >>>> Sector size is a device property. >>>> >>>> If the user asks for a 4K sector disk, and the backend can't support >>>> that, we need to reject the configuration. Just like we reject >>>> read-only backends for read/write disks. >>> >>> I don't see why we need to reject a guest disk with 4k sectors, >>> just because the host disk only has 512 byte sectors. A guest >>> sector size that's a larger multiple of host sector size should >>> work just fine. It just means any guest sector write will update >>> 8 host sectors at a time. We only have problems if guest sector >>> size is not a multiple of host sector size, in which case bounce >>> buffers are the only option (other than rejecting the config >>> which is not too nice). >>> >>> IIUC, current QEMU behaviour is >>> >>> Guest 512 Guest 4k >>> Host 512 * OK OK >>> Host 4k * I/O Err OK >>> >>> '*' marks defaults >>> >>> IMHO, QEMU needs to work withot I/O errors in all of these >>> combinations, even if this means having to use bounce buffers >>> in some of them. That said, IMHO the default should be for >>> QEMU to avoid bounce buffers, which implies it should either >>> chose guest sector size to match host sector size, or it >>> should unconditionally use 4k guest. IMHO we need the former >>> >>> Guest 512 Guest 4k >>> Host 512 *OK OK >>> Host 4k OK *OK >> >> I'm not sure if a 4k host should imply a 4k guest by default. This means >> that some guests wouldn't be able to run on a 4k host. On the other >> hand, for those guests that can do 4k, it would be the much better option. >> >> So I think this decision is the hard thing about it. > > I guess it somewhat depends whether we want to strive for > > 1. Give the user the fastest working config by default > 2. Give the user a working config by default > 3. Give the user the fastest (possibly broken) config by default > > IMHO 3 is not a serious option, but I could see 2 as a reasonable > tradeoff to avoid complexity in chosing QEMU defaults. The user > would have a working config with 512 sectors, but sub-optimal perf > on 4k hosts due to bounce buffering. Ideally libvirt or other > higher app would be setting the best block size that a guest > can support by default, so bounce buffers would rarely be needed. > So only people using QEMU directly without setting a block size > would ordinarily suffer the bounce buffer perf hit on a 4k host > host Yes, I'm currently tending towards this plus a warning on stderr if bounce buffering is used. Or, coming back to the original subject of this discussion, we can default to cache=writeback and forget about alignment. If you specify cache=none, you have to take care to explicitly specify a block size > 512 bytes, too. Maybe the best is actually to do both: Default to cache=writeback, completely avoiding bounce buffers. If the user specifies cache=none, but doesn't change the sector size of the virtual disk, print a warning and enable bounce buffers. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html