On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote: > Am 26.10.2011 11:57, schrieb Daniel P. Berrange: > > On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote: > >> Kevin Wolf <kwolf@xxxxxxxxxx> writes: > >> > >>> Am 25.10.2011 16:06, schrieb Anthony Liguori: > >>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote: > >>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori: > >>>>>> I'd be much more open to changing the default mode to cache=none FWIW since the > >>>>>> risk of data loss there is much, much lower. > >>>>> > >>>>> I think people said that they'd rather not have cache=none as default > >>>>> because O_DIRECT doesn't work everywhere. > >>>> > >>>> Where doesn't it work these days? I know it doesn't work on tmpfs. I know it > >>>> works on ext[234], btrfs, nfs. > >>> > >>> Besides file systems (and probably OSes) that don't support O_DIRECT, > >>> there's another case: Our defaults don't work on 4k sector disks today. > >>> You need to explicitly specify the logical_block_size qdev property for > >>> cache=none to work on them. > >>> > >>> And changing this default isn't trivial as the right value doesn't only > >>> depend on the host disk, but it's also guest visible. The only way out > >>> would be bounce buffers, but I'm not sure that doing that silently is a > >>> good idea... > >> > >> Sector size is a device property. > >> > >> If the user asks for a 4K sector disk, and the backend can't support > >> that, we need to reject the configuration. Just like we reject > >> read-only backends for read/write disks. > > > > I don't see why we need to reject a guest disk with 4k sectors, > > just because the host disk only has 512 byte sectors. A guest > > sector size that's a larger multiple of host sector size should > > work just fine. It just means any guest sector write will update > > 8 host sectors at a time. We only have problems if guest sector > > size is not a multiple of host sector size, in which case bounce > > buffers are the only option (other than rejecting the config > > which is not too nice). > > > > IIUC, current QEMU behaviour is > > > > Guest 512 Guest 4k > > Host 512 * OK OK > > Host 4k * I/O Err OK > > > > '*' marks defaults > > > > IMHO, QEMU needs to work withot I/O errors in all of these > > combinations, even if this means having to use bounce buffers > > in some of them. That said, IMHO the default should be for > > QEMU to avoid bounce buffers, which implies it should either > > chose guest sector size to match host sector size, or it > > should unconditionally use 4k guest. IMHO we need the former > > > > Guest 512 Guest 4k > > Host 512 *OK OK > > Host 4k OK *OK > > I'm not sure if a 4k host should imply a 4k guest by default. This means > that some guests wouldn't be able to run on a 4k host. On the other > hand, for those guests that can do 4k, it would be the much better option. > > So I think this decision is the hard thing about it. I guess it somewhat depends whether we want to strive for 1. Give the user the fastest working config by default 2. Give the user a working config by default 3. Give the user the fastest (possibly broken) config by default IMHO 3 is not a serious option, but I could see 2 as a reasonable tradeoff to avoid complexity in chosing QEMU defaults. The user would have a working config with 512 sectors, but sub-optimal perf on 4k hosts due to bounce buffering. Ideally libvirt or other higher app would be setting the best block size that a guest can support by default, so bounce buffers would rarely be needed. So only people using QEMU directly without setting a block size would ordinarily suffer the bounce buffer perf hit on a 4k host host Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html