On Tue 20-07-10 17:41:33, Michael Tokarev wrote: > 20.07.2010 16:46, Jan Kara wrote: > > Hi, > > > >On Fri 02-07-10 16:46:28, Michael Tokarev wrote: > >>-----BEGIN PGP SIGNED MESSAGE----- > >>Hash: SHA1 > >> > >>I noticed that qcow2 images, esp. fresh ones (so that they > >>receive lots of metadata updates) are very slow on my > >>machine. And on IRC (#kvm), Sheldon Hearn found that on > >>ext3, it is fast again. > >> > >>So I tested different combinations for a bit, and observed > >>the following: > >> > >>for fresh qcow2 file, with default qemu cache settings, > >>copying kernel source is about 10 times slower on ext4 > >>than on ext3. Second copy (rewrite) is significantly > >>faster in both cases (expectable), but still ~20% slower > >>on ext4 than on ext3. > >> > >>Normal cache mode in qemu is writethrough, which translates > >>to O_SYNC file open mode. > >> > >>With cache=none, which translates to O_DIRECT, metadata- > >>intensive writes (fresh qcow) are about as slow as on > >>ext4 with O_SYNC, and rewrite is expectedly faster, but > >>now there's _no_ difference in speed between ext3 and ext4. > >> > >>I did a series of straces of the writer processes, -- time > >>spent in pwrite() syscalls is significantly larger for > >>ext4 with O_SYNC than with ext3 with O_SYNC, the diff is > >>about 50 times. > >> > >>Also, with slower I/O in case of ext4, qemu-kvm starts more > >>I/O threads, which, as it seems, slows whole thing down even > >>further - I changed max_threads from default 64 to 16, and > >>the speed improved slightly. Here, the diff. is again quite > >>significant: on ext3 qemu spawns only 8 threads, while on > >>ext4 all 64 I/O threads are spawned almost immediately. > >> > >>So I've two questions: > >> > >> 1. Why ext4 O_SYNC is too slow compared with ext3 O_SYNC? > >> This is observed on 2.6.32 and 2.6.34 kernels, barriers > >> or data={writeback|ordered} had no difference. I tested > >> whole thing on a partition on a single drive, sheldonh > >> used ext[34]fs on top of lvm on a raid1 volume. > > Do I get it right, that you have ext3/4 which carries fs images used by > >KVM? What you describe is strange. Up to this moment it sounded to me like > >a difference in barrier settings on the host but you seem to have tried > >that. Just stabbing in the dark - could you try nodelalloc mount option > >of ext4? > > Yes, exactly, a guest filesystem image stored on ext3 or > ext4. And yes, I suspected barriers too, but immediately > ruled that out, since barrier or no barrier does not matter > in this test. > > I'll try nodelalloc, but I'm not sure when: right now I'm at > vacation, typing from a hotel, and my home machine whith all > the guest images and the like is turned off and - for some > reason - I can't wake it up over ethernet, it seemingly ignores > WOL packets. Too bad I don't have any guest image here on my > notebook. > > >> 2. The number of threads spawned for I/O... this is a good > >> question, how to find an adequate cap. Different hw has > >> different capabilities, and we may have more users doing > >> I/O at the same time... > > > Maybe you could measure your total throughput over some period, > >try increasing number of threads in the next period and if it > >helps significantly, use larger number, otherwise go back to a > >smaller number? > > Well, this is, again, a good question -- it's how qemu works right > now, spawning up to 64 I/O threads for all I/O requiests guests > submits. The slower the I/O, the more threads can be spawned. > Working that part out is a separate, difficult job. > > The main question here is why ext4 is so slow for O_[D]SYNC writes. Yes. > Besides, quite similar topic were discussed meanwhile, in a different > thread titled "BTRFS: Unbelievably slow with kvm/qemu" -- see f.e. > http://marc.info/?t=127891236700003&r=1&w=2 . In particular, this > message http://marc.info/?l=linux-kernel&m=127913696420974 shows > a comparison table for a few filesystems and qemu/kvm usage, but on > raw files instead of qcow. Thanks for the pointer. But in the comparison Christoph did, ext4 came out slightly faster than ext3 when barrier options were equivalent. Which is what I would expect... So what is the difference? > Different qemu/kvm guest fs image options are (partial list): > > raw disk image in a file on host. Either pre-allocated or > (initially) sparse. The pre-allocated case should - in > theory - work equally on all filesystems. While sparse > case should differ per filesystem, depending on how different > filesystems allocate data. > > qcow[2] image in a file on host. This one is never sparse, > but unlike raw it also contains some qemu-specific metadata, > like which blocks are allocated and in which place, sorta > like lvm. Initially it is created empty (with only a header), > and when guest perform writes, new blocks are allocated and > metadata gets updated. This requires some more writes than > the guest performs, and quite a few syncs (with O_SYNC they're > automatic). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html