Re: slow ext4 O_SYNC writes (why qemu qcow2 is so slow on ext4 vs ext3)

Jan Kara <jack@xxxxxxx> · Tue, 20 Jul 2010 17:59:24 +0200



On Tue 20-07-10 17:41:33, Michael Tokarev wrote:
> 20.07.2010 16:46, Jan Kara wrote:
> >   Hi,
> >
> >On Fri 02-07-10 16:46:28, Michael Tokarev wrote:
> >>-----BEGIN PGP SIGNED MESSAGE-----
> >>Hash: SHA1
> >>
> >>I noticed that qcow2 images, esp. fresh ones (so that they
> >>receive lots of metadata updates) are very slow on my
> >>machine.  And on IRC (#kvm), Sheldon Hearn found that on
> >>ext3, it is fast again.
> >>
> >>So I tested different combinations for a bit, and observed
> >>the following:
> >>
> >>for fresh qcow2 file, with default qemu cache settings,
> >>copying kernel source is about 10 times slower on ext4
> >>than on ext3.  Second copy (rewrite) is significantly
> >>faster in both cases (expectable), but still ~20% slower
> >>on ext4 than on ext3.
> >>
> >>Normal cache mode in qemu is writethrough, which translates
> >>to O_SYNC file open mode.
> >>
> >>With cache=none, which translates to O_DIRECT, metadata-
> >>intensive writes (fresh qcow) are about as slow as on
> >>ext4 with O_SYNC, and rewrite is expectedly faster, but
> >>now there's _no_ difference in speed between ext3 and ext4.
> >>
> >>I did a series of straces of the writer processes, -- time
> >>spent in pwrite() syscalls is significantly larger for
> >>ext4 with O_SYNC than with ext3 with O_SYNC, the diff is
> >>about 50 times.
> >>
> >>Also, with slower I/O in case of ext4, qemu-kvm starts more
> >>I/O threads, which, as it seems, slows whole thing down even
> >>further - I changed max_threads from default 64 to 16, and
> >>the speed improved slightly.  Here, the diff. is again quite
> >>significant: on ext3 qemu spawns only 8 threads, while on
> >>ext4 all 64 I/O threads are spawned almost immediately.
> >>
> >>So I've two questions:
> >>
> >>  1.  Why ext4 O_SYNC is too slow compared with ext3 O_SYNC?
> >>    This is observed on 2.6.32 and 2.6.34 kernels, barriers
> >>    or data={writeback|ordered} had no difference.  I tested
> >>    whole thing on a partition on a single drive, sheldonh
> >>    used ext[34]fs on top of lvm on a raid1 volume.
> >   Do I get it right, that you have ext3/4 which carries fs images used by
> >KVM? What you describe is strange. Up to this moment it sounded to me like
> >a difference in barrier settings on the host but you seem to have tried
> >that. Just stabbing in the dark - could you try nodelalloc mount option
> >of ext4?
> 
> Yes, exactly, a guest filesystem image stored on ext3 or
> ext4.  And yes, I suspected barriers too, but immediately
> ruled that out, since barrier or no barrier does not matter
> in this test.
> 
> I'll try nodelalloc, but I'm not sure when: right now I'm at
> vacation, typing from a hotel, and my home machine whith all
> the guest images and the like is turned off and - for some
> reason - I can't wake it up over ethernet, it seemingly ignores
> WOL packets.  Too bad I don't have any guest image here on my
> notebook.
> 
> >>  2.  The number of threads spawned for I/O... this is a good
> >>    question, how to find an adequate cap.  Different hw has
> >>    different capabilities, and we may have more users doing
> >>    I/O at the same time...
> 
> >   Maybe you could measure your total throughput over some period,
> >try increasing number of threads in the next period and if it
> >helps significantly, use larger number, otherwise go back to a
> >smaller number?
> 
> Well, this is, again, a good question -- it's how qemu works right
> now, spawning up to 64 I/O threads for all I/O requiests guests
> submits.  The slower the I/O, the more threads can be spawned.
> Working that part out is a separate, difficult job.
> 
> The main question here is why ext4 is so slow for O_[D]SYNC writes.
  Yes.

> Besides, quite similar topic were discussed meanwhile, in a different
> thread titled "BTRFS: Unbelievably slow with kvm/qemu" -- see f.e.
> http://marc.info/?t=127891236700003&r=1&w=2 .  In particular, this
> message http://marc.info/?l=linux-kernel&m=127913696420974 shows
> a comparison table for a few filesystems and qemu/kvm usage, but on
> raw files instead of qcow.
  Thanks for the pointer. But in the comparison Christoph did, ext4 came
out slightly faster than ext3 when barrier options were equivalent.
Which is what I would expect... So what is the difference?

> Different qemu/kvm guest fs image options are (partial list):
> 
>  raw disk image in a file on host.  Either pre-allocated or
>    (initially) sparse.  The pre-allocated case should - in
>    theory - work equally on all filesystems.  While sparse
>    case should differ per filesystem, depending on how different
>    filesystems allocate data.
> 
>  qcow[2] image in a file on host.  This one is never sparse,
>   but unlike raw it also contains some qemu-specific metadata,
>   like which blocks are allocated and in which place, sorta
>   like lvm.  Initially it is created empty (with only a header),
>   and when guest perform writes, new blocks are allocated and
>   metadata gets updated.  This requires some more writes than
>   the guest performs, and quite a few syncs (with O_SYNC they're
>   automatic).

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html