On Wed, Aug 04, 2010 at 05:59:40PM +0200, Alexander Graf wrote: > > On 04.08.2010, at 17:48, Gleb Natapov wrote: > > > On Wed, Aug 04, 2010 at 05:31:12PM +0200, Alexander Graf wrote: > >> > >> On 04.08.2010, at 17:25, Gleb Natapov wrote: > >> > >>> On Wed, Aug 04, 2010 at 09:57:17AM -0500, Anthony Liguori wrote: > >>>> On 08/04/2010 09:51 AM, David S. Ahern wrote: > >>>>> > >>>>> On 08/03/10 12:43, Avi Kivity wrote: > >>>>>> libguestfs does not depend on an x86 architectural feature. > >>>>>> qemu-system-x86_64 emulates a PC, and PCs don't have -kernel. We should > >>>>>> discourage people from depending on this interface for production use. > >>>>> That is a feature of qemu - and an important one to me as well. Why > >>>>> should it be discouraged? You end up at the same place -- a running > >>>>> kernel and in-ram filesystem; why require going through a bootloader > >>>>> just because the hardware case needs it? > >>>> > >>>> It's smoke and mirrors. We're still providing a boot loader it's > >>>> just a little tiny one that we've written soley for this purpose. > >>>> > >>>> And it works fine for production use. The question is whether we > >>>> ought to be aggressively optimizing it for large initrd sizes. To > >>>> be honest, after a lot of discussion of possibilities, I've come to > >>>> the conclusion that it's just not worth it. > >>>> > >>>> There are better ways like using string I/O and optimizing the PIO > >>>> path in the kernel. That should cut down the 1s slow down with a > >>>> 100MB initrd by a bit. But honestly, shaving a couple hundred ms > >>>> further off the initrd load is just not worth it using the current > >>>> model. > >>>> > >>> The slow down is not 1s any more. String PIO emulation had many bugs > >>> that were fixed in 2.6.35. I verified how much time it took to load 100M > >>> via fw_cfg interface on older kernel and on 2.6.35. On older kernels on > >>> my machine it took ~2-3 second on 2.6.35 it took 26s. Some optimizations > >>> that was already committed make it 20s. I have some code prototype that > >>> makes it 11s. I don't see how we can get below that, surely not back to > >>> ~2-3sec. > >> > >> What exactly is the reason for the slowdown? It can't be only boundary and permission checks, right? > >> > >> > > The big part of slowdown right now is that write into memory is done > > for each byte. It means for each byte we call kvm_write_guest() and > > kvm_mmu_pte_write(). The second call is needed in case memory, instruction > > is trying to write to, is shadowed. Previously we didn't checked for > > that at all. This can be mitigated by introducing write cache and do > > combined writes into the memory and unshadow the page if there is more > > then one write into it. This optimization saves ~10secs. Currently string > > Ok, so you tackled that bit already. > > > emulation enter guest from time to time to check if event injection is > > needed and read from userspace is done in 1K chunks, not 4K like it was, > > but when I made reads to be 4K and disabled guest reentry I haven't seen > > any speed improvements worth talking about. > > So what are we wasting those 10 seconds on then? Does perf tell you anything useful? > Not 10, but 7-8 seconds. After applying cache fix nothing definite as far as I remember (I ran it last time almost 2 week ago, need to rerun). Code always go through emulator now and check direction flags to update SI/DI accordingly. Emulator is a big switch and it calls various callbacks that may also slow things down. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html