Any comments? Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

Takuya Yoshikawa <yoshikawa.takuya@xxxxxxxxxxxxx> · Mon, 24 May 2010 16:05:29 +0900

(2010/05/17 18:06), Takuya Yoshikawa wrote:

User allocated bitmaps have the advantage of reducing pinned memory.
However we have plenty more pinned memory allocated in memory slots, so
by itself, user allocated bitmaps don't justify this change.

Sorry for pinging several times.

In that sense, what do you think about the question I sent last week?

=== REPOST 1 ===
 >>
 >> mark_page_dirty is called with the mmu_lock spinlock held in set_spte.
 >> Must find a way to move it outside of the spinlock section.

I am now trying to do something to solve this spinlock problem. But the
spinlock section looks too wide to solve with simple workaround.

Sorry but I have to say that mmu_lock spin_lock problem was completely
out of
my mind. Although I looked through the code, it seems not easy to move the
set_bit_user to outside of spinlock section without breaking the
semantics of
its protection.

So this may take some time to solve.

But personally, I want to do something for x86's "vmallc() every time"
problem
even though moving dirty bitmaps to user space cannot be achieved soon.

In that sense, do you mind if we do double buffering without moving
dirty bitmaps to
user space?

So I would be happy if you give me any comments about this kind of other
options.

Thanks,
  Takuya

I know that the resource for vmalloc() is precious for x86 but even now,
at the timing
of get_dirty_log, we use the same amount of memory as double buffering.
=== 1 END ===

Perhaps if we optimize memory slot write protection (I have some ideas
about this) we can make the performance improvement more pronounced.

It's really nice!

Even now we can measure the performance improvement by introducing
switch ioctl
when guest is relatively idle, so the combination will be really effective!

=== REPOST 2 ===
 >>
 >> Can you post such a test, for an idle large guest?
 >
 > OK, I'll do!

Result of "low workload test" (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*****************************************
get.org get.opt switch.opt

1060875 310292 190335
1076754 301295 188600
655504 318284 196029
529769 301471 325
694796 70216 221172
651868 353073 196184
543339 312865 213236
1061938 72785 203090
689527 323901 249519
621364 323881 473
1063671 70703 192958
915903 336318 174008
1046462 332384 782
1037942 72783 190655
680122 318305 243544
688156 314935 193526
558658 265934 190550
652454 372135 196270
660140 68613 352
1101947 378642 186575
... ... ...
*****************************************

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.

And when the slot is cleaner, the ratio is bigger.
=== 2 END ===
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe kvm-ia64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html