On 08/30/2010 09:06 AM, Glauber Costa wrote: > Hi, > > So, this is basically the same as v1, with three major > differences: > 1) I am posting to lkml for wider audience > 2) patch 2/7 fixes one problem I mentined would happen in > smp systems, which is, we only update kvmclock when we > changes pcup > 3) softlockup algorithm is changed. Again, as marcelo pointed > out, this is open to discussion, and I am not dropping it > so more people can step in. > > I have some other patches under local test for a slightly modified > guest part accounting, and I do somehow support extending > the interface, and changing to nsecs (maybe not 100 %, but...). But > I am posting in this state so we can have lkml people to step > in earlier. > > Reminder of the previous cover-letter: > > There are two parts of it: the guest and host part. > > The proposal for the guest part, is to just change the > common time accounting, and try to identify at that spot, > wether or not we should account any steal time. I considered > this idea less cumbersome that trying to cook a clockevents > implementation ourselves, since I see little value in it. > I am, however, pretty open to suggestions. What's the relationship between clockevents and stolen time? Are you thinking some form of timer that counts unstolen time or something? > For the host<->guest communications, I am using a shared > page, in the same way as pvclock. Because of that, I am just > hijacking pvclock structure anyway. There is a 32-bit field > floating by, that gives us enough room for 8 years of steal > time (we use msec resolution). Please don't. The pvclock structure is already getting a bit packed with stuff, and stolen time isn't really part of its job. In Xen we have a separate runstate structure which allows the guest to get a detailed breakdown of the time each vcpu spends in each state (which are guaranteed to sum to the system time). We can use that to compute how much time has been stolen (time spent in RUNNABLE state). You might consider a similar ABI for KVM, even if you can't (yet) fill out all the time values. > The main idea is to timestamp our exit and entry through > sched notifiers, and export the value at pvclock updates. > This obviously have some disadvantages: by doing this we > are giving up futures ideas about only updating > this structure once, and even right now, won't work > on pinned-smp (since we don't update pvclock if we > haven't changed cpus.) > > But again, it is just an RFC, and I'd like to feel the > reception of the idea as a whole. The Xen code has always accounted for stolen time, so it appears in top, vmstat, etc, and gives a user/admin some idea about how much their domain is suffering from competition. This doesn't require any kernel changes aside from some calls to account_steal_ticks(); we do this every timer interrupt, accumulating whole ticks as we get them. But I've not successfully managed to make the scheduler work well with stolen time. I did experiment with making sched_clock return unstolen time, on the grounds that it would give the scheduler more information about how long things actually executed for. But its meaningless for measuring sleep/idle times, and it causes the different cpus' timebases to drift severely, which causes other things in the kernel to get upset. So I think any work in the scheduler area is interesting, and probably worth posting separately unless they're too entangled in the stolen time accounting area. J > Have a nice review. > Glauber Costa (7): > change headers preparing for steal time > always call kvm_write_guest > measure time out of guest > change kernel accounting to include steal time > kvm steal time implementation > touch softlockup watchdog > tell guest about steal time feature > > arch/x86/include/asm/kvm_host.h | 2 + > arch/x86/include/asm/kvm_para.h | 1 + > arch/x86/include/asm/pvclock-abi.h | 4 ++- > arch/x86/kernel/kvmclock.c | 40 ++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/x86.c | 26 ++++++++++++++++++---- > include/linux/sched.h | 1 + > kernel/sched.c | 29 ++++++++++++++++++++++++++ > 7 files changed, 97 insertions(+), 6 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html