PVnize instructions

Alexander Graf <agraf@xxxxxxx> · Mon, 21 Jun 2010 17:10:15 +0200

I figured I go and try to find out what the emulation distribution is in
random use cases. The one I measured here was a:

$ for i in `seq 1000`; do ls -la > /dev/null; done

inside the guest. This should give pretty good hints on process spawning
overhead. Below are the results on what is issued most often.

Number of invocations | Opcode in hex | OP | XOP | asm name | sprn
parameter | sprn name

00488520        2101346470      OP: 31  XOP: 83         mfmsr
00487702        1275068452      OP: 19  XOP: 18         rfid
00244822        2108900006      OP: 31  XOP: 339        mfspr   275    
SPRN_SPRG3
00244799        2107310758      OP: 31  XOP: 339        mfspr   27     
SPR_SRR1
00243110        2103116710      OP: 31  XOP: 467        mtspr   27     
SPR_SRR1
00242910        2107245478      OP: 31  XOP: 467        mtspr   26     
SPR_SRR0
00242854        2105148070      OP: 31  XOP: 339        mfspr   26     
SPR_SRR0
00206254        2101412196      OP: 31  XOP: 178        mtmsrd
00163540        2103509348      OP: 31  XOP: 178        mtmsrd  
00162348        2108769190      OP: 31  XOP: 467        mtspr   273    
SPRN_SPRG1
00158986        2100380326      OP: 31  XOP: 339        mfspr   273    
SPRN_SPRG1
00142246        2080375332      OP: 31  XOP: 274        tlbiel  
00122541        2107311014      OP: 31  XOP: 467        mtspr   27     
SPR_SRR1
00122527        2105148326      OP: 31  XOP: 467        mtspr   26     
SPR_SRR0
00089577        2102592166      OP: 31  XOP: 339        mfspr   19     
SPR_DAR
00089562        2102526630      OP: 31  XOP: 339        mfspr   18     
SPR_DSISR
00082629        2103443622      OP: 31  XOP: 83         mfmsr   
00080937        2098922406      OP: 31  XOP: 467        mtspr   27     
SPR_SRR1
00080937        2096759718      OP: 31  XOP: 467        mtspr   26     
SPR_SRR0
00054759        2080393764      OP: 31  XOP: 274        tlbiel
00042033        2080440676      OP: 31  XOP: 178        mtmsrd
00042013        2080374950      OP: 31  XOP: 83         mfmsr
00040733        2099315044      OP: 31  XOP: 178        mtmsrd  
00039939        2081817254      OP: 31  XOP: 339        mfspr   22     
SPR_DECR
00039401        2088829284      OP: 31  XOP: 178        mtmsrd  
00039386        2088436646      OP: 31  XOP: 467        mtspr   27     
SPR_SRR1
00039377        2088763558      OP: 31  XOP: 83         mfmsr   
00039343        2086273958      OP: 31  XOP: 467        mtspr   26     
SPR_SRR0

Obviously we could PV mfmsr. Most of the mfmsr and mtmsrs can also be
easily replaced by stda/lda to a negative address with a magic page.
Rfid is pretty much impossible, mtmsrd is _very_ difficult without more
logic inside the guest. The only way around tlbiel would be a queuing
invalidation mechanism - and I doubt that's possible as the kernel
expects the page to be gone instantly.

Overall, this looks pretty promising though. Apparently > 60% of the
emulated instructions can be pretty easily patched to non-emulated ones.
So this is definitely the next low hanging performance fruit to get!

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html