RE: [PATCH] Map guest (ts,tid) to individual shadow tid

"Liu Yu-B13201" <Yu.Liu@xxxxxxxxxxxxx> · Fri, 10 Apr 2009 17:13:26 +0800

> -----Original Message-----
> From: Hollis Blanchard [mailto:hollisb@xxxxxxxxxx] 
> Sent: Friday, April 10, 2009 4:16 AM
> To: Liu Yu-B13201
> Cc: kvm-ppc@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH] Map guest (ts,tid) to individual shadow tid
> 
> On Thu, 2009-04-09 at 11:44 +0800, Liu Yu-B13201 wrote:
> > > -----Original Message-----
> > > From: Hollis Blanchard [mailto:hollisb@xxxxxxxxxx] 
> > > Sent: Thursday, April 09, 2009 12:43 AM
> > > To: Liu Yu-B13201
> > > Cc: kvm-ppc@xxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH] Map guest (ts,tid) to individual shadow tid
> > > 
> > > On Tuesday 07 April 2009 21:11:11 Liu Yu-B13201 wrote:
> > > > 
> > > > > -----Original Message-----
> > > > > From: Hollis Blanchard [mailto:hollisb@xxxxxxxxxx] 
> > > > > Sent: Tuesday, April 07, 2009 11:41 PM
> > > > > To: Liu Yu-B13201
> > > > > Cc: kvm-ppc@xxxxxxxxxxxxxxx
> > > > > Subject: Re: [PATCH] Map guest (ts,tid) to individual 
> shadow tid
> > > > > 
> > > > > On Tue, 2009-04-07 at 17:51 +0800, Liu Yu wrote:
> > > > > > Hi guys,
> > > > > > 
> > > > > > Thanks to Hollis's idea.
> > > > > > This patch help to make KVMPPC be more friendly to OSes 
> > > > > other than Linux.
> > > > > 
> > > > > Wow, that was quick. :)
> > > > > 
> > > > > Have you tested this code? Doesn't this break the 
> assumption in
> > > > > kvmppc_e500_tlb1_invalidate() (and probably elsewhere) 
> > > that stid ==
> > > > > gtid?
> > > > 
> > > > Yes, have taken a simple test.
> > > 
> > > Once we can reduce the number of TLB flushes (see below) 
> it will be 
> > > interesting to see if there's a performance impact.
> > > 
> > > >Good catch, it needs to handle it here in 
> > > kvmppc_e500_tlb1_invalidate().
> > > >Thanks.
> > > >But it's ok for now, because TLB1 only contains tid=0 
> > > entries, and tid=0
> > > >always be mapped to stid=0.
> > > >That's why the test is fine...
> > > 
> > > OK. Still, it makes me nervous to break such a simple 
> > > assumption. We should 
> > > introduce nice accessors to make it difficult to code it 
> > > wrong in the future. 
> > > I'm honestly surprised that kvmppc_e500_tlb1_invalidate() is 
> > > the only affected 
> > > site.
> > 
> > It is the only site. We don't pay much attention on shadow tid.
> > For 500, shadow tid is inherit from guest tid.
> > As guest tlb1 mapping is broken into 4K shadow mappings,
> > the kvmppc_e500_tlb1_invalidate() need to check tid to find all 4k
> > shadow mappings related to an guest mapping.
> > 
> > Actually, I have been thinking about remove all shadow tlb 
> like 440 did.
> > This maybe helpful to succedent work such as huge tlb mapping.
> > After doing that, kvmppc_e500_tlb1_invalidate() won't has 
> this assumption.
> > 
> > Anyway, the patch is an RFC, not aimed at getting applied.
> > Just make something to discuss. :)
> 
> It's a very good discussion. :)
> 
> > But there is one thing I haven't thought through.
> > That is how should we handle guest kernel mapping (tid=0) 
> after apply this trick?
> > As it's supposed to share by all PIDs.
> > Any suggestion?
> 
> Hmm, that's a good question.
> 
> On e500, you could use the PID1 register, but that doesn't appear in
> 440, and won't be available in future implementations anyways.
> 
> In general, I don't know... I'm reluctant to sacrifice the other PID
> hack we use, which allows us to flip between guest user and 
> guest kernel
> without flushing the TLB. Since that's probably way more frequent than
> context switches, maybe it's worth suffering the TLB flush on context
> switch.
> 
> It would be OK to map gPID 0 to another arbitrary host PID, except we
> need copy_to_user() to work in the guest. To elaborate, if we had this
> address space:
> 
> 0               0xc        0xf
> |     gTID 3     |  gTID 0  |          gPID=3
> 
> we could implement it like this:
> 
> 0               0xc        0xf
> |     sTID 27    |  sTID 9  |
> 
> and switch sPID between 9 and 27 when jumping between guest user and
> guest kernel. However, if we're in the guest kernel (sTID=9 mappings)
> with sPID=9, copy_to_user() will fault because sTID=27 mappings aren't
> visible.
> 
> When those faults occur, we could create a new userspace mapping with
> sTID 9, aliasing the sTID 27 mappings. Would that work? Odds are we
> won't get more than 1 or 2 of these at a time, but we'd need to track
> these mappings so that guest updates to the sTID=27 mappings 
> also modify
> or flush our extra sTID=9 mappings. 
> 
> That might be the only complexity, and might not be that bad. I don't
> know, what do you think? If e500 can selectively flush by PID, maybe
> that's better, but 440 can't so we'd want some scheme like I 
> described.
> 

E500 cannot selectively flush by PID either... Seems e500mc could.
It's a good idea that use PID1 to map guest kernel's stid for e500,
but there is another problem that how to handle guest mapping access
privilege.

I think we should apart this problem into two to make it clear.
1). how to handle guest share mapping(tid=0).
2). how to handle guest mapping without user access privilege.
What I mean is there may be non-zero tid mapping without user access
privilege,
Although this seems don't exist in Linux.

I have two different ways in mind to solve it.

1. utilize e500 PID1 as you mentioned
We can set PID1 when enter guest kernel and clear PID1 when enter guest
userspace.

Pro.
   this method can minimize TLB flush.
Con.
   it's based on the assumption that all guest access limited mappings
always have tid=0,
   or non-zero mappings always have user access bits.
   It's fine for Linux but not sure for other OSes.

2. utilize e500 duel TLBs, put different kind of guest mappings into
different TLBs
Only chosen mappings go into TLB1, while others go into TLB0.
If a flush is inavoidable, only need to flush TLB1.
As TLB1 just has 16 entry, the flush don't cost much.
The main cost will be how many circles the flushing take, instead of
endless tlb miss after flush.

For example, to solve method 1 con.
if we meet an non-zero mapping without user access bits,
We could put it into TLB1, then when enter guest userspace, we only
flush TLB1,
while TLB0 still keep most of mappings.

If large tlb trick is applied, then large mappings have no choise but go
into TLB1.
But flushing TLB1 still doesn't cost much.
In most cases, if we flush TLB1 when enter userspace, we will meet one
tlb miss when enter in kernel again.

Pro.
    It could help to provide a universal solution. (not sure we want it)
Con.
   Need flush TLB1 under certain cases.

Moreover when guest large mapping is broken into 4K shadow mappings,
Put them into TLB0 instead of TLB1 may be helpful to reduce the tlb
miss.
As TLB1 is too small while TLB0 has a big room.

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html