RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

Bhushan Bharat-R65777 <R65777@xxxxxxxxxxxxx> · Wed, 24 Jul 2013 09:11:27 +0000

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@xxxxxxx]
> Sent: Wednesday, July 24, 2013 1:55 PM
> To: "“tiejun.chen”"
> Cc: Bhushan Bharat-R65777; kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx list;
> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
> managed pages
> 
> 
> On 24.07.2013, at 04:26, “tiejun.chen” wrote:
> 
> > On 07/18/2013 06:27 PM, Alexander Graf wrote:
> >>
> >> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
> >>
> >>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
> >>>>
> >>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
> >>>>
> >>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
> >>>>>>
> >>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Bhushan Bharat-R65777
> >>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
> >>>>>>>> To: '" tiejun.chen "'
> >>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> >>>>>>>> agraf@xxxxxxx; Wood Scott-
> >>>>>>>> B07421
> >>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
> >>>>>>>> for kernel managed pages
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@xxxxxxxxxxxxx]
> >>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
> >>>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> >>>>>>>>> agraf@xxxxxxx; Wood
> >>>>>>>>> Scott-
> >>>>>>>>> B07421
> >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>>>>>>>> only for kernel managed pages
> >>>>>>>>>
> >>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: kvm-ppc-owner@xxxxxxxxxxxxxxx
> >>>>>>>>>>> [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On Behalf Of " tiejun.chen "
> >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
> >>>>>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> >>>>>>>>>>> agraf@xxxxxxx; Wood
> >>>>>>>>>>> Scott-
> >>>>>>>>>>> B07421
> >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>>>>>>>>>> only for kernel managed pages
> >>>>>>>>>>>
> >>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@xxxxxxxxxxxxx]
> >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
> >>>>>>>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> >>>>>>>>>>>>> agraf@xxxxxxx; Wood
> >>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
> >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
> >>>>>>>>>>>>> only for kernel managed pages
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
> >>>>>>>>>>>>>> If there is a struct page for the requested mapping then
> >>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent,
> >>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set  "I +
> >>>>>>>>>>>>>> G"  (cache inhibited,
> >>>>>>>>>>>>>> guarded)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned
> >>>>>>>>>>>>>> device
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan
> >>>>>>>>>>>>>> <bharat.bhushan@xxxxxxxxxxxxx>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>    arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++-----
> >>>>>>>>>>>>>>    1 files changed, 12 insertions(+), 5 deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
> >>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> >>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
> >>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
> >>>>>>>>>>>>> usermode)
> >>>>>>>>>>>>>>    	return mas3;
> >>>>>>>>>>>>>>    }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
> >>>>>>>>>>>>>> usermode)
> >>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
> >>>>>>>>>>>>>> +pfn_t pfn)
> >>>>>>>>>>>>>>    {
> >>>>>>>>>>>>>> +	u32 mas2_attr;
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> +	mas2_attr = mas2 & MAS2_ATTRIB_MASK;
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
> >>>>>>>>>>>>
> >>>>>>>>>>>> What I understand from this function (someone can correct
> >>>>>>>>>>>> me) is that it
> >>>>>>>>>>> returns "false" when the page is managed by kernel and is
> >>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not
> >>>>>>>>>>> matter whether the page is reserved or not, if it is kernel
> >>>>>>>>>>> visible page then it
> >>>>>>>> is DDR.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,
> >>>>>>>>>>> right? If so,
> >>>>>>>>>>>
> >>>>>>>>>>>      KVM: direct mmio pfn check
> >>>>>>>>>>>
> >>>>>>>>>>>      Userspace may specify memory slots that are backed by
> >>>>>>>>>>> mmio pages rather than
> >>>>>>>>>>>      normal RAM.  In some cases it is not enough to identify
> >>>>>>>>>>> these mmio
> >>>>>>>>> pages
> >>>>>>>>>>>      by pfn_valid().  This patch adds checking the PageReserved as
> well.
> >>>>>>>>>>
> >>>>>>>>>> Do you know what are those "some cases" and how checking
> >>>>>>>>>> PageReserved helps in
> >>>>>>>>> those cases?
> >>>>>>>>>
> >>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this
> >>>>>>>>> should be chronically persistent as I understand ;-)
> >>>>>>>>
> >>>>>>>> Then I will wait till someone educate me :)
> >>>>>>>
> >>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
> not want to call this for all tlbwe operation unless it is necessary.
> >>>>>>
> >>>>>> It certainly does more than we need and potentially slows down the fast
> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to
> check for pages that are declared reserved on the host. This happens in 2 cases:
> >>>>>>
> >>>>>>   1) Non cache coherent DMA
> >>>>>>   2) Memory hot remove
> >>>>>>
> >>>>>> The non coherent DMA case would be interesting, as with the mechanism as
> it is in place in Linux today, we could potentially break normal guest operation
> if we don't take it into account. However, it's Kconfig guarded by:
> >>>>>>
> >>>>>>         depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> >>>>>>         default n if PPC_47x
> >>>>>>         default y
> >>>>>>
> >>>>>> so we never hit it with any core we care about ;).
> >>>>>>
> >>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry
> about that one either.
> >>>>>
> >>>>> Thanks for this good information :)
> >>>>>
> >>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside
> kvm_is_mmio_pfn() to make sure that check is only valid when that is really
> needed? This can decrease those unnecessary performance loss.
> >>>>>
> >>>>> If I'm wrong please correct me :)
> >>>>
> >>>> You're perfectly right, but this is generic KVM code. So it gets run across
> all architectures. What if someone has the great idea to add a new case here for
> x86, but doesn't tell us? In that case we potentially break x86.
> >>>>
> >>>> I'd rather not like to break x86 :).
> >>>>
> >>>> However, it'd be very interesting to see a benchmark with this change. Do
> you think you could just rip out the whole reserved check and run a few
> benchmarks and show us the results?
> >>>>
> >>>
> >>> Often what case should be adopted to validate this scenario?
> >>
> >> Something which hammers the TLB emulation heavily. I usually just run
> >> /bin/echo a thousand times in "time" and see how long it takes ;)
> >>
> >
> > I tried to run five times with this combination, "time `for ((i=0; i<5000;
> i++));  do /bin/echo; done`", to calculate the average value with this change:
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > 1580dd4..5e8635b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> 
> I'd feel safer if we narrow this down to e500.
> 
> > +       /*
> > +        * Currently only in memory hot remove case we may still need this.
> > +        */
> >        if (pfn_valid(pfn)) {
> 
> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> 
> >                int reserved;
> >                struct page *tail = pfn_to_page(pfn); @@ -124,6 +128,7
> > @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >                }
> >                return PageReserved(tail);
> >        }
> > +#endif
> >
> >        return true;
> > }
> >
> > Before apply this change:
> >
> > real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5=
> 1m21.376s
> > user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5=
> 0m23.433s
> > sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >
> > After apply this change:
> >
> > real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5=
> 1m20.667s
> > user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5=
> 0m22.615s
> > sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >
> > So,
> >
> > real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> 
> Very nice, so there is a real world performance benefit to doing this. Then yes,
> I think it would make sense to change the global helper function to be fast on
> e500 and use that one from e500_shadow_mas2_attrib() instead.

Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as Scott commented?

-Bharat

> 
> Gleb, Paolo, any hard feelings?
> 
> 
> Alex
> 

��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�