> -----Original Message----- > From: Alexander Graf [mailto:agraf@xxxxxxx] > Sent: Wednesday, July 24, 2013 1:55 PM > To: "“tiejun.chen”" > Cc: Bhushan Bharat-R65777; kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx list; > Wood Scott-B07421; Gleb Natapov; Paolo Bonzini > Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel > managed pages > > > On 24.07.2013, at 04:26, “tiejun.chen” wrote: > > > On 07/18/2013 06:27 PM, Alexander Graf wrote: > >> > >> On 18.07.2013, at 12:19, “tiejun.chen” wrote: > >> > >>> On 07/18/2013 06:12 PM, Alexander Graf wrote: > >>>> > >>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote: > >>>> > >>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote: > >>>>>> > >>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Bhushan Bharat-R65777 > >>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM > >>>>>>>> To: '" tiejun.chen "' > >>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > >>>>>>>> agraf@xxxxxxx; Wood Scott- > >>>>>>>> B07421 > >>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only > >>>>>>>> for kernel managed pages > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@xxxxxxxxxxxxx] > >>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM > >>>>>>>>> To: Bhushan Bharat-R65777 > >>>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > >>>>>>>>> agraf@xxxxxxx; Wood > >>>>>>>>> Scott- > >>>>>>>>> B07421 > >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency > >>>>>>>>> only for kernel managed pages > >>>>>>>>> > >>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> -----Original Message----- > >>>>>>>>>>> From: kvm-ppc-owner@xxxxxxxxxxxxxxx > >>>>>>>>>>> [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On Behalf Of " tiejun.chen " > >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM > >>>>>>>>>>> To: Bhushan Bharat-R65777 > >>>>>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > >>>>>>>>>>> agraf@xxxxxxx; Wood > >>>>>>>>>>> Scott- > >>>>>>>>>>> B07421 > >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency > >>>>>>>>>>> only for kernel managed pages > >>>>>>>>>>> > >>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@xxxxxxxxxxxxx] > >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM > >>>>>>>>>>>>> To: Bhushan Bharat-R65777 > >>>>>>>>>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > >>>>>>>>>>>>> agraf@xxxxxxx; Wood > >>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777 > >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency > >>>>>>>>>>>>> only for kernel managed pages > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote: > >>>>>>>>>>>>>> If there is a struct page for the requested mapping then > >>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent, > >>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set "I + > >>>>>>>>>>>>>> G" (cache inhibited, > >>>>>>>>>>>>>> guarded) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned > >>>>>>>>>>>>>> device > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan > >>>>>>>>>>>>>> <bharat.bhushan@xxxxxxxxxxxxx> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> arch/powerpc/kvm/e500_mmu_host.c | 17 ++++++++++++----- > >>>>>>>>>>>>>> 1 files changed, 12 insertions(+), 5 deletions(-) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c > >>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c > >>>>>>>>>>>>>> index 1c6a9d7..089c227 100644 > >>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c > >>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c > >>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32 > >>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int > >>>>>>>>>>>>> usermode) > >>>>>>>>>>>>>> return mas3; > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int > >>>>>>>>>>>>>> usermode) > >>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, > >>>>>>>>>>>>>> +pfn_t pfn) > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> + u32 mas2_attr; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + mas2_attr = mas2 & MAS2_ATTRIB_MASK; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + if (!pfn_valid(pfn)) { > >>>>>>>>>>>>> > >>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()? > >>>>>>>>>>>> > >>>>>>>>>>>> What I understand from this function (someone can correct > >>>>>>>>>>>> me) is that it > >>>>>>>>>>> returns "false" when the page is managed by kernel and is > >>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not > >>>>>>>>>>> matter whether the page is reserved or not, if it is kernel > >>>>>>>>>>> visible page then it > >>>>>>>> is DDR. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, > >>>>>>>>>>> right? If so, > >>>>>>>>>>> > >>>>>>>>>>> KVM: direct mmio pfn check > >>>>>>>>>>> > >>>>>>>>>>> Userspace may specify memory slots that are backed by > >>>>>>>>>>> mmio pages rather than > >>>>>>>>>>> normal RAM. In some cases it is not enough to identify > >>>>>>>>>>> these mmio > >>>>>>>>> pages > >>>>>>>>>>> by pfn_valid(). This patch adds checking the PageReserved as > well. > >>>>>>>>>> > >>>>>>>>>> Do you know what are those "some cases" and how checking > >>>>>>>>>> PageReserved helps in > >>>>>>>>> those cases? > >>>>>>>>> > >>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this > >>>>>>>>> should be chronically persistent as I understand ;-) > >>>>>>>> > >>>>>>>> Then I will wait till someone educate me :) > >>>>>>> > >>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do > not want to call this for all tlbwe operation unless it is necessary. > >>>>>> > >>>>>> It certainly does more than we need and potentially slows down the fast > path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to > check for pages that are declared reserved on the host. This happens in 2 cases: > >>>>>> > >>>>>> 1) Non cache coherent DMA > >>>>>> 2) Memory hot remove > >>>>>> > >>>>>> The non coherent DMA case would be interesting, as with the mechanism as > it is in place in Linux today, we could potentially break normal guest operation > if we don't take it into account. However, it's Kconfig guarded by: > >>>>>> > >>>>>> depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON > >>>>>> default n if PPC_47x > >>>>>> default y > >>>>>> > >>>>>> so we never hit it with any core we care about ;). > >>>>>> > >>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry > about that one either. > >>>>> > >>>>> Thanks for this good information :) > >>>>> > >>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside > kvm_is_mmio_pfn() to make sure that check is only valid when that is really > needed? This can decrease those unnecessary performance loss. > >>>>> > >>>>> If I'm wrong please correct me :) > >>>> > >>>> You're perfectly right, but this is generic KVM code. So it gets run across > all architectures. What if someone has the great idea to add a new case here for > x86, but doesn't tell us? In that case we potentially break x86. > >>>> > >>>> I'd rather not like to break x86 :). > >>>> > >>>> However, it'd be very interesting to see a benchmark with this change. Do > you think you could just rip out the whole reserved check and run a few > benchmarks and show us the results? > >>>> > >>> > >>> Often what case should be adopted to validate this scenario? > >> > >> Something which hammers the TLB emulation heavily. I usually just run > >> /bin/echo a thousand times in "time" and see how long it takes ;) > >> > > > > I tried to run five times with this combination, "time `for ((i=0; i<5000; > i++)); do /bin/echo; done`", to calculate the average value with this change: > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > > 1580dd4..5e8635b 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -102,6 +102,10 @@ static bool largepages_enabled = true; > > > > bool kvm_is_mmio_pfn(pfn_t pfn) > > { > > +#ifdef CONFIG_MEMORY_HOTPLUG > > I'd feel safer if we narrow this down to e500. > > > + /* > > + * Currently only in memory hot remove case we may still need this. > > + */ > > if (pfn_valid(pfn)) { > > We still have to check for pfn_valid, no? So the #ifdef should be down here. > > > int reserved; > > struct page *tail = pfn_to_page(pfn); @@ -124,6 +128,7 > > @@ bool kvm_is_mmio_pfn(pfn_t pfn) > > } > > return PageReserved(tail); > > } > > +#endif > > > > return true; > > } > > > > Before apply this change: > > > > real (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= > 1m21.376s > > user (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= > 0m23.433s > > sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s > > > > After apply this change: > > > > real (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= > 1m20.667s > > user (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= > 0m22.615s > > sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s > > > > So, > > > > real (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6% > > user (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5% > > sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7% > > Very nice, so there is a real world performance benefit to doing this. Then yes, > I think it would make sense to change the global helper function to be fast on > e500 and use that one from e500_shadow_mas2_attrib() instead. Are not we going to use page_is_ram() from e500_shadow_mas2_attrib() as Scott commented? -Bharat > > Gleb, Paolo, any hard feelings? > > > Alex > ��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�