On Thu, May 06, 2021 at 03:45:21PM +1000, Nicholas Piggin wrote: > Excerpts from Bharata B Rao's message of May 6, 2021 1:46 am: > > > > +static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu, > > + unsigned long id, unsigned long target, > > + unsigned long type, unsigned long pg_sizes, > > + unsigned long start, unsigned long end) > > +{ > > + unsigned long psize; > > + struct mmu_psize_def *def; > > + > > + if (!kvm_is_radix(vcpu->kvm)) > > + return H_UNSUPPORTED; > > + > > + if (end < start) > > + return H_P5; > > + > > + /* > > + * Partition-scoped invalidation for nested guests. > > + * Not yet supported > > + */ > > + if (type & H_RPTI_TYPE_NESTED) > > + return H_P3; > > + > > + /* > > + * Process-scoped invalidation for L1 guests. > > + */ > > + for (psize = 0; psize < MMU_PAGE_COUNT; psize++) { > > + def = &mmu_psize_defs[psize]; > > + if (!(pg_sizes & def->h_rpt_pgsize)) > > + continue; > > Not that it really matters but why did you go this approach rather than > use a bitmask iteration over h_rpt_pgsize? If you are asking why I am not just looping over the hcall argument @pg_sizes bitmask then, I was doing that in my earlier version. But David suggested that it would be good to have page size encodings of H_RPT_INVALIDATE within mmu_pgsize_defs[]. Based on this, I am populating mmu_pgsize_defs[] during radix page size initialization and using that here to check for those page sizes that have been set in @pg_sizes. > > I would actually prefer to put this loop into the TLB invalidation code > itself. Yes, I could easily move it there. > > The reason is that not all flush types are based on page size. You only > need to do IS=1/2/3 flushes once and it takes out all page sizes. I see. So we have to do explicit flushing for different page sizes only if we are doing range based invalidation (IS=0). For rest of the cases (IS=1/2/3), that's not necessary. > > You don't need to do all these optimisations right now, but it would > be good to make them possible to implement. Sure. > > +void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid, > > + unsigned long type, unsigned long page_size, > > + unsigned long psize, unsigned long start, > > + unsigned long end) > > +{ > > + /* > > + * A H_RPTI_TYPE_ALL request implies RIC=3, hence > > + * do a single IS=1 based flush. > > + */ > > + if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) { > > + _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_ALL); > > + return; > > + } > > + > > + if (type & H_RPTI_TYPE_PWC) > > + _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC); > > + > > + if (start == 0 && end == -1) /* PID */ > > + _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB); > > + else /* EA */ > > + _tlbie_va_range_lpid(start, end, pid, lpid, page_size, > > + psize, false); > > At least one thing that is probably needed is to use the > single_page_flush_ceiling to flip the va range flush over to a pid > flush, so the guest can't cause problems in the hypervisor with an > enormous range. Yes, makes sense. I shall do this and the above as later optimizations. Regards, Bharata.