On Thu, Apr 04, 2019 at 08:33:51PM +0100, Andrew Murray wrote: > On Thu, Apr 04, 2019 at 05:21:28PM +0100, Will Deacon wrote: > > On Thu, Mar 28, 2019 at 10:37:31AM +0000, Andrew Murray wrote: > > > +exclude_kernel > > > +-------------- > > > + > > > +This attribute excludes the kernel. > > > + > > > +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run > > > +at EL1. > > > + > > > +This attribute will exclude EL1 and additionally EL2 on a VHE system. > > > > I find this last sentence a bit confusing, because it can be read to imply > > that if you don't set exclude_kernel and you're in a guest on a VHE system, > > then you can profile EL2. > > Yes this could be misleading. > > However from the perspective of the guest, when exclude_kernel is not set we > do indeed allow the guest to program it's PMU with ARMV8_PMU_INCLUDE_EL2 - and > thus the statement above is correct in terms of what the kernel believes it is > doing. > > I think these statements are less confusing if we treat the exception levels > as those 'detected' by the running context (e.g. consider the impact of nested > virt here) - and we if ignore what the hypervisor (KVM) does outside (e.g. > stops counting upon switching between guest/host, translating PMU filters in > kvm_pmu_set_counter_event_type etc, etc). This then makes this document useful > for those wishing to change this logic (which is the intent) rather than those > trying to understand how we filter for EL levels as seen bare-metal. > > With regards to the example you gave (exclude_kernel, EL2) - yes we want the > kernel to believe it can count EL2 - because one day we may want to update > KVM to allow the guest to count it's hypervisor overhead (e.g. host kernel > time associated with the guest). If we were to support this in the future, then exclude_hv will suddenly start meaning something in a guest, so this could be considered to be an ABI break. > I could write some preface that describes this outlook. Alternatively I could > just spell out what happens on a guest, e.g. > > "For the host this attribute will exclude EL1 and additionally EL2 on a VHE > system. > > For the guest this attribute will exclude EL1." > > Though I'm less comfortable with this, as the last statement "For the guest this > attribute will exclude EL1." describes the product of both > kvm_pmu_set_counter_event_type and armv8pmu_set_event_filter which is confusing > to work out and also makes an assumption that we don't have nested virt (true > for now at least) and also reasons about bare-metal EL levels which probably > aren't that useful for someone changing this logic or understanding what the > flags do for there performance analysis. > > Do you have a preference for how this is improved? I think you should be explicit about what is counted. If we don't count EL2 when profiling in a guest (regardless of the exclude_*) flags, then we should say that. By not documenting this we don't actually buy ourselves room to change things in future, we should have an emergent behaviour which isn't covered by our docs. Will _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm