Hello Fabiano, Thanks for reviewing! (answers inline) On Fri, 2021-02-05 at 10:09 -0300, Fabiano Rosas wrote: > Leonardo Bras <leobras.c@xxxxxxxxx> writes: > > > Before guest entry, TBU40 register is changed to reflect guest timebase. > > After exitting guest, the register is reverted to it's original value. > > > > If one tries to get the timestamp from host between those changes, it > > will present an incorrect value. > > > > An example would be trying to add a tracepoint in > > kvmppc_guest_entry_inject_int(), which depending on last tracepoint > > acquired could actually cause the host to crash. > > > > Save the Timebase Offset to PACA and use it on sched_clock() to always > > get the correct timestamp. > > > > Signed-off-by: Leonardo Bras <leobras.c@xxxxxxxxx> > > Suggested-by: Paul Mackerras <paulus@xxxxxxxxxx> > > --- > > Changes since v1: > > - Subtracts offset only when CONFIG_KVM_BOOK3S_HANDLER and > > CONFIG_PPC_BOOK3S_64 are defined. > > --- > > arch/powerpc/include/asm/kvm_book3s_asm.h | 1 + > > arch/powerpc/kernel/asm-offsets.c | 1 + > > arch/powerpc/kernel/time.c | 8 +++++++- > > arch/powerpc/kvm/book3s_hv.c | 2 ++ > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++ > > 5 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h > > index 078f4648ea27..e2c12a10eed2 100644 > > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h > > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h > > @@ -131,6 +131,7 @@ struct kvmppc_host_state { > > u64 cfar; > > u64 ppr; > > u64 host_fscr; > > + u64 tb_offset; /* Timebase offset: keeps correct > > timebase while on guest */ > > Couldn't you use the vc->tb_offset_applied for this? We have a reference > for the vcore in the hstate already. But it's a pointer, which means we would have to keep checking for NULL every time we need sched_clock(). Potentially it would cost a cache miss for PACA memory region that contain vc, another for getting the part of *vc that contains the tb_offset_applied, instead of only one for PACA struct region that contains tb_offset. On the other hand, it got me thinking: If the offset is applied per cpu, why don't we get this info only in PACA, instead of in vc? It could be a general way to get an offset applied for any purpose and still get the sched_clock() right. (Not that I have any idea of any other purpose we could use it) Best regards! Leonardo Bras > > > #endif > > }; > > > > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c > > index b12d7c049bfe..0beb8fdc6352 100644 > > --- a/arch/powerpc/kernel/asm-offsets.c > > +++ b/arch/powerpc/kernel/asm-offsets.c > > @@ -706,6 +706,7 @@ int main(void) > > HSTATE_FIELD(HSTATE_CFAR, cfar); > > HSTATE_FIELD(HSTATE_PPR, ppr); > > HSTATE_FIELD(HSTATE_HOST_FSCR, host_fscr); > > + HSTATE_FIELD(HSTATE_TB_OFFSET, tb_offset); > > #endif /* CONFIG_PPC_BOOK3S_64 */ > > > > #else /* CONFIG_PPC_BOOK3S */ > > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c > > index 67feb3524460..f27f0163792b 100644 > > --- a/arch/powerpc/kernel/time.c > > +++ b/arch/powerpc/kernel/time.c > > @@ -699,7 +699,13 @@ EXPORT_SYMBOL_GPL(tb_to_ns); > > */ > > notrace unsigned long long sched_clock(void) > > { > > - return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift; > > + u64 tb = get_tb() - boot_tb; > > + > > +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_HANDLER) > > + tb -= local_paca->kvm_hstate.tb_offset; > > +#endif > > + > > + return mulhdu(tb, tb_to_ns_scale) << tb_to_ns_shift; > > } > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > > index b3731572295e..c08593c63353 100644 > > --- a/arch/powerpc/kvm/book3s_hv.c > > +++ b/arch/powerpc/kvm/book3s_hv.c > > @@ -3491,6 +3491,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > > if ((tb & 0xffffff) < (new_tb & 0xffffff)) > > mtspr(SPRN_TBU40, new_tb + 0x1000000); > > vc->tb_offset_applied = vc->tb_offset; > > + local_paca->kvm_hstate.tb_offset = vc->tb_offset; > > } > > > > if (vc->pcr) > > @@ -3594,6 +3595,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > > if ((tb & 0xffffff) < (new_tb & 0xffffff)) > > mtspr(SPRN_TBU40, new_tb + 0x1000000); > > vc->tb_offset_applied = 0; > > + local_paca->kvm_hstate.tb_offset = 0; > > } > > > > mtspr(SPRN_HDEC, 0x7fffffff); > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > index b73140607875..8f7a9f7f4ee6 100644 > > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > @@ -632,6 +632,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300) > > cmpdi r8,0 > > beq 37f > > std r8, VCORE_TB_OFFSET_APPL(r5) > > + std r8, HSTATE_TB_OFFSET(r13) > > mftb r6 /* current host timebase */ > > add r8,r8,r6 > > mtspr SPRN_TBU40,r8 /* update upper 40 bits */ > > @@ -1907,6 +1908,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > > beq 17f > > li r0, 0 > > std r0, VCORE_TB_OFFSET_APPL(r5) > > + std r0, HSTATE_TB_OFFSET(r13) > > mftb r6 /* current guest timebase */ > > subf r8,r8,r6 > > mtspr SPRN_TBU40,r8 /* update upper 40 bits */