On 09/02/17 14:51, David Gibson wrote: > On Tue, Feb 07, 2017 at 06:17:09PM +1100, Alexey Kardashevskiy wrote: >> For the emulated devices it does not matter much if we get a broken TCE >> half way handling a TCE list but for VFIO it will matter as it has >> more chances to fail so we try to do our best and check as much as we >> can before proceeding. >> >> This separates a guest view table update from validation. No change in >> behavior is expected. >> >> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >> --- >> arch/powerpc/kvm/book3s_64_vio.c | 8 ++++++++ >> arch/powerpc/kvm/book3s_64_vio_hv.c | 8 ++++++-- >> 2 files changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c >> index 15df8ae627d9..9a7b7fca5e84 100644 >> --- a/arch/powerpc/kvm/book3s_64_vio.c >> +++ b/arch/powerpc/kvm/book3s_64_vio.c >> @@ -282,6 +282,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, >> ret = kvmppc_tce_validate(stt, tce); >> if (ret != H_SUCCESS) >> goto unlock_exit; >> + } >> + >> + for (i = 0; i < npages; ++i) { >> + if (get_user(tce, tces + i)) { >> + ret = H_TOO_HARD; >> + goto unlock_exit; >> + } >> + tce = be64_to_cpu(tce); > > This doesn't look safe. The contents of user memory could change > between the two get_user()s, meaning that you're no longer guaranteed > a TCE loaded into kernel has been validated at all. > > I think you need to either: > > a) Make sure things safe against a bad TCE being loaded into a TCE > table and move all validation to where the TCE is used, rather > than loaded > > or > b) Copy the whole set of indirect entries to a temporary in-kernel > buffer, then validate, then load into the actual TCE table. Correct :( The problem is I do not know how far I want to go in reverting the state as it was when I started handling H_PUT_TCE_INDIRECT. For example, 1 container, 2 IOMMU groups with disabled shared tables, so - 2 tables, 512 TCEs request and TCE#100 does not translate to host physical address. To do a) I'll need to remember old content of each hardware table entry as when I reach TCE#100, I'll need to revert to the initial state which means I need to write back old TCEs to all affected hardware tables and update reference counters of all affected preregistered areas. Well, the actual tables must not have different addresses (BUG_ON? is it worth testing while writing to hardware tables that values I am replacing are the same in all tables?) so I can have just a single array of old TCEs from hardware tables in vcpu. To do b) I'll need: 1. to have a copy of TCEs from the guest in vcpu, I populate it via get_user() to make sure they won't change; 2. an array of userspace addresses translated from given TCEs; and in order to make sure these addresses won't go away, I'll need to reference each preregistered memory area via mm_iommu_mapped_inc(). When I reach TCE#100, I'll have to revert the change, i.e. call mm_iommu_mapped_dec(). So I will end up having 2 arrays in a vcpu and simpler reverting code. Or I can do simpler version of b) which would store guest TCEs in kvm_vcpu_arch::tces[512] and use them after checking. If a malicious guest does something bad and I return from H_PUT_TCE_INDIRECT in a middle of request, some preregistered regions will stay referenced till the guest is killed or rebooted (and this will prevent memory from unregistering) - but this means no harm to the host; and with preregistered RAM, there is no valid reason for H_PUT_TCE_INDIRECT to fail for a good guest. Which approach to pick? LoPAPR says: === If the TCE parameter represents the logical page address of a page that is not valid for the calling partition, return H_Parameter. === >> >> kvmppc_tce_put(stt, entry + i, tce); >> } >> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c >> index 918af76ab2b6..f8a54b7c788e 100644 >> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c >> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c >> @@ -237,7 +237,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, >> { >> struct kvmppc_spapr_tce_table *stt; >> long i, ret = H_SUCCESS; >> - unsigned long tces, entry, ua = 0; >> + unsigned long tces, entry, tce, ua = 0; >> unsigned long *rmap = NULL; >> >> stt = kvmppc_find_table(vcpu->kvm, liobn); >> @@ -279,11 +279,15 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, >> } >> >> for (i = 0; i < npages; ++i) { >> - unsigned long tce = be64_to_cpu(((u64 *)tces)[i]); >> + tce = be64_to_cpu(((u64 *)tces)[i]); >> >> ret = kvmppc_tce_validate(stt, tce); >> if (ret != H_SUCCESS) >> goto unlock_exit; >> + } >> + >> + for (i = 0; i < npages; ++i) { >> + tce = be64_to_cpu(((u64 *)tces)[i]); > > Same problem here. > >> >> kvmppc_tce_put(stt, entry + i, tce); >> } > -- Alexey
Attachment:
signature.asc
Description: OpenPGP digital signature