On Thu, 18 Nov 2021, Juergen Gross wrote: > On 18.11.21 09:47, Jan Beulich wrote: > > On 18.11.2021 06:32, Juergen Gross wrote: > > > On 18.11.21 03:37, Stefano Stabellini wrote: > > > > --- a/drivers/xen/xenbus/xenbus_probe.c > > > > +++ b/drivers/xen/xenbus/xenbus_probe.c > > > > @@ -951,6 +951,28 @@ static int __init xenbus_init(void) > > > > err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v); > > > > if (err) > > > > goto out_error; > > > > + /* > > > > + * Return error on an invalid value. > > > > + * > > > > + * Uninitialized hvm_params are zero and return no error. > > > > + * Although it is theoretically possible to have > > > > + * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it > > > > is > > > > + * not zero when valid. If zero, it means that Xenstore hasn't > > > > + * been properly initialized. Instead of attempting to map a > > > > + * wrong guest physical address return error. > > > > + */ > > > > + if (v == 0) { > > > > > > Make this "if (v == ULONG_MAX || v== 0)" instead? > > > This would result in the same err on a new and an old hypervisor > > > (assuming we switch the hypervisor to init params with ~0UL). Sure, I can do that > > > > + err = -ENOENT; > > > > + goto out_error; > > > > + } > > > > + /* > > > > + * ULONG_MAX is invalid on 64-bit because is INVALID_PFN. > > > > + * On 32-bit return error to avoid truncation. > > > > + */ > > > > + if (v >= ULONG_MAX) { > > > > + err = -EINVAL; > > > > + goto out_error; > > > > + } > > > > > > Does it make sense to continue the system running in case of > > > truncation? This would be a 32-bit guest with more than 16TB of RAM > > > and the Xen tools decided to place the Xenstore ring page above the > > > 16TB boundary. This is a completely insane scenario IMO. > > > > > > A proper panic() in this case would make diagnosis of that much > > > easier (me having doubts that this will ever be hit, though). > > > > While I agree panic() may be an option here (albeit I'm not sure why > > that would be better than trying to cope with 0 and hence without > > I could imagine someone wanting to run a guest without Xenstore access, > which BTW will happen in case of a guest created by the hypervisor at > boot time. > > > xenbus), I'd like to point out that the amount of RAM assigned to a > > guest is unrelated to the choice of GFNs for the various "magic" > > items. > > Yes, but this would still be a major tools problem which probably > would render the whole guest rather unusable. First let's distinguish between an error due to "hvm_param not initialized" and an error due to more serious conditions, such as "pfn above max". "hvm_param not initialized" could mean v == 0 (as it would be today) or v == ~0UL (if we change Xen to initialize all hvm_param to ~0UL). I don't think we want to panic in these cases as they are not actually true erroneous configurations. We should just stop trying to initialize xenstore and continue with the rest. The "pfn above max" case could happen if v is greater than the max pfn. This is a true error in the configuration because the toolstack should know that the guest is 32-bit so it should give it a pfn that the guest is able to use. As Jan wrote in another email, for 32-bit the actual limit depends on the physical address bits but actually Linux has never been able to cope with a pfn > ULONG_MAX on 32-bit because xen_store_gfn is defined as unsigned long. So Linux 32-bit has been truncating HVM_PARAM_STORE_PFN all along. There is also an argument that depending on kconfig Linux 32-bit might only be able to handle addresses < 4G, so I don't think the toolstack can assume that a 32-bit guest is able to cope with HVM_PARAM_STORE_PFN > ULONG_MAX. If Linux is 32-bit and HVM_PARAM_STORE_PFN > ULONG_MAX, even if HVM_PARAM_STORE_PFN < address_bits_max I think it would be fair to still consider it an error, but I can see it could be argued either way. Certainly if HVM_PARAM_STORE_PFN > address_bits_max is an error. In any case, I think it is still better for Linux to stop trying to initialize Xenstore but continue with the rest because there is a bunch of other useful things Linux can do without it. Panic should only be the last resort if there is nothing else to do. In this case we haven't even initialized the service and the service is not essential, at least it is not essential in certain ARM setups. So in conclusion, I think this patch should: - if v == 0 return error (uninitialized) - if v == ~0ULL (INVALID_PFN) return error (uinitialized) - if v >= ~0UL (32-bit) return error (even if this case could be made to work for v < max_address_bits depending on kconfig) Which leads to something like: /* uninitialized */ if (v == 0 || v == ~0ULL) { err = -ENOENT; goto out_error; } /* * Avoid truncation on 32-bit. * TODO: handle addresses >= 4G */ if ( v >= ~0UL ) { err = -EINVAL; goto out_error; }