On 9/25/20 6:28 PM, Anchal Agarwal wrote: > On Fri, Sep 25, 2020 at 04:02:58PM -0400, boris.ostrovsky@xxxxxxxxxx wrote: >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> On 9/25/20 3:04 PM, Anchal Agarwal wrote: >>> On Tue, Sep 22, 2020 at 11:17:36PM +0000, Anchal Agarwal wrote: >>>> On Tue, Sep 22, 2020 at 12:18:05PM -0400, boris.ostrovsky@xxxxxxxxxx wrote: >>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>>>> >>>>> >>>>> >>>>> On 9/21/20 5:54 PM, Anchal Agarwal wrote: >>>>> Also, wrt KASLR stuff, that issue is still seen sometimes but I haven't had >>>>> bandwidth to dive deep into the issue and fix it. >> >> So what's the plan there? You first mentioned this issue early this year and judged by your response it is not clear whether you will ever spend time looking at it. >> > I do want to fix it and did do some debugging earlier this year just haven't > gotten back to it. Also, wanted to understand if the issue is a blocker to this > series? Integrating code with known bugs is less than ideal. 3% failure for this feature seems to be a manageable number from the reproducability perspective --- you should be able to script this and each iteration should take way under a minute, no? > I had some theories when debugging around this like if the random base address picked by kaslr for the > resuming kernel mismatches the suspended kernel and just jogging my memory, I didn't find that as the case. > Another hunch was if physical address of registered vcpu info at boot is different from what suspended kernel > has and that can cause CPU's to get stuck when coming online. I'd think if this were the case you'd have 100% failure rate. And we are also re-registering vcpu info on xen restore and I am not aware of any failures due to KASLR. > The issue was only > reproducible 3% of the time out of 3000 runs hence its hard to just reproduce this. > > Moreover, I also wanted to get an insight on if hibernation works correctly with KASLR > generally and its only Xen causing the issue? With KASLR being on by default I'd be surprised if it didn't. -boris