On Fri, May 25, 2018 at 10:45:17AM +0100, Dave Martin wrote: > On Fri, May 25, 2018 at 11:00:20AM +0200, Christoffer Dall wrote: > > On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote: > > > On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote: > > > > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote: > > [...] > > > > > I'm not sure what the reader is to make of that. Do you not mean the > > > > TIF_FOREIGN_FPSTATE is always true for kernel threads? > > > > > > Again, this is probably a red herring. TIF_FOREIGN_FPSTATE is always > > > true for kernel threads prior to the patch, except (randomly) for the > > > init task. > > > > That was really what my initial question was about, and what I thought > > the commit message should make abundantly clear, because that ties the > > message together with the code. > > > > > > > > This change is not really about TIF_FOREIGN_FPSTATE at all, rather > > > that there is nothing to justify handling kernel threads differently, > > > or even distinguishing kernel threads from user threads at all in this > > > code. > > > > Understood. > > And my bad was that I hadn't gone to the effort of understanding my own > argument -- I'd glad to be called out on that. > > > > Part of the confusion (and I had confused myself) comes from the fact > > > that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make > > > sense as a per-task property -- i.e., the flag is meaningless for > > > scheduled-out tasks and we must explicitly "repair" it when scheduling > > > a task in anyway. I think it's a thread flag primarily so that it's > > > convenient to check alongside other thread flags in the ret_to_user > > > work loop. This is somewhat less of a justification now that loop was > > > ported to C. > > > > > > > > > > > > > The context switch logic is already deliberately optimised to defer > > > > > reloads of the regs until ret_to_user (or sigreturn as a special > > > > > case), and save them only if they have been previously loaded. > > > > > > Does it help to insert the following here? > > > > > > "These paths are the only places where the wrong_task and wrong_cpu > > > conditions can be made false, by calling fpsimd_bind_task_to_cpu()." > > > > > > > yes it does. > > > > > > > Kernel threads by definition never reach these paths. As a result, > > > > > > > > I'm struggling with the "As a result," here. Is this because reloads of > > > > regs in ret_to_user (or sigreturn) are the only places that can make > > > > wrong_cpu or wrong_task be false? > > > > > > See the proposed clarification above. Is that sufficient? > > > > > > > yes. > > > > > > (I'm actually wanting to understand this, not just bikeshedding the > > > > commit message, as new corner cases keep coming up on this logic.) > > > > > > That's a good thing, and I would really like to explain it in a > > > concise manner. See [*] below for the "concise" explanation -- it may > > > demonstrate why I've been evasive... > > > > > > > I don't think you've been evasive at all, I just think we reason about > > this in slightly different ways, and I was trying to convince myself why > > this change is safe and summarize that concisely. I think we've > > accomplished both :) > > OK, good. I reposted speculatively on this basis :) > > The commit message is in better shape now, and I very much appreciate > you kicking the tyres on my reasoning! > > [...] > > > > As an aside, the big wall of text before the definition of struct > > > fpsimd_last_state_struct is looking out of date and could use an > > > update to cover at least some of what is explained in [*] better. > > > > > > I'm currently considering that out of scope for this series, but I will > > > keep it in mind to refresh it in the not too distant future. > > > > > > > Fine with me. > > OK, good. > > [...] > > > > [*] The bigger picture: > > > > > > * Consider a relation (C,T) between cpus C and tasks T, such that > > [...] > > > > but by assuming that the code is already well-optimised, "unnecessary" > > > save/restore work will not be added. If this were not the case, it > > > could in any case be fixed independently. > > > > > > The observation of this _series_ is that we don't need to do very > > > much in order to be able to generalise the logic to accept KVM vcpus > > > in place of T. > > > > > > > Thanks for the explanation. > > -Christoffer > > Was this reasonably understandable? If so I could use it as a basis for > improving the comment block in fpsimd.c, but I'd want to squash it down > to the essentials. It's pretty verbose as it stands. Yes, I think that's a resonable way forward. The thing that I hadn't fully appreciated before is that you may have a valid relation (C,T) which you wish to invalidate whilst T may not be running on C at that particular time. > > (What I'd really like to do it take an axe to the logic so that we > end up with something that doesn't require anything like this amount > of explanation ... but that's more of an aspiration right now.) > I'll be happy to review a potentially simplified design, should you come up with one at some point in the future. Thanks, -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm