Re: [patch V3 01/13] entry: Provide generic syscall entry functionality

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Mon, 20 Jul 2020 08:50:02 +0200

Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>> On Jul 19, 2020, at 3:17 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> 
>> Andy Lutomirski <luto@xxxxxxxxxx> writes:
>>>> On Sat, Jul 18, 2020 at 7:16 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>>> Andy Lutomirski <luto@xxxxxxxxxx> writes:
>>>>> FWIW, TIF_USER_RETURN_NOTIFY is a bit of an odd duck: it's an
>>>>> entry/exit word *and* a context switch word.  The latter is because
>>>>> it's logically a per-cpu flag, not a per-task flag, and the context
>>>>> switch code moves it around so it's always set on the running task.
>>>> 
>>>> Gah, I missed the context switch thing of that. That stuff is hideous.
>>> 
>>> It's also delightful because anything that screws up that dance (such
>>> as failure to do the exit-to-usermode path exactly right) likely
>>> results in an insta-root-hole.  If we fail to run user return
>>> notifiers, we can run user code with incorrect syscall MSRs, etc.
>> 
>> Looking at it deeper, having that thing in the loop is a pointless
>> exercise. This really wants to be done _after_ the loop.
>> 
> As long as we’re confident that nothing after the loop can set the flag again.

Yes, because that's the direct way off to user space.

Thanks,

        tglx