Re: [RFC PATCH 2/2] tracing/user_events: Fixup enable faults asyncly

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Sun, 30 Oct 2022 07:45:33 -0400

On 2022-10-29 10:40, Mathieu Desnoyers wrote:

On 2022-10-28 18:42, Beau Belgrave wrote:

On Fri, Oct 28, 2022 at 06:19:10PM -0400, Mathieu Desnoyers wrote:

On 2022-10-27 18:40, Beau Belgrave wrote:

When events are enabled within the various tracing facilities, such as
ftrace/perf, the event_mutex is held. As events are enabled pages are
accessed. We do not want page faults to occur under this lock. Instead
queue the fault to a workqueue to be handled in a process context safe
way without the lock.

The enable address is disabled while the async fault-in occurs. This
ensures that we don't attempt fault-in more than is necessary. Once the
page has been faulted in, the address write is attempted again. If the
page couldn't fault-in, then we wait until the next time the event is
enabled to prevent any potential infinite loops.

I'm also unclear about how the system call initiating the enabled state
change is delayed (or not) when a page fault is queued.

It's not, if needed we could call schedule_delayed_work(). However, I
don't think we need it. When pin_user_pages_remote is invoked, it's with
FOLL_NOFAULT. This will tell us if we need to fault pages in, we then
call fixup_user_fault with unlocked value passed. This will cause the
fixup to retry and get the page in.

It's called out in the comments for this exact purpose (lucked out
here):
mm/gup.c

  * This is meant to be called in the specific scenario where for 

locking reasons

  * we try to access user memory in atomic context (within a 

pagefault_disable()

  * section), this returns -EFAULT, and we want to resolve the user 

fault before

  * trying again.

The fault in happens in a workqueue, this is the same way KVM does it's
async page fault logic, so it's not a new pattern. As soon as the
fault-in is done, we have a page we should be able to use, so we
re-attempt the write immediately. If the write fails, another queue
happens and we could loop, but not like the unmap() case I replied with
earlier.

I would expect that when a page fault is needed, after enqueuing work 

to the

worker thread, the system call initiating the state change would somehow
wait for a completion (after releasing the user events mutex). That

completion would be signaled by the worker thread either if the page 

fault

fails, or if the state change is done.

I didn't see a generic fault-in + notify anywhere, do you know of one I
could use? Otherwise, it seems the pattern used is check fault, fault-in
via workqueue, re-attempt.

I don't think we're talking about the same thing. I'll try stating my 

concern in a different way.

user_event_enabler_write() calls user_event_enabler_queue_fault() when 

it cannot perform the enabled state update immediately (because a page 

fault is needed).

But then AFAIU it returns immediately to the caller. The caller could 

very well expect that the event has been enabled, as requested, 

immediately when the enabler write returns. The fact that enabling the 

event can be delayed for an arbitrary amount of time due to page faults 

means that we have no hard guarantee that the event is enabled as 

requested upon return to the caller.

I'd like to add a completion there, to be waited for after 

user_event_enabler_queue_fault(), but before returning from 

user_event_enabler_create(). Waiting for the completion should be done 

without any mutexes held, so after releasing event_mutex.

The completion would be placed on the stack of 

user_event_enabler_create(), and a reference to the completion would be 

placed in the queued fault request. The completion notification would be 

emitted by the worker thread either when enabling is done, or if a page 

fault fails.

See include/linux/completion.h

Thoughts ?

Actually, after further thinking, I wonder if we need a worker thread at 

all to perform the page faults.

Could we simply handle the page faults from user_event_enabler_create() 

by releasing the mutex and re-trying ? From what contexts is 

user_event_enabler_create() called ? (any locks taken ? system call 

context ? irqs and preemption enabled or disabled ?)

Thanks,

Mathieu

Thanks,

Mathieu

Thoughts ?

Thanks,

Mathieu

Thanks,
-Beau

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com