Re: [PATCH v4] doc: brief user documentation for completion

Ingo Molnar <mingo@xxxxxxxxxx> · Wed, 18 Feb 2015 23:47:15 +0100

* Nicholas Mc Guire <der.herr@xxxxxxx> wrote:

> Signed-off-by: Nicholas Mc Guire <der.herr@xxxxxxx>
> ---
> 
> v3: cleanups and merged review notes from Jonathan Corbet <corbet@xxxxxxx>
> v4: english cleanup and an important ommission on scope from
>     Valdis Kletniek <Valdis.Kletnieks@xxxxxx>
> 
>  Documentation/scheduler/completion.txt |  245 ++++++++++++++++++++++++++++++++
>  1 file changed, 245 insertions(+)
>  create mode 100644 Documentation/scheduler/completion.txt
> 
> diff --git a/Documentation/scheduler/completion.txt b/Documentation/scheduler/completion.txt
> new file mode 100644
> index 0000000..72cbda2
> --- /dev/null
> +++ b/Documentation/scheduler/completion.txt
> @@ -0,0 +1,245 @@
> +completions - wait for completion handling
> +==========================================
> +
> +This document was originally written based on 3.18.0 (linux-next)
> +
> +Introduction:
> +-------------
> +
> +If you have one or more threads of execution that must wait for some process
> +to have reached a point or a specific state, completions can provide a race
> +free solution to this problem. Semantically they are somewhat like a
> +pthread_barriers and have similar use-cases.
> +
> +Completions are a code synchronization mechanism that is preferable to any

s/that is/which are ?

> +misuse of locks. Any time you think of using yield() or some quirky
> +msleep(1); loop to allow something else to proceed, you probably want to
> +look into using one of the wait_for_completion*() calls instead. The
> +advantage of using completions is clear intent of the code but also more

s/code but/code, but

> +efficient code as both threads can continue until the result is actually
> +needed.
> +
> +Completions are built on top of the generic event infrastructure in Linux,
> +with the event reduced to a simple flag appropriately called "done" in
> +struct completion, that tells the waiting threads of execution if they
> +can continue safely.
> +
> +As completions are scheduling related the code is found in

s/related the/related, the

> +kernel/sched/completion.c - for details on completion design and
> +implementation see completions-design.txt
> +
> +
> +Usage:
> +------
> +
> +There are three parts to the using completions, the initialization of the

s/to the using/to using

> +struct completion, the waiting part through a call to one of the variants of
> +wait_for_completion() and the signaling side through a call to complete(),
> +or complete_all(). Further there are some helper functions for checking the
> +state of completions.
> +
> +To use completions one needs to include <linux/completion.h> and
> +create a variable of type struct completion. The structure used for
> +handling of completions is:
> +
> +	struct completion {
> +		unsigned int done;
> +		wait_queue_head_t wait;
> +	};
> +
> +providing the wait queue to place tasks on for waiting and the flag for
> +indicating the state of affairs.
> +
> +Completions should be named to convey the intent of the waiter.  A good
> +example is:
> +
> +	wait_for_completion(&early_console_added);
> +
> +	complete(&early_console_added);
> +
> +Good naming (as always) helps code readability.
> +
> +
> +Initializing completions:
> +-------------------------
> +
> +Initialization of dynamically allocated completions, often embedded in
> +other structures, is done with:
> +
> +	void init_completion(&done);
> +
> +Initialization is accomplished by initializing the wait queue and setting
> +the default state to "not available", that is, "done" is set to 0.
> +
> +The re-initialization function, reinit_completion(), simply resets the
> +done element to "not available", thus again to 0, without touching the
> +wait queue. Calling init_completion() on the same completions object is

s/completions object/completion object

> +most likely a bug as it re-initializes the queue to an empty queue and
> +enqueued tasks could get "lost" - use reinit_completion() in that case.
> +
> +For static declaration and initialization, macros are available. These are:
> +
> +	static DECLARE_COMPLETION(setup_done)
> +
> +used for static declarations in file scope. Within functions the static
> +initialization should always use:
> +
> +	DECLARE_COMPLETION_ONSTACK(setup_done)
> +
> +suitable for automatic/local variables on the stack and will make lockdep
> +happy. Note also that one needs to making *sure* the completion passt to

s/needs to making *sure*/needs to make *sure*/
s/passt/passed ?

> +work threads remains in-scope, and no references remain to on-stack data
> +when the initiating function returns.
> +
> +
> +Waiting for completions:
> +------------------------
> +
> +For a thread of execution to wait for some concurrent work to finish, it
> +calls wait_for_completion() on the initialized completion structure.
> +A typical usage scenario is:
> +
> +
> +	structure completion setup_done;
> +	init_completion(&setup_done);
> +	initialze_work(...,&setup_done,...)
> +
> +	/* run non-dependent code */              /* do setup */
> +
> +	wait_for_completion(&seupt_done);         complete(setup_done)
> +
> +This is not implying any temporal order of wait_for_completion() and the

this does not imply any temporar order on?

> +call to complete() - if the call to complete() happened before the call
> +to wait_for_completion() then the waiting side simply will continue
> +immediately as all dependencies are satisfied.
> +
> +Note that wait_for_completion() is calling spin_lock_irq/spin_unlock_irq
> +so it can only be called safely when you know that interrupts are enabled.
> +Calling it from hard-irq context will result in hard to detect spurious
> +enabling of interrupts.

It's not just about hardirq contexts, but also about 
irqs-off atomic contexts.

> +
> +
> +wait_for_completion():
> +
> +	void wait_for_completion(struct completion *done):
> +
> +The default behavior is to wait without a timeout and mark the task as
> +uninterruptible. wait_for_completion() and its variants are only safe
> +in soft-interrupt or process context but not in hard-irq context.

I don't think wait_for_completion() is safe in softirq 
context ...

> +As all variants of wait_for_completion() can (obviously) block for a long
> +time, you probably don't want to call this with held locks - see also
> +try_wait_for_completion() below.

Not 'probably': you must not call it from atomic contexts, 
held spinlocks, elevated preempt count, disabled irqs, etc.

> +
> +
> +Variants available:
> +-------------------
> +
> +The below variants all return status and this status should be checked in
> +most(/all) cases - in cases where the status is deliberately not checked you
> +probably want to make a note explaining this (e.g. see
> +arch/arm/kernel/smp.c:__cpu_up()).
> +
> +A common problem that occurs is to have unclean assignment of return types,
> +so care should be taken with assigning return-values to variables of proper
> +type. Checking for the specific meaning of return values also has been found
> +to be quite inaccurate e.g. constructs like
> +if(!wait_for_completion_interruptible_timeout(...)) would execute the same

s/if(/if (/

> +code path for successful completion and for the interrupted case - which is
> +probably not what you want.
> +
> +
> +	int wait_for_completion_interruptible(struct completion *done)
> +
> +marking the task TASK_INTERRUPTIBLE. If a signal was received while waiting.
> +It will return -ERESTARTSYS and 0 otherwise.

s/marking/This function marks

The two final sentences should probably be one?

> +
> +
> +	unsigned long wait_for_completion_timeout(struct completion *done,
> +		unsigned long timeout)
> +
> +The task is marked as TASK_UNINTERRUPTIBLE and will wait at most timeout

s/timeout/'timeout'

> +(in jiffies). If timeout occurs it return 0 else the remaining time in

s/it return 0/it returns 0,

> +jiffies (but at least 1). Timeouts are preferably passed by msecs_to_jiffies()
> +or usecs_to_jiffies(). If the returned timeout value is deliberately ignored
> +a comment should probably explain why (e.g. see drivers/mfd/wm8350-core.c
> +wm8350_read_auxadc())
> +
> +
> +	long wait_for_completion_interruptible_timeout(
> +		struct completion *done, unsigned long timeout)
> +
> +passing a timeout in jiffies and marking the task as TASK_INTERRUPTIBLE. If a

s/passing/This function passes

> +signal was received it will return -ERESTARTSYS, 0 if completion timed-out and

s/timed-out/timed out

> +the remaining time in jiffies if completion occurred.
> +
> +Further variants include _killable which passes TASK_KILLABLE as the
> +designated tasks state and will return a -ERESTARTSYS if interrupted or

s/return a -ERESTARTSYS/return -ERESTARTSYS

> +else 0 if completions was achieved as well as a _timeout variant.

s/if completions was achieved/if completion was achieved/

> +
> +	long wait_for_completion_killable(struct completion *done)
> +	long wait_for_completion_killable_timeout(struct completion *done,
> +		unsigned long timeout)
> +
> +
> +The _io variants wait_for_completion_io behave the same as the non-_io

s/wait_for_completion_io/wait_for_completion_io()

> +variants, except for accounting waiting time as waiting on IO, which has
> +an impact on how scheduling is calculated.

s/how scheduling is calculated/how the task is accounted in scheduling stats

> +
> +	void wait_for_completion_io(struct completion *done)
> +	unsigned long wait_for_completion_io_timeout(struct completion *done
> +		unsigned long timeout)
> +
> +
> +Signaling completions:
> +----------------------
> +
> +A thread of execution that wants to signal that the conditions for

s/thread of execution/thread

> +continuation have been achieved calls complete() to signal exactly one
> +of the waiters that it can continue.
> +
> +	void complete(struct completion *done)
> +
> +or calls complete_all to signal all current and future waiters.

s/complete_all/complete_all()

> +
> +	void complete_all(struct completion *done)
> +
> +
> +The signaling will work as expected even if completions are signaled before
> +a thread starts waiting. This is achieved by the waiter "consuming"
> +(decrementing) the done element of struct completion. Waiting threads
> +wakeup order is the same in which they were enqueued (FIFO order).
> +
> +If complete() is called multiple times then this will allow for that number
> +of waiters to continue - each call to complete() will simply increment the
> +done element. Calling complete_all() multiple times is a bug though. Both
> +complete() and complete_all() can be called in hard-irq context safely.
> +
> +There only can be one thread calling complete() or complete_all() on a
> +particular struct completions at any time - serialized through the wait

s/struct completions/struct completion

> +queue spinlock. Any such concurrent calls to complete() or complete_all()
> +probably are a design bug.
> +
> +Signaling completion from hard-irq context is fine as it will appropriately
> +lock with spin_lock_irqsave/spin_unlock_irqrestore.
> +
> +
> +try_wait_for_completion()/completion_done():
> +--------------------------------------------
> +
> +The try_wait_for_completion will not put the thread on the wait queue but

s/The try_wait_for_completion will/
  The try_wait_for_completion() function will/

> +rather returns false if it would need to enqueue (block) the thread, else it

s/if it would need to enqueue/if it needed to enqueue/ ?

> +consumes any posted completions and returns true.
> +
> +     bool try_wait_for_completion(struct completion *done)
> +
> +
> +Finally to check state of a completions without changing it in any way is

s/completions/completion

> +provided by completion_done() returning false if there are any posted

s/if there are any/if there is any

> +completion that was not yet consumed by waiters implying that there are
> +waiters and true otherwise;
> +
> +     bool completion_done(struct completion *done)
> +
> +Both try_wait_for_completion() and completion_done() are safe to be called in
> +hard-irq context.

irq atomic context != hard-irq context.

This needs to be fixed throughout the document.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html