On Mon 2015-09-28 13:03:14, Tejun Heo wrote: > Hello, Petr. > > On Fri, Sep 25, 2015 at 01:26:17PM +0200, Petr Mladek wrote: > > 1) PENDING state plus -EAGAIN/busy loop cycle > > --------------------------------------------- > > > > IMHO, we want to use the timer because it is an elegant solution. > > Then we must release the lock when the timer is running. The lock > > must be taken by the timer->function(). And there is a small window > > when the timer is not longer pending but timer->function is not running: > > > > CPU0 CPU1 > > > > run_timer_softirq() > > __run_timers() > > detach_expired_timer() > > detach_timer() > > #clear_pending > > > > try_to_grab_pending_kthread_work() > > del_timer() > > # fails because not pending > > > > test_and_set_bit(KTHREAD_WORK_PENDING_BIT) > > # fails because already set > > > > if (!list_empty(&work->node)) > > # fails because still not queued > > > > !!! problematic window !!! > > > > call_timer_fn() > > queue_kthraed_work() > > Let's say each work item has a state variable which is protected by a > lock and the state can be one of IDLE, PENDING, CANCELING. Let's also > assume that all cancelers synchronize with each other via mutex, so we > only have to worry about a single canceler. Wouldn't something like > the following work while being a lot simpler? > > Delayed queueing and execution. > > 1. Lock and check whether state is IDLE. If not, nothing to do. > > 2. Set state to PENDING and schedule the timer and unlock. > > 3. On expiration, timer_fn grabs the lock and see whether state is > still PENDING. If so, schedule the work item for execution; > otherwise, nothing to do. > > 4. After dequeueing from execution queue with lock held, the worker is > marked as executing the work item and state is reset to IDLE. > > Canceling > > 1. Lock, dequeue and set the state to CANCELING. > > 2. Unlock and perform del_timer_sync(). > > 3. Flush the work item. > > 4. Lock and reset the state to IDLE and unlock. > > > > 2) CANCEL state plus custom waitqueue > > ------------------------------------- > > > > cancel_kthread_work_sync() has to wait for the running work. It might take > > quite some time. Therefore we could not block others by a spinlock. > > Also others could not wait for the spin lock in a busy wait. > > Hmmm? Cancelers can synchronize amongst them using a mutex and the > actual work item wait can use flushing. > > > IMHO, the proposed and rather complex solutions are needed in both cases. > > > > Or did I miss a possible trick, please? > > I probably have missed something in the above and it is not completley > correct but I do think it can be way simpler than how workqueue does > it. I have played with this idea and it opens a can of worms with locking problems and it looks even more complicated. Let me show this on a snippet of code: struct kthread_worker { spinlock_t lock; struct list_head work_list; struct kthread_work *current_work; }; enum { KTHREAD_WORK_IDLE, KTHREAD_WORK_PENDING, KTHREAD_WORK_CANCELING, }; struct kthread_work { unsigned int flags; spinlock_t lock; struct list_head node; kthread_work_func_t func; struct kthread_worker *worker; }; /* the main kthread worker cycle */ int kthread_worker_fn(void *worker_ptr) { struct kthread_worker *worker = worker_ptr; struct kthread_work *work; repeat: work = NULL; spin_lock_irq(&worker->lock); if (!list_empty(&worker->work_list)) { work = list_first_entry(&worker->work_list, struct kthread_work, node); spin_lock(&work->lock); list_del_init(&work->node); work->flags = KTHREAD_WORK_IDLE; spin_unlock(&work->lock); } worker->current_work = work; spin_unlock_irq(&worker->lock); if (work) { __set_current_state(TASK_RUNNING); work->func(work); } else if (!freezing(current)) schedule(); goto repeat; } EXPORT_SYMBOL_GPL(kthread_worker_fn); static void __queue_kthread_work(struct kthread_worker *worker, struct kthread_work *work) { list_add_tail(&work->node, pos); work->worker = worker; } bool queue_kthread_work(struct kthread_worker *worker, struct kthread_work *work) { bool ret = false; unsigned long flags; /* * Lock worker first to avoid ABBA deadlock. * What if the work is already queued to another worker? */ spin_lock_irqsave(&worker->lock, flags); spin_lock(&work->lock); if (work->flags != KTHREAD_WORK_IDLE) goto unlock_out; __queue_kthread_work(worker, work); ret = true; unlock_out: spin_unlock(&work->lock); spin_unlock_irqrestore(&worker->lock, flags); out: return ret; } bool cancel_kthread_work_sync(struct kthread_work *work) { struct kthread_worker *worker; bool flush = true; unsigned long flags; again: worker = ACCESS_ONCE(work->worker); if (worker) spin_lock_irqsave(&worker->lock, flags); else local_irq_save(flags); spin_lock(&work->lock); if (worker && worker != work->worker) { spin_unlock(&work->lock); spin_unlock_irqrestore(&worker->lock, flags); goto again; } switch (work->flags) { case KTHREAD_WORK_PENDING: list_del_init(&work->node); case KTHREAD_WORK_IDLE: work->flags = KTHREAD_WORK_CANCELING; break; case KTHREAD_WORK_CANCELING: /* * Some other work is canceling. Let's wait in a * custom waitqueue until the work is flushed. */ prepare_to_wait_exclusive(&cancel_waitq, &cwait.wait, TASK_UNINTERRUPTIBLE); flush = false; break; } spin_unlock(&work->lock); if (worker) spin_unlock_irqrestore(&worker->lock, flags); else local_irq_restore(flags); if (flush) { kthread_work_flush(work); /* * We are the cancel leader. Nobody else could manipulate the * work. */ work->flags = KTHREAD_WORK_IDLE; /* * Paired with prepare_to_wait() above so that either * waitqueue_active() is visible here or CANCELING bit is * visible there. */ smp_mb(); if (waitqueue_active(&cancel_waitq)) __wake_up(&cancel_waitq, TASK_NORMAL, 1, work); } elseif (READ_ONCE(work->flags) == KTHREAD_WORK_CANCELING) { schedule(); } } IMHO, we need both locks. The worker manipulates more works and need its own lock. We need work-specific lock because the work might be assigned to different workers and we need to be sure that the operations are really serialized, e.g. queuing. Now, worker_fn() needs to get the first work from the the worker list and then manipulate the work => it needs to get the worker->lock and then work->lock. It means that most other functions would need to take both locks in this order to avoid ABBA deadlock. This causes quite some complications, e.g. in cancel_work(). There might be even more problems if we try to queue the same work into more workers and we start mixing the locks from different workers. I do not know. It is possible that you prefer this solution than the atomic bit operations. Also it is possible that there exists a trick that might make it easier. I would personally prefer to use the tricks from the workqueues. They are proven to work and they make the code faster. I think that we might need to queue the work in a sensitive fast path. IMHO, the code already is much easier than the workqueues because there are no pools of kthreads behind each worker. Best Regards, Petr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>