Li Zefan <lizefan@xxxxxxxxxx> writes: > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> > > commit 3aa62497594430ea522050b75c033f71f2c60ee6 upstream. This commit seems to be missing in the 3.5 kernel as well. I'm queuing it for the next release. Cheers, -- Luis > > Currently, when try_to_grab_pending() grabs a delayed work item, it > leaves its linked work items alone on the delayed_works. The linked > work items are always NO_COLOR and will cause future > cwq_activate_first_delayed() increase cwq->nr_active incorrectly, and > may cause the whole cwq to stall. For example, > > state: cwq->max_active = 1, cwq->nr_active = 1 > one work in cwq->pool, many in cwq->delayed_works. > > step1: try_to_grab_pending() removes a work item from delayed_works > but leaves its NO_COLOR linked work items on it. > > step2: Later on, cwq_activate_first_delayed() activates the linked > work item increasing ->nr_active. > > step3: cwq->nr_active = 1, but all activated work items of the cwq are > NO_COLOR. When they finish, cwq->nr_active will not be > decreased due to NO_COLOR, and no further work items will be > activated from cwq->delayed_works. the cwq stalls. > > Fix it by ensuring the target work item is activated before stealing > PENDING in try_to_grab_pending(). This ensures that all the linked > work items are activated without incorrectly bumping cwq->nr_active. > > tj: Updated comment and description. > > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > [lizf: backported to 3.4: adjust context] > Signed-off-by: Li Zefan <lizefan@xxxxxxxxxx> > --- > > The original commit was tagged with stable, and it was backported to 3.2 > and 3.6. > > --- > kernel/workqueue.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index a64b94e..9a0b579 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1721,10 +1721,9 @@ static void move_linked_works(struct work_struct *work, struct list_head *head, > *nextp = n; > } > > -static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq) > +static void cwq_activate_delayed_work(struct work_struct *work) > { > - struct work_struct *work = list_first_entry(&cwq->delayed_works, > - struct work_struct, entry); > + struct cpu_workqueue_struct *cwq = get_work_cwq(work); > struct list_head *pos = gcwq_determine_ins_pos(cwq->gcwq, cwq); > > trace_workqueue_activate_work(work); > @@ -1733,6 +1732,14 @@ static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq) > cwq->nr_active++; > } > > +static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq) > +{ > + struct work_struct *work = list_first_entry(&cwq->delayed_works, > + struct work_struct, entry); > + > + cwq_activate_delayed_work(work); > +} > + > /** > * cwq_dec_nr_in_flight - decrement cwq's nr_in_flight > * @cwq: cwq of interest > @@ -2625,6 +2632,18 @@ static int try_to_grab_pending(struct work_struct *work) > smp_rmb(); > if (gcwq == get_work_gcwq(work)) { > debug_work_deactivate(work); > + > + /* > + * A delayed work item cannot be grabbed directly > + * because it might have linked NO_COLOR work items > + * which, if left on the delayed_list, will confuse > + * cwq->nr_active management later on and cause > + * stall. Make sure the work item is activated > + * before grabbing. > + */ > + if (*work_data_bits(work) & WORK_STRUCT_DELAYED) > + cwq_activate_delayed_work(work); > + > list_del_init(&work->entry); > cwq_dec_nr_in_flight(get_work_cwq(work), > get_work_color(work), -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html