From: Oleg Nesterov Date: Wed, 9 May 2007 09:34:07 +0000 (-0700) Subject: workqueue: fix flush_workqueue() vs CPU_DEAD race X-Git-Tag: v2.6.22-rc1~233 X-Git-Url: http://git.openpandora.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=d721304dce0ced0b3b0366996cc02929669708a8;p=pandora-kernel.git workqueue: fix flush_workqueue() vs CPU_DEAD race Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting the bug in my previous attempt. work->func() (and thus flush_workqueue()) must not use workqueue_mutex, this leads to deadlock when CPU_DEAD does kthread_stop(). However without this mutex held we can't detect CPU_DEAD in progress, which can move pending works to another CPU while the dead one is not on cpu_online_map. Change flush_workqueue() to use for_each_possible_cpu(). This means that flush_cpu_workqueue() may hit CPU which is already dead. However in that case !list_empty(&cwq->worklist) || cwq->current_work != NULL means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work() so we can proceed and insert a barrier. We hold cwq->lock, so we are safe. Also, add migrate_sequence incremented by take_over_work() under cwq->lock. If take_over_work() happened before we checked this CPU, we should see the new value after spin_unlock(). Further possible changes: remove CPU_DEAD handling (along with take_over_work, migrate_sequence) from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag. CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates the new thread if cwq->thread == NULL. This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away. Signed-off-by: Oleg Nesterov Cc: Srivatsa Vaddagiri Cc: "Pallipadi, Venkatesh" Cc: Gautham shenoy Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Reading git-diff-tree failed