writeback: do not lose wakeup events when forking bdi threads
authorArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Fri, 27 Aug 2010 07:15:09 +0000 (09:15 +0200)
committerJens Axboe <jaxboe@fusionio.com>
Fri, 27 Aug 2010 07:16:18 +0000 (09:16 +0200)
This patch fixes the following issue:

INFO: task mount.nfs4:1120 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.nfs4    D 00000000fffc6a21     0  1120   1119 0x00000000
 ffff880235643948 0000000000000046 ffffffff00000000 ffffffff00000000
 ffff880235643fd8 ffff880235314760 00000000001d44c0 ffff880235643fd8
 00000000001d44c0 00000000001d44c0 00000000001d44c0 00000000001d44c0
Call Trace:
 [<ffffffff813bc747>] schedule_timeout+0x34/0xf1
 [<ffffffff813bc530>] ? wait_for_common+0x3f/0x130
 [<ffffffff8106b50b>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff813bc5c3>] wait_for_common+0xd2/0x130
 [<ffffffff8104159c>] ? default_wake_function+0x0/0xf
 [<ffffffff813beaa0>] ? _raw_spin_unlock+0x26/0x2a
 [<ffffffff813bc6bb>] wait_for_completion+0x18/0x1a
 [<ffffffff81101a03>] sync_inodes_sb+0xca/0x1bc
 [<ffffffff811056a6>] __sync_filesystem+0x47/0x7e
 [<ffffffff81105798>] sync_filesystem+0x47/0x4b
 [<ffffffff810e7ffd>] generic_shutdown_super+0x22/0xd2
 [<ffffffff810e80f8>] kill_anon_super+0x11/0x4f
 [<ffffffffa00d06d7>] nfs4_kill_super+0x3f/0x72 [nfs]
 [<ffffffff810e7b68>] deactivate_locked_super+0x21/0x41
 [<ffffffff810e7fd6>] deactivate_super+0x40/0x45
 [<ffffffff810fc66c>] mntput_no_expire+0xb8/0xed
 [<ffffffff810fc73b>] release_mounts+0x9a/0xb0
 [<ffffffff810fc7bb>] put_mnt_ns+0x6a/0x7b
 [<ffffffffa00d0fb2>] nfs_follow_remote_path+0x19a/0x296 [nfs]
 [<ffffffffa00d11ca>] nfs4_try_mount+0x75/0xaf [nfs]
 [<ffffffffa00d1790>] nfs4_get_sb+0x276/0x2ff [nfs]
 [<ffffffff810e7dba>] vfs_kern_mount+0xb8/0x196
 [<ffffffff810e7ef6>] do_kern_mount+0x48/0xe8
 [<ffffffff810fdf68>] do_mount+0x771/0x7e8
 [<ffffffff810fe062>] sys_mount+0x83/0xbd
 [<ffffffff810089c2>] system_call_fastpath+0x16/0x1b

The reason of this hang was a race condition: when the flusher thread is
forking a bdi thread, we use 'kthread_run()', so we run it _before_ we make it
visible in 'bdi->wb.task'. The bdi thread runs, does all works, and goes sleep.
'bdi->wb.task' is still NULL. And this is a dangerous time window.

If at this time someone queues a work for this bdi, he does not see the bdi
thread and wakes up the forker thread instead! But the forker has already
forked this bdi thread, but just did not make it visible yet!

The result is that we lose the wake up event for this bdi thread and the NFS4
code waits forever.

To fix the problem, we should use 'ktrhead_create()' for creating bdi threads,
then make them visible in 'bdi->wb.task', and only after this wake them up.
This is exactly what this patch does.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
mm/backing-dev.c

index eaa4a5b..c2bf86f 100644 (file)
@@ -445,8 +445,8 @@ static int bdi_forker_thread(void *ptr)
                switch (action) {
                case FORK_THREAD:
                        __set_current_state(TASK_RUNNING);
-                       task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s",
-                                          dev_name(bdi->dev));
+                       task = kthread_create(bdi_writeback_thread, &bdi->wb,
+                                             "flush-%s", dev_name(bdi->dev));
                        if (IS_ERR(task)) {
                                /*
                                 * If thread creation fails, force writeout of
@@ -457,10 +457,13 @@ static int bdi_forker_thread(void *ptr)
                                /*
                                 * The spinlock makes sure we do not lose
                                 * wake-ups when racing with 'bdi_queue_work()'.
+                                * And as soon as the bdi thread is visible, we
+                                * can start it.
                                 */
                                spin_lock_bh(&bdi->wb_lock);
                                bdi->wb.task = task;
                                spin_unlock_bh(&bdi->wb_lock);
+                               wake_up_process(task);
                        }
                        break;