From: Stephen Boyd Date: Fri, 20 Jul 2012 18:14:38 +0000 (+0000) Subject: cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch X-Git-Tag: v3.6-rc1~155^2^2 X-Git-Url: http://git.openpandora.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=a9144436271583115a2230db15d0b6ae2c481d3c;p=pandora-kernel.git cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch Running one program that continuously hotplugs and replugs a cpu concurrently with another program that continuously writes to the scaling_setspeed node eventually deadlocks with: ============================================= [ INFO: possible recursive locking detected ] 3.4.0 #37 Tainted: G W --------------------------------------------- filemonkey/122 is trying to acquire lock: (s_active#13){++++.+}, at: [] sysfs_remove_dir+0x9c/0xb4 but task is already holding lock: (s_active#13){++++.+}, at: [] sysfs_write_file+0xe8/0x140 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(s_active#13); lock(s_active#13); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by filemonkey/122: #0: (&buffer->mutex){+.+.+.}, at: [] sysfs_write_file+0x28/0x140 #1: (s_active#13){++++.+}, at: [] sysfs_write_file+0xe8/0x140 stack backtrace: [] (unwind_backtrace+0x0/0x120) from [] (validate_chain+0x6f8/0x1054) [] (validate_chain+0x6f8/0x1054) from [] (__lock_acquire+0x81c/0x8d8) [] (__lock_acquire+0x81c/0x8d8) from [] (lock_acquire+0x18c/0x1e8) [] (lock_acquire+0x18c/0x1e8) from [] (sysfs_addrm_finish+0xd0/0x180) [] (sysfs_addrm_finish+0xd0/0x180) from [] (sysfs_remove_dir+0x9c/0xb4) [] (sysfs_remove_dir+0x9c/0xb4) from [] (kobject_del+0x10/0x38) [] (kobject_del+0x10/0x38) from [] (kobject_release+0xf0/0x194) [] (kobject_release+0xf0/0x194) from [] (cpufreq_cpu_put+0xc/0x24) [] (cpufreq_cpu_put+0xc/0x24) from [] (store+0x6c/0x74) [] (store+0x6c/0x74) from [] (sysfs_write_file+0x10c/0x140) [] (sysfs_write_file+0x10c/0x140) from [] (vfs_write+0xb0/0x128) [] (vfs_write+0xb0/0x128) from [] (sys_write+0x3c/0x68) [] (sys_write+0x3c/0x68) from [] (ret_fast_syscall+0x0/0x3c) This is because store() in cpufreq.c indirectly calls kobject_get() via cpufreq_cpu_get() and is the last one to call kobject_put() via cpufreq_cpu_put(). Sysfs code should not call kobject_get() or kobject_put() directly (see the comment around sysfs_schedule_callback() for more information). Fix this deadlock by introducing two new functions: struct cpufreq_policy *cpufreq_cpu_get_sysfs(unsigned int cpu) void cpufreq_cpu_put_sysfs(struct cpufreq_policy *data) which do the same thing as cpufreq_cpu_{get,put}() but don't call kobject functions. To easily trigger this deadlock you can insert an msleep() with a reasonably large value right after the fail label at the bottom of the store() function in cpufreq.c and then write scaling_setspeed in one task and offline the cpu in another. The first task will hang and be detected by the hung task detector. Signed-off-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- Reading git-diff-tree failed