From: James Smart Date: Mon, 10 Apr 2006 14:14:05 +0000 (-0400) Subject: [SCSI] FC transport: fixes for workq deadlocks X-Git-Tag: v2.6.17-rc2~30^2~4 X-Git-Url: http://git.openpandora.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=aedf349773e5877d716a89368d512b9baa3e8c7b;p=pandora-kernel.git [SCSI] FC transport: fixes for workq deadlocks As previously reported via Michael Reed, the FC transport took a hit in 2.6.15 (perhaps a little earlier) when we solved a recursion error. There are 2 deadlocks occurring: - With scan and the delete items sharing the same workq, flushing the workq for the delete code was getting it stalled behind a very long running scan code path. - There's a deadlock where scsi_remove_target() has to sit behind scsi_scan_target() due to contention over the scan_lock(). This patch resolves the 1st deadlock and significantly reduces the odds of the second. So far, we have only replicated the 2nd deadlock on a highly-parallel SMP system. More on the 2nd deadlock in a following email. This patch reworks the transport to: - Only use the scsi host workq for scanning - Use 2 other workq's internally. One for deletions, the other for scheduled deletions. Originally, we tried this with a single workq, but the occassional flushes of the scheduled queues was hitting the second deadlock with a slightly higher frequency. In the future, we'll look at the LLDD's and the transport to see if we can get rid of this extra overhead. - When moving to the other workq's we tightened up some object states and some lock handling. - Properly syncs adds/deletes - minor code cleanups - directly reference fc_host_attrs, rather than through attribute macros - flush the right workq on delayed work cancel failures. Large kudos to Michael Reed who has been working this issue for the last month. Signed-off-by: James Bottomley --- Reading git-diff-tree failed