From: NeilBrown Date: Tue, 12 Mar 2013 01:18:06 +0000 (+1100) Subject: md/raid5: ensure sync and DISCARD don't happen at the same time. X-Git-Tag: v3.9-rc4~2^2~1 X-Git-Url: http://git.openpandora.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=f8dfcffd0472a0f353f34a567ad3f53568914d04;p=pandora-kernel.git md/raid5: ensure sync and DISCARD don't happen at the same time. A number of problems can occur due to races between resync/recovery and discard. - if sync_request calls handle_stripe() while a discard is happening on the stripe, it might call handle_stripe_clean_event before all of the individual discard requests have completed (so some devices are still locked, but not all). Since commit ca64cae96037de16e4af92678814f5d4bf0c1c65 md/raid5: Make sure we clear R5_Discard when discard is finished. this will cause R5_Discard to be cleared for the parity device, so handle_stripe_clean_event() will not be called when the other devices do become unlocked, so their ->written will not be cleared. This ultimately leads to a WARN_ON in init_stripe and a lock-up. - If handle_stripe_clean_event() does clear R5_UPTODATE at an awkward time for resync, it can lead to s->uptodate being less than disks in handle_parity_checks5(), which triggers a BUG (because it is one). So: - keep R5_Discard on the parity device until all other devices have completed their discard request - make sure we don't try to have a 'discard' and a 'sync' action at the same time. This involves a new stripe flag to we know when a 'discard' is happening, and the use of R5_Overlap on the parity disk so when a discard is wanted while a sync is active, so we know to wake up the discard at the appropriate time. Discard support for RAID5 was added in 3.7, so this is suitable for any -stable kernel since 3.7. Cc: stable@vger.kernel.org (v3.7+) Reported-by: Jes Sorensen Tested-by: Jes Sorensen Signed-off-by: NeilBrown --- Reading git-diff-tree failed