drbd: Fix state change in case of connection timeout
authorPhilipp Reisner <philipp.reisner@linbit.com>
Mon, 10 Nov 2014 16:21:14 +0000 (17:21 +0100)
committerJens Axboe <axboe@fb.com>
Mon, 10 Nov 2014 16:27:41 +0000 (09:27 -0700)
A connection timeout affects all volumes of a resource!
Under the following conditions:

 A resource with multiple volumes
  AND
 ko-count >=1
  AND
 a write request triggers the timeout (ko-count * timeout)

DRBD's internal state gets confused. That in turn may
lead to very miss leading follow up failures. E.g.
"BUG: scheduling while atomic"

CC: stable@kernel.org # v3.17
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
drivers/block/drbd/drbd_req.c

index 90319b1..3b797cd 100644 (file)
@@ -1629,7 +1629,7 @@ void request_timer_fn(unsigned long data)
                 time_after(now, req_peer->pre_send_jif + ent) &&
                !time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + ent)) {
                drbd_warn(device, "Remote failed to finish a request within ko-count * timeout\n");
-               _drbd_set_state(_NS(device, conn, C_TIMEOUT), CS_VERBOSE | CS_HARD, NULL);
+               _conn_request_state(connection, NS(conn, C_TIMEOUT), CS_VERBOSE | CS_HARD);
        }
        if (dt && oldest_submit_jif != now &&
                 time_after(now, oldest_submit_jif + dt) &&