pandora-kernel.git
13 years agodrbd: use dynamic_dev_dbg to optionally log uuid changes
Lars Ellenberg [Thu, 7 Oct 2010 13:18:08 +0000 (15:18 +0200)]
drbd: use dynamic_dev_dbg to optionally log uuid changes

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set
Philipp Reisner [Thu, 14 Oct 2010 09:58:20 +0000 (11:58 +0200)]
dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
13 years agodrbd: cleanup: change "<= 0" to "== 0"
Dan Carpenter [Wed, 11 Aug 2010 22:38:45 +0000 (00:38 +0200)]
drbd: cleanup: change "<= 0" to "== 0"

dt is unsigned so it's never less than zero.  We are calculating the
elapsed time, and that's never less than zero (unless there is a bug or
we invent time travel).  The comparison here is just to guard against
divide by zero bugs.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
13 years agodrbd: relax the grace period of the md_sync timer again
Lars Ellenberg [Thu, 14 Oct 2010 13:01:21 +0000 (15:01 +0200)]
drbd: relax the grace period of the md_sync timer again

Consolidate the ifdef's for the debug level, accidentally the used both
DEBUG and DRBD_DEBUG_MD_SYNC.  Default to off.

For production, we can safely reduce the grace period for this timer
again the the value we used to have.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: add some more explicit drbd_md_sync
Lars Ellenberg [Thu, 14 Oct 2010 11:37:40 +0000 (13:37 +0200)]
drbd: add some more explicit drbd_md_sync

It sometimes may take a while for the after state change work to be
scheduled, which does drbd_md_sync. At convenient places, we should do
explicit drbd_md_sync to have the new state information on disk as soon
as possible.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: drop wrong debug asserts, fix recently introduced race
Lars Ellenberg [Thu, 14 Oct 2010 11:57:07 +0000 (13:57 +0200)]
drbd: drop wrong debug asserts, fix recently introduced race

 commit 2372c38caadeaebc68a5ee190782c2a0df01edc3
 drbd: fix for possible deadlock on IO error during resync

introduced a new ASSERT, which turns out to be wrong. Drop it.

Also serialize the state change to D_DISKLESS with the after state
change work of the -> D_FAILED transition, don't open a new race.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: cleanup useless leftover warn/error printk's
Lars Ellenberg [Wed, 13 Oct 2010 16:19:23 +0000 (18:19 +0200)]
drbd: cleanup useless leftover warn/error printk's

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: add explicit drbd_md_sync to drbd_resync_finished
Lars Ellenberg [Wed, 13 Oct 2010 15:37:54 +0000 (17:37 +0200)]
drbd: add explicit drbd_md_sync to drbd_resync_finished

As we usually update the generation UUIDs here, we should explicitly
sync them to disk.  So far this has been done only implicitly by related
code paths.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
Philipp Reisner [Wed, 13 Oct 2010 13:32:44 +0000 (15:32 +0200)]
drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED

This might happen if on the VERIFY_S node the disk gets dropped.
Although this is an cluster wide state transition, the VERIFY_T node,
updates it connection state first. Then the ack packet for the
cluster wide state transition travels back, and the VERIFY_S node
stops to produce the P_OV_REQUEST packets.

There is absolutely nothing wrong with that.

Further, do not log "Can not satisfy peer's..." on the VERIFY_S
node in this case, but pretend that they had equal checksum.

[Bugz 327]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix for possible deadlock on IO error during resync
Lars Ellenberg [Tue, 14 Sep 2010 18:26:27 +0000 (20:26 +0200)]
drbd: fix for possible deadlock on IO error during resync

Scenario:

Something (say, flush-147:0) is in drbd_al_begin_io,
holding a local_cnt, waiting for the resync to make progress.

Disk fails, worker in after_state_ch does drbd_rs_cancel_all,
then waits for local_cnt to drop to zero.

flush-147:0 is woken by drbd_rs_cancel_all, needs to write an AL
transaction, and queues that on the worker.

Deadlock.

Fix: do not wait in the worker, have put_ldev() trigger the
state change D_FAILED -> D_DISKLESS when necessary.
put_ldev() cannot do the state change directly, as it may or may not
already hold various spinlocks. We queue a short work instead.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix unlikely access after free and list corruption
Lars Ellenberg [Tue, 14 Sep 2010 18:40:41 +0000 (20:40 +0200)]
drbd: fix unlikely access after free and list corruption

Various cleanup paths have been incomplete, for the very unlikely case
that we cannot allocate enough bios from process context when submitting
on behalf of the peer or resync process.

Never observed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix for spurious fullsync (uuids rotated too fast)
Lars Ellenberg [Thu, 7 Oct 2010 14:07:55 +0000 (16:07 +0200)]
drbd: fix for spurious fullsync (uuids rotated too fast)

If it was an "empty" resync, the SyncSource may have already "finished"
the resync and rotated the UUIDs, before noticing the connection loss
(and generating a new uuid, if Primary, rotating again), while the
SyncTarget did not change its uuids at all, or only got to the previous
sync-uuid.
This would then again lead to a full sync on next handshake
(see also Bug #251).

Fix:
Use explicit resync finished notification even for empty resyncs,
do not finish an empty resync implicitly on the SyncSource.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: allow for explicit resync-finished notifications
Lars Ellenberg [Thu, 7 Oct 2010 13:55:39 +0000 (15:55 +0200)]
drbd: allow for explicit resync-finished notifications

Preparation patch so more drbd_send_state() usage on the peer
will not confuse drbd in receive_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: preparation commit, using full state in receive_state()
Lars Ellenberg [Thu, 22 Jul 2010 15:39:26 +0000 (17:39 +0200)]
drbd: preparation commit, using full state in receive_state()

no functional change, just using full state instead of just the .conn
part of it for comparisons.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: drbd_send_ack_dp must not rely on header information
Lars Ellenberg [Wed, 6 Oct 2010 09:46:55 +0000 (11:46 +0200)]
drbd: drbd_send_ack_dp must not rely on header information

drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
drbd: receiving of big packets, for payloads between 64kByte and 4GByte
introduced a new on-the-wire packet header format.  We must no longer
assume either format, but use the result of whatever drbd_recv_header
has decoded.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
Lars Ellenberg [Tue, 5 Oct 2010 18:13:58 +0000 (20:13 +0200)]
drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)

We used to be16_to_cpu the length field in our received packet header.
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
    drbd: receiving of big packets, for payloads between 64kByte and 4GByte
changed this, but forgot to adjust a few places where we relied on
h->length being in native byte order.

This broke the receiving side of the RLE compressed bitmap exchange.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fixed a stupid copy and paste error
Philipp Reisner [Tue, 5 Oct 2010 14:50:17 +0000 (16:50 +0200)]
drbd: Fixed a stupid copy and paste error

This caused rs_planed to be not in sync with the content of the fifo.
That in turn could cause that the resync comes to a complete halt.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Allow larger values for c-fill-target.
Philipp Reisner [Tue, 5 Oct 2010 09:19:39 +0000 (11:19 +0200)]
drbd: Allow larger values for c-fill-target.

Connections through a compressing proxy might have more bits
on the fly. 500MByte instead of 50MByte

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix possible access after free
Lars Ellenberg [Tue, 14 Sep 2010 18:14:09 +0000 (20:14 +0200)]
drbd: fix possible access after free

If we release the page pointed to by md_io_tmpp, we need to zero out the
pointer, too, as that may be used later to decide whether we need to
allocate a new page again.

Impact: a previously freed page may be used and clobbered.  Depending on
what that particular page is being used for meanwhile, this may result
in silent data corruption of completely unrelated things.

Only of concern on devices with logical_block_size != 512 byte,
if you re-attach after becoming diskless once.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: protocol compatibility for maximum packet sizes
Lars Ellenberg [Tue, 14 Sep 2010 13:56:29 +0000 (15:56 +0200)]
drbd: protocol compatibility for maximum packet sizes

Two missing corner cases to the "maximum packet size" handshake.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Track the reasons to suspend IO in dedicated state bits
Philipp Reisner [Wed, 8 Sep 2010 21:20:21 +0000 (23:20 +0200)]
drbd: Track the reasons to suspend IO in dedicated state bits

There are three ways to get IO suspended:

 * Loss of any access to data
 * Fence-peer-handler running
 * User requested to suspend IO

Track those in different bits, so that one condition clearing its
state bit does not interfere with the other two conditions.

Only when the user resumes IO he overrules all three bits.

The fact is hidden from the user, he sees only a single suspend
bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: DIV_ROUND_UP not needed here
Lars Ellenberg [Mon, 13 Sep 2010 11:27:10 +0000 (13:27 +0200)]
drbd: DIV_ROUND_UP not needed here

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fixed compatibility with protocol versions smaller than 95
Philipp Reisner [Thu, 9 Sep 2010 12:22:21 +0000 (14:22 +0200)]
drbd: Fixed compatibility with protocol versions smaller than 95

Forgot to consider the max size for the resync requests.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix for spurious full sync (becoming sync target looked like invalidate)
Lars Ellenberg [Wed, 21 Jul 2010 15:04:32 +0000 (17:04 +0200)]
drbd: fix for spurious full sync (becoming sync target looked like invalidate)

If a synctarget lost connection while being WFSyncUUID,
due to "state sanitizing", the attempted state change to SyncTarget
looked like an "invalidate" to after_state_ch() later,
thus caused a full sync on next handshake (Bug #318).

drbd0: PingAck did not arrive in time.
drbd0: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

        from  : { cs:NetworkFailure ro:Secondary/Unknown ds:UpToDate/DUnknown r--- }
        to    : { cs:SyncTarget ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
        after sanizising, resulted in
        state: { cs:NetworkFailure ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
        drbd0: disk( UpToDate -> Inconsistent )

Fix:
don't mask state transition errors in "sanitizing",
so the requested state change to SyncTarget fails,
instead of being implicitly "remaped" to invalidate.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: cosmetic, don't report resync for online-verify
Lars Ellenberg [Mon, 6 Sep 2010 10:13:20 +0000 (12:13 +0200)]
drbd: cosmetic, don't report resync for online-verify

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix spurious protocol error
Lars Ellenberg [Mon, 6 Sep 2010 10:31:37 +0000 (12:31 +0200)]
drbd: fix spurious protocol error

If we cannot satisfy a request (because our disk just broke),
we still need to drain the payload.  Or we'll get a protocol error
when interpreting the payload as DRBD packet header.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix potential kernel BUG (NULL deref)
Lars Ellenberg [Sat, 4 Sep 2010 23:13:24 +0000 (01:13 +0200)]
drbd: fix potential kernel BUG (NULL deref)

BUG trace would look like:
 lc_find
 drbd_rs_complete_io
 got_OVResult
 drbd_asender

Could be triggered by explicit, or IO-error policy based,
detach during online-verify.

We may only dereference mdev->resync, if we first get_ldev(), as the
disk may break any time, causing mdev->resync to disappear once all
ldev references have been returned.
Already in flight online-verify requests or replies may still come in,
which we then need to ignore.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: don't count sendpage()d pages only referenced by tcp as in use
Lars Ellenberg [Mon, 6 Sep 2010 10:30:25 +0000 (12:30 +0200)]
drbd: don't count sendpage()d pages only referenced by tcp as in use

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Adding support for BIO/Request flags: REQ_FUA, REQ_FLUSH and REQ_DISCARD
Philipp Reisner [Wed, 25 Aug 2010 09:58:05 +0000 (11:58 +0200)]
drbd: Adding support for BIO/Request flags: REQ_FUA, REQ_FLUSH and REQ_DISCARD

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: drbd_md_sync before calling user space helpers
Lars Ellenberg [Mon, 19 Jul 2010 15:41:04 +0000 (17:41 +0200)]
drbd: drbd_md_sync before calling user space helpers

Just in case we have some pending meta data changes to sync, do it
before we call our userland helper, as that may take some time,
or even cause a hard reboot.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix race on meta-data update, addendum
Lars Ellenberg [Fri, 3 Sep 2010 08:00:09 +0000 (10:00 +0200)]
drbd: fix race on meta-data update, addendum

addendum to baa33ae4eaa4477b60af7c434c0ddd1d182c1ae7

The race:
    drbd_md_sync()
if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
return;
    ==> RACE with drbd_md_mark_dirty() rearming the timer.
del_timer(&mdev->md_sync_timer);

    Fixed by moving the del_timer before the test_and_clear_bit.

Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was
not already set, reduce the grace period from five to one second, and
add an ifdef'ed debuging aid to find code paths missing an explicit
drbd_md_sync, if any, as those are the only relevant ones for this race.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Removed a race that could cause unexpected execution of w_make_resync_request()
Philipp Reisner [Wed, 1 Sep 2010 13:47:15 +0000 (15:47 +0200)]
drbd: Removed a race that could cause unexpected execution of w_make_resync_request()

The actual race happened int the drbd_start_resync() function. Where
drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and
armed the timer.

If the timer fired before execution reaches the mod_timer statement
at the end of drbd_start_resync() the latter would cause an
unexpected call to w_make_resync_request().

Removed the STOP_SYNC_TIMER bit, and base it on the connection state.

The STOP_SYNC_TIMER bit probably originates probably the time before
the state engine.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: implicitly create unconfigured devices on sync-after dependencies
Lars Ellenberg [Wed, 1 Sep 2010 12:39:30 +0000 (14:39 +0200)]
drbd: implicitly create unconfigured devices on sync-after dependencies

If pacemaker (for example) decided to initialize minor devices not in
the exact sync-after dependency order, the configuration partially
failed with an error "The sync-after minor number is invalid". (Bugz. #322)

We can avoid that by implicitly creating unconfigured minor devices,
if others depend on them.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix race on meta-data update
Lars Ellenberg [Wed, 1 Sep 2010 13:12:12 +0000 (15:12 +0200)]
drbd: fix race on meta-data update

The race:
drbd_md_mark_dirty()
drbd_md_sync()
if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
return;
drbd_md_sync_page_io(mdev, mdev->ldev, sector, WRITE)
  ==> RACE
clear_bit(MD_DIRTY, &mdev->flags); <== spurious

Fixed by removing the spurious clear_bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix race between deconfiguring and reconfiguring network
Lars Ellenberg [Wed, 1 Sep 2010 07:50:23 +0000 (09:50 +0200)]
drbd: fix race between deconfiguring and reconfiguring network

If a drbd_nl_net_conf hits the small window between the state change
to C_STANDALONE and the corresponding cleanup in after_state_ch,
that cleanup would throw away stuff we now need again,
and later trigger BUG_ON()s.

Fixed by properly serializing the new config request with
any pending cleanup.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Disable activity log updates when the whole device is out of sync
Philipp Reisner [Tue, 31 Aug 2010 10:00:50 +0000 (12:00 +0200)]
drbd: Disable activity log updates when the whole device is out of sync

When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.

As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.

BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Actually allow BIOs up to 128k (was 32k).
Philipp Reisner [Mon, 23 Aug 2010 13:18:33 +0000 (15:18 +0200)]
drbd: Actually allow BIOs up to 128k (was 32k).

Now we have multiple BIOs per ee, packets with a 32 bit length field,
it gets time to use these goodies.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: receiving of big packets, for payloads between 64kByte and 4GByte
Philipp Reisner [Fri, 20 Aug 2010 12:35:10 +0000 (14:35 +0200)]
drbd: receiving of big packets, for payloads between 64kByte and 4GByte

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Sending of big packets, for payloads from 64KByte to 4GByte
Philipp Reisner [Fri, 20 Aug 2010 11:36:10 +0000 (13:36 +0200)]
drbd: Sending of big packets, for payloads from 64KByte to 4GByte

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Bugfix for regression introduced with f9bc8913c06022e
Philipp Reisner [Mon, 23 Aug 2010 14:17:13 +0000 (16:17 +0200)]
drbd: Bugfix for regression introduced with f9bc8913c06022e

If we intent to use the block_id member of an epoch entry,
we may not use the digest member.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Microfix: Assigning sector once is sufficient
Philipp Reisner [Mon, 23 Aug 2010 13:51:56 +0000 (15:51 +0200)]
drbd: Microfix: Assigning sector once is sufficient

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: new configuration parameter c-min-rate
Lars Ellenberg [Wed, 11 Aug 2010 21:40:24 +0000 (23:40 +0200)]
drbd: new configuration parameter c-min-rate

We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.

If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: reduce code duplication when receiving data requests
Lars Ellenberg [Wed, 11 Aug 2010 21:28:00 +0000 (23:28 +0200)]
drbd: reduce code duplication when receiving data requests

also canonicalize the return values of read_for_csum
and drbd_rs_begin_io to return -ESOMETHING, or 0 for success.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: use rolling marks for resync speed calculation
Lars Ellenberg [Wed, 11 Aug 2010 19:21:50 +0000 (21:21 +0200)]
drbd: use rolling marks for resync speed calculation

The current resync speed as displayed in /proc/drbd fluctuates a lot.
Using an array of rolling marks makes this calculation much more stable.
We used to have this (a long time ago with 0.7), but it got lost somehow.

If "stalled", do not discard the rest of the information, just add a
" (stalled)" tag to the progress line.

This patch also shortens a spinlock critical section somewhat, and
reduces the number of atomic operations in put_ldev.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: remove outdated comment and dead code
Lars Ellenberg [Wed, 11 Aug 2010 18:53:21 +0000 (20:53 +0200)]
drbd: remove outdated comment and dead code

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: let drbd_free_ee implicitly free any digest
Lars Ellenberg [Wed, 11 Aug 2010 18:42:55 +0000 (20:42 +0200)]
drbd: let drbd_free_ee implicitly free any digest

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Replaced some casts by an union. Improved comments
Philipp Reisner [Wed, 21 Jul 2010 08:20:17 +0000 (10:20 +0200)]
drbd: Replaced some casts by an union. Improved comments

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Bugfix: rs_in_flight could become wrong if read_for_csum() requested reschedule...
Philipp Reisner [Thu, 22 Jul 2010 13:27:27 +0000 (15:27 +0200)]
drbd: Bugfix: rs_in_flight could become wrong if read_for_csum() requested reschedule later

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: The new, smarter resync speed controller
Philipp Reisner [Tue, 6 Jul 2010 09:14:00 +0000 (11:14 +0200)]
drbd: The new, smarter resync speed controller

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: New sync_param packet, that includes the parameters of the new controller
Philipp Reisner [Tue, 6 Jul 2010 15:25:54 +0000 (17:25 +0200)]
drbd: New sync_param packet, that includes the parameters of the new controller

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: New sync parameters for the smart resync rate controller
Philipp Reisner [Mon, 5 Jul 2010 11:42:03 +0000 (13:42 +0200)]
drbd: New sync parameters for the smart resync rate controller

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix list corruption (recent regression)
Lars Ellenberg [Fri, 9 Jul 2010 21:28:10 +0000 (23:28 +0200)]
drbd: fix list corruption (recent regression)

The commit 288f422ec13667de40b278535d2a5fb5c77352c4
 drbd: Track all IO requests on the TL, not writes only
moved a list_add_tail(req, ) into a region where req
may have just been freed due to conflict detection.

Fix this by adding a proper cleanup section for that code path.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Initialize all members of sync_conf to their defaults [Bugz 315]
Philipp Reisner [Tue, 29 Jun 2010 15:35:34 +0000 (17:35 +0200)]
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Make sure tl_restart(, resend) can not get called multiple times for a new...
Philipp Reisner [Thu, 24 Jun 2010 14:24:25 +0000 (16:24 +0200)]
drbd: Make sure tl_restart(, resend) can not get called multiple times for a new connection

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Do not try to free tl_hash in drbd_disconnect() when IO is suspended
Philipp Reisner [Thu, 24 Jun 2010 12:34:40 +0000 (14:34 +0200)]
drbd: Do not try to free tl_hash in drbd_disconnect() when IO is suspended

We may not free tl_hash when IO is suspended, since we can not wait
until ap_bio_cnt reaches zero.

We can do this after susp reched 0, since then tl_clear was called

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Allow attach while IO is suspended
Philipp Reisner [Thu, 24 Jun 2010 10:05:53 +0000 (12:05 +0200)]
drbd: Allow attach while IO is suspended

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Allow tl_restart() to do IO completion while IO is suspended
Philipp Reisner [Wed, 23 Jun 2010 15:18:51 +0000 (17:18 +0200)]
drbd: Allow tl_restart() to do IO completion while IO is suspended

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fixed a deadlock, probably only affected UP machines
Philipp Reisner [Wed, 23 Jun 2010 09:20:05 +0000 (11:20 +0200)]
drbd: Fixed a deadlock, probably only affected UP machines

After disconnect (most likely mdev->net_cnt == 0) and we are
still in an unstable state (!drbd_state_is_stable()). When we
get an IO request in drbd_get_max_buffers() (called from
__inc_ap_bio_cond(), called from inc_ap_bio()) we wake up
misc_wait. Misc_wait is also used in inc_ap_bio() to sleep
until the outcome of __inc_ap_bio_cond() changes. => Busy loop!

Solution: Have a dedicated wait queue for get_net_conf() and
put_net_conf().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Do not do a hard state change when establishing a connection [bugz 304]
Philipp Reisner [Wed, 16 Jun 2010 14:18:09 +0000 (16:18 +0200)]
drbd: Do not do a hard state change when establishing a connection [bugz 304]

Make sure the state engine can deny two primaries to connect

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Ensure that the peer was not rebootet in the meantime before resending TL
Philipp Reisner [Tue, 22 Jun 2010 12:03:27 +0000 (14:03 +0200)]
drbd: Ensure that the peer was not rebootet in the meantime before resending TL

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Delayed creation of current-UUID
Philipp Reisner [Fri, 11 Jun 2010 09:26:34 +0000 (11:26 +0200)]
drbd: Delayed creation of current-UUID

When a fencing policy of "resource-and-stonith" is configured,
and DRBD looses connection to it's peer, we can delay the
creation of a new current-UUID until IO gets thawed.

That allows one to deploy fence-peer handlers that actually
commit suicide on the machine they get started.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Run the fence-peer helper asynchronously
Philipp Reisner [Fri, 11 Jun 2010 11:56:33 +0000 (13:56 +0200)]
drbd: Run the fence-peer helper asynchronously

Since we can not thaw the transfer log, the next logical step is
to allow reconnects while the fence-peer handler runs.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Reduce the verbosity of some state transitions
Philipp Reisner [Thu, 10 Jun 2010 14:55:15 +0000 (16:55 +0200)]
drbd: Reduce the verbosity of some state transitions

State transitions in the space of non-allowed states used
to be very noisy. Reduce that, since that has little value
for the majority of the user base.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Removing a by now obsolete clause in the state sanitizing
Philipp Reisner [Thu, 10 Jun 2010 14:46:54 +0000 (16:46 +0200)]
drbd: Removing a by now obsolete clause in the state sanitizing

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Now we need to handle the ed_uuid of an diskless, unconnected primary correctly
Philipp Reisner [Mon, 21 Jun 2010 12:14:15 +0000 (14:14 +0200)]
drbd: Now we need to handle the ed_uuid of an diskless, unconnected primary correctly

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Disabled the crashed_primary detection for re-attach of last data while IO...
Philipp Reisner [Fri, 18 Jun 2010 14:03:20 +0000 (16:03 +0200)]
drbd: Disabled the crashed_primary detection for re-attach of last data while IO is frozen

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Do not allow a fencing-policy of resource-and-stonith with protocol A
Philipp Reisner [Fri, 18 Jun 2010 11:56:57 +0000 (13:56 +0200)]
drbd: Do not allow a fencing-policy of resource-and-stonith with protocol A

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Finished the "on-no-data-accessible suspend-io;" functionality
Philipp Reisner [Mon, 31 May 2010 08:14:17 +0000 (10:14 +0200)]
drbd: Finished the "on-no-data-accessible suspend-io;" functionality

When no data is accessible (no connection to the peer, nor a local disk)
allow the user to select to freeze all IO operations instead of getting
IO errors.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Removed redundant error checks in the request code path
Philipp Reisner [Mon, 10 May 2010 14:03:10 +0000 (16:03 +0200)]
drbd: Removed redundant error checks in the request code path

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: factored drbd_req_make_private_bio() out of drbd_req_new()
Philipp Reisner [Thu, 10 Jun 2010 11:30:36 +0000 (13:30 +0200)]
drbd: factored drbd_req_make_private_bio() out of drbd_req_new()

Preparing tl_thaw_dio()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Do not send two barriers without any writes between them
Philipp Reisner [Tue, 22 Jun 2010 09:26:48 +0000 (11:26 +0200)]
drbd: Do not send two barriers without any writes between them

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: factored tl_restart() out of tl_clear().
Philipp Reisner [Wed, 12 May 2010 15:08:26 +0000 (17:08 +0200)]
drbd: factored tl_restart() out of tl_clear().

If IO was frozen for a temporal network outage, resend the
content of the transfer-log into the newly established connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: mod_req has now a return value
Philipp Reisner [Wed, 9 Jun 2010 12:07:43 +0000 (14:07 +0200)]
drbd: mod_req has now a return value

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Track all IO requests on the TL, not writes only
Philipp Reisner [Thu, 27 May 2010 13:07:43 +0000 (15:07 +0200)]
drbd: Track all IO requests on the TL, not writes only

With that the drbd_fail_pending_reads() function becomes obsolete.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: renamed drbd_tl_epoch.n_req to drbd_tl_epoch.n_writes
Philipp Reisner [Thu, 27 May 2010 12:49:27 +0000 (14:49 +0200)]
drbd: renamed drbd_tl_epoch.n_req to drbd_tl_epoch.n_writes

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agoamiga floppy: Compile failure fixes
Vivek Goyal [Sun, 26 Sep 2010 03:23:25 +0000 (12:23 +0900)]
amiga floppy: Compile failure fixes

o Compile fixes for amiga floppy driver.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoatari floppy: Stop sharing request queue across multiple gendisks
Vivek Goyal [Fri, 24 Sep 2010 18:35:45 +0000 (20:35 +0200)]
atari floppy: Stop sharing request queue across multiple gendisks

o Use one request queue per gendisk instead of sharing the queue.

o Don't have hardware. No compile testing or run time testing done. Completely
  untested.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoamiga floppy: Stop sharing request queue across multiple gendisks
Vivek Goyal [Fri, 24 Sep 2010 18:35:44 +0000 (20:35 +0200)]
amiga floppy: Stop sharing request queue across multiple gendisks

o Use one request queue per gendisk instead of sharing request queue

o Don't have hardware. No compile testing or run time testing done. Completely
  untested.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agofloppy: switch to one queue per drive instead of sharing a queue
Jens Axboe [Wed, 22 Sep 2010 07:32:36 +0000 (09:32 +0200)]
floppy: switch to one queue per drive instead of sharing a queue

Pretty straight forward conversion. Note that we do round-robin
between the drives that have available requests, before we simply
used the drive that the IO scheduler told us to. Since the IO
scheduler doesn't care about multiple devices per queue, the resulting
sort would not have made sense.

Fixed by Vivek to get rid of a double lock problem in set_next_request()

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
13 years agocciss: remove some superfluous tests from cciss_bigpassthru()
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:35 +0000 (13:56 -0500)]
cciss: remove some superfluous tests from cciss_bigpassthru()

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_big_passthru
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:30 +0000 (13:56 -0500)]
cciss: factor out cciss_big_passthru

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_passthru
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:25 +0000 (13:56 -0500)]
cciss: factor out cciss_passthru

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getluninfo
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:20 +0000 (13:56 -0500)]
cciss: factor out cciss_getluninfo

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getdrivver
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:15 +0000 (13:56 -0500)]
cciss: factor out cciss_getdrivver

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getfirmver
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:10 +0000 (13:56 -0500)]
cciss: factor out cciss_getfirmver

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getbustypes
Stephen M. Cameron [Thu, 26 Aug 2010 18:56:05 +0000 (13:56 -0500)]
cciss: factor out cciss_getbustypes

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getheartbeat
Stephen M. Cameron [Thu, 26 Aug 2010 18:55:59 +0000 (13:55 -0500)]
cciss: factor out cciss_getheartbeat

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_setnodename
Stephen M. Cameron [Thu, 26 Aug 2010 18:55:54 +0000 (13:55 -0500)]
cciss: factor out cciss_setnodename

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getnodename
Stephen M. Cameron [Thu, 26 Aug 2010 18:55:49 +0000 (13:55 -0500)]
cciss: factor out cciss_getnodename

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_setintinfo
Stephen M. Cameron [Thu, 26 Aug 2010 18:55:44 +0000 (13:55 -0500)]
cciss: factor out cciss_setintinfo

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getintinfo
Stephen M. Cameron [Thu, 26 Aug 2010 18:55:39 +0000 (13:55 -0500)]
cciss: factor out cciss_getintinfo

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out cciss_getpciinfo
Stephen M. Cameron [Thu, 26 Aug 2010 18:55:34 +0000 (13:55 -0500)]
cciss: factor out cciss_getpciinfo

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoloop: add some basic read-only sysfs attributes
Milan Broz [Mon, 23 Aug 2010 13:16:00 +0000 (15:16 +0200)]
loop: add some basic read-only sysfs attributes

Create /sys/block/loopX/loop directory and provide these attributes:
 - backing_file
 - autoclear
 - offset
 - sizelimit

This loop directory is present only if loop device is configured.

To be used in util-linux-ng (and possibly elsewhere like udev rules)
where code need to get loop attributes from kernel (and not store
duplicate info in userspace).

Moreover loop ioctls are not even able to provide full backing
file info because of buffer limits.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoMerge branch 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev
Linus Torvalds [Mon, 23 Aug 2010 02:55:14 +0000 (19:55 -0700)]
Merge branch 'radix-tree' of git://git./linux/kernel/git/dgc/xfsdev

* 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
  radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
  radix-tree: clear all tags in radix_tree_node_rcu_free

13 years agoLinux 2.6.36-rc2 v2.6.36-rc2
Linus Torvalds [Mon, 23 Aug 2010 00:43:29 +0000 (17:43 -0700)]
Linux 2.6.36-rc2

13 years agoradix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
Dave Chinner [Mon, 23 Aug 2010 00:33:53 +0000 (10:33 +1000)]
radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags

Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
omplement function radix_tree_range_tag_if_tagged") does not safely
set tags on on intermediate tree nodes. The code walks down the tree
setting tags before it has fully resolved the path to the leaf under
the assumption there will be a leaf slot with the tag set in the
range it is searching.

Unfortunately, this is not a valid assumption - we can abort after
setting a tag on an intermediate node if we overrun the number of
tags we are allowed to set in a batch, or stop scanning because we
we have passed the last scan index before we reach a leaf slot with
the tag we are searching for set.

As a result, we can leave the function with tags set on intemediate
nodes which can be tripped over later by tag-based lookups. The
result of these stale tags is that lookup may end prematurely or
livelock because the lookup cannot make progress.

The fix for the problem involves reocrding the traversal path we
take to the leaf nodes, and only propagating the tags back up the
tree once the tag is set in the leaf node slot. We are already
recording the path for efficient traversal, so there is no
additional overhead to do the intermediately node tag setting in
this manner.

This fixes a radix tree lookup livelock triggered by the new
writeback sync livelock avoidance code introduced in commit
f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
livelock avoidance using page tagging").

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
13 years agoradix-tree: clear all tags in radix_tree_node_rcu_free
Dave Chinner [Mon, 23 Aug 2010 00:33:19 +0000 (10:33 +1000)]
radix-tree: clear all tags in radix_tree_node_rcu_free

Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
writeback livelock avoidance using page tagging") introduced a new
radix tree tag, increasing the number of tags in each node from 2 to
3. It did not, however, fix up the code in
radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
and hence could leave stray tags set in the new tag array.

The result is that the livelock avoidance code added in the the
above commit would hit stale tags when doing tag based lookups,
resulting in livelocks when trying to traverse the tree.

Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
again in the future by using a loop to walk all the tags up to
RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
leaves behind.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Nick Piggin <npiggin@kernel.dk>
Acked-by: Jan Kara <jack@suse.cz>
13 years agoMerge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 22 Aug 2010 18:27:36 +0000 (11:27 -0700)]
Merge branch 'kvm-updates/2.6.36' of git://git./virt/kvm/kvm

* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PIT: free irq source id in handling error path
  KVM: destroy workqueue on kvm_create_pit() failures
  KVM: fix poison overwritten caused by using wrong xstate size

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt...
Linus Torvalds [Sun, 22 Aug 2010 18:03:27 +0000 (11:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits)
  drm/i915,intel_agp: Add support for Sandybridge D0
  drm/i915: fix render pipe control notify on sandybridge
  agp/intel: set 40-bit dma mask on Sandybridge
  drm/i915: Remove the conflicting BUG_ON()
  drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
  drm/i915/suspend: Flush register writes before busy-waiting.
  i915: disable DAC on Ironlake also when doing CRT load detection.
  drm/i915: wait for actual vblank, not just 20ms
  drm/i915: make sure eDP PLL is enabled at the right time
  drm/i915: fix VGA plane disable for Ironlake+
  drm/i915: eDP mode set sequence corrections
  drm/i915: add panel reset workaround
  drm/i915: Enable RC6 on Ironlake.
  drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode.
  drm/i915: Set up a render context on Ironlake
  drm/i915 invalidate indirect state pointers at end of ring exec
  drm/i915: Wake-up wait_request() from elapsed hang-check (v2)
  drm/i915: Apply i830 errata for cursor alignment
  drm/i915: Only update i845/i865 CURBASE when disabled (v2)
  drm/i915: FBC is updated within set_base() so remove second call in mode_set()
  ...

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg...
Linus Torvalds [Sun, 22 Aug 2010 17:08:52 +0000 (10:08 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slab: fix object alignment
  slub: add missing __percpu markup in mm/slub_def.h