pandora-kernel.git
17 years ago[Bluetooth] Correct SCO buffer size for Belkin devices
Marcel Holtmann [Tue, 18 Jul 2006 15:47:40 +0000 (17:47 +0200)]
[Bluetooth] Correct SCO buffer size for Belkin devices

The Belkin F8T012 and F8T013 devices are both based on a Bluetooth chip
from Broadcom and their SCO buffer size values are wrong. The Bluetooth
core should correct these values.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
17 years ago[Bluetooth] Correct SCO buffer size for another Broadcom chip
Marcel Holtmann [Fri, 14 Jul 2006 14:01:52 +0000 (16:01 +0200)]
[Bluetooth] Correct SCO buffer size for another Broadcom chip

The SCO buffer size values on IBM/Lenovo ThinkPad laptops with a
Bluetooth chip from Broadcom are wrong. The USB Bluetooth driver
has to set a quirk to correct the SCO buffer size values.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
17 years ago[Bluetooth] Correct RFCOMM channel MTU for broken implementations
Marcel Holtmann [Fri, 14 Jul 2006 09:42:12 +0000 (11:42 +0200)]
[Bluetooth] Correct RFCOMM channel MTU for broken implementations

Some Bluetooth RFCOMM implementations try to negotiate a bigger channel
MTU than we can support for a particular session. The maximum MTU for
a RFCOMM session is limited through the L2CAP layer. So if the other
side proposes a channel MTU that is bigger than the underlying L2CAP
MTU, we should reduce it to the L2CAP MTU of the session minus five
bytes for the RFCOMM headers.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
17 years ago[PKT_SCHED]: Fix regression in PSCHED_TADD{,2}.
Guillaume Chazarain [Mon, 24 Jul 2006 06:37:24 +0000 (23:37 -0700)]
[PKT_SCHED]: Fix regression in PSCHED_TADD{,2}.

In PSCHED_TADD and PSCHED_TADD2, if delta is less than tv.tv_usec (so,
less than USEC_PER_SEC too) then tv_res will be smaller than tv. The
affectation "(tv_res).tv_usec = __delta;" is wrong.  The fix is to
revert to the original code before
4ee303dfeac6451b402e3d8512723d3a0f861857 and change the 'if' in
'while'.

[Shuya MAEDA: "while (__delta >= USEC_PER_SEC){ ... }" instead of
"while (__delta > USEC_PER_SEC){ ... }"]

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[DCCP]: Fix default sequence window size
Ian McDonald [Mon, 24 Jul 2006 06:33:28 +0000 (23:33 -0700)]
[DCCP]: Fix default sequence window size

When using the default sequence window size (100) I got the following in
my logs:

Jun 22 14:24:09 localhost kernel: [ 1492.114775] DCCP: Step 6 failed for
DATA packet, (LSWL(6279674225) <= P.seqno(6279674749) <=
S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <=
P.ackno(1125899906842620) <= S.AWH(18798206548), sending SYNC...
Jun 22 14:24:09 localhost kernel: [ 1492.115147] DCCP: Step 6 failed for
DATA packet, (LSWL(6279674225) <= P.seqno(6279674750) <=
S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <=
P.ackno(1125899906842620) <= S.AWH(18798206549), sending SYNC...

I went to alter the default sysctl and it didn't take for new sockets.
Below patch fixes this.

I think the default is too low but it is what the DCCP spec specifies.

As a side effect of this my rx speed using iperf goes from about 2.8 Mbits/sec
to 3.5. This is still far too slow but it is a step in the right direction.

Compile tested only for IPv6 but not particularly complex change.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoIB/mthca: Initialize max_cmds before debug code prints it
Roland Dreier [Mon, 24 Jul 2006 16:36:50 +0000 (09:36 -0700)]
IB/mthca: Initialize max_cmds before debug code prints it

Read the max_cmds value from the response to the QUERY_FW command
before printing out the value, so that the real value goes into the
debug output.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipoib: Fix packet loss after hardware address update
Michael S. Tsirkin [Wed, 19 Jul 2006 14:44:37 +0000 (17:44 +0300)]
IB/ipoib: Fix packet loss after hardware address update

The neighbour ha field may get updated without destroying the
neighbour.  In this case, the ha field gets out of sync with the
address handle stored in ipoib_neigh->ah, with the result that
the ah field would point to an incorrect path, resulting in all
packets being lost.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipoib: Fix oops with ipoib_debug_mcast set
Or Gerlitz [Mon, 24 Jul 2006 07:42:00 +0000 (10:42 +0300)]
IB/ipoib: Fix oops with ipoib_debug_mcast set

Need to set mcast->ah before debug code dereferences it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mad: Validate MADs for spec compliance
Sean Hefty [Thu, 20 Jul 2006 08:25:50 +0000 (11:25 +0300)]
IB/mad: Validate MADs for spec compliance

Validate MADs sent by userspace clients for spec compliance with
C13-18.1.1 (prevent duplicate requests and responses sent on the
same port).  Without this, RMPP transactions get aborted because
of duplicate packets.

This patch is similar to that provided by Jack Morgenstein.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: ipath_skip_sge() can break if num_sge > 1
Ralph Campbell [Tue, 18 Jul 2006 01:21:24 +0000 (18:21 -0700)]
IB/ipath: ipath_skip_sge() can break if num_sge > 1

ipath_skip_sge() doesn't exactly duplicate the side effects of
ipath_copy_sge() if num_sge > 1 since it doesn't decrement ss->num_sge.
This could result in the sg_list being accessed out of bounds.
Since ipath_skip_sge() is almost always called with num_sge == 1,
the original "optimization" is almost never used.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: Fix ib_ipath driver to work with SRP
Ralph Campbell [Tue, 18 Jul 2006 01:19:54 +0000 (18:19 -0700)]
IB/ipath: Fix ib_ipath driver to work with SRP

I am still working on a proposal to remove the phys_to_virt() calls
in the ib_ipath driver.  In the mean time, this patch allows SRP
to work by fixing the R_Key check and conversion from IB address
to kernel virtual address.  It also returns the correct page size
for FMRs.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/ipath: Fix a data corruption
Ralph Campbell [Tue, 18 Jul 2006 01:18:36 +0000 (18:18 -0700)]
IB/ipath: Fix a data corruption

This patch fixes a problem where certain error packets are passed
to the InfiniBand layer for processing even though the packet
actually was received with an error.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/mthca: Fix SRQ limit event range check
Dotan Barak [Thu, 13 Jul 2006 08:05:49 +0000 (11:05 +0300)]
IB/mthca: Fix SRQ limit event range check

Mem-free HCAs always keep one spare SRQ WQE, so the SRQ limit cannot
be set beyond srq->max - 1.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/uverbs: Fix lockdep warnings
Roland Dreier [Sun, 23 Jul 2006 22:16:04 +0000 (15:16 -0700)]
IB/uverbs: Fix lockdep warnings

Lockdep warns because uverbs is trying to take uobj->mutex when it
already holds that lock.  This is because there are really multiple
types of uobjs even though all of their locks are initialized in
common code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years agoIB/uverbs: Fix unlocking in error paths
Michael S. Tsirkin [Mon, 17 Jul 2006 15:20:51 +0000 (18:20 +0300)]
IB/uverbs: Fix unlocking in error paths

ib_uverbs_create_ah() and ib_uverbs_create_srq() did not release the
PD's read lock in their error paths, which lead to deadlock when
destroying the PD.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
17 years ago[PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock
Paul Jackson [Sun, 23 Jul 2006 18:36:08 +0000 (11:36 -0700)]
[PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock

Fix ABBA deadlock between lock_cpu_hotplug() and the cpuset
callback_mutex lock.

It only happens on cpu_exclusive cpusets, due to the dynamic
sched domain code trying to take the cpu hotplug lock inside
the cpuset callback_mutex lock.

This bug has apparently been here for several months, but didn't
get hit until the right customer load on a large system.

This fix appears right from inspection, but it will take a few
more days running it on that customers workload to be confident
we nailed it.  We don't have any other reproducible test case.

The cpu_hotplug_lock() tends to cover large runs of code.
The other places that hold both that lock and the cpuset callback
mutex lock always nest the cpuset lock inside the hotplug lock.
This place tries to do the reverse, risking an ABBA deadlock.

This is in the cpuset_rmdir() code, where we:
  * take the callback_mutex lock
  * mark the cpuset CS_REMOVED
  * call update_cpu_domains for cpu_exclusive cpusets
  * in that call, take the cpu_hotplug lock if the
    cpuset is marked for removal.

Thanks to Jack Steiner for identifying this deadlock.

The fix is to tear down the dynamic sched domain before we grab
the cpuset callback_mutex lock.  This way, the two locks are
serialized, with the hotplug lock taken and released before
trying for the cpuset lock.

I suspect that this bug was introduced when I changed the
cpuset locking from one lock to two.  The dynamic sched domain
dependency on cpu_exclusive cpusets and its hotplug hooks were
added to this code earlier, when cpusets had only a single lock.
It may well have been fine then.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agocpu hotplug: simplify and hopefully fix locking
Linus Torvalds [Sun, 23 Jul 2006 19:12:16 +0000 (12:12 -0700)]
cpu hotplug: simplify and hopefully fix locking

The CPU hotplug locking was quite messy, with a recursive lock to
handle the fact that both the actual up/down sequence wanted to
protect itself from being re-entered, but the callbacks that it
called also tended to want to protect themselves from CPU events.

This splits the lock into two (one to serialize the whole hotplug
sequence, the other to protect against the CPU present bitmaps
changing). The latter still allows recursive usage because some
subsystems (ondemand policy for cpufreq at least) had already gotten
too used to the lax locking, but the locking mistakes are hopefully
now less fundamental, and we now warn about recursive lock usage
when we see it, in the hope that it can be fixed.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[cpufreq] ondemand: make shutdown sequence more robust
Linus Torvalds [Sun, 23 Jul 2006 19:05:00 +0000 (12:05 -0700)]
[cpufreq] ondemand: make shutdown sequence more robust

Shutting down the ondemand policy was fraught with potential
problems, causing issues for SMP suspend (which wants to hot-
unplug) all but the last CPU.

This should fix at least the worst problems (divide-by-zero
and infinite wait for the workqueue to shut down).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 21 Jul 2006 23:44:45 +0000 (16:44 -0700)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [TIPC]: Removing useless casts
  [IPV4]: Fix nexthop realm dumping for multipath routes
  [DUMMY]: Avoid an oops when dummy_init_one() failed
  [IFB] After ifb_init_one() failed, i is increased. Decrease
  [NET]: Fix reversed error test in netif_tx_trylock
  [MAINTAINERS]: Mark LAPB as Oprhan.
  [NET]: Conversions from kmalloc+memset to k(z|c)alloc.
  [NET]: sun happymeal, little pci cleanup
  [IrDA]: Use alloc_skb() in IrDA TX path
  [I/OAT]: Remove pci_module_init() from Intel I/OAT DMA engine
  [I/OAT]: net/core/user_dma.c should #include <net/netdma.h>
  [SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKed
  [SCTP]: Set chunk->data_accepted only if we are going to accept it.
  [SCTP]: Verify all the paths to a peer via heartbeat before using them.
  [SCTP]: Unhash the endpoint in sctp_endpoint_free().
  [SCTP]: Check for NULL arg to sctp_bucket_destroy().
  [PKT_SCHED] netem: Fix slab corruption with netem (2nd try)
  [WAN]: Converted synclink drivers to use netif_carrier_*()
  [WAN]: Cosmetic changes to N2 and C101 drivers
  [WAN]: Added missing netif_dormant_off() to generic HDLC
  ...

17 years ago[TIPC]: Removing useless casts
Panagiotis Issaris [Fri, 21 Jul 2006 22:52:20 +0000 (15:52 -0700)]
[TIPC]: Removing useless casts

Removing useless casts

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4]: Fix nexthop realm dumping for multipath routes
Patrick McHardy [Fri, 21 Jul 2006 22:09:55 +0000 (15:09 -0700)]
[IPV4]: Fix nexthop realm dumping for multipath routes

Routing realms exist per nexthop, but are only returned to userspace
for the first nexthop. This is due to the fact that iproute2 only
allows to set the realm for the first nexthop and the kernel refuses
multipath routes where only a single realm is present.

Dump all realms for multipath routes to enable iproute to correctly
display them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[DUMMY]: Avoid an oops when dummy_init_one() failed
Nicolas Dichtel [Fri, 21 Jul 2006 22:09:07 +0000 (15:09 -0700)]
[DUMMY]: Avoid an oops when dummy_init_one() failed

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IFB] After ifb_init_one() failed, i is increased. Decrease
Nicolas Dichtel [Fri, 21 Jul 2006 21:56:02 +0000 (14:56 -0700)]
[IFB] After ifb_init_one() failed, i is increased. Decrease

It before entering in the loop for freeing the other ifb devices.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Fix reversed error test in netif_tx_trylock
Herbert Xu [Fri, 21 Jul 2006 21:55:38 +0000 (14:55 -0700)]
[NET]: Fix reversed error test in netif_tx_trylock

A non-zero return value indicates success from spin_trylock,
not error.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[MAINTAINERS]: Mark LAPB as Oprhan.
David S. Miller [Fri, 21 Jul 2006 21:55:17 +0000 (14:55 -0700)]
[MAINTAINERS]: Mark LAPB as Oprhan.

Maintainer email not longer exists.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Conversions from kmalloc+memset to k(z|c)alloc.
Panagiotis Issaris [Fri, 21 Jul 2006 21:51:30 +0000 (14:51 -0700)]
[NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: sun happymeal, little pci cleanup
Jiri Slaby [Fri, 21 Jul 2006 21:51:02 +0000 (14:51 -0700)]
[NET]: sun happymeal, little pci cleanup

Use pci_register_driver instead of pci_module_init. Use PCI_DEVICE macro.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IrDA]: Use alloc_skb() in IrDA TX path
Samuel Ortiz [Fri, 21 Jul 2006 21:50:41 +0000 (14:50 -0700)]
[IrDA]: Use alloc_skb() in IrDA TX path

As pointed out by Christoph Hellwig, dev_alloc_skb() is not intended to be
used for allocating TX sk_buff. The IrDA stack was exclusively calling
dev_alloc_skb() on the TX path, and this patch fixes that.

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[I/OAT]: Remove pci_module_init() from Intel I/OAT DMA engine
Henrik Kretzschmar [Fri, 21 Jul 2006 21:50:13 +0000 (14:50 -0700)]
[I/OAT]: Remove pci_module_init() from Intel I/OAT DMA engine

Changes pci_module_init() to pci_register_driver().

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[I/OAT]: net/core/user_dma.c should #include <net/netdma.h>
Adrian Bunk [Fri, 21 Jul 2006 21:49:49 +0000 (14:49 -0700)]
[I/OAT]: net/core/user_dma.c should #include <net/netdma.h>

Every file should #include the headers containing the prototypes for
its global functions.

Especially in cases like this one where gcc can tell us through a
compile error that the prototype was wrong...

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKed
Sridhar Samudrala [Fri, 21 Jul 2006 21:49:25 +0000 (14:49 -0700)]
[SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKed

This implements Rules D1 and D4 of Sec 4.3 in the ADDIP draft.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Set chunk->data_accepted only if we are going to accept it.
Sridhar Samudrala [Fri, 21 Jul 2006 21:49:07 +0000 (14:49 -0700)]
[SCTP]: Set chunk->data_accepted only if we are going to accept it.

Currently there is a code path in sctp_eat_data() where it is possible
to set this flag even when we are dropping this chunk.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Verify all the paths to a peer via heartbeat before using them.
Sridhar Samudrala [Fri, 21 Jul 2006 21:48:50 +0000 (14:48 -0700)]
[SCTP]: Verify all the paths to a peer via heartbeat before using them.

This patch implements Path Initialization procedure as described in
Sec 2.36 of RFC4460.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Unhash the endpoint in sctp_endpoint_free().
Vlad Yasevich [Fri, 21 Jul 2006 21:48:26 +0000 (14:48 -0700)]
[SCTP]: Unhash the endpoint in sctp_endpoint_free().

This prevents a race between the close of a socket and receive of an
incoming packet.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SCTP]: Check for NULL arg to sctp_bucket_destroy().
Sridhar Samudrala [Fri, 21 Jul 2006 21:45:47 +0000 (14:45 -0700)]
[SCTP]: Check for NULL arg to sctp_bucket_destroy().

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[PKT_SCHED] netem: Fix slab corruption with netem (2nd try)
Guillaume Chazarain [Fri, 21 Jul 2006 21:45:25 +0000 (14:45 -0700)]
[PKT_SCHED] netem: Fix slab corruption with netem (2nd try)

CONFIG_DEBUG_SLAB found the following bug:
netem_enqueue() in sch_netem.c gets a pointer inside a slab object:
struct netem_skb_cb *cb = (struct netem_skb_cb *)skb->cb;
But then, the slab object may be freed:
skb = skb_unshare(skb, GFP_ATOMIC)
cb is still pointing inside the freed skb, so here is a patch to
initialize cb later, and make it clear that initializing it sooner
is a bad idea.

[From Stephen Hemminger: leave cb unitialized in order to let gcc
complain in case of use before initialization]

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[WAN]: Converted synclink drivers to use netif_carrier_*()
Krzysztof Halasa [Fri, 21 Jul 2006 21:44:55 +0000 (14:44 -0700)]
[WAN]: Converted synclink drivers to use netif_carrier_*()

WAN: Converted synclink drivers to use netif_carrier_*() instead
of hdlc_set_carrier().

Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[WAN]: Cosmetic changes to N2 and C101 drivers
Krzysztof Halasa [Fri, 21 Jul 2006 21:41:36 +0000 (14:41 -0700)]
[WAN]: Cosmetic changes to N2 and C101 drivers

WAN: Cosmetic changes to N2 and C101 drivers

Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[WAN]: Added missing netif_dormant_off() to generic HDLC
Krzysztof Halasa [Fri, 21 Jul 2006 21:41:01 +0000 (14:41 -0700)]
[WAN]: Added missing netif_dormant_off() to generic HDLC

WAN: Fixed a problem with PPP/raw HDLC/X.25 protocols not doing
netif_dormant_off() at startup.

Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4]: Get rid of redundant IPCB->opts initialisation
Herbert Xu [Fri, 21 Jul 2006 21:29:53 +0000 (14:29 -0700)]
[IPV4]: Get rid of redundant IPCB->opts initialisation

Now that we always zero the IPCB->opts in ip_rcv, it is no longer
necessary to do so before calling netif_rx for tunneled packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC64]: Update defconfig.
David S. Miller [Fri, 21 Jul 2006 21:19:45 +0000 (14:19 -0700)]
[SPARC64]: Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Fix length parameter verification in sys_getdomainname().
David S. Miller [Fri, 21 Jul 2006 21:12:39 +0000 (14:12 -0700)]
[SPARC]: Fix length parameter verification in sys_getdomainname().

Found by scrashme.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SERIAL] sunzilog: Fix instance enumeration.
David S. Miller [Thu, 20 Jul 2006 05:55:08 +0000 (22:55 -0700)]
[SERIAL] sunzilog: Fix instance enumeration.

Just do a linear enumeration so that we handle sun4d systems
correctly.  As a consequence, eliminate the hard coded keyboard and
mouse channel line values, use the CONS_{KEYB,MS} flags instead.

Also, report the keyboard/mouse Zilog channels just like the uart ones
do.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SERIAL] sunzilog: Remove duplicate IRQ registry in zs_probe().
David S. Miller [Thu, 20 Jul 2006 04:04:04 +0000 (21:04 -0700)]
[SERIAL] sunzilog: Remove duplicate IRQ registry in zs_probe().

We do it now in sunzilog_init() after all devices have been
probed.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Get sun4d SMP building again.
Raymond Burns [Tue, 18 Jul 2006 04:57:09 +0000 (21:57 -0700)]
[SPARC]: Get sun4d SMP building again.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Do not call sun4m_irq_rotate on sun4d.
Raymond Burns [Tue, 18 Jul 2006 04:50:55 +0000 (21:50 -0700)]
[SPARC]: Do not call sun4m_irq_rotate on sun4d.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Simplify and correct __cpu_find_by()
David S. Miller [Tue, 18 Jul 2006 04:49:58 +0000 (21:49 -0700)]
[SPARC]: Simplify and correct __cpu_find_by()

By using for_each_node_by_type().

Also, correct a spurioud test in check_cpu_node() on sparc64.
It is only called with nodes that have device_type "cpu".

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Initialize iounit spinlock in iounit_init().
Raymond Burns [Tue, 18 Jul 2006 04:40:27 +0000 (21:40 -0700)]
[SPARC]: Initialize iounit spinlock in iounit_init().

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Fix initialization of sun4d SBUS interrupts.
David S. Miller [Tue, 18 Jul 2006 04:39:09 +0000 (21:39 -0700)]
[SPARC]: Fix initialization of sun4d SBUS interrupts.

1) Explicitly traverse to the root looking for the "sbi".
2) Grab the "board#" property from the sbi's parent and
   verify that this parent is an "io-unit" node.
3) Skip IRQ initialization when device lacks "reg" property.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SERIAL] sunzilog: Register IRQ after all devices have been probed.
David S. Miller [Tue, 18 Jul 2006 04:07:17 +0000 (21:07 -0700)]
[SERIAL] sunzilog: Register IRQ after all devices have been probed.

Otherwise we will deref half-initialized channel pointers
and crash in the interrupt handler.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC] sbus: Make sure sbus nodes are named uniquely.
David S. Miller [Tue, 18 Jul 2006 04:06:15 +0000 (21:06 -0700)]
[SPARC] sbus: Make sure sbus nodes are named uniquely.

Just name them "sbus%d" otherwise on sun4d we try to register
multiple entries named "sbi@0,0" which does not work.

Based upon a report from Raymond Burns.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Fix property name acquisition in prom.c
Bob Breuer [Tue, 18 Jul 2006 00:05:56 +0000 (17:05 -0700)]
[SPARC]: Fix property name acquisition in prom.c

On sparc32 the prom_{first,next}prop() interfaces work
a little differently.  The buffer argument is ignored on
sparc32 and the firmware just returns a raw pointer to
the property name.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SERIAL] sunsab: Get line numbers and table sizing correct.
David S. Miller [Mon, 17 Jul 2006 23:40:26 +0000 (16:40 -0700)]
[SERIAL] sunsab: Get line numbers and table sizing correct.

Table sizing code should look for "se" not "su" nodes.

The chip at the lower address should get the first index.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC64] Fix sunsab ports ordering
Marc Zyngier [Mon, 17 Jul 2006 22:53:32 +0000 (15:53 -0700)]
[SPARC64] Fix sunsab ports ordering

Register second SAB port before the first one, as serial A is wired to
it, and expected to appear as ttyS0.

Signed-off-by: Marc Zyngier <maz@misterjones.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC]: Kill prom_getname, unused and not implemented properly.
David S. Miller [Mon, 17 Jul 2006 05:19:40 +0000 (22:19 -0700)]
[SPARC]: Kill prom_getname, unused and not implemented properly.

The m68k port's sun3 asm/oplib.h had a stray reference too, so I
killed that off as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SPARC64]: Fix more of_device layer IRQ bugs, and correct PROMREG_MAX.
David S. Miller [Mon, 17 Jul 2006 05:10:44 +0000 (22:10 -0700)]
[SPARC64]: Fix more of_device layer IRQ bugs, and correct PROMREG_MAX.

Sabre and Psycho PCI controllers can have partial interrupt-map
properties, meaning that on-board devices don't match up to any
entries.  Instead, they are fully specified from the beginning and
we should pass them directly to the IRQ translator as-is.

Also, fill in the necessary translator slots for the "graphics"
and "expansion UPA" interrupts on Sabre, Psycho, and SYSIO SBUS.

Increase PROMREG_MAX to 24, as seen on SUNW,ffb devices.

Finally, prevent accidentally writing past the end of the of_device
struct resource[] and irqs[] arrays.  Spit out a log message when
we ignore some entries because there are too many of them.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
Linus Torvalds [Fri, 21 Jul 2006 19:04:53 +0000 (12:04 -0700)]
Merge /linux/kernel/git/jejb/scsi-rc-fixes-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (38 commits)
  [SCSI] More buffer->request_buffer changes
  [SCSI] mptfusion: bump version to 3.04.01
  [SCSI] mptfusion: misc fix's
  [SCSI] mptfusion: firmware download boot fix's
  [SCSI] mptfusion: task abort fix's
  [SCSI] mptfusion: sas nexus loss support
  [SCSI] mptfusion: sas loginfo update
  [SCSI] mptfusion: mptctl panic when loading
  [SCSI] mptfusion: sas enclosures with smart drive
  [SCSI] NCR_D700: misc fixes (section and argument ordering)
  [SCSI] scsi_debug: must_check fixes
  [SCSI] scsi_transport_sas: kill the use of channel
  [SCSI] scsi_transport_sas: add expander backlink
  [SCSI] hide EH backup data outside the scsi_cmnd
  [SCSI] ibmvscsi: handle inactive SCSI target during probe
  [SCSI] ibmvscsi: allocate lpevents for ibmvscsi on iseries
  [SCSI] aic7[9x]xx: Remove last vestiges of reverse_scan
  [SCSI] aha152x: stop poking at saved scsi_cmnd members
  [SCSI] st.c: Improve sense output
  [SCSI] lpfc 8.1.7: Change version number to 8.1.7
  ...

17 years agoMerge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
Linus Torvalds [Fri, 21 Jul 2006 19:03:57 +0000 (12:03 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] sysfs_create_xxx return values.
  [S390] .align 4096 statements in head.S
  [S390] get_clock inline assembly.
  [S390] channel measurement interval display.
  [S390] xpram module parameter parsing - take 2.
  [S390] Fix gcc warning about unused return values.

17 years agoMerge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Fri, 21 Jul 2006 19:03:32 +0000 (12:03 -0700)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
  [PATCH] spidernet: rework tx queue handling
  [PATCH] spidernet: bug fix for init code
  [PATCH] sky2: NAPI poll fix
  [NET] ethtool: fix oops by testing correct struct member
  e1000: bump version to 7.1.9-k4
  e1000: fix panic on large frame receive when mtu=default
  e1000: remove CRC bytes from measured packet length
  e1000: Redo netpoll fix to address community concerns

17 years ago[S390] sysfs_create_xxx return values.
Heiko Carstens [Tue, 18 Jul 2006 11:46:58 +0000 (13:46 +0200)]
[S390] sysfs_create_xxx return values.

Take return values of sysfs_create_group & friends into account.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
17 years ago[S390] .align 4096 statements in head.S
Heiko Carstens [Tue, 18 Jul 2006 11:44:57 +0000 (13:44 +0200)]
[S390] .align 4096 statements in head.S

SLES9 binutils don't like .align 4096 statements in head.S. Work around this
by using .org statements.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
17 years ago[PATCH] spidernet: rework tx queue handling
Jens Osterkamp [Thu, 13 Jul 2006 09:54:08 +0000 (11:54 +0200)]
[PATCH] spidernet: rework tx queue handling

With this patch TX queue descriptors are not chained per default any more.
The pointer to next descriptor is set only when next descriptor is prepaired
for transfer. Also the mechanism of checking wether Spider is ready has been
changed: it checks not for CARDOWNED flag in status of previous descriptor
but for a TXDMAENABLED flag in Spider's register.

Signed-off-by: Maxim Shchetynin <maxim@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[PATCH] spidernet: bug fix for init code
Jens Osterkamp [Thu, 13 Jul 2006 09:54:23 +0000 (11:54 +0200)]
[PATCH] spidernet: bug fix for init code

We want to intitialize addr instead of data register first.

Signed-off-by: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[PATCH] sky2: NAPI poll fix
Stephen Hemminger [Mon, 17 Jul 2006 13:54:34 +0000 (09:54 -0400)]
[PATCH] sky2: NAPI poll fix

When sky2 driver gets lots of received packets at once, it can get stuck.
The NAPI poll routine gets called back to keep going, but since no IRQ bits
are set it doesn't make progress.

Increase version, since this is serious enough problem that I want to be
able to tell new from old problems.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoMerge branch 'upstream-fixes-jgarzik' of git://lost.foo-projects.org/~ahkok/git/netde...
Jeff Garzik [Mon, 17 Jul 2006 17:26:52 +0000 (13:26 -0400)]
Merge branch 'upstream-fixes-jgarzik' of git://lost.foo-projects.org/~ahkok/git/netdev-2.6 into upstream-fixes

17 years ago[NET] ethtool: fix oops by testing correct struct member
Jeff Garzik [Mon, 17 Jul 2006 16:54:40 +0000 (12:54 -0400)]
[NET] ethtool: fix oops by testing correct struct member

Noticed by Willy Tarreau.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[S390] get_clock inline assembly.
Andreas Krebbel [Mon, 17 Jul 2006 14:09:42 +0000 (16:09 +0200)]
[S390] get_clock inline assembly.

Add missing volatile to the get_clock / get_cycles inline assemblies
to avoid that consecutive calls get optimized away.

Signed-off-by: Andreas Krebbel <krebbel1@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
17 years ago[S390] channel measurement interval display.
Cornelia Huck [Mon, 17 Jul 2006 14:09:28 +0000 (16:09 +0200)]
[S390] channel measurement interval display.

Display avg_sample_interval in nanoseconds, like it is documented.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
17 years ago[S390] xpram module parameter parsing - take 2.
Heiko Carstens [Mon, 17 Jul 2006 14:09:23 +0000 (16:09 +0200)]
[S390] xpram module parameter parsing - take 2.

Don't use memparse since the default size modifier is 'k'.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
17 years ago[S390] Fix gcc warning about unused return values.
Heiko Carstens [Mon, 17 Jul 2006 14:09:18 +0000 (16:09 +0200)]
[S390] Fix gcc warning about unused return values.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
17 years ago[libata] ata_piix: correct 'invalid MAP value' typo-caused error
Jeff Garzik [Tue, 11 Jul 2006 19:28:12 +0000 (15:28 -0400)]
[libata] ata_piix: correct 'invalid MAP value' typo-caused error

Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[libata] ata_piix: minor cleanups noticed in prior patch run
Jeff Garzik [Tue, 11 Jul 2006 17:11:17 +0000 (13:11 -0400)]
[libata] ata_piix: minor cleanups noticed in prior patch run

* delete unused PIIX_FLAG_COMBINED*
* port_enable should be u16 rather than u32

Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[libata] ata_piix: attempt to fix ICH8 support
Jeff Garzik [Tue, 11 Jul 2006 15:57:44 +0000 (11:57 -0400)]
[libata] ata_piix: attempt to fix ICH8 support

Take into account the fact that ICH8 changed the register layout of
the MAP and PCS register bits.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[libata] ata_piix: Consolidate PCS register writing
Jeff Garzik [Tue, 11 Jul 2006 15:48:50 +0000 (11:48 -0400)]
[libata] ata_piix: Consolidate PCS register writing

Prior to this patch, the driver would do this for each port:
read 8-bit PCS
write 8-bit PCS
read 8-bit PCS
write 8-bit PCS

In the field, flaky behavior has been observed related to this register.
In particular, these overzealous register writes can cause misdetection
problems.

Update to do the following once (not once per port) at boot:
read 16-bit PCS
if needs changing,
write 16-bit PCS

And thereafter, we only perform a 'read 16-bit PCS' per port.

This should eliminate all PCS writes in many cases, and be more friendly
in the cases where we do need to enable ports.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[PATCH] ata_piix: add host_set private structure
Tejun Heo [Wed, 28 Jun 2006 16:58:28 +0000 (01:58 +0900)]
[PATCH] ata_piix: add host_set private structure

Add host_set private structure piix_host_priv.  Currently the only
field is ->map which used to be stored directly at
host_set->private_data.  This change allows more host_set private
fields to be added.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoLinux 2.6.18-rc2 v2.6.18-rc2
Linus Torvalds [Sat, 15 Jul 2006 21:53:08 +0000 (14:53 -0700)]
Linux 2.6.18-rc2

Finishing up for the kernel summit. Ottawa, here I come.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy...
Linus Torvalds [Sat, 15 Jul 2006 21:43:30 +0000 (14:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/shaggy/jfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: commit_mutex cleanups

17 years ago[PATCH] UML - fix utsname build breakage
Jeff Dike [Fri, 14 Jul 2006 19:52:23 +0000 (15:52 -0400)]
[PATCH] UML - fix utsname build breakage

Some -mm-only material leaked into a patch destined for mainline, and I didn't
notice.

This was the replacement of system_utsname with utsname() that's required by
the uts namespace patch.  This patch reverts those changes (which are correct
in -mm) so that mainline UML builds again.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoDon't allow chmod() on the /proc/<pid>/ files
Linus Torvalds [Sat, 15 Jul 2006 19:26:45 +0000 (12:26 -0700)]
Don't allow chmod() on the /proc/<pid>/ files

This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.

The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..

This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.

Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoMark /proc MS_NOSUID and MS_NOEXEC
Linus Torvalds [Sat, 15 Jul 2006 19:20:05 +0000 (12:20 -0700)]
Mark /proc MS_NOSUID and MS_NOEXEC

Not that we really need this any more, but at the same time there's no
reason not to do this.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] sch_htb compile fix.
Dave Jones [Sat, 15 Jul 2006 07:41:12 +0000 (03:41 -0400)]
[PATCH] sch_htb compile fix.

net/sched/sch_htb.c: In function 'htb_change_class':
net/sched/sch_htb.c:1605: error: expected ';' before 'do_gettimeofday'

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Sat, 15 Jul 2006 04:57:45 +0000 (21:57 -0700)]
Merge /pub/scm/linux/kernel/git/herbert/crypto-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  [CRYPTO] padlock: Fix alignment after aes_ctx rearrange

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Sat, 15 Jul 2006 04:57:23 +0000 (21:57 -0700)]
Merge /pub/scm/linux/kernel/git/davem/sparc-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64] Fix PSYCHO PCI controler init.
  [SPARC64] psycho: Fix pbm->name handling in pbm_register_toplevel_resources()
  [SERIAL] sunsab: Fix significant typo in sab_probe()
  [SERIAL] sunsu: Report keyboard and mouse ports in kernel log.
  [SPARC64]: Make sure IRQs are disabled properly during early boot.

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sat, 15 Jul 2006 04:57:06 +0000 (21:57 -0700)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [VLAN]: __vlan_hwaccel_rx can use the faster ether_compare_addr
  [PKT_SCHED] HTB: initialize upper bound properly
  [IPV4]: Clear skb cb on IP input
  [NET]: Update frag_list in pskb_trim

17 years ago[PATCH] remove set_wmb - arch removal
Steven Rostedt [Fri, 14 Jul 2006 20:05:03 +0000 (16:05 -0400)]
[PATCH] remove set_wmb - arch removal

set_wmb should not be used in the kernel because it just confuses the
code more and has no benefit.  Since it is not currently used in the
kernel this patch removes it so that new code does not include it.

All archs define set_wmb(var, value) to do { var = value; wmb(); }
while(0) except ia64 and sparc which use a mb() instead.  But this is
still moot since it is not used anyway.

Hasn't been tested on any archs but x86 and x86_64 (and only compiled
tested)

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] remove set_wmb - doc update
Steven Rostedt [Fri, 14 Jul 2006 20:05:01 +0000 (16:05 -0400)]
[PATCH] remove set_wmb - doc update

This patch removes the reference to set_wmb from memory-barriers.txt
since it shouldn't be used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] Remove down_write() from taskstats code invoked on the exit() path
Shailabh Nagar [Fri, 14 Jul 2006 07:24:47 +0000 (00:24 -0700)]
[PATCH] Remove down_write() from taskstats code invoked on the exit() path

In send_cpu_listeners(), which is called on the exit path, a down_write()
was protecting operations like skb_clone() and genlmsg_unicast() that do
GFP_KERNEL allocations.  If the oom-killer decides to kill tasks to satisfy
the allocations,the exit of those tasks could block on the same semphore.

The down_write() was only needed to allow removal of invalid listeners from
the listener list.  The patch converts the down_write to a down_read and
defers the removal to a separate critical region.  This ensures that even
if the oom-killer is called, no other task's exit is blocked as it can
still acquire another down_read.

Thanks to Andrew Morton & Herbert Xu for pointing out the oom related
pitfalls, and to Chandra Seetharaman for suggesting this fix instead of
using something more complex like RCU.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task delay accounting taskstats interface: control exit data through...
Shailabh Nagar [Fri, 14 Jul 2006 07:24:47 +0000 (00:24 -0700)]
[PATCH] per-task delay accounting taskstats interface: control exit data through cpumasks

On systems with a large number of cpus, with even a modest rate of tasks
exiting per cpu, the volume of taskstats data sent on thread exit can
overflow a userspace listener's buffers.

One approach to avoiding overflow is to allow listeners to get data for a
limited and specific set of cpus.  By scaling the number of listeners
and/or the cpus they monitor, userspace can handle the statistical data
overload more gracefully.

In this patch, each listener registers to listen to a specific set of cpus
by specifying a cpumask.  The interest is recorded per-cpu.  When a task
exits on a cpu, its taskstats data is unicast to each listener interested
in that cpu.

Thanks to Andrew Morton for pointing out the various scalability and
general concerns of previous attempts and for suggesting this design.

[akpm@osdl.org: build fix]
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task delay accounting: avoid send without listeners
Shailabh Nagar [Fri, 14 Jul 2006 07:24:46 +0000 (00:24 -0700)]
[PATCH] per-task delay accounting: avoid send without listeners

Don't send taskstats (per-pid or per-tgid) on thread exit when no one is
listening for such data.

Currently the taskstats interface allocates a structure, fills it in and
calls netlink to send out per-pid and per-tgid stats regardless of whether
a userspace listener for the data exists (netlink layer would check for
that and avoid the multicast).

As a result of this patch, the check for the no-listener case is performed
early, avoiding the redundant allocation and filling up of the taskstats
structures.

Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per task delay accounting taskstats interface: documentation fix
Shailabh Nagar [Fri, 14 Jul 2006 07:24:45 +0000 (00:24 -0700)]
[PATCH] per task delay accounting taskstats interface: documentation fix

Change documentation and example program to reflect the flow control issues
being addressed by the cpumask changes.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] delay accounting taskstats interface send tgid once
Shailabh Nagar [Fri, 14 Jul 2006 07:24:44 +0000 (00:24 -0700)]
[PATCH] delay accounting taskstats interface send tgid once

Send per-tgid data only once during exit of a thread group instead of once
with each member thread exit.

Currently, when a thread exits, besides its per-tid data, the per-tgid data
of its thread group is also sent out, if its thread group is non-empty.
The per-tgid data sent consists of the sum of per-tid stats for all
*remaining* threads of the thread group.

This patch modifies this sending in two ways:

- the per-tgid data is sent only when the last thread of a thread group
  exits.  This cuts down heavily on the overhead of sending/receiving
  per-tgid data, especially when other exploiters of the taskstats
  interface aren't interested in per-tgid stats

- the semantics of the per-tgid data sent are changed.  Instead of being
  the sum of per-tid data for remaining threads, the value now sent is the
  true total accumalated statistics for all threads that are/were part of
  the thread group.

The patch also addresses a minor issue where failure of one accounting
subsystem to fill in the taskstats structure was causing the send of
taskstats to not be sent at all.

The patch has been tested for stability and run cerberus for over 4 hours
on an SMP.

[akpm@osdl.org: bugfixes]
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: /proc export of aggregated block I/O delays
Shailabh Nagar [Fri, 14 Jul 2006 07:24:43 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: /proc export of aggregated block I/O delays

Export I/O delays seen by a task through /proc/<tgid>/stats for use in top
etc.

Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
together with all other I/O here (this is not the case in the netlink
interface where the swapin I/O is kept distinct)

[akpm@osdl.org: printk warning fix]
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: documentation
Shailabh Nagar [Fri, 14 Jul 2006 07:24:42 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: documentation

Some documentation for delay accounting.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: delay accounting usage of taskstats interface
Shailabh Nagar [Fri, 14 Jul 2006 07:24:41 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: delay accounting usage of taskstats interface

Usage of taskstats interface by delay accounting.

Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: taskstats interface
Shailabh Nagar [Fri, 14 Jul 2006 07:24:40 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: taskstats interface

Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC
family), for getting statistics of tasks and thread groups during their
lifetime and when they exit.  The interface is intended for use by multiple
accounting packages though it is being created in the context of delay
accounting.

This patch creates the interface without populating the fields of the data
that is sent to the user in response to a command or upon the exit of a task.
Each accounting package interested in using taskstats has to provide an
additional patch to add its stats to the common structure.

[akpm@osdl.org: cleanups, Kconfig fix]
Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: utilities for genetlink usage
Balbir Singh [Fri, 14 Jul 2006 07:24:39 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: utilities for genetlink usage

Two utilities for simplifying usage of NETLINK_GENERIC interface.

Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: cpu delay collection via schedstats
Chandra Seetharaman [Fri, 14 Jul 2006 07:24:38 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: cpu delay collection via schedstats

Make the task-related schedstats functions callable by delay accounting even
if schedstats collection isn't turned on.  This removes the dependency of
delay accounting on schedstats.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: sync block I/O and swapin delay collection
Shailabh Nagar [Fri, 14 Jul 2006 07:24:37 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: sync block I/O and swapin delay collection

Unlike earlier iterations of the delay accounting patches, now delays are only
collected for the actual I/O waits rather than try and cover the delays seen
in I/O submission paths.

Account separately for block I/O delays incurred as a result of swapin page
faults whose frequency can be affected by the task/process' rss limit.  Hence
swapin delays can act as feedback for rss limit changes independent of I/O
priority changes.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] per-task-delay-accounting: setup
Shailabh Nagar [Fri, 14 Jul 2006 07:24:36 +0000 (00:24 -0700)]
[PATCH] per-task-delay-accounting: setup

Initialization code related to collection of per-task "delay" statistics which
measure how long it had to wait for cpu, sync block io, swapping etc.  The
collection of statistics and the interface are in other patches.  This patch
sets up the data structures and allows the statistics collection to be
disabled through a kernel boot parameter.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
17 years ago[PATCH] list_is_last utility
Shailabh Nagar [Fri, 14 Jul 2006 07:24:35 +0000 (00:24 -0700)]
[PATCH] list_is_last utility

Add another list utility function to check for last element in a list.

Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>