6 years agommc: block: replace __blk_end_request() with blk_end_request()
Subhash Jadavani [Thu, 7 Jun 2012 10:16:58 +0000 (15:46 +0530)]
mmc: block: replace __blk_end_request() with blk_end_request()

For completing any block request, MMC block driver is calling:

But if we analyze the sources of latency in kernel using ftrace,
__blk_end_request() function at times may take up to 6.5ms with
spinlock held and irq disabled.

__blk_end_request() calls couple of functions and ftrace output
shows that blk_update_bidi_request() function is almost taking 6ms.
There are 2 function to end the current request: ___blk_end_request()
and blk_end_request(). Both these functions do same thing except
that blk_end_request() function doesn't take up the spinlock
while calling the blk_update_bidi_request().

This patch replaces all __blk_end_request() calls with
blk_end_request() and __blk_end_request_all() calls with

Testing done: 20 process concurrent read/write on sd card
and eMMC. Ran this test for almost a day on multicore system
and no errors observed.

This change is not meant for improving MMC throughput; it's basically
about becoming fair to other threads/interrupts in the system. By
holding spin lock and interrupts disabled for longer duration, we
won't allow other threads/interrupts to run at all.  Actually slight
performance degradation at file system level can be expected as we
are not holding the spin lock during blk_update_bidi_request() which
means our mmcqd thread may get preempted for other high priority
thread or any interrupt in the system.

These are performance numbers (100MB file write) with eMMC running
in DDR mode:

Without this patch:
Name of the Test   Value   Unit
LMDD Read Test     53.79   MBPS
LMDD Write Test    18.86   MBPS
IOZONE  Read Test  51.65   MBPS
IOZONE  Write Test 24.36   MBPS

With this patch:
Name of the Test    Value  Unit
LMDD Read Test      52.94  MBPS
LMDD Write Test     16.70  MBPS
IOZONE  Read Test   52.08  MBPS
IOZONE  Write Test  23.29  MBPS

Read numbers are fine. Write numbers are bit down (especially LMDD
write), may be because write requests normally have large transfer
size and which means there are chances that while mmcq is executing
blk_update_bidi_request(), it may get interrupted by interrupts or
other high priority thread.

Signed-off-by: Subhash Jadavani <>
Reviewed-by: Namjae Jeon <>
Signed-off-by: Chris Ball <>
6 years agommc: block: fix the data timeout issue with ACMD22
Subhash Jadavani [Wed, 13 Jun 2012 11:40:43 +0000 (17:10 +0530)]
mmc: block: fix the data timeout issue with ACMD22

If multi block write operation fails for SD card, during
error handling we send the SD_APP_SEND_NUM_WR_BLKS (ACMD22)
to know how many blocks were already programmed by card.

But mmc_sd_num_wr_blocks() function which sends the ACMD22
calculates the data timeout value from csd.tacc_ns and
csd.tacc_clks parameters which will be 0 for block addressed
(>2GB cards) SD card. This would result in timeout_ns and
timeout_clks being 0 in the mmc_request passed to host driver.
This means host controller would program its data timeout timer
value with 0 which could result in DATA TIMEOUT errors from

To fix this issue, mmc_sd_num_wr_blocks() should instead
just call the mmc_set_data_timeout() to calculate the
data timeout value. mmc_set_data_timeout() function
ensures that non zero timeout value is set even for
block addressed SD cards.

Signed-off-by: Subhash Jadavani <>
Reviewed-by: Venkatraman S <>
Signed-off-by: Chris Ball <>
6 years agommc: card: Avoid null pointer dereference
Philippe De Swert [Wed, 11 Apr 2012 20:31:45 +0000 (23:31 +0300)]
mmc: card: Avoid null pointer dereference

After the null check on md the code jumped to cmd_done, which then
will dereference md in mmc_blk_put. This patch avoids the possible
null pointer dereference in that case.

Signed-off-by: Philippe De Swert <>
Reviewed-by: Namjae Jeon <>
Signed-off-by: Chris Ball <>
6 years agommc: card: Kill block requests if card is removed
Sujit Reddy Thumma [Thu, 8 Dec 2011 08:35:50 +0000 (14:05 +0530)]
mmc: card: Kill block requests if card is removed

Kill block requests when the host realizes that the card is
removed from the slot and is sure that subsequent requests
are bound to fail. Do this silently so that the block
layer doesn't output unnecessary error messages.

Signed-off-by: Sujit Reddy Thumma <>
Acked-by: Adrian Hunter <>
Signed-off-by: Chris Ball <>
6 years agommc: allow upper layers to know immediately if card has been removed
Adrian Hunter [Mon, 28 Nov 2011 14:22:00 +0000 (16:22 +0200)]
mmc: allow upper layers to know immediately if card has been removed

Add a function mmc_detect_card_removed() which upper layers can use to
determine immediately if a card has been removed. This function should
be called after an I/O request fails so that all queued I/O requests
can be errored out immediately instead of waiting for the card device
to be removed.

Signed-off-by: Adrian Hunter <>
Acked-by: Sujit Reddy Thumma <>
Signed-off-by: Chris Ball <>
6 years agommc: sdio: Fix to support any block size optimally
Stefan Nilsson XK [Wed, 26 Oct 2011 08:52:17 +0000 (10:52 +0200)]
mmc: sdio: Fix to support any block size optimally

This patch allows any block size to be set on the SDIO link,
and still have an arbitrary sized packet (adjusted in size by
using sdio_align_size) transferred in an optimal way
(preferably one transfer).

Previously if the block size was larger than the default of
512 bytes and the transfer size was exactly one block size
(possibly thanks to using sdio_align_size to get an optimal
transfer size), it was sent as a number of byte transfers instead
of one block transfer. Also if the number of blocks was
(max_blocks * N) + 1, the tranfer would be conducted with a number
of blocks and finished off with a number of byte transfers.

When doing this change it was also possible to break out the quirk
for broken byte mode in a much cleaner way, and collect the logic of
when to do byte or block transfer in one function instead of two.

Signed-off-by: Stefan Nilsson XK <>
Signed-off-by: Ulf Hansson <>
Acked-by: Linus Walleij <>
Signed-off-by: Chris Ball <>
6 years agoARM: OMAP: dma: fix error return code in omap_system_dma_probe()
Wei Yongjun [Tue, 16 Jul 2013 12:10:46 +0000 (20:10 +0800)]
ARM: OMAP: dma: fix error return code in omap_system_dma_probe()

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <>
Signed-off-by: Tony Lindgren <>
6 years agoARM: OMAP: dma: Remove the erroneous freeing of platform data
Rajendra Nayak [Thu, 13 Jun 2013 14:17:11 +0000 (19:47 +0530)]
ARM: OMAP: dma: Remove the erroneous freeing of platform data

Given p = pdev->dev.platform_data; and
      d = p->dma_attr;
the freeing of either one of these by the driver
seems just plain wrong.

Get rid of them in the .probe failure path as well as the

Signed-off-by: Rajendra Nayak <>
Signed-off-by: Tony Lindgren <>
6 years agoARM: OMAP: dma: Fix the dma_chan_link_map init order
R Sricharan [Thu, 13 Jun 2013 14:17:10 +0000 (19:47 +0530)]
ARM: OMAP: dma: Fix the dma_chan_link_map init order

Init dma_chan_link_map[lch] *after* its memset to 0.

Signed-off-by: R Sricharan <>
Signed-off-by: Tony Lindgren <>
6 years agoARM: OMAP: dma: Remove the wrong dev_id check
R Sricharan [Thu, 13 Jun 2013 14:17:09 +0000 (19:47 +0530)]
ARM: OMAP: dma: Remove the wrong dev_id check

Once a free channel is found, the check for dev_id == 0 does
not make any sense. Get rid of it.

Signed-off-by: R Sricharan <>
Signed-off-by: Tony Lindgren <>
6 years agoARM: OMAP: Fix the use of uninitialized dma_lch_count
Chen Gang [Fri, 11 Jan 2013 05:39:18 +0000 (13:39 +0800)]
ARM: OMAP: Fix the use of uninitialized dma_lch_count

  'omap_dma_reserve_channels' when used is suppose to be from command.
    so, it alreay has value before 1st call of omap_system_dma_probe.
    and it will never be changed again during running (not from ioctl).

  but 'dma_lch_count' is zero before 1st call of omap_system_dma_probe.
    so it will be failed for omap_dma_reserve_channels, when 1st call.

  so, need use 'd->lch_count' instead of 'dma_lch_count' for judging.

Signed-off-by: Chen Gang <>
Signed-off-by: Santosh Shilimkar <>
Signed-off-by: Tony Lindgren <>
6 years agowl1251: allow to disable dynamic ps
Grazvydas Ignotas [Sun, 7 Jun 2015 00:39:52 +0000 (03:39 +0300)]
wl1251: allow to disable dynamic ps

Some people with CC units reported they get better wifi without dynamic
PS enabled, so allow to disable it. I guess this could be related to
missing clock driver resistor tweak, which breaks chip's own PS.

6 years agopandora: defconfig: switch to minstrel default
Grazvydas Ignotas [Sun, 7 Jun 2015 00:17:11 +0000 (03:17 +0300)]
pandora: defconfig: switch to minstrel default

I don't know why we had pid there, most distributions seem to be
using minstrel, newer kernels even removed pid completely.

6 years agomac80211: cosmetics for minstrel_debugfs
Karl Beldan [Wed, 17 Apr 2013 12:08:26 +0000 (14:08 +0200)]
mac80211: cosmetics for minstrel_debugfs

This changes the minstrel stats ouput from:

rate     throughput  ewma prob   this prob  this succ/attempt   success    attempts
 BCD   6         0.0        0.0        0.0          0(  0)          0           0


rate      throughput  ewma prob  this prob  this succ/attempt   success    attempts
 BCD   6         0.0        0.0        0.0             0(  0)         0           0

Signed-off-by: Karl Beldan <>
Acked-by: Felix Fietkau <>
Signed-off-by: Johannes Berg <>
6 years agominstrel: update stats after processing status
Johannes Berg [Thu, 15 Nov 2012 17:27:56 +0000 (18:27 +0100)]
minstrel: update stats after processing status

Instead of updating stats before sending a packet,
update them after processing the packet's status.
This makes minstrel in line with minstrel_ht.

Signed-off-by: Johannes Berg <>
6 years agomac80211: correct size the argument to kzalloc in minstrel_ht
Thomas Huehn [Fri, 29 Jun 2012 13:26:27 +0000 (06:26 -0700)]
mac80211: correct size the argument to kzalloc in minstrel_ht

msp has type struct minstrel_ht_sta_priv not struct minstrel_ht_sta.

(This incorporates the fixup originally posted as "mac80211: fix kzalloc
memory corruption introduced in minstrel_ht". -- JWL)

Reported-by: Fengguang Wu <>
Reported-by: Dan Carpenter <>
Signed-off-by: Thomas Huehn <>
Acked-by: Johannes Berg <>
Signed-off-by: John W. Linville <>
6 years agomac80211: Don't sample max throughput rate in minstrel_ht
Helmut Schaa [Wed, 14 Mar 2012 12:31:11 +0000 (13:31 +0100)]
mac80211: Don't sample max throughput rate in minstrel_ht

The current max throughput rate is known to be good as otherwise it
wouldn't be the max throughput rate. Since rate sampling can introduce
some overhead (by adding RTS for example or due to not aggregating the
frame) don't sample the max throughput rate.

Signed-off-by: Helmut Schaa <>
Acked-by: Felix Fietkau <>
Signed-off-by: John W. Linville <>
6 years agomac80211: Disable MCS > 7 in minstrel_ht when STA uses static SMPS
Helmut Schaa [Fri, 9 Mar 2012 13:13:45 +0000 (14:13 +0100)]
mac80211: Disable MCS > 7 in minstrel_ht when STA uses static SMPS

Disable multi stream rates (MCS > 7) when a STA is in static SMPS mode
since it has only one active rx chain. Hence, it doesn't even make
sense to sample multi stream rates.

Signed-off-by: Helmut Schaa <>
Signed-off-by: John W. Linville <>
6 years agominstrel_ht: Remove unused function parameters
Patrick Kelle [Tue, 15 Nov 2011 15:44:48 +0000 (16:44 +0100)]
minstrel_ht: Remove unused function parameters

Remove unused function parameters in the following functions:

Signed-off-by: Patrick Kelle <>
Signed-off-by: John W. Linville <>
6 years agomac80211: Get rid of search loop for rate group index
Helmut Schaa [Mon, 14 Nov 2011 14:28:20 +0000 (15:28 +0100)]
mac80211: Get rid of search loop for rate group index

Finding the group index for a specific rate is done by looping through
all groups and returning if the correct one is found. This code is
called for each tx'ed frame and thus it makes sense to reduce its

Do this by calculating the group index by this formula based on the SGI
and HT40 flags as well as the stream number:

idx = (HT40 * 2 * MINSTREL_MAX_STREAMS) +
      (streams - 1)

Hence, the groups are ordered by th HT40 flag first, then by the SGI
flag and afterwards by the number of used streams.

This should reduce the runtime of minstrel_ht_get_group_idx

Signed-off-by: Helmut Schaa <>
Acked-by: Felix Fietkau <>
Signed-off-by: John W. Linville <>
6 years agomac80211: Check rate->idx before rate->count
Helmut Schaa [Mon, 14 Nov 2011 14:28:19 +0000 (15:28 +0100)]
mac80211: Check rate->idx before rate->count

The drivers are not required to fill in rate->count if rate->idx is set
to -1. Hence, we should first check rate->idx before accessing

Signed-off-by: Helmut Schaa <>
Acked-by: Felix Fietkau <>
Signed-off-by: John W. Linville <>
6 years agominstrel: Remove unused function parameter in calc_rate_durations()
Patrick Kelle [Thu, 10 Nov 2011 14:13:11 +0000 (15:13 +0100)]
minstrel: Remove unused function parameter in calc_rate_durations()

Signed-off-by: Patrick Kelle <>
Signed-off-by: John W. Linville <>
6 years agoRevert "pagemap: do not leak physical addresses to non-privileged userspace"
Grazvydas Ignotas [Sun, 7 Jun 2015 00:18:58 +0000 (03:18 +0300)]
Revert "pagemap: do not leak physical addresses to non-privileged userspace"

This reverts commit 1ffc3cd9a36b504c20ce98fe5eeb5463f389e1ac.

Don't need it on pandora - even if rowhammer worked, pandora is almost
never a multiuser system, and cache invalidate is a privileged instruction
already on ARM.

pagemap may also be useful for use c64_tools and such.

6 years agoMerge branch 'stable-3.2' into pandora-3.2
Grazvydas Ignotas [Sun, 7 Jun 2015 00:18:36 +0000 (03:18 +0300)]
Merge branch 'stable-3.2' into pandora-3.2

7 years agoLinux 3.2.69 v3.2.69
Ben Hutchings [Sat, 9 May 2015 22:16:42 +0000 (23:16 +0100)]
Linux 3.2.69

7 years agoRevert "KVM: s390: flush CPU on load control"
Ben Hutchings [Mon, 4 May 2015 23:34:49 +0000 (00:34 +0100)]
Revert "KVM: s390: flush CPU on load control"

This reverts commit 823f14022fd2335affc8889a9c7e1b60258883a3, which was
commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream.  It
depends on functionality that is not present in 3.2.y.

Signed-off-by: Ben Hutchings <>
Cc: Christian Borntraeger <>
7 years agoipvs: uninitialized data with IP_VS_IPV6
Dan Carpenter [Sat, 6 Dec 2014 13:49:24 +0000 (16:49 +0300)]
ipvs: uninitialized data with IP_VS_IPV6

commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream.

The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter <>
Acked-by: Julian Anastasov <>
Signed-off-by: Simon Horman <>
Signed-off-by: Ben Hutchings <>
Cc: Pablo Neira Ayuso <>
7 years agoipvs: rerouting to local clients is not needed anymore
Julian Anastasov [Thu, 18 Dec 2014 20:41:23 +0000 (22:41 +0200)]
ipvs: rerouting to local clients is not needed anymore

commit 579eb62ac35845686a7c4286c0a820b4eb1f96aa upstream.

commit f5a41847acc5 ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000c5 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner <>
Tested-by: Smart Weblications GmbH - Florian Wiessner <>
Signed-off-by: Julian Anastasov <>
Signed-off-by: Simon Horman <>
Signed-off-by: Ben Hutchings <>
Cc: Pablo Neira Ayuso <>
7 years agoIB/core: Avoid leakage from kernel to user space
Eli Cohen [Sun, 14 Sep 2014 13:47:52 +0000 (16:47 +0300)]
IB/core: Avoid leakage from kernel to user space

commit 377b513485fd885dea1083a9a5430df65b35e048 upstream.

Clear the reserved field of struct ib_uverbs_async_event_desc which is
copied to user space.

Signed-off-by: Eli Cohen <>
Reviewed-by: Yann Droneaud <>
Signed-off-by: Roland Dreier <>
Signed-off-by: Ben Hutchings <>
Cc: Yann Droneaud <>
7 years agospi: spidev: fix possible arithmetic overflow for multi-transfer message
Ian Abbott [Mon, 23 Mar 2015 17:50:27 +0000 (17:50 +0000)]
spi: spidev: fix possible arithmetic overflow for multi-transfer message

commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream.

`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length.  It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow.  For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long.  If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address.  Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.

Thanks to Dan Carpenter for reporting the potential integer overflow.

Signed-off-by: Ian Abbott <>
Signed-off-by: Mark Brown <>
[Ian Abbott: Note: original commit compares the lengths to INT_MAX
 instead of bufsiz due to changes in earlier commits.]
Signed-off-by: Ben Hutchings <>
7 years agonet: make skb_gso_segment error handling more robust
Florian Westphal [Mon, 20 Oct 2014 11:49:17 +0000 (13:49 +0200)]
net: make skb_gso_segment error handling more robust

commit 330966e501ffe282d7184fde4518d5e0c24bc7f8 upstream.

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <>
Signed-off-by: David S. Miller <>
[Brad Spengler: backported to 3.2]
Signed-off-by: Brad Spengler <>
Signed-off-by: Ben Hutchings <>
7 years agotcp: avoid looping in tcp_send_fin()
Eric Dumazet [Thu, 23 Apr 2015 17:42:39 +0000 (10:42 -0700)]
tcp: avoid looping in tcp_send_fin()

[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ]

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2:
 - Drop inapplicable change to sk_forced_wmem_schedule()
 - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
Signed-off-by: Ben Hutchings <>
7 years agoip_forward: Drop frames with attached skb->sk
Sebastian Pöhn [Mon, 20 Apr 2015 07:19:20 +0000 (09:19 +0200)]
ip_forward: Drop frames with attached skb->sk

[ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ]

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

Remove useless comment

Signed-off-by: Sebastian Poehn <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agogianfar: Carefully free skbs in functions called by netpoll.
Eric W. Biederman [Tue, 11 Mar 2014 21:20:26 +0000 (14:20 -0700)]
gianfar: Carefully free skbs in functions called by netpoll.

commit c9974ad4aeb36003860100221a594f3c0ccc3f78 upstream.

netpoll can call functions in hard irq context that are ordinarily
called in lesser contexts.  For those functions use dev_kfree_skb_any
and dev_consume_skb_any so skbs are freed safely from hard irq

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: use only dev_kfree_skb() and not dev_consume_skb_any()]
Signed-off-by: Ben Hutchings <>
7 years agobenet: Call dev_kfree_skby_any instead of kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:19:50 +0000 (14:19 -0700)]
benet: Call dev_kfree_skby_any instead of kfree_skb.

commit d8ec2c02caa3515f35d6c33eedf529394c419298 upstream.

Replace free_skb with dev_kfree_skb_any in be_tx_compl_process as
which can be called in hard irq by netpoll, softirq context
by normal napi polling, and in normal sleepable context
by the network device close method.

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoixgb: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:18:42 +0000 (14:18 -0700)]
ixgb: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit f7e79913a1d6a6139211ead3b03579b317d25a1f upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agotg3: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:18:14 +0000 (14:18 -0700)]
tg3: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit 497a27b9e1bcf6dbaea7a466cfcd866927e1b431 upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agor8169: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:16:14 +0000 (14:16 -0700)]
r8169: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit 989c9ba104d9ce53c1ca918262f3fdfb33aca12a upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years ago8139too: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:15:36 +0000 (14:15 -0700)]
8139too: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit a2ccd2e4bd70122523a7bf21cec4dd6e34427089 upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years ago8139cp: Call dev_kfree_skby_any instead of kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:14:58 +0000 (14:14 -0700)]
8139cp: Call dev_kfree_skby_any instead of kfree_skb.

commit 508f81d517ed1f3f0197df63ea7ab5cd91b6f3b3 upstream.

Replace kfree_skb with dev_kfree_skb_any in cp_start_xmit
as it can be called in both hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agotcp: make connect() mem charging friendly
Eric Dumazet [Tue, 18 Nov 2014 07:06:20 +0000 (23:06 -0800)]
tcp: make connect() mem charging friendly

[ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ]

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <>
Signed-off-by: Eric Dumazet <>
Acked-by: Yuchung Cheng <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2:
 - Drop the Fast Open changes
 - Adjust context]
Signed-off-by: Ben Hutchings <>
7 years agorxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
Al Viro [Sat, 14 Mar 2015 05:34:56 +0000 (05:34 +0000)]
rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()

[ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ]

[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]

MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there.  It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.

It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested.  If it is correct, it's
-stable fodder.

Signed-off-by: Al Viro <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agocaif: fix MSG_OOB test in caif_seqpkt_recvmsg()
Al Viro [Sat, 14 Mar 2015 05:22:21 +0000 (05:22 +0000)]
caif: fix MSG_OOB test in caif_seqpkt_recvmsg()

[ Upstream commit 3eeff778e00c956875c70b145c52638c313dfb23 ]

It should be checking flags, not msg->msg_flags.  It's ->sendmsg()
instances that need to look for that in ->msg_flags, ->recvmsg() ones
(including the other ->recvmsg() instance in that file, as well as
unix_dgram_recvmsg() this one claims to be imitating) check in flags.
Braino had been introduced in commit dcda13 ("caif: Bugfix - use MSG_TRUNC
in receive") back in 2010, so it goes quite a while back.

Signed-off-by: Al Viro <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agords: avoid potential stack overflow
Arnd Bergmann [Wed, 11 Mar 2015 21:46:59 +0000 (22:46 +0100)]
rds: avoid potential stack overflow

[ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ]

The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:

net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]

As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.

Signed-off-by: Arnd Bergmann <>
Acked-by: Sowmini Varadhan <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonet: sysctl_net_core: check SNDBUF and RCVBUF for min length
Alexey Kodanev [Wed, 11 Mar 2015 11:29:17 +0000 (14:29 +0300)]
net: sysctl_net_core: check SNDBUF and RCVBUF for min length

[ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ]

sysctl has*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

    # sysctl net.core.rmem_default=64
      net.core.wmem_default = 64
    # sysctl net.core.wmem_default=64
      net.core.wmem_default = 64
    # ping localhost -s 1024 -i 0 > /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
kernel BUG at net/core/skbuff.c:102!
invalid opcode: 0000 [#1] SMP
task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
 ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
 ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
Call Trace:
 [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
 [<ffffffff81556f56>] sock_write_iter+0x146/0x160
 [<ffffffff811d9612>] new_sync_write+0x92/0xd0
 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
 [<ffffffff811da499>] SyS_write+0x59/0xd0
 [<ffffffff81651532>] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
      00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
      eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP <ffff88003ae8bc68>
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
BUG: unable to handle kernel paging request at ffff88013caee5c0
IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0

Signed-off-by: Alexey Kodanev <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: delete now-unused 'one' variable]
Signed-off-by: Ben Hutchings <>
7 years agonet: avoid to hang up on sending due to sysctl configuration overflow. [Wed, 23 Jan 2013 20:35:28 +0000 (20:35 +0000)]
net: avoid to hang up on sending due to sysctl configuration overflow.

commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream.

    I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or

    This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:





Signed-off-by: Eric Dumazet <>
Signed-off-by: Li Yu <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2:
 - Adjust context
 - Delete now-unused 'zero' variable]
Signed-off-by: Ben Hutchings <>
7 years agonet: ping: Return EAFNOSUPPORT when appropriate.
Lorenzo Colitti [Tue, 3 Mar 2015 14:16:16 +0000 (23:16 +0900)]
net: ping: Return EAFNOSUPPORT when appropriate.

[ Upstream commit 9145736d4862145684009d6a72a6e61324a9439e ]

1. For an IPv4 ping socket, ping_check_bind_addr does not check
   the family of the socket address that's passed in. Instead,
   make it behave like inet_bind, which enforces either that the
   address family is AF_INET, or that the family is AF_UNSPEC and
   the address is
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
   if the socket family is not AF_INET6. Return EAFNOSUPPORT
   instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
   instead of EINVAL if an incorrect socket address structure is
   passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
   IPv4, and it cannot easily be made to support IPv4 because
   the protocol numbers for ICMP and ICMPv6 are different. This
   makes connect(::ffff: fail with EAFNOSUPPORT instead
   of making the socket unusable.

Among other things, this fixes an oops that can be triggered by:

    int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
    struct sockaddr_in6 sin6 = {
        .sin6_family = AF_INET6,
        .sin6_addr = in6addr_any,
    bind(s, (struct sockaddr *) &sin6, sizeof(sin6));

Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2:
 - Drop the IPv6 part
 - Adjust context, indentation]
Signed-off-by: Ben Hutchings <>
7 years agoudp: only allow UFO for packets from SOCK_DGRAM sockets
Michal Kubeček [Mon, 2 Mar 2015 17:27:11 +0000 (18:27 +0100)]
udp: only allow UFO for packets from SOCK_DGRAM sockets

[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ]

If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).

Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.

Signed-off-by: Michal Kubecek <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agousb: plusb: Add support for National Instruments host-to-host cable
Ben Shelton [Mon, 16 Feb 2015 19:47:06 +0000 (13:47 -0600)]
usb: plusb: Add support for National Instruments host-to-host cable

[ Upstream commit 42c972a1f390e3bc51ca1e434b7e28764992067f ]

The National Instruments USB Host-to-Host Cable is based on the Prolific
PL-25A1 chipset.  Add its VID/PID so the plusb driver will recognize it.

Signed-off-by: Ben Shelton <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agomacvtap: make sure neighbour code can push ethernet header
Eric Dumazet [Sat, 28 Feb 2015 02:35:35 +0000 (18:35 -0800)]
macvtap: make sure neighbour code can push ethernet header

[ Upstream commit 2f1d8b9e8afa5a833d96afcd23abcb8cdf8d83ab ]

Brian reported crashes using IPv6 traffic with macvtap/veth combo.

I tracked the crashes in neigh_hh_output()

-> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);

Neighbour code assumes headroom to push Ethernet header is
at least 16 bytes.

It appears macvtap has only 14 bytes available on arches
where NET_IP_ALIGN is 0 (like x86)

Effect is a corruption of 2 bytes right before skb->head,
and possible crashes if accessing non existing memory.

This fix should also increase IPv4 performance, as paranoid code
in ip_finish_output2() wont have to call skb_realloc_headroom()

Reported-by: Brian Rak <>
Tested-by: Brian Rak <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agomacvtap: limit head length of skb allocated
Jason Wang [Wed, 13 Nov 2013 06:00:40 +0000 (14:00 +0800)]
macvtap: limit head length of skb allocated

commit 16a3fa28630331e28208872fa5341ce210b901c7 upstream.

We currently use hdr_len as a hint of head length which is advertised by
guest. But when guest advertise a very big value, it can lead to an 64K+
allocating of kmalloc() which has a very high possibility of failure when host
memory is fragmented or under heavy stress. The huge hdr_len also reduce the
effect of zerocopy or even disable if a gso skb is linearized in guest.

To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the
head, which guarantees an order 0 allocation each time.

Cc: Stefan Hajnoczi <>
Cc: Michael S. Tsirkin <>
Signed-off-by: Jason Wang <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonet: reject creation of netdev names with colons
Matthew Thode [Wed, 18 Feb 2015 00:31:57 +0000 (18:31 -0600)]
net: reject creation of netdev names with colons

[ Upstream commit a4176a9391868bfa87705bcd2e3b49e9b9dd2996 ]

colons are used as a separator in netdev device lookup in dev_ioctl.c


Signed-off-by: Matthew Thode <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agoematch: Fix auto-loading of ematch modules.
Ignacy Gawędzki [Tue, 17 Feb 2015 19:15:20 +0000 (20:15 +0100)]
ematch: Fix auto-loading of ematch modules.

[ Upstream commit 34eea79e2664b314cab6a30fc582fdfa7a1bb1df ]

In tcf_em_validate(), after calling request_module() to load the
kind-specific module, set em->ops to NULL before returning -EAGAIN, so
that module_put() is not called again by tcf_em_tree_destroy().

Signed-off-by: Ignacy Gawędzki <>
Acked-by: Cong Wang <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoipv4: ip_check_defrag should not assume that skb_network_offset is zero
Alexander Drozdov [Thu, 5 Mar 2015 07:29:39 +0000 (10:29 +0300)]
ipv4: ip_check_defrag should not assume that skb_network_offset is zero

[ Upstream commit 3e32e733d1bbb3f227259dc782ef01d5706bdae0 ]

ip_check_defrag() may be used by af_packet to defragment outgoing packets.
skb_network_offset() of af_packet's outgoing packets is not zero.

Signed-off-by: Alexander Drozdov <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agogen_stats.c: Duplicate xstats buffer for later use
Ignacy Gawędzki [Fri, 13 Feb 2015 22:47:05 +0000 (14:47 -0800)]
gen_stats.c: Duplicate xstats buffer for later use

[ Upstream commit 1c4cff0cf55011792125b6041bc4e9713e46240f ]

The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.

[ remove a useless kfree()]

Signed-off-by: Ignacy Gawędzki <>
Signed-off-by: Cong Wang <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agortnetlink: call ->dellink on failure when ->newlink exists
WANG Cong [Fri, 13 Feb 2015 21:56:53 +0000 (13:56 -0800)]
rtnetlink: call ->dellink on failure when ->newlink exists

[ Upstream commit 7afb8886a05be68e376655539a064ec672de8a8e ]

Ignacy reported that when eth0 is down and add a vlan device
on top of it like:

  ip link add link eth0 name eth0.1 up type vlan id 1

We will get a refcount leak:

  unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2

The problem is when rtnl_configure_link() fails in rtnl_newlink(),
we simply call unregister_device(), but for stacked device like vlan,
we almost do nothing when we unregister the upper device, more work
is done when we unregister the lower device, so call its ->dellink().

Reported-by: Ignacy Gawedzki <>
Signed-off-by: Cong Wang <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoppp: deflate: never return len larger than output buffer
Florian Westphal [Wed, 28 Jan 2015 09:56:04 +0000 (10:56 +0100)]
ppp: deflate: never return len larger than output buffer

[ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ]

When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.

When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.

This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.

Reported-by: Iain Douglas <>
Signed-off-by: Florian Westphal <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoping: Fix race in free in receive path [Fri, 23 Jan 2015 22:26:02 +0000 (22:26 +0000)]
ping: Fix race in free in receive path

[ Upstream commit fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0 ]

An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().


Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet

Signed-off-by: Subash Abhinov Kasiviswanathan <>
Cc: Eric Dumazet <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonetxen: fix netxen_nic_poll() logic
Eric Dumazet [Thu, 22 Jan 2015 15:56:18 +0000 (07:56 -0800)]
netxen: fix netxen_nic_poll() logic

[ Upstream commit 6088beef3f7517717bd21d90b379714dd0837079 ]

NAPI poll logic now enforces that a poller returns exactly the budget
when it wants to be called again.

If a driver limits TX completion, it has to return budget as well when
the limit is hit, not the number of received packets.

Reported-and-tested-by: Mike Galbraith <>
Signed-off-by: Eric Dumazet <>
Fixes: d75b1ade567f ("net: less interrupt masking in NAPI")
Cc: Manish Chopra <>
Acked-by: Manish Chopra <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoipv6: stop sending PTB packets for MTU < 1280
Hagen Paul Pfeifer [Thu, 15 Jan 2015 21:34:25 +0000 (22:34 +0100)]
ipv6: stop sending PTB packets for MTU < 1280

[ Upstream commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2 ]

Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.


Signed-off-by: Fernando Gont <>
Signed-off-by: Hagen Paul Pfeifer <>
Acked-by: Hannes Frederic Sowa <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonet: rps: fix cpu unplug
Eric Dumazet [Fri, 16 Jan 2015 01:04:22 +0000 (17:04 -0800)]
net: rps: fix cpu unplug

[ Upstream commit ac64da0b83d82abe62f78b3d0e21cca31aea24fa ]

softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.

Based on initial patch from Prasad Sodagudi & Subash Abhinov

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi <>
Reported-by: Subash Abhinov Kasiviswanathan <>
Signed-off-by: Eric Dumazet <>
Cc: Tom Herbert <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoip: zero sockaddr returned on error queue
Willem de Bruijn [Thu, 15 Jan 2015 18:18:40 +0000 (13:18 -0500)]
ip: zero sockaddr returned on error queue

[ Upstream commit f812116b174e59a350acc8e4856213a166a91222 ]

The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

    struct {
            struct sock_extended_err ee;
            struct sockaddr_in(6)    offender;
    } errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.

Signed-off-by: Willem de Bruijn <>

Also verified that there is no padding between and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agojfs: fix readdir regression
Dave Kleikamp [Mon, 23 Mar 2015 21:06:26 +0000 (16:06 -0500)]
jfs: fix readdir regression

Upstream commit 44512449, "jfs: fix readdir cookie incompatibility
with NFSv4", was backported incorrectly into the stable trees which
used the filldir callback (rather than dir_emit). The position is
being incorrectly passed to filldir for the . and .. entries.

The still-maintained stable trees that need to be fixed are 3.2.y,
3.4.y and 3.10.y.

Signed-off-by: Dave Kleikamp <>
Signed-off-by: Ben Hutchings <>
7 years agoNFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
Trond Myklebust [Tue, 27 Mar 2012 22:31:25 +0000 (18:31 -0400)]
NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error

commit 14977489ffdb80d4caf5a184ba41b23b02fbacd9 upstream.

Signed-off-by: Trond Myklebust <>
[bwh: This is not merely a cleanup but also fixes a regression introduced by
 commit 3114ea7a24d3 ("NFSv4: Return the delegation if the server returns
 NFS4ERR_OPENMODE"), backported in 3.2.14]
Signed-off-by: Ben Hutchings <>
7 years agonet:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from...
Ani Sinha [Mon, 8 Sep 2014 21:49:59 +0000 (14:49 -0700)]
net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.

commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream.

Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
break old binaries and any code for which there is no access to source code.
To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.

Signed-off-by: Ani Sinha <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoipv4: Missing sk_nulls_node_init() in ping_unhash().
David S. Miller [Sat, 2 May 2015 02:02:47 +0000 (22:02 -0400)]
ipv4: Missing sk_nulls_node_init() in ping_unhash().

commit a134f083e79fb4c3d0a925691e732c56911b4326 upstream.

If we don't do that, then the poison value is left in the ->pprev

This can cause crashes if we do a disconnect, followed by a connect().

Tested-by: Linus Torvalds <>
Reported-by: Wen Xu <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agofs: take i_mutex during prepare_binprm for set[ug]id executables
Jann Horn [Sun, 19 Apr 2015 00:48:39 +0000 (02:48 +0200)]
fs: take i_mutex during prepare_binprm for set[ug]id executables

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <>
Signed-off-by: Linus Torvalds <>
[bwh: Backported to 3.2:
 - Drop the task_no_new_privs() and user namespace checks
 - Open-code file_inode()
 - Adjust context]
Signed-off-by: Ben Hutchings <>
7 years agoipv6: Don't reduce hop limit for an interface
D.S. Ljungmark [Wed, 25 Mar 2015 08:28:15 +0000 (09:28 +0100)]
ipv6: Don't reduce hop limit for an interface

commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a upstream.

A local route may have a lower hop_limit set than global routes do.

RFC 3756, Section 4.2.7, "Parameter Spoofing"

>   1.  The attacker includes a Current Hop Limit of one or another small
>       number which the attacker knows will cause legitimate packets to
>       be dropped before they reach their destination.

>   As an example, one possible approach to mitigate this threat is to
>   ignore very small hop limits.  The nodes could implement a
>   configurable minimum hop limit, and ignore attempts to set it below
>   said limit.

Signed-off-by: D.S. Ljungmark <>
Acked-by: Hannes Frederic Sowa <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust ND_PRINTK() usage]
Signed-off-by: Ben Hutchings <>
7 years agonet: rds: use correct size for max unacked packets and bytes
Sasha Levin [Tue, 3 Feb 2015 13:55:58 +0000 (08:55 -0500)]
net: rds: use correct size for max unacked packets and bytes

commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream.

Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonet: llc: use correct size for sysctl timeout entries
Sasha Levin [Sat, 24 Jan 2015 01:47:00 +0000 (20:47 -0500)]
net: llc: use correct size for sysctl timeout entries

commit 6b8d9117ccb4f81b1244aafa7bc70ef8fa45fc49 upstream.

The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonetfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
Andrey Vagin [Fri, 28 Mar 2014 09:54:32 +0000 (13:54 +0400)]
netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len

commit 223b02d923ecd7c84cf9780bb3686f455d279279 upstream.

"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.

nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8

I have seen "len" up to 280 and my host has crashes w/o this patch.

The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.

Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <>
Cc: Patrick McHardy <>
Cc: Jozsef Kadlecsik <>
Cc: "David S. Miller" <>
Signed-off-by: Andrey Vagin <>
Signed-off-by: Pablo Neira Ayuso <>
Signed-off-by: Ben Hutchings <>
7 years agoALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support
Dmitry M. Fedin [Thu, 9 Apr 2015 14:37:03 +0000 (17:37 +0300)]
ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support

commit 3dc8523fa7412e731441c01fb33f003eb3cfece1 upstream.

Adds an entry for Creative USB X-Fi to the rc_config array in
mixer_quirks.c to allow use of volume knob on the device.
Adds support for newer X-Fi Pro card, known as "Model No. SB1095"
with USB ID "041e:3237"

Signed-off-by: Dmitry M. Fedin <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Ben Hutchings <>
7 years agoocfs2: _really_ sync the right range
Al Viro [Wed, 8 Apr 2015 21:00:32 +0000 (17:00 -0400)]
ocfs2: _really_ sync the right range

commit 64b4e2526d1cf6e6a4db6213d6e2b6e6ab59479a upstream.

"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.

Spotted by Joseph Qi <> back in January;
unfortunately, I'd missed his mail back then ;-/

Signed-off-by: Al Viro <>
Signed-off-by: Ben Hutchings <>
7 years agoDefer processing of REQ_PREEMPT requests for blocked devices
Bart Van Assche [Wed, 4 Mar 2015 09:31:47 +0000 (10:31 +0100)]
Defer processing of REQ_PREEMPT requests for blocked devices

commit bba0bdd7ad4713d82338bcd9b72d57e9335a664b upstream.

SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:

BUG: unable to handle kernel paging request at ffffc9001a6bc084
IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
Call Trace:
 [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
 [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
 [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
 [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
 [<ffffffff81223b37>] __blk_run_queue+0x27/0x30
 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
 [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
 [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
 [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
 [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
 [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
 [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
 [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
 [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
 [<ffffffff811589de>] vfs_write+0xce/0x140
 [<ffffffff81158b53>] sys_write+0x53/0xa0
 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b
 [<00007f611c9d9300>] 0x7f611c9d92ff

Reported-by: Max Gurtuvoy <>
Signed-off-by: Bart Van Assche <>
Reviewed-by: Mike Christie <>
Signed-off-by: James Bottomley <>
Signed-off-by: Ben Hutchings <>
7 years agobe2iscsi: Fix kernel panic when device initialization fails
John Soni Jose [Thu, 12 Feb 2015 01:15:47 +0000 (06:45 +0530)]
be2iscsi: Fix kernel panic when device initialization fails

commit 2e7cee027b26cbe7e6685a7a14bd2850bfe55d33 upstream.

Kernel panic was happening as iscsi_host_remove() was called on
a host which was not yet added.

Signed-off-by: John Soni Jose <>
Reviewed-by: Mike Christie <>
Signed-off-by: James Bottomley <>
Signed-off-by: Ben Hutchings <>
7 years agoxen-netfront: transmit fully GSO-sized packets
Jonathan Davies [Tue, 31 Mar 2015 10:05:15 +0000 (11:05 +0100)]
xen-netfront: transmit fully GSO-sized packets

commit 0c36820e2ab7d943ab1188230fdf2149826d33c0 upstream.

xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in

Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.

The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.

So the maximum permitted size of an skb is calculated to be

Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.

Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).

Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.

Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoIB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
Shachar Raindel [Wed, 18 Mar 2015 17:39:08 +0000 (17:39 +0000)]
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

commit 8494057ab5e40df590ef6ef7d66324d3ae33356b upstream.

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Signed-off-by: Shachar Raindel <>
Signed-off-by: Jack Morgenstein <>
Signed-off-by: Or Gerlitz <>
Signed-off-by: Roland Dreier <>
Signed-off-by: Ben Hutchings <>
7 years agomac80211: fix RX A-MPDU session reorder timer deletion
Johannes Berg [Wed, 1 Apr 2015 12:20:42 +0000 (14:20 +0200)]
mac80211: fix RX A-MPDU session reorder timer deletion

commit 788211d81bfdf9b6a547d0530f206ba6ee76b107 upstream.

There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:

 * tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
 * station is destroyed
 * reorder timer fires before ieee80211_free_tid_rx() runs,
   accessing the station, thus potentially crashing due to
   the use-after-free

The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.

Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.

Signed-off-by: Johannes Berg <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
Stefan Lippers-Hollmann [Mon, 30 Mar 2015 20:44:27 +0000 (22:44 +0200)]
x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk

commit 80313b3078fcd2ca51970880d90757f05879a193 upstream.

The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.

The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.

Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.

( Searching the web seems to suggest that other Bay Trail-D mainboards
  might be affected as well. )
Signed-off-by: Stefan Lippers-Hollmann <>
Cc: Matt Fleming <>
Signed-off-by: Ingo Molnar <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Add reboot quirk for Certec BPC600
Christian Gmeiner [Wed, 7 May 2014 07:01:54 +0000 (09:01 +0200)]
x86/reboot: Add reboot quirk for Certec BPC600

commit aadca6fa4068ad1f92c492bc8507b7ed350825a2 upstream.

Certec BPC600 needs reboot=pci to actually reboot.

Signed-off-by: Christian Gmeiner <>
Cc: Matthew Garrett <>
Cc: Li Aubrey <>
Cc: Andrew Morton <>
Cc: Dave Jones <>
Cc: Fenghua Yu <>
Cc: Linus Torvalds <>
Signed-off-by: Ingo Molnar <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Add reboot quirk for Dell Latitude E5410
Ville Syrjälä [Fri, 4 Oct 2013 12:16:04 +0000 (15:16 +0300)]
x86/reboot: Add reboot quirk for Dell Latitude E5410

commit 8412da757776727796e9edd64ba94814cc08d536 upstream.

Dell Latitude E5410 needs reboot=pci to actually reboot.

Signed-off-by: Ville Syrjälä <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
Masoud Sharbiani [Thu, 26 Sep 2013 17:30:43 +0000 (10:30 -0700)]
x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list

commit b5eafc6f07c95e9f3dd047e72737449cb03c9956 upstream.

Two entries for the same system type were added, with two different vendor
names: 'Dell' and 'Dell, Inc.'.

Since a prefix match is being used by the DMI parsing code, we can eliminate
the latter as redundant.

Reported-by: "H. Peter Anvin" <>
Signed-off-by: Masoud Sharbiani <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
Dave Jones [Wed, 25 Sep 2013 00:13:44 +0000 (20:13 -0400)]
x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround

commit 7a20c2fad61aa3624e83c671d36dbd36b2661476 upstream.

This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.

Signed-off-by: Dave Jones <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
Masoud Sharbiani [Fri, 20 Sep 2013 22:59:07 +0000 (15:59 -0700)]
x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically

commit 4f0acd31c31f03ba42494c8baf6c0465150e2621 upstream.

Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.

Signed-off-by: Masoud Sharbiani <>
Signed-off-by: Vinson Lee <>
Cc: Robin Holt <>
Cc: Russell King <>
Cc: Guan Xuetao <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Remove quirk entry for SBC FITPC
David Hooper [Tue, 2 Oct 2012 14:26:53 +0000 (15:26 +0100)]
x86/reboot: Remove quirk entry for SBC FITPC

commit fcd8af585f587741c051f7124b8dee6c73c8629b upstream.

Remove the quirk for the SBC FITPC. It seems ot have been
required when the default was kbd reboot, but no longer required
now that the default is acpi reboot. Furthermore, BIOS reboot no
longer works for this board as of 2.6.39 or any of the 3.x

Signed-off-by: David Hooper <>
Signed-off-by: Alan Cox <>
Signed-off-by: Ingo Molnar <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agoACPI, x86: fix Dell M6600 ACPI reboot regression via DMI
Zhang Rui [Mon, 20 Feb 2012 06:20:06 +0000 (14:20 +0800)]
ACPI, x86: fix Dell M6600 ACPI reboot regression via DMI

commit 76eb9a30db4bc8fd172f9155247264b5f2686d7b upstream.

Dell Precision M6600 is known to require PCI reboot, so add it to
the reboot blacklist in pci_reboot_dmi_table[].

Signed-off-by: Zhang Rui <>
Signed-off-by: Len Brown <>
Signed-off-by: Ben Hutchings <>
7 years agox86/reboot: Remove VersaLogic Menlow reboot quirk
Michael D Labriola [Sun, 29 Jan 2012 19:21:17 +0000 (14:21 -0500)]
x86/reboot: Remove VersaLogic Menlow reboot quirk

commit e6d36a653becc7bbc643c399a77882e02bf552cb upstream.

This commit removes the reboot quirk originally added by commit
e19e074 ("x86: Fix reboot problem on VersaLogic Menlow boards").

Testing with a VersaLogic Ocelot (VL-EPMs-21a rev 1.00 w/ BIOS
6.5.102) revealed the following regarding the reboot hang

- v2.6.37 reboot=bios was needed.

- v2.6.38-rc1: behavior changed, reboot=acpi is needed,
  reboot=kbd and reboot=bios results in system hang.

- v2.6.38: VersaLogic patch (e19e074 "x86: Fix reboot problem on
  VersaLogic Menlow boards") was applied prior to v2.6.38-rc7.  This
  patch sets a quirk for VersaLogic Menlow boards that forces the use
  of reboot=bios, which doesn't work anymore.

- v3.2: It seems that commit 660e34c ("x86: Reorder reboot method
  preferences") changed the default reboot method to acpi prior to
  v3.0-rc1, which means the default behavior is appropriate for the
  Ocelot.  No VersaLogic quirk is required.

The Ocelot board used for testing can successfully reboot w/out
having to pass any reboot= arguments for all 3 current versions
of the BIOS.

Signed-off-by: Michael D Labriola <>
Cc: Matthew Garrett <>
Cc: Michael D Labriola <>
Cc: Kushal Koolwal <>
Cc: Linus Torvalds <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Ben Hutchings <>
7 years agoradeon: Do not directly dereference pointers to BIOS area.
David Miller [Thu, 19 Mar 2015 03:18:40 +0000 (23:18 -0400)]
radeon: Do not directly dereference pointers to BIOS area.

commit f2c9e560b406f2f6b14b345c7da33467dee9cdf2 upstream.

Use readb() and memcpy_fromio() accessors instead.

Reviewed-by: Christian König <>
Signed-off-by: David S. Miller <>
Signed-off-by: Alex Deucher <>
Signed-off-by: Ben Hutchings <>
7 years agoALSA: hda - Add one more node in the EAPD supporting candidate list
Hui Wang [Thu, 26 Mar 2015 09:14:55 +0000 (17:14 +0800)]
ALSA: hda - Add one more node in the EAPD supporting candidate list

commit af95b41426e0b58279f8ff0ebe420df49a4e96b8 upstream.

We have a HP machine which use the codec node 0x17 connecting the
internal speaker, and from the node capability, we saw the EAPD,
if we don't set the EAPD on for this node, the internal speaker
can't output any sound.

Signed-off-by: Hui Wang <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Ben Hutchings <>
7 years agohfsplus: fix B-tree corruption after insertion at position 0
Sergei Antonov [Wed, 25 Mar 2015 22:55:34 +0000 (15:55 -0700)]
hfsplus: fix B-tree corruption after insertion at position 0

commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().  In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated.  This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.

The resulting first leaf node is correct:

  | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |

But the parent index key0 still contains the previous key CNID=123:

  | key0.CNID=123 | ... |

A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from

Along the way, I removed duplicate code with unification of the if
condition.  The resulting code is equivalent to the original code
because node is never 0.

Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value.  However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch.  brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.

Signed-off-by: Sergei Antonov <>
Cc: Joe Perches <>
Reviewed-by: Vyacheslav Dubeyko <>
Acked-by: Hin-Tak Leung <>
Cc: Anton Altaparmakov <>
Cc: Al Viro <>
Cc: Christoph Hellwig <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agomm: fix anon_vma->degree underflow in anon_vma endless growing prevention
Leon Yu [Wed, 25 Mar 2015 22:55:11 +0000 (15:55 -0700)]
mm: fix anon_vma->degree underflow in anon_vma endless growing prevention

commit 3fe89b3e2a7bbf3e97657104b9b33a9d81b950b3 upstream.

I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after
upgrading to 3.19 and had no luck with 4.0-rc1 neither.

So, after looking into new logic introduced by commit 7a3ef208e662 ("mm:
prevent endless growth of anon_vma hierarchy"), I found chances are that
unlink_anon_vmas() is called without incrementing dst->anon_vma->degree
in anon_vma_clone() due to allocation failure.  If dst->anon_vma is not
NULL in error path, its degree will be incorrectly decremented in
unlink_anon_vmas() and eventually underflow when exiting as a result of
another call to unlink_anon_vmas().  That's how "kernel BUG at
mm/rmap.c:399!" is triggered for me.

This patch fixes the underflow by dropping dst->anon_vma when allocation
fails.  It's safe to do so regardless of original value of dst->anon_vma
because dst->anon_vma doesn't have valid meaning if anon_vma_clone()
fails.  Besides, callers don't care dst->anon_vma in such case neither.

Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as
anon_vma_clone() now does the work.

[ tweak comment]
Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
Signed-off-by: Leon Yu <>
Signed-off-by: Konstantin Khlebnikov <>
Reviewed-by: Michal Hocko <>
Acked-by: Rik van Riel <>
Acked-by: David Rientjes <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agoselinux: fix sel_write_enforce broken return value
Joe Perches [Tue, 24 Mar 2015 01:01:35 +0000 (18:01 -0700)]
selinux: fix sel_write_enforce broken return value

commit 6436a123a147db51a0b06024a8350f4c230e73ff upstream.

Return a negative error value like the rest of the entries in this function.

Signed-off-by: Joe Perches <>
Acked-by: Stephen Smalley <>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <>
Signed-off-by: Ben Hutchings <>
7 years agoUSB: ftdi_sio: Use jtag quirk for SNAP Connect E10
Doug Goldstein [Tue, 24 Mar 2015 01:34:48 +0000 (20:34 -0500)]
USB: ftdi_sio: Use jtag quirk for SNAP Connect E10

commit b229a0f840f774d29d8fedbf5deb344ca36b7f1a upstream.

This patch uses the existing CALAO Systems ftdi_8u2232c_probe in order
to avoid attaching a TTY to the JTAG port as this board is based on the
CALAO Systems reference design and needs the same fix up.

Signed-off-by: Doug Goldstein <>
[johan: clean up probe logic ]
Signed-off-by: Johan Hovold <>
Signed-off-by: Ben Hutchings <>
7 years agonet: use for_each_netdev_safe() in rtnl_group_changelink()
WANG Cong [Mon, 23 Mar 2015 23:31:09 +0000 (16:31 -0700)]
net: use for_each_netdev_safe() in rtnl_group_changelink()

commit d079535d5e1bf5e2e7c856bae2483414ea21e137 upstream.

In case we move the whole dev group to another netns,
we should call for_each_netdev_safe(), otherwise we get
a soft lockup:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ip:798]
 irq event stamp: 255424
 hardirqs last  enabled at (255423): [<ffffffff81a2aa95>] restore_args+0x0/0x30
 hardirqs last disabled at (255424): [<ffffffff81a2ad5a>] apic_timer_interrupt+0x6a/0x80
 softirqs last  enabled at (255422): [<ffffffff81079ebc>] __do_softirq+0x2c1/0x3a9
 softirqs last disabled at (255417): [<ffffffff8107a190>] irq_exit+0x41/0x95
 CPU: 0 PID: 798 Comm: ip Not tainted 4.0.0-rc4+ #881
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff8800d1b88000 ti: ffff880119530000 task.ti: ffff880119530000
 RIP: 0010:[<ffffffff810cad11>]  [<ffffffff810cad11>] debug_lockdep_rcu_enabled+0x28/0x30
 RSP: 0018:ffff880119533778  EFLAGS: 00000246
 RAX: ffff8800d1b88000 RBX: 0000000000000002 RCX: 0000000000000038
 RDX: 0000000000000000 RSI: ffff8800d1b888c8 RDI: ffff8800d1b888c8
 RBP: ffff880119533778 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 000000000000b5c2 R12: 0000000000000246
 R13: ffff880119533708 R14: 00000000001d5a40 R15: ffff88011a7d5a40
 FS:  00007fc01315f740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00007f367a120988 CR3: 000000011849c000 CR4: 00000000000007f0
  ffff880119533798 ffffffff811ac868 ffffffff811ac831 ffffffff811ac828
  ffff8801195337c8 ffffffff811ac8c9 ffff8801195339b0 ffff8801197633e0
  0000000000000000 ffff8801195339b0 ffff8801195337d8 ffffffff811ad2d7
 Call Trace:
  [<ffffffff811ac868>] rcu_read_lock+0x37/0x6e
  [<ffffffff811ac831>] ? rcu_read_unlock+0x5f/0x5f
  [<ffffffff811ac828>] ? rcu_read_unlock+0x56/0x5f
  [<ffffffff811ac8c9>] __fget+0x2a/0x7a
  [<ffffffff811ad2d7>] fget+0x13/0x15
  [<ffffffff811be732>] proc_ns_fget+0xe/0x38
  [<ffffffff817c7714>] get_net_ns_by_fd+0x11/0x59
  [<ffffffff817df359>] rtnl_link_get_net+0x33/0x3e
  [<ffffffff817df3d7>] do_setlink+0x73/0x87b
  [<ffffffff810b28ce>] ? trace_hardirqs_off+0xd/0xf
  [<ffffffff81a2aa95>] ? retint_restore_args+0xe/0xe
  [<ffffffff817e0301>] rtnl_newlink+0x40c/0x699
  [<ffffffff817dffe0>] ? rtnl_newlink+0xeb/0x699
  [<ffffffff81a29246>] ? _raw_spin_unlock+0x28/0x33
  [<ffffffff8143ed1e>] ? security_capable+0x18/0x1a
  [<ffffffff8107da51>] ? ns_capable+0x4d/0x65
  [<ffffffff817de5ce>] rtnetlink_rcv_msg+0x181/0x194
  [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
  [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
  [<ffffffff817de44d>] ? __rtnl_unlock+0x17/0x17
  [<ffffffff818327c6>] netlink_rcv_skb+0x4d/0x93
  [<ffffffff817de42f>] rtnetlink_rcv+0x26/0x2d
  [<ffffffff81830f18>] netlink_unicast+0xcb/0x150
  [<ffffffff8183198e>] netlink_sendmsg+0x501/0x523
  [<ffffffff8115cba9>] ? might_fault+0x59/0xa9
  [<ffffffff817b5398>] ? copy_from_user+0x2a/0x2c
  [<ffffffff817b7b74>] sock_sendmsg+0x34/0x3c
  [<ffffffff817b7f6d>] ___sys_sendmsg+0x1b8/0x255
  [<ffffffff8115c5eb>] ? handle_pte_fault+0xbd5/0xd4a
  [<ffffffff8100a2b0>] ? native_sched_clock+0x35/0x37
  [<ffffffff8109e94b>] ? sched_clock_local+0x12/0x72
  [<ffffffff8109eb9c>] ? sched_clock_cpu+0x9e/0xb7
  [<ffffffff810cadbf>] ? rcu_read_lock_held+0x3b/0x3d
  [<ffffffff811ac1d8>] ? __fcheck_files+0x4c/0x58
  [<ffffffff811ac946>] ? __fget_light+0x2d/0x52
  [<ffffffff817b8adc>] __sys_sendmsg+0x42/0x60
  [<ffffffff817b8b0c>] SyS_sendmsg+0x12/0x1c
  [<ffffffff81a29e32>] system_call_fastpath+0x12/0x17

Fixes: e7ed828f10bd8 ("netlink: support setting devgroup parameters")
Signed-off-by: Cong Wang <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agousb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers
Lu Baolu [Mon, 23 Mar 2015 16:27:42 +0000 (18:27 +0200)]
usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers

commit 227a4fd801c8a9fa2c4700ab98ec1aec06e3b44d upstream.

When a device with an isochronous endpoint is plugged into the Intel
xHCI host controller, and the driver submits multiple frames per URB,
the xHCI driver will set the Block Event Interrupt (BEI) flag on all
but the last TD for the URB. This causes the host controller to place
an event on the event ring, but not send an interrupt. When the last
TD for the URB completes, BEI is cleared, and we get an interrupt for
the whole URB.

However, under Intel xHCI host controllers, if the event ring is full
of events from transfers with BEI set,  an "Event Ring is Full" event
will be posted to the last entry of the event ring,  but no interrupt
is generated. Host will cease all transfer and command executions and
wait until software completes handling the pending events in the event
ring.  That means xHC stops, but event of "event ring is full" is not
notified. As the result, the xHC looks like dead to user.

This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And
it should be backported to kernels as old as 3.0, that contains the
commit 69e848c2090a ("Intel xhci: Support EHCI/xHCI port switching.").

Signed-off-by: Lu Baolu <>
Tested-by: Alistair Grant <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agousb: xhci: handle Config Error Change (CEC) in xhci driver
Lu Baolu [Mon, 23 Mar 2015 16:27:41 +0000 (18:27 +0200)]
usb: xhci: handle Config Error Change (CEC) in xhci driver

commit 9425183d177aa4a2f09d01a74925124f0778b595 upstream.

Linux xHCI driver doesn't report and handle port cofig error change.
If Port Configure Error for root hub port occurs, CEC bit in PORTSC
would be set by xHC and remains 1. This happends when the root port
fails to configure its link partner, e.g. the port fails to exchange
port capabilities information using Port Capability LMPs.

Then the Port Status Change Events will be blocked until all status
change bits(CEC is one of the change bits) are cleared('0') (refer to
xHCI spec 4.19.2). Otherwise, the port status change event for this
root port will not be generated anymore, then root port would look
like dead for user and can't be recovered until a Host Controller

This patch is to check CEC bit in PORTSC in xhci_get_port_status()
and set a Config Error in the return status if CEC is set. This will
cause a ClearPortFeature request, where CEC bit is cleared in

[The commit log is based on initial Marvell patch posted at]

Reported-by: Gregory CLEMENT <>
Signed-off-by: Lu Baolu <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2:
 - Fix indentation
 - s/raw_port_status/temp/]
Signed-off-by: Ben Hutchings <>
7 years agowriteback: fix possible underflow in write bandwidth calculation
Tejun Heo [Mon, 23 Mar 2015 04:18:48 +0000 (00:18 -0400)]
writeback: fix possible underflow in write bandwidth calculation

commit c72efb658f7c8b27ca3d0efb5cfd5ded9fcac89e upstream.

From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001
From: Tejun Heo <>
Date: Mon, 23 Mar 2015 00:08:19 -0400

2f800fbd777b ("writeback: fix dirtied pages accounting on redirty")
introduced account_page_redirty() which reverts stat updates for a
redirtied page, making BDI_DIRTIED no longer monotonically increasing.

bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
basis for bandwidth calculation.  While unlikely, since the above
patch, the newer value may be lower than the recorded past value and
underflow the bandwidth calculation leading to a wild result.

Fix it by subtracing min of the old and new values when calculating
delta.  AFAIK, there hasn't been any report of it happening but the
resulting erratic behavior would be non-critical and temporary, so
it's possible that the issue is happening without being reported.  The
risk of the fix is very low, so tagged for -stable.

Signed-off-by: Tejun Heo <>
Cc: Jens Axboe <>
Cc: Jan Kara <>
Cc: Wu Fengguang <>
Cc: Greg Thelen <>
Fixes: 2f800fbd777b ("writeback: fix dirtied pages accounting on redirty")
Signed-off-by: Jens Axboe <>
Signed-off-by: Ben Hutchings <>
7 years agosched: Fix RLIMIT_RTTIME when PI-boosting to RT
Brian Silverman [Thu, 19 Feb 2015 00:23:56 +0000 (16:23 -0800)]
sched: Fix RLIMIT_RTTIME when PI-boosting to RT

commit 746db9443ea57fd9c059f62c4bfbf41cf224fe13 upstream.

When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.

I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.

Signed-off-by: Brian Silverman <>
Signed-off-by: Peter Zijlstra (Intel) <>
Signed-off-by: Ingo Molnar <>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <>
7 years agoperf: Fix irq_work 'tail' recursion
Peter Zijlstra [Thu, 19 Feb 2015 17:03:11 +0000 (18:03 +0100)]
perf: Fix irq_work 'tail' recursion

commit d525211f9d1be8b523ec7633f080f2116f5ea536 upstream.

Vince reported a watchdog lockup like:

[<ffffffff8115e114>] perf_tp_event+0xc4/0x210
[<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160
[<ffffffff810b7f10>] lock_release+0x130/0x260
[<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40
[<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80
[<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0
[<ffffffff811f71ce>] send_sigio+0xae/0x100
[<ffffffff811f72b7>] kill_fasync+0x97/0xf0
[<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0
[<ffffffff8115d103>] perf_pending_event+0x33/0x60
[<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80
[<ffffffff8114e448>] irq_work_run+0x18/0x40
[<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0
[<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver <>
Tested-by: Vince Weaver <>
Signed-off-by: Peter Zijlstra (Intel) <>
Cc: Arnaldo Carvalho de Melo <>
Cc: Jiri Olsa <>
Cc: Paul Mackerras <>
Cc: Steven Rostedt <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Ben Hutchings <>
7 years agocifs: fix use-after-free bug in find_writable_file
David Disseldorp [Fri, 13 Mar 2015 13:20:29 +0000 (14:20 +0100)]
cifs: fix use-after-free bug in find_writable_file

commit e1e9bda22d7ddf88515e8fe401887e313922823e upstream.

Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:

Thread 1                                        Thread 2
========                                        ========

inv_file = NULL
refind = 0

// invalidHandle found on openFileList

inv_file = open_file
// inv_file->count currently 1

// inv_file->count = 2


cifs_reopen_file()                            cifs_close()
// fails (rc != 0)                            ->cifsFileInfo_put()
                                       // inv_file->count = 1



  // inv_file->count = 0
  // cleanup!!


// refind = 1
goto refind_writable;

At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.

Signed-off-by: David Disseldorp <>
Reviewed-by: Jeff Layton <>
Signed-off-by: Steve French <>
Signed-off-by: Ben Hutchings <>