pandora-kernel.git
6 years agousb: musb: drop __deprecated flag
Felipe Balbi [Wed, 18 Apr 2012 10:49:20 +0000 (13:49 +0300)]
usb: musb: drop __deprecated flag

Looks like we cannot live without that double_buffer_not_ok
flag due to many HW bugs this MUSB core has.

So, let's drop the __deprecated flag to avoid annoying
compile warnings.

Signed-off-by: Felipe Balbi <balbi@ti.com>
6 years agowatchdog: twl4030_wdt: disable autoload
Grazvydas Ignotas [Fri, 3 Jul 2015 21:38:29 +0000 (00:38 +0300)]
watchdog: twl4030_wdt: disable autoload

3.2 doesn't allow multiple watchdogs (fails to register misc device),
and we already always have OMAP watchdog loaded, so disable twl4030_wdt.
To use twl4030_wdt driver, the user needs these manual steps:
  echo omap_wdt > /sys/bus/platform/drivers/omap_wdt/unbind
  modprobe twl4030_wdt

6 years agoARM: let's tune for cortex-a8 when compiling for pandora
Grazvydas Ignotas [Tue, 30 Jun 2015 23:28:05 +0000 (02:28 +0300)]
ARM: let's tune for cortex-a8 when compiling for pandora

6 years agoARM: dma-mapping: avoid speculative prefetching fix
Grazvydas Ignotas [Tue, 30 Jun 2015 22:38:10 +0000 (01:38 +0300)]
ARM: dma-mapping: avoid speculative prefetching fix

This partially reverts 2ffe2da3e71652d4f4cae19539b5c78c2a239136.

Cortex-A8 doesn't seem to be doing any speculative data prefetching,
there is no mention of this in the manual (unlike as for A9), only
instruction prefetch is mentioned. For us this fix is only causing
useless performance penalty, avoid it. The 2.6.27 kernel we used before
didn't have this fix and all seemed to be fine.

6 years agomtd: omap2: tune dma parameters
Grazvydas Ignotas [Fri, 3 Jul 2015 21:21:57 +0000 (00:21 +0300)]
mtd: omap2: tune dma parameters

seems to give a small improvement

6 years agoomap_hsmmc: avoid requesting dma repeatedly
Grazvydas Ignotas [Thu, 2 Jul 2015 00:04:09 +0000 (03:04 +0300)]
omap_hsmmc: avoid requesting dma repeatedly

omap_request_dma() is rather expensive operation, so don't release the
channel after every transfer, release only on autosuspend instead.

6 years agoomap_hsmmc: tune dma parameters
Grazvydas Ignotas [Wed, 1 Jul 2015 23:01:00 +0000 (02:01 +0300)]
omap_hsmmc: tune dma parameters

seems to be helping wifi, SD card speeds are mostly unaffected

6 years agoomap_hsmmc: avoid useless dto set
Grazvydas Ignotas [Tue, 30 Jun 2015 00:20:56 +0000 (03:20 +0300)]
omap_hsmmc: avoid useless dto set

always setting the same value, which was already set by
omap_hsmmc_set_clock()

6 years agoomap_hsmmc: avoid useless work in pre_req
Grazvydas Ignotas [Tue, 30 Jun 2015 00:12:53 +0000 (03:12 +0300)]
omap_hsmmc: avoid useless work in pre_req

6 years agoARM: OMAP: hsmmc: Add support for non-OMAP pins
Thomas Weber [Thu, 17 Nov 2011 21:39:40 +0000 (22:39 +0100)]
ARM: OMAP: hsmmc: Add support for non-OMAP pins

The Devkit8000 uses a TWL4030 pin for card detection.
Thats why the error:
_omap_mux_init_gpio: Could not set gpio192
occurs.

This patch checks that the pin is on OMAP before
calling omap_mux_init_gpio.

Signed-off-by: Thomas Weber <weber@corscience.de>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
6 years agoUBI: Change the default percentage of reserved PEB
Richard Genoud [Fri, 29 Jun 2012 06:57:41 +0000 (08:57 +0200)]
UBI: Change the default percentage of reserved PEB

The actual value (1%) is too low for actual NAND devices, a huge
majority of device has 2% maximum bad blocks (SLC or MLC).
(Actually it's 20 blocks on a 1024 blocks device, 40/2048...)

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
6 years agoreadahead: fix sequential read cache miss detection
Damien Ramonda [Tue, 12 Nov 2013 23:08:16 +0000 (15:08 -0800)]
readahead: fix sequential read cache miss detection

The kernel's readahead algorithm sometimes interprets random read
accesses as sequential and triggers unnecessary data prefecthing from
storage device (impacting random read average latency).

In order to identify sequential cache read misses, the readahead
algorithm intends to check whether offset - previous offset == 1
(trivial sequential reads) or offset - previous offset == 0 (sequential
reads not aligned on page boundary):

  if (offset - (ra->prev_pos >> PAGE_CACHE_SHIFT) <= 1UL)

The current offset is stored in the "offset" variable of type "pgoff_t"
(unsigned long), while previous offset is stored in "ra->prev_pos" of
type "loff_t" (long long).  Therefore, operands of the if statement are
implicitly converted to type long long.  Consequently, when previous
offset > current offset (which happens on random pattern), the if
condition is true and access is wrongly interpeted as sequential.  An
unnecessary data prefetching is triggered, impacting the average random
read latency.

Storing the previous offset value in a "pgoff_t" variable (unsigned
long) fixes the sequential read detection logic.

Signed-off-by: Damien Ramonda <damien.ramonda@intel.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Pierre Tardy <pierre.tardy@intel.com>
Acked-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoARM: prefetch: remove redundant "cc" clobber
Will Deacon [Wed, 26 Jun 2013 15:47:35 +0000 (16:47 +0100)]
ARM: prefetch: remove redundant "cc" clobber

The pld instruction does not affect the condition flags, so don't bother
clobbering them.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
6 years agoslub: prefetch next freelist pointer in slab_alloc()
Eric Dumazet [Fri, 16 Dec 2011 15:25:34 +0000 (16:25 +0100)]
slub: prefetch next freelist pointer in slab_alloc()

Recycling a page is a problem, since freelist link chain is hot on
cpu(s) which freed objects, and possibly very cold on cpu currently
owning slab.

Adding a prefetch of cache line containing the pointer to next object in
slab_alloc() helps a lot in many workloads, in particular on assymetric
ones (allocations done on one cpu, frees on another cpus). Added cost is
three machine instructions only.

Examples on my dual socket quad core ht machine (Intel CPU E5540
@2.53GHz) (16 logical cpus, 2 memory nodes), 64bit kernel.

Before patch :

# perf stat -r 32 hackbench 50 process 4000 >/dev/null

 Performance counter stats for 'hackbench 50 process 4000' (32 runs):

     327577,471718 task-clock                #   15,821 CPUs utilized            ( +-  0,64% )
        28 866 491 context-switches          #    0,088 M/sec                    ( +-  1,80% )
         1 506 929 CPU-migrations            #    0,005 M/sec                    ( +-  3,24% )
           127 151 page-faults               #    0,000 M/sec                    ( +-  0,16% )
   829 399 813 448 cycles                    #    2,532 GHz                      ( +-  0,64% )
   580 664 691 740 stalled-cycles-frontend   #   70,01% frontend cycles idle     ( +-  0,71% )
   197 431 700 448 stalled-cycles-backend    #   23,80% backend  cycles idle     ( +-  1,03% )
   503 548 648 975 instructions              #    0,61  insns per cycle
                                             #    1,15  stalled cycles per insn  ( +-  0,46% )
    95 780 068 471 branches                  #  292,389 M/sec                    ( +-  0,48% )
     1 426 407 916 branch-misses             #    1,49% of all branches          ( +-  1,35% )

      20,705679994 seconds time elapsed                                          ( +-  0,64% )

After patch :

# perf stat -r 32 hackbench 50 process 4000 >/dev/null

 Performance counter stats for 'hackbench 50 process 4000' (32 runs):

     286236,542804 task-clock                #   15,786 CPUs utilized            ( +-  1,32% )
        19 703 372 context-switches          #    0,069 M/sec                    ( +-  4,99% )
         1 658 249 CPU-migrations            #    0,006 M/sec                    ( +-  6,62% )
           126 776 page-faults               #    0,000 M/sec                    ( +-  0,12% )
   724 636 593 213 cycles                    #    2,532 GHz                      ( +-  1,32% )
   499 320 714 837 stalled-cycles-frontend   #   68,91% frontend cycles idle     ( +-  1,47% )
   156 555 126 809 stalled-cycles-backend    #   21,60% backend  cycles idle     ( +-  2,22% )
   463 897 792 661 instructions              #    0,64  insns per cycle
                                             #    1,08  stalled cycles per insn  ( +-  0,94% )
    87 717 352 563 branches                  #  306,451 M/sec                    ( +-  0,99% )
       941 738 280 branch-misses             #    1,07% of all branches          ( +-  3,35% )

      18,132070670 seconds time elapsed                                          ( +-  1,30% )

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
CC: Matt Mackall <mpm@selenic.com>
CC: David Rientjes <rientjes@google.com>
CC: "Alex,Shi" <alex.shi@intel.com>
CC: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
6 years agoARM: 7983/1: atomics: implement a better __atomic_add_unless for v6+
Will Deacon [Fri, 21 Feb 2014 16:01:48 +0000 (17:01 +0100)]
ARM: 7983/1: atomics: implement a better __atomic_add_unless for v6+

Looking at perf profiles of multi-threaded hackbench runs, a significant
performance hit appears to manifest from the cmpxchg loop used to
implement the 32-bit atomic_add_unless function. This can be mitigated
by writing a direct implementation of __atomic_add_unless which doesn't
require iteration outside of the atomic operation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Conflicts:
arch/arm/include/asm/atomic.h

6 years agoARM: fix warnings about atomic64_read
Russell King [Thu, 5 Jul 2012 12:06:32 +0000 (13:06 +0100)]
ARM: fix warnings about atomic64_read

Fix:
net/netfilter/xt_connbytes.c: In function 'connbytes_mt':
net/netfilter/xt_connbytes.c:43: warning: passing argument 1 of 'atomic64_read' discards qualifiers from pointer target type
...

by adding the missing const.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
6 years agoomap_hsmmc: minor tweaks
Grazvydas Ignotas [Sun, 7 Jun 2015 20:46:10 +0000 (23:46 +0300)]
omap_hsmmc: minor tweaks

- remove "Flush posted write" loop (done by the caller)
- mark error paths as unlikely

6 years agommc: block: don't start new request when the card is removed
Seungwon Jeon [Tue, 22 Jan 2013 10:48:07 +0000 (19:48 +0900)]
mmc: block: don't start new request when the card is removed

It's not necessary to start a new request while error handling if
the card was removed.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Tested-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agommc: block: replace __blk_end_request() with blk_end_request()
Subhash Jadavani [Thu, 7 Jun 2012 10:16:58 +0000 (15:46 +0530)]
mmc: block: replace __blk_end_request() with blk_end_request()

For completing any block request, MMC block driver is calling:
spin_lock_irq(queue)
__blk_end_request()
spin_unlock_irq(queue)

But if we analyze the sources of latency in kernel using ftrace,
__blk_end_request() function at times may take up to 6.5ms with
spinlock held and irq disabled.

__blk_end_request() calls couple of functions and ftrace output
shows that blk_update_bidi_request() function is almost taking 6ms.
There are 2 function to end the current request: ___blk_end_request()
and blk_end_request(). Both these functions do same thing except
that blk_end_request() function doesn't take up the spinlock
while calling the blk_update_bidi_request().

This patch replaces all __blk_end_request() calls with
blk_end_request() and __blk_end_request_all() calls with
blk_end_request_all().

Testing done: 20 process concurrent read/write on sd card
and eMMC. Ran this test for almost a day on multicore system
and no errors observed.

This change is not meant for improving MMC throughput; it's basically
about becoming fair to other threads/interrupts in the system. By
holding spin lock and interrupts disabled for longer duration, we
won't allow other threads/interrupts to run at all.  Actually slight
performance degradation at file system level can be expected as we
are not holding the spin lock during blk_update_bidi_request() which
means our mmcqd thread may get preempted for other high priority
thread or any interrupt in the system.

These are performance numbers (100MB file write) with eMMC running
in DDR mode:

Without this patch:
Name of the Test   Value   Unit
LMDD Read Test     53.79   MBPS
LMDD Write Test    18.86   MBPS
IOZONE  Read Test  51.65   MBPS
IOZONE  Write Test 24.36   MBPS

With this patch:
Name of the Test    Value  Unit
LMDD Read Test      52.94  MBPS
LMDD Write Test     16.70  MBPS
IOZONE  Read Test   52.08  MBPS
IOZONE  Write Test  23.29  MBPS

Read numbers are fine. Write numbers are bit down (especially LMDD
write), may be because write requests normally have large transfer
size and which means there are chances that while mmcq is executing
blk_update_bidi_request(), it may get interrupted by interrupts or
other high priority thread.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agommc: block: fix the data timeout issue with ACMD22
Subhash Jadavani [Wed, 13 Jun 2012 11:40:43 +0000 (17:10 +0530)]
mmc: block: fix the data timeout issue with ACMD22

If multi block write operation fails for SD card, during
error handling we send the SD_APP_SEND_NUM_WR_BLKS (ACMD22)
to know how many blocks were already programmed by card.

But mmc_sd_num_wr_blocks() function which sends the ACMD22
calculates the data timeout value from csd.tacc_ns and
csd.tacc_clks parameters which will be 0 for block addressed
(>2GB cards) SD card. This would result in timeout_ns and
timeout_clks being 0 in the mmc_request passed to host driver.
This means host controller would program its data timeout timer
value with 0 which could result in DATA TIMEOUT errors from
controller.

To fix this issue, mmc_sd_num_wr_blocks() should instead
just call the mmc_set_data_timeout() to calculate the
data timeout value. mmc_set_data_timeout() function
ensures that non zero timeout value is set even for
block addressed SD cards.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agommc: card: Avoid null pointer dereference
Philippe De Swert [Wed, 11 Apr 2012 20:31:45 +0000 (23:31 +0300)]
mmc: card: Avoid null pointer dereference

After the null check on md the code jumped to cmd_done, which then
will dereference md in mmc_blk_put. This patch avoids the possible
null pointer dereference in that case.

Signed-off-by: Philippe De Swert <philippedeswert@gmail.com>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agommc: card: Kill block requests if card is removed
Sujit Reddy Thumma [Thu, 8 Dec 2011 08:35:50 +0000 (14:05 +0530)]
mmc: card: Kill block requests if card is removed

Kill block requests when the host realizes that the card is
removed from the slot and is sure that subsequent requests
are bound to fail. Do this silently so that the block
layer doesn't output unnecessary error messages.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agommc: allow upper layers to know immediately if card has been removed
Adrian Hunter [Mon, 28 Nov 2011 14:22:00 +0000 (16:22 +0200)]
mmc: allow upper layers to know immediately if card has been removed

Add a function mmc_detect_card_removed() which upper layers can use to
determine immediately if a card has been removed. This function should
be called after an I/O request fails so that all queued I/O requests
can be errored out immediately instead of waiting for the card device
to be removed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agommc: sdio: Fix to support any block size optimally
Stefan Nilsson XK [Wed, 26 Oct 2011 08:52:17 +0000 (10:52 +0200)]
mmc: sdio: Fix to support any block size optimally

This patch allows any block size to be set on the SDIO link,
and still have an arbitrary sized packet (adjusted in size by
using sdio_align_size) transferred in an optimal way
(preferably one transfer).

Previously if the block size was larger than the default of
512 bytes and the transfer size was exactly one block size
(possibly thanks to using sdio_align_size to get an optimal
transfer size), it was sent as a number of byte transfers instead
of one block transfer. Also if the number of blocks was
(max_blocks * N) + 1, the tranfer would be conducted with a number
of blocks and finished off with a number of byte transfers.

When doing this change it was also possible to break out the quirk
for broken byte mode in a much cleaner way, and collect the logic of
when to do byte or block transfer in one function instead of two.

Signed-off-by: Stefan Nilsson XK <stefan.xk.nilsson@stericsson.com>
Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
6 years agoARM: OMAP: dma: fix error return code in omap_system_dma_probe()
Wei Yongjun [Tue, 16 Jul 2013 12:10:46 +0000 (20:10 +0800)]
ARM: OMAP: dma: fix error return code in omap_system_dma_probe()

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Tony Lindgren <tony@atomide.com>
6 years agoARM: OMAP: dma: Remove the erroneous freeing of platform data
Rajendra Nayak [Thu, 13 Jun 2013 14:17:11 +0000 (19:47 +0530)]
ARM: OMAP: dma: Remove the erroneous freeing of platform data

Given p = pdev->dev.platform_data; and
      d = p->dma_attr;
the freeing of either one of these by the driver
seems just plain wrong.

Get rid of them in the .probe failure path as well as the
.remove.

Signed-off-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
6 years agoARM: OMAP: dma: Fix the dma_chan_link_map init order
R Sricharan [Thu, 13 Jun 2013 14:17:10 +0000 (19:47 +0530)]
ARM: OMAP: dma: Fix the dma_chan_link_map init order

Init dma_chan_link_map[lch] *after* its memset to 0.

Signed-off-by: R Sricharan <r.sricharan@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
6 years agoARM: OMAP: dma: Remove the wrong dev_id check
R Sricharan [Thu, 13 Jun 2013 14:17:09 +0000 (19:47 +0530)]
ARM: OMAP: dma: Remove the wrong dev_id check

Once a free channel is found, the check for dev_id == 0 does
not make any sense. Get rid of it.

Signed-off-by: R Sricharan <r.sricharan@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
6 years agoARM: OMAP: Fix the use of uninitialized dma_lch_count
Chen Gang [Fri, 11 Jan 2013 05:39:18 +0000 (13:39 +0800)]
ARM: OMAP: Fix the use of uninitialized dma_lch_count

  'omap_dma_reserve_channels' when used is suppose to be from command.
    so, it alreay has value before 1st call of omap_system_dma_probe.
    and it will never be changed again during running (not from ioctl).

  but 'dma_lch_count' is zero before 1st call of omap_system_dma_probe.
    so it will be failed for omap_dma_reserve_channels, when 1st call.

  so, need use 'd->lch_count' instead of 'dma_lch_count' for judging.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
6 years agowl1251: allow to disable dynamic ps
Grazvydas Ignotas [Sun, 7 Jun 2015 00:39:52 +0000 (03:39 +0300)]
wl1251: allow to disable dynamic ps

Some people with CC units reported they get better wifi without dynamic
PS enabled, so allow to disable it. I guess this could be related to
missing clock driver resistor tweak, which breaks chip's own PS.

6 years agopandora: defconfig: switch to minstrel default
Grazvydas Ignotas [Sun, 7 Jun 2015 00:17:11 +0000 (03:17 +0300)]
pandora: defconfig: switch to minstrel default

I don't know why we had pid there, most distributions seem to be
using minstrel, newer kernels even removed pid completely.

6 years agomac80211: cosmetics for minstrel_debugfs
Karl Beldan [Wed, 17 Apr 2013 12:08:26 +0000 (14:08 +0200)]
mac80211: cosmetics for minstrel_debugfs

This changes the minstrel stats ouput from:

rate     throughput  ewma prob   this prob  this succ/attempt   success    attempts
 BCD   6         0.0        0.0        0.0          0(  0)          0           0

to:

rate      throughput  ewma prob  this prob  this succ/attempt   success    attempts
 BCD   6         0.0        0.0        0.0             0(  0)         0           0

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agominstrel: update stats after processing status
Johannes Berg [Thu, 15 Nov 2012 17:27:56 +0000 (18:27 +0100)]
minstrel: update stats after processing status

Instead of updating stats before sending a packet,
update them after processing the packet's status.
This makes minstrel in line with minstrel_ht.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: correct size the argument to kzalloc in minstrel_ht
Thomas Huehn [Fri, 29 Jun 2012 13:26:27 +0000 (06:26 -0700)]
mac80211: correct size the argument to kzalloc in minstrel_ht

msp has type struct minstrel_ht_sta_priv not struct minstrel_ht_sta.

(This incorporates the fixup originally posted as "mac80211: fix kzalloc
memory corruption introduced in minstrel_ht". -- JWL)

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agomac80211: Don't sample max throughput rate in minstrel_ht
Helmut Schaa [Wed, 14 Mar 2012 12:31:11 +0000 (13:31 +0100)]
mac80211: Don't sample max throughput rate in minstrel_ht

The current max throughput rate is known to be good as otherwise it
wouldn't be the max throughput rate. Since rate sampling can introduce
some overhead (by adding RTS for example or due to not aggregating the
frame) don't sample the max throughput rate.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agomac80211: Disable MCS > 7 in minstrel_ht when STA uses static SMPS
Helmut Schaa [Fri, 9 Mar 2012 13:13:45 +0000 (14:13 +0100)]
mac80211: Disable MCS > 7 in minstrel_ht when STA uses static SMPS

Disable multi stream rates (MCS > 7) when a STA is in static SMPS mode
since it has only one active rx chain. Hence, it doesn't even make
sense to sample multi stream rates.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agominstrel_ht: Remove unused function parameters
Patrick Kelle [Tue, 15 Nov 2011 15:44:48 +0000 (16:44 +0100)]
minstrel_ht: Remove unused function parameters

Remove unused function parameters in the following functions:
minstrel_calc_rate_ewma()
minstrel_ht_calc_tp()
minstrel_aggr_check()
minstrel_ht_set_rate()

Signed-off-by: Patrick Kelle <patrick.kelle81@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agomac80211: Get rid of search loop for rate group index
Helmut Schaa [Mon, 14 Nov 2011 14:28:20 +0000 (15:28 +0100)]
mac80211: Get rid of search loop for rate group index

Finding the group index for a specific rate is done by looping through
all groups and returning if the correct one is found. This code is
called for each tx'ed frame and thus it makes sense to reduce its
runtime.

Do this by calculating the group index by this formula based on the SGI
and HT40 flags as well as the stream number:

idx = (HT40 * 2 * MINSTREL_MAX_STREAMS) +
      (SGI * MINSTREL_MAX_STREAMS) +
      (streams - 1)

Hence, the groups are ordered by th HT40 flag first, then by the SGI
flag and afterwards by the number of used streams.

This should reduce the runtime of minstrel_ht_get_group_idx
considerable.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agomac80211: Check rate->idx before rate->count
Helmut Schaa [Mon, 14 Nov 2011 14:28:19 +0000 (15:28 +0100)]
mac80211: Check rate->idx before rate->count

The drivers are not required to fill in rate->count if rate->idx is set
to -1. Hence, we should first check rate->idx before accessing
rate->count.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agominstrel: Remove unused function parameter in calc_rate_durations()
Patrick Kelle [Thu, 10 Nov 2011 14:13:11 +0000 (15:13 +0100)]
minstrel: Remove unused function parameter in calc_rate_durations()

Signed-off-by: Patrick Kelle <patrick.kelle81@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
6 years agoRevert "pagemap: do not leak physical addresses to non-privileged userspace"
Grazvydas Ignotas [Sun, 7 Jun 2015 00:18:58 +0000 (03:18 +0300)]
Revert "pagemap: do not leak physical addresses to non-privileged userspace"

This reverts commit 1ffc3cd9a36b504c20ce98fe5eeb5463f389e1ac.

Don't need it on pandora - even if rowhammer worked, pandora is almost
never a multiuser system, and cache invalidate is a privileged instruction
already on ARM.

pagemap may also be useful for use c64_tools and such.

6 years agoMerge branch 'stable-3.2' into pandora-3.2
Grazvydas Ignotas [Sun, 7 Jun 2015 00:18:36 +0000 (03:18 +0300)]
Merge branch 'stable-3.2' into pandora-3.2

6 years agoLinux 3.2.69 v3.2.69
Ben Hutchings [Sat, 9 May 2015 22:16:42 +0000 (23:16 +0100)]
Linux 3.2.69

6 years agoRevert "KVM: s390: flush CPU on load control"
Ben Hutchings [Mon, 4 May 2015 23:34:49 +0000 (00:34 +0100)]
Revert "KVM: s390: flush CPU on load control"

This reverts commit 823f14022fd2335affc8889a9c7e1b60258883a3, which was
commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream.  It
depends on functionality that is not present in 3.2.y.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
6 years agoipvs: uninitialized data with IP_VS_IPV6
Dan Carpenter [Sat, 6 Dec 2014 13:49:24 +0000 (16:49 +0300)]
ipvs: uninitialized data with IP_VS_IPV6

commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream.

The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoipvs: rerouting to local clients is not needed anymore
Julian Anastasov [Thu, 18 Dec 2014 20:41:23 +0000 (22:41 +0200)]
ipvs: rerouting to local clients is not needed anymore

commit 579eb62ac35845686a7c4286c0a820b4eb1f96aa upstream.

commit f5a41847acc5 ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000c5 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoIB/core: Avoid leakage from kernel to user space
Eli Cohen [Sun, 14 Sep 2014 13:47:52 +0000 (16:47 +0300)]
IB/core: Avoid leakage from kernel to user space

commit 377b513485fd885dea1083a9a5430df65b35e048 upstream.

Clear the reserved field of struct ib_uverbs_async_event_desc which is
copied to user space.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Yann Droneaud <ydroneaud@opteya.com>
6 years agospi: spidev: fix possible arithmetic overflow for multi-transfer message
Ian Abbott [Mon, 23 Mar 2015 17:50:27 +0000 (17:50 +0000)]
spi: spidev: fix possible arithmetic overflow for multi-transfer message

commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream.

`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length.  It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow.  For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long.  If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address.  Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.

Thanks to Dan Carpenter for reporting the potential integer overflow.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
[Ian Abbott: Note: original commit compares the lengths to INT_MAX
 instead of bufsiz due to changes in earlier commits.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: make skb_gso_segment error handling more robust
Florian Westphal [Mon, 20 Oct 2014 11:49:17 +0000 (13:49 +0200)]
net: make skb_gso_segment error handling more robust

commit 330966e501ffe282d7184fde4518d5e0c24bc7f8 upstream.

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Brad Spengler: backported to 3.2]
Signed-off-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agotcp: avoid looping in tcp_send_fin()
Eric Dumazet [Thu, 23 Apr 2015 17:42:39 +0000 (10:42 -0700)]
tcp: avoid looping in tcp_send_fin()

[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ]

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop inapplicable change to sk_forced_wmem_schedule()
 - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoip_forward: Drop frames with attached skb->sk
Sebastian Pöhn [Mon, 20 Apr 2015 07:19:20 +0000 (09:19 +0200)]
ip_forward: Drop frames with attached skb->sk

[ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ]

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agogianfar: Carefully free skbs in functions called by netpoll.
Eric W. Biederman [Tue, 11 Mar 2014 21:20:26 +0000 (14:20 -0700)]
gianfar: Carefully free skbs in functions called by netpoll.

commit c9974ad4aeb36003860100221a594f3c0ccc3f78 upstream.

netpoll can call functions in hard irq context that are ordinarily
called in lesser contexts.  For those functions use dev_kfree_skb_any
and dev_consume_skb_any so skbs are freed safely from hard irq
context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: use only dev_kfree_skb() and not dev_consume_skb_any()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agobenet: Call dev_kfree_skby_any instead of kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:19:50 +0000 (14:19 -0700)]
benet: Call dev_kfree_skby_any instead of kfree_skb.

commit d8ec2c02caa3515f35d6c33eedf529394c419298 upstream.

Replace free_skb with dev_kfree_skb_any in be_tx_compl_process as
which can be called in hard irq by netpoll, softirq context
by normal napi polling, and in normal sleepable context
by the network device close method.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoixgb: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:18:42 +0000 (14:18 -0700)]
ixgb: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit f7e79913a1d6a6139211ead3b03579b317d25a1f upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agotg3: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:18:14 +0000 (14:18 -0700)]
tg3: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit 497a27b9e1bcf6dbaea7a466cfcd866927e1b431 upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agor8169: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:16:14 +0000 (14:16 -0700)]
r8169: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit 989c9ba104d9ce53c1ca918262f3fdfb33aca12a upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years ago8139too: Call dev_kfree_skby_any instead of dev_kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:15:36 +0000 (14:15 -0700)]
8139too: Call dev_kfree_skby_any instead of dev_kfree_skb.

commit a2ccd2e4bd70122523a7bf21cec4dd6e34427089 upstream.

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years ago8139cp: Call dev_kfree_skby_any instead of kfree_skb.
Eric W. Biederman [Tue, 11 Mar 2014 21:14:58 +0000 (14:14 -0700)]
8139cp: Call dev_kfree_skby_any instead of kfree_skb.

commit 508f81d517ed1f3f0197df63ea7ab5cd91b6f3b3 upstream.

Replace kfree_skb with dev_kfree_skb_any in cp_start_xmit
as it can be called in both hard irq and other contexts.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agotcp: make connect() mem charging friendly
Eric Dumazet [Tue, 18 Nov 2014 07:06:20 +0000 (23:06 -0800)]
tcp: make connect() mem charging friendly

[ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ]

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop the Fast Open changes
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agorxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
Al Viro [Sat, 14 Mar 2015 05:34:56 +0000 (05:34 +0000)]
rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()

[ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ]

[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]

MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there.  It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.

It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested.  If it is correct, it's
-stable fodder.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agocaif: fix MSG_OOB test in caif_seqpkt_recvmsg()
Al Viro [Sat, 14 Mar 2015 05:22:21 +0000 (05:22 +0000)]
caif: fix MSG_OOB test in caif_seqpkt_recvmsg()

[ Upstream commit 3eeff778e00c956875c70b145c52638c313dfb23 ]

It should be checking flags, not msg->msg_flags.  It's ->sendmsg()
instances that need to look for that in ->msg_flags, ->recvmsg() ones
(including the other ->recvmsg() instance in that file, as well as
unix_dgram_recvmsg() this one claims to be imitating) check in flags.
Braino had been introduced in commit dcda13 ("caif: Bugfix - use MSG_TRUNC
in receive") back in 2010, so it goes quite a while back.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agords: avoid potential stack overflow
Arnd Bergmann [Wed, 11 Mar 2015 21:46:59 +0000 (22:46 +0100)]
rds: avoid potential stack overflow

[ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ]

The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:

net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]

As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: sysctl_net_core: check SNDBUF and RCVBUF for min length
Alexey Kodanev [Wed, 11 Mar 2015 11:29:17 +0000 (14:29 +0300)]
net: sysctl_net_core: check SNDBUF and RCVBUF for min length

[ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ]

sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

    # sysctl net.core.rmem_default=64
      net.core.wmem_default = 64
    # sysctl net.core.wmem_default=64
      net.core.wmem_default = 64
    # ping localhost -s 1024 -i 0 > /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
kernel BUG at net/core/skbuff.c:102!
invalid opcode: 0000 [#1] SMP
...
task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
Stack:
 ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
 ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
Call Trace:
 [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
 [<ffffffff81556f56>] sock_write_iter+0x146/0x160
 [<ffffffff811d9612>] new_sync_write+0x92/0xd0
 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
 [<ffffffff811da499>] SyS_write+0x59/0xd0
 [<ffffffff81651532>] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
      00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
      eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
RSP <ffff88003ae8bc68>
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
...
BUG: unable to handle kernel paging request at ffff88013caee5c0
IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
...

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: delete now-unused 'one' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: avoid to hang up on sending due to sysctl configuration overflow.
bingtian.ly@taobao.com [Wed, 23 Jan 2013 20:35:28 +0000 (20:35 +0000)]
net: avoid to hang up on sending due to sysctl configuration overflow.

commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream.

    I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.

    This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:

net.core.wmem_default
net.core.rmem_default

net.core.rmem_max
net.core.wmem_max

net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min

net.ipv4.tcp_wmem
net.ipv4.tcp_rmem

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Li Yu <raise.sail@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Adjust context
 - Delete now-unused 'zero' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: ping: Return EAFNOSUPPORT when appropriate.
Lorenzo Colitti [Tue, 3 Mar 2015 14:16:16 +0000 (23:16 +0900)]
net: ping: Return EAFNOSUPPORT when appropriate.

[ Upstream commit 9145736d4862145684009d6a72a6e61324a9439e ]

1. For an IPv4 ping socket, ping_check_bind_addr does not check
   the family of the socket address that's passed in. Instead,
   make it behave like inet_bind, which enforces either that the
   address family is AF_INET, or that the family is AF_UNSPEC and
   the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
   if the socket family is not AF_INET6. Return EAFNOSUPPORT
   instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
   instead of EINVAL if an incorrect socket address structure is
   passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
   IPv4, and it cannot easily be made to support IPv4 because
   the protocol numbers for ICMP and ICMPv6 are different. This
   makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
   of making the socket unusable.

Among other things, this fixes an oops that can be triggered by:

    int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
    struct sockaddr_in6 sin6 = {
        .sin6_family = AF_INET6,
        .sin6_addr = in6addr_any,
    };
    bind(s, (struct sockaddr *) &sin6, sizeof(sin6));

Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop the IPv6 part
 - Adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoudp: only allow UFO for packets from SOCK_DGRAM sockets
Michal Kubeček [Mon, 2 Mar 2015 17:27:11 +0000 (18:27 +0100)]
udp: only allow UFO for packets from SOCK_DGRAM sockets

[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ]

If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).

Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agousb: plusb: Add support for National Instruments host-to-host cable
Ben Shelton [Mon, 16 Feb 2015 19:47:06 +0000 (13:47 -0600)]
usb: plusb: Add support for National Instruments host-to-host cable

[ Upstream commit 42c972a1f390e3bc51ca1e434b7e28764992067f ]

The National Instruments USB Host-to-Host Cable is based on the Prolific
PL-25A1 chipset.  Add its VID/PID so the plusb driver will recognize it.

Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomacvtap: make sure neighbour code can push ethernet header
Eric Dumazet [Sat, 28 Feb 2015 02:35:35 +0000 (18:35 -0800)]
macvtap: make sure neighbour code can push ethernet header

[ Upstream commit 2f1d8b9e8afa5a833d96afcd23abcb8cdf8d83ab ]

Brian reported crashes using IPv6 traffic with macvtap/veth combo.

I tracked the crashes in neigh_hh_output()

-> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);

Neighbour code assumes headroom to push Ethernet header is
at least 16 bytes.

It appears macvtap has only 14 bytes available on arches
where NET_IP_ALIGN is 0 (like x86)

Effect is a corruption of 2 bytes right before skb->head,
and possible crashes if accessing non existing memory.

This fix should also increase IPv4 performance, as paranoid code
in ip_finish_output2() wont have to call skb_realloc_headroom()

Reported-by: Brian Rak <brak@vultr.com>
Tested-by: Brian Rak <brak@vultr.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomacvtap: limit head length of skb allocated
Jason Wang [Wed, 13 Nov 2013 06:00:40 +0000 (14:00 +0800)]
macvtap: limit head length of skb allocated

commit 16a3fa28630331e28208872fa5341ce210b901c7 upstream.

We currently use hdr_len as a hint of head length which is advertised by
guest. But when guest advertise a very big value, it can lead to an 64K+
allocating of kmalloc() which has a very high possibility of failure when host
memory is fragmented or under heavy stress. The huge hdr_len also reduce the
effect of zerocopy or even disable if a gso skb is linearized in guest.

To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the
head, which guarantees an order 0 allocation each time.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: reject creation of netdev names with colons
Matthew Thode [Wed, 18 Feb 2015 00:31:57 +0000 (18:31 -0600)]
net: reject creation of netdev names with colons

[ Upstream commit a4176a9391868bfa87705bcd2e3b49e9b9dd2996 ]

colons are used as a separator in netdev device lookup in dev_ioctl.c

Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME

Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoematch: Fix auto-loading of ematch modules.
Ignacy Gawędzki [Tue, 17 Feb 2015 19:15:20 +0000 (20:15 +0100)]
ematch: Fix auto-loading of ematch modules.

[ Upstream commit 34eea79e2664b314cab6a30fc582fdfa7a1bb1df ]

In tcf_em_validate(), after calling request_module() to load the
kind-specific module, set em->ops to NULL before returning -EAGAIN, so
that module_put() is not called again by tcf_em_tree_destroy().

Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoipv4: ip_check_defrag should not assume that skb_network_offset is zero
Alexander Drozdov [Thu, 5 Mar 2015 07:29:39 +0000 (10:29 +0300)]
ipv4: ip_check_defrag should not assume that skb_network_offset is zero

[ Upstream commit 3e32e733d1bbb3f227259dc782ef01d5706bdae0 ]

ip_check_defrag() may be used by af_packet to defragment outgoing packets.
skb_network_offset() of af_packet's outgoing packets is not zero.

Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agogen_stats.c: Duplicate xstats buffer for later use
Ignacy Gawędzki [Fri, 13 Feb 2015 22:47:05 +0000 (14:47 -0800)]
gen_stats.c: Duplicate xstats buffer for later use

[ Upstream commit 1c4cff0cf55011792125b6041bc4e9713e46240f ]

The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.

[xiyou.wangcong@gmail.com: remove a useless kfree()]

Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agortnetlink: call ->dellink on failure when ->newlink exists
WANG Cong [Fri, 13 Feb 2015 21:56:53 +0000 (13:56 -0800)]
rtnetlink: call ->dellink on failure when ->newlink exists

[ Upstream commit 7afb8886a05be68e376655539a064ec672de8a8e ]

Ignacy reported that when eth0 is down and add a vlan device
on top of it like:

  ip link add link eth0 name eth0.1 up type vlan id 1

We will get a refcount leak:

  unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2

The problem is when rtnl_configure_link() fails in rtnl_newlink(),
we simply call unregister_device(), but for stacked device like vlan,
we almost do nothing when we unregister the upper device, more work
is done when we unregister the lower device, so call its ->dellink().

Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoppp: deflate: never return len larger than output buffer
Florian Westphal [Wed, 28 Jan 2015 09:56:04 +0000 (10:56 +0100)]
ppp: deflate: never return len larger than output buffer

[ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ]

When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.

When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.

This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.

Reported-by: Iain Douglas <centos@1n6.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoping: Fix race in free in receive path
subashab@codeaurora.org [Fri, 23 Jan 2015 22:26:02 +0000 (22:26 +0000)]
ping: Fix race in free in receive path

[ Upstream commit fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0 ]

An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().

-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish

Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonetxen: fix netxen_nic_poll() logic
Eric Dumazet [Thu, 22 Jan 2015 15:56:18 +0000 (07:56 -0800)]
netxen: fix netxen_nic_poll() logic

[ Upstream commit 6088beef3f7517717bd21d90b379714dd0837079 ]

NAPI poll logic now enforces that a poller returns exactly the budget
when it wants to be called again.

If a driver limits TX completion, it has to return budget as well when
the limit is hit, not the number of received packets.

Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade567f ("net: less interrupt masking in NAPI")
Cc: Manish Chopra <manish.chopra@qlogic.com>
Acked-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoipv6: stop sending PTB packets for MTU < 1280
Hagen Paul Pfeifer [Thu, 15 Jan 2015 21:34:25 +0000 (22:34 +0100)]
ipv6: stop sending PTB packets for MTU < 1280

[ Upstream commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2 ]

Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.

[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: rps: fix cpu unplug
Eric Dumazet [Fri, 16 Jan 2015 01:04:22 +0000 (17:04 -0800)]
net: rps: fix cpu unplug

[ Upstream commit ac64da0b83d82abe62f78b3d0e21cca31aea24fa ]

softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.

Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoip: zero sockaddr returned on error queue
Willem de Bruijn [Thu, 15 Jan 2015 18:18:40 +0000 (13:18 -0500)]
ip: zero sockaddr returned on error queue

[ Upstream commit f812116b174e59a350acc8e4856213a166a91222 ]

The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

    struct {
            struct sock_extended_err ee;
            struct sockaddr_in(6)    offender;
    } errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.

Signed-off-by: Willem de Bruijn <willemb@google.com>
----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agojfs: fix readdir regression
Dave Kleikamp [Mon, 23 Mar 2015 21:06:26 +0000 (16:06 -0500)]
jfs: fix readdir regression

Upstream commit 44512449, "jfs: fix readdir cookie incompatibility
with NFSv4", was backported incorrectly into the stable trees which
used the filldir callback (rather than dir_emit). The position is
being incorrectly passed to filldir for the . and .. entries.

The still-maintained stable trees that need to be fixed are 3.2.y,
3.4.y and 3.10.y.

https://bugzilla.kernel.org/show_bug.cgi?id=94741

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoNFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
Trond Myklebust [Tue, 27 Mar 2012 22:31:25 +0000 (18:31 -0400)]
NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error

commit 14977489ffdb80d4caf5a184ba41b23b02fbacd9 upstream.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: This is not merely a cleanup but also fixes a regression introduced by
 commit 3114ea7a24d3 ("NFSv4: Return the delegation if the server returns
 NFS4ERR_OPENMODE"), backported in 3.2.14]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from...
Ani Sinha [Mon, 8 Sep 2014 21:49:59 +0000 (14:49 -0700)]
net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.

commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream.

Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
break old binaries and any code for which there is no access to source code.
To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.

Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoipv4: Missing sk_nulls_node_init() in ping_unhash().
David S. Miller [Sat, 2 May 2015 02:02:47 +0000 (22:02 -0400)]
ipv4: Missing sk_nulls_node_init() in ping_unhash().

commit a134f083e79fb4c3d0a925691e732c56911b4326 upstream.

If we don't do that, then the poison value is left in the ->pprev
backlink.

This can cause crashes if we do a disconnect, followed by a connect().

Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agofs: take i_mutex during prepare_binprm for set[ug]id executables
Jann Horn [Sun, 19 Apr 2015 00:48:39 +0000 (02:48 +0200)]
fs: take i_mutex during prepare_binprm for set[ug]id executables

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Drop the task_no_new_privs() and user namespace checks
 - Open-code file_inode()
 - s/READ_ONCE/ACCESS_ONCE/
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoipv6: Don't reduce hop limit for an interface
D.S. Ljungmark [Wed, 25 Mar 2015 08:28:15 +0000 (09:28 +0100)]
ipv6: Don't reduce hop limit for an interface

commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a upstream.

A local route may have a lower hop_limit set than global routes do.

RFC 3756, Section 4.2.7, "Parameter Spoofing"

>   1.  The attacker includes a Current Hop Limit of one or another small
>       number which the attacker knows will cause legitimate packets to
>       be dropped before they reach their destination.

>   As an example, one possible approach to mitigate this threat is to
>   ignore very small hop limits.  The nodes could implement a
>   configurable minimum hop limit, and ignore attempts to set it below
>   said limit.

Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust ND_PRINTK() usage]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: rds: use correct size for max unacked packets and bytes
Sasha Levin [Tue, 3 Feb 2015 13:55:58 +0000 (08:55 -0500)]
net: rds: use correct size for max unacked packets and bytes

commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream.

Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.

This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonet: llc: use correct size for sysctl timeout entries
Sasha Levin [Sat, 24 Jan 2015 01:47:00 +0000 (20:47 -0500)]
net: llc: use correct size for sysctl timeout entries

commit 6b8d9117ccb4f81b1244aafa7bc70ef8fa45fc49 upstream.

The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agonetfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
Andrey Vagin [Fri, 28 Mar 2014 09:54:32 +0000 (13:54 +0400)]
netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len

commit 223b02d923ecd7c84cf9780bb3686f455d279279 upstream.

"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.

nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8

I have seen "len" up to 280 and my host has crashes w/o this patch.

The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.

Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support
Dmitry M. Fedin [Thu, 9 Apr 2015 14:37:03 +0000 (17:37 +0300)]
ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support

commit 3dc8523fa7412e731441c01fb33f003eb3cfece1 upstream.

Adds an entry for Creative USB X-Fi to the rc_config array in
mixer_quirks.c to allow use of volume knob on the device.
Adds support for newer X-Fi Pro card, known as "Model No. SB1095"
with USB ID "041e:3237"

Signed-off-by: Dmitry M. Fedin <dmitry.fedin@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoocfs2: _really_ sync the right range
Al Viro [Wed, 8 Apr 2015 21:00:32 +0000 (17:00 -0400)]
ocfs2: _really_ sync the right range

commit 64b4e2526d1cf6e6a4db6213d6e2b6e6ab59479a upstream.

"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.

Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoDefer processing of REQ_PREEMPT requests for blocked devices
Bart Van Assche [Wed, 4 Mar 2015 09:31:47 +0000 (10:31 +0100)]
Defer processing of REQ_PREEMPT requests for blocked devices

commit bba0bdd7ad4713d82338bcd9b72d57e9335a664b upstream.

SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:

BUG: unable to handle kernel paging request at ffffc9001a6bc084
IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
Call Trace:
 [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
 [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
 [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
 [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
 [<ffffffff81223b37>] __blk_run_queue+0x27/0x30
 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
 [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
 [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
 [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
 [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
 [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
 [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
 [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
 [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
 [<ffffffff811589de>] vfs_write+0xce/0x140
 [<ffffffff81158b53>] sys_write+0x53/0xa0
 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b
 [<00007f611c9d9300>] 0x7f611c9d92ff

Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agobe2iscsi: Fix kernel panic when device initialization fails
John Soni Jose [Thu, 12 Feb 2015 01:15:47 +0000 (06:45 +0530)]
be2iscsi: Fix kernel panic when device initialization fails

commit 2e7cee027b26cbe7e6685a7a14bd2850bfe55d33 upstream.

Kernel panic was happening as iscsi_host_remove() was called on
a host which was not yet added.

Signed-off-by: John Soni Jose <sony.john-n@emulex.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoxen-netfront: transmit fully GSO-sized packets
Jonathan Davies [Tue, 31 Mar 2015 10:05:15 +0000 (11:05 +0100)]
xen-netfront: transmit fully GSO-sized packets

commit 0c36820e2ab7d943ab1188230fdf2149826d33c0 upstream.

xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.

Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
    XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.

The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.

So the maximum permitted size of an skb is calculated to be
    (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.

Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.

Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).

Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.

Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agoIB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
Shachar Raindel [Wed, 18 Mar 2015 17:39:08 +0000 (17:39 +0000)]
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

commit 8494057ab5e40df590ef6ef7d66324d3ae33356b upstream.

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agomac80211: fix RX A-MPDU session reorder timer deletion
Johannes Berg [Wed, 1 Apr 2015 12:20:42 +0000 (14:20 +0200)]
mac80211: fix RX A-MPDU session reorder timer deletion

commit 788211d81bfdf9b6a547d0530f206ba6ee76b107 upstream.

There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:

 * tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
 * station is destroyed
 * reorder timer fires before ieee80211_free_tid_rx() runs,
   accessing the station, thus potentially crashing due to
   the use-after-free

The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.

Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agox86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
Stefan Lippers-Hollmann [Mon, 30 Mar 2015 20:44:27 +0000 (22:44 +0200)]
x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk

commit 80313b3078fcd2ca51970880d90757f05879a193 upstream.

The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.

The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.

Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.

( Searching the web seems to suggest that other Bay Trail-D mainboards
  might be affected as well. )
--
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agox86/reboot: Add reboot quirk for Certec BPC600
Christian Gmeiner [Wed, 7 May 2014 07:01:54 +0000 (09:01 +0200)]
x86/reboot: Add reboot quirk for Certec BPC600

commit aadca6fa4068ad1f92c492bc8507b7ed350825a2 upstream.

Certec BPC600 needs reboot=pci to actually reboot.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agox86/reboot: Add reboot quirk for Dell Latitude E5410
Ville Syrjälä [Fri, 4 Oct 2013 12:16:04 +0000 (15:16 +0300)]
x86/reboot: Add reboot quirk for Dell Latitude E5410

commit 8412da757776727796e9edd64ba94814cc08d536 upstream.

Dell Latitude E5410 needs reboot=pci to actually reboot.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://lkml.kernel.org/r/1380888964-14517-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
6 years agox86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
Masoud Sharbiani [Thu, 26 Sep 2013 17:30:43 +0000 (10:30 -0700)]
x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list

commit b5eafc6f07c95e9f3dd047e72737449cb03c9956 upstream.

Two entries for the same system type were added, with two different vendor
names: 'Dell' and 'Dell, Inc.'.

Since a prefix match is being used by the DMI parsing code, we can eliminate
the latter as redundant.

Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Cc: holt@sgi.com
Link: http://lkml.kernel.org/r/1380216643-4683-1-git-send-email-masoud.sharbiani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>