7 years agoring-buffer: Up rb_iter_peek() loop count to 3
Steven Rostedt (Red Hat) [Wed, 6 Aug 2014 19:36:31 +0000 (15:36 -0400)]
ring-buffer: Up rb_iter_peek() loop count to 3

commit 021de3d904b88b1771a3a2cfc5b75023c391e646 upstream.

After writting a test to try to trigger the bug that caused the
ring buffer iterator to become corrupted, I hit another bug:

 WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
 Modules linked in: ipt_MASQUERADE sunrpc [...]
 CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
  ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
  ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
 Call Trace:
  [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
  [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
  [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
  [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
  [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
  [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
  [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
  [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
  [<ffffffff8114cf94>] ? seq_read+0x148/0x361
  [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
  [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
  [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2

Debugging this bug, which triggers when the rb_iter_peek() loops too
many times (more than 2 times), I discovered there's a case that can
cause that function to legitimately loop 3 times!

rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
only deals with the reader page (it's for consuming reads). The
rb_iter_peek() is for traversing the buffer without consuming it, and as
such, it can loop for one more reason. That is, if we hit the end of
the reader page or any page, it will go to the next page and try again.

That is, we have this:

 1. iter->head > iter->head_page->page->commit
    (rb_inc_iter() which moves the iter to the next page)
    try again

 2. event = rb_iter_head_event()
    event->type_len == RINGBUF_TYPE_TIME_EXTEND
    try again

 3. read the event.

But we never get to 3, because the count is greater than 2 and we
cause the WARNING and return NULL.

Up the counter to 3.

Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together"
Signed-off-by: Steven Rostedt <>
[bwh: Backported to 3.2: drop inapplicable spelling correction]
Signed-off-by: Ben Hutchings <>
7 years agos390/locking: Reenable optimistic spinning
Christian Borntraeger [Tue, 5 Aug 2014 07:57:51 +0000 (09:57 +0200)]
s390/locking: Reenable optimistic spinning

commit 36e7fdaa1a04fcf65b864232e1af56a51c7814d6 upstream.

commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc (locking/mutex: Disable
optimistic spinning on some architectures) fenced spinning for
architectures without proper cmpxchg.
There is no need to disable mutex spinning on s390, though:
The instructions CS,CSG and friends provide the proper guarantees.
(We dont implement cmpxchg with locks).

Signed-off-by: Christian Borntraeger <>
Cc: Ingo Molnar <>
Cc: Peter Zijlstra <>
Signed-off-by: Heiko Carstens <>
Signed-off-by: Martin Schwidefsky <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (ads1015) Fix out-of-bounds array access
Axel Lin [Tue, 5 Aug 2014 01:59:49 +0000 (09:59 +0800)]
hwmon: (ads1015) Fix out-of-bounds array access

commit e981429557cbe10c780fab1c1a237cb832757652 upstream.

Current code uses data_rate as array index in ads1015_read_adc() and uses pga
as array index in ads1015_reg_to_mv, so we must make sure both data_rate and
pga settings are in valid value range.
Return -EINVAL if the setting is out-of-range.

Signed-off-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (lm92) Prevent overflow problem when writing large limits
Axel Lin [Tue, 5 Aug 2014 02:08:31 +0000 (10:08 +0800)]
hwmon: (lm92) Prevent overflow problem when writing large limits

commit 5b963089161b8fb244889c972edf553b9d737545 upstream.

On platforms with sizeof(int) < sizeof(long), writing a temperature
limit larger than MAXINT will result in unpredictable limit values
written to the chip. Avoid auto-conversion from long to int to fix
the problem.

The hysteresis temperature range depends on the value of
data->temp[attr->index], since val is subtracted from it.
Use a wider clamp, [-120000, 220000] should do to cover the
possible range. Also add missing TEMP_TO_REG() on writes into
cached hysteresis value.

Also uses clamp_val to simplify the code a bit.

Signed-off-by: Axel Lin <>
[Guenter Roeck: Fixed double TEMP_TO_REG on hysteresis updates]
Signed-off-by: Guenter Roeck <>
[bwh: Backported to 3.2:
 - s/temp\[attr->index\]/temp1_crit/
 - s/temp\[t_hyst\]/temp1_hyst/]
Signed-off-by: Ben Hutchings <>
7 years agoRDMA/iwcm: Use a default listen backlog if needed
Steve Wise [Fri, 25 Jul 2014 14:11:33 +0000 (09:11 -0500)]
RDMA/iwcm: Use a default listen backlog if needed

commit 2f0304d21867476394cd51a54e97f7273d112261 upstream.

If the user creates a listening cm_id with backlog of 0 the IWCM ends
up not allowing any connection requests at all.  The correct behavior
is for the IWCM to pick a default value if the user backlog parameter
is zero.

Lustre from version 1.8.8 onward uses a backlog of 0, which breaks
iwarp support without this fix.

Signed-off-by: Steve Wise <>
Signed-off-by: Roland Dreier <>
[bwh: Backported to 3.2: use register_net_sysctl_table()]
Signed-off-by: Ben Hutchings <>
7 years agodrm/radeon: load the lm63 driver for an lm64 thermal chip.
Alex Deucher [Mon, 28 Jul 2014 03:21:50 +0000 (23:21 -0400)]
drm/radeon: load the lm63 driver for an lm64 thermal chip.

commit 5dc355325b648dc9b4cf3bea4d968de46fd59215 upstream.

Looks like the lm63 driver supports the lm64 as well.

Signed-off-by: Alex Deucher <>
Signed-off-by: Ben Hutchings <>
7 years agopowerpc/mm/numa: Fix break placement
Andrey Utkin [Mon, 4 Aug 2014 20:13:10 +0000 (23:13 +0300)]
powerpc/mm/numa: Fix break placement

commit b00fc6ec1f24f9d7af9b8988b6a198186eb3408c upstream.

Reported-by: David Binderman <>
Signed-off-by: Andrey Utkin <>
Signed-off-by: Benjamin Herrenschmidt <>
Signed-off-by: Ben Hutchings <>
7 years agodrm/ttm: Fix possible stack overflow by recursive shrinker calls.
Tetsuo Handa [Sun, 3 Aug 2014 11:02:03 +0000 (20:02 +0900)]
drm/ttm: Fix possible stack overflow by recursive shrinker calls.

commit 71336e011d1d2312bcbcaa8fcec7365024f3a95d upstream.

While ttm_dma_pool_shrink_scan() tries to take mutex before doing GFP_KERNEL
allocation, ttm_pool_shrink_scan() does not do it. This can result in stack
overflow if kmalloc() in ttm_page_pool_free() triggered recursion due to
memory pressure.

  => ttm_pool_shrink_scan()
     => ttm_page_pool_free()
        => kmalloc(GFP_KERNEL)
           => shrink_slab()
              => ttm_pool_shrink_scan()
                 => ttm_page_pool_free()
                    => kmalloc(GFP_KERNEL)

Change ttm_pool_shrink_scan() to do like ttm_dma_pool_shrink_scan() does.

Signed-off-by: Tetsuo Handa <>
Signed-off-by: Dave Airlie <>
[bwh: Backported to 3.2:
 - Adjust context
 - Change return value in the contended case to follow the old shrinker
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (sis5595) Prevent overflow problem when writing large limits
Axel Lin [Thu, 31 Jul 2014 14:27:04 +0000 (22:27 +0800)]
hwmon: (sis5595) Prevent overflow problem when writing large limits

commit cc336546ddca8c22de83720632431c16a5f9fe9a upstream.

On platforms with sizeof(int) < sizeof(long), writing a temperature
limit larger than MAXINT will result in unpredictable limit values
written to the chip. Avoid auto-conversion from long to int to fix
the problem.

Signed-off-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (gpio-fan) Prevent overflow problem when writing large limits
Axel Lin [Sat, 2 Aug 2014 05:36:38 +0000 (13:36 +0800)]
hwmon: (gpio-fan) Prevent overflow problem when writing large limits

commit 2565fb05d1e9fc0831f7b1c083bcfcb1cba1f020 upstream.

On platforms with sizeof(int) < sizeof(unsigned long), writing a rpm value
larger than MAXINT will result in unpredictable limit values written to the
chip. Avoid auto-conversion from unsigned long to int to fix the problem.

Signed-off-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Ben Hutchings <>
7 years agoALSA: virtuoso: add Xonar Essence STX II support
Clemens Ladisch [Mon, 4 Aug 2014 13:17:55 +0000 (15:17 +0200)]
ALSA: virtuoso: add Xonar Essence STX II support

commit f42bb22243d2ae264d721b055f836059fe35321f upstream.

Just add the PCI ID for the STX II.  It appears to work the same as the
STX, except for the addition of the not-yet-supported daughterboard.

Tested-by: Mario <>
Tested-by: corubba <>
Signed-off-by: Clemens Ladisch <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Ben Hutchings <>
7 years agoALSA: virtuoso: Xonar DSX support
Sergiu Giurgiu [Sun, 9 Sep 2012 09:14:15 +0000 (05:14 -0400)]
ALSA: virtuoso: Xonar DSX support

commit 4492363251235c4499a2d073c5f09121ea23d39d upstream.

This patch adds support for ASUS - Xonar DSX sound cards. Tested on
openSUSE 12.2 with kernel:
Linux 3.4.6-2.10-desktop #1 SMP PREEMPT Thu Jul 26 09:36:26 UTC 2012 (641c197) x86_64 x86_64 x86_64 GNU/Linux
 - play sounds
 - adjust volume on master channel.
 - mute .

Since Xonar DS uses the same chip, everything that works for DS should
work for DSX as well.

Signed-off-by: Sergiu Giurgiu <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Ben Hutchings <>
7 years agoUSB: serial: ftdi_sio: Add support for new Xsens devices
Patrick Riphagen [Thu, 24 Jul 2014 07:09:50 +0000 (09:09 +0200)]
USB: serial: ftdi_sio: Add support for new Xsens devices

commit 4bdcde358b4bda74e356841d351945ca3f2245dd upstream.

This adds support for new Xsens devices, using Xsens' own Vendor ID.

Signed-off-by: Patrick Riphagen <>
Signed-off-by: Frans Klaver <>
Cc: Johan Hovold <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agoUSB: serial: ftdi_sio: Annotate the current Xsens PID assignments
Patrick Riphagen [Thu, 24 Jul 2014 07:12:52 +0000 (09:12 +0200)]
USB: serial: ftdi_sio: Annotate the current Xsens PID assignments

commit 9273b8a270878906540349422ab24558b9d65716 upstream.

The converters are used in specific products. It can be useful to know
which they are exactly.

Signed-off-by: Patrick Riphagen <>
Signed-off-by: Frans Klaver <>
Cc: Johan Hovold <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agonetlabel: fix a problem when setting bits below the previously lowest bit
Paul Moore [Fri, 1 Aug 2014 15:17:03 +0000 (11:17 -0400)]
netlabel: fix a problem when setting bits below the previously lowest bit

commit 41c3bd2039e0d7b3dc32313141773f20716ec524 upstream.

The NetLabel category (catmap) functions have a problem in that they
assume categories will be set in an increasing manner, e.g. the next
category set will always be larger than the last.  Unfortunately, this
is not a valid assumption and could result in problems when attempting
to set categories less than the startbit in the lowest catmap node.
In some cases kernel panics and other nasties can result.

This patch corrects the problem by checking for this and allocating a
new catmap node instance and placing it at the front of the list.

Reported-by: Christian Evans <>
Signed-off-by: Paul Moore <>
Tested-by: Casey Schaufler <>
[bwh: Backported to 3.2: adjust filename for SMACK]
Signed-off-by: Ben Hutchings <>
7 years agonetlabel: use GFP flags from caller instead of GFP_ATOMIC
Dan Carpenter [Wed, 21 Mar 2012 20:41:01 +0000 (20:41 +0000)]
netlabel: use GFP flags from caller instead of GFP_ATOMIC

commit 64b5fad526f63e9b56752a7e8e153b99ec0ddecd upstream.

This function takes a GFP flags as a parameter, but they are never used.
We don't take a lock in this function so there is no reason to prefer
GFP_ATOMIC over the caller's GFP flags.

There is only one caller, cipso_v4_map_cat_rng_ntoh(), and it passes
GFP_ATOMIC as the GFP flags so this doesn't change how the code works.
It's just a cleanup.

Signed-off-by: Dan Carpenter <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
Jeremy Vial [Thu, 31 Jul 2014 13:10:33 +0000 (15:10 +0200)]
ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.

commit 9b5f7428f8b16bd8980213f2b70baf1dd0b9e36c upstream.

According to the comment “restore_es3: applies to 34xx >= ES3.0" in
"arch/arm/mach-omap2/sleep34xx.S”, omap3_restore_es3 should be used
if the revision of an OMAP34xx is ES3.1.2.

Signed-off-by: Jeremy Vial <>
Signed-off-by: Tony Lindgren <>
Signed-off-by: Ben Hutchings <>
7 years agomnt: Change the default remount atime from relatime to the existing value
Eric W. Biederman [Tue, 29 Jul 2014 00:36:04 +0000 (17:36 -0700)]
mnt: Change the default remount atime from relatime to the existing value

commit ffbc6f0ead47fa5a1dc9642b0331cb75c20a640e upstream.

Since March 2009 the kernel has treated the state that if no
MS_..ATIME flags are passed then the kernel defaults to relatime.

Defaulting to relatime instead of the existing atime state during a
remount is silly, and causes problems in practice for people who don't
specify any MS_...ATIME flags and to get the default filesystem atime
setting.  Those users may encounter a permission error because the
default atime setting does not work.

A default that does not work and causes permission problems is
ridiculous, so preserve the existing value to have a default
atime setting that is always guaranteed to work.

Using the default atime setting in this way is particularly
interesting for applications built to run in restricted userspace
environments without /proc mounted, as the existing atime mount
options of a filesystem can not be read from /proc/mounts.

In practice this fixes user space that uses the default atime
setting on remount that are broken by the permission checks
keeping less privileged users from changing more privileged users
atime settings.

Acked-by: Serge E. Hallyn <>
Signed-off-by: "Eric W. Biederman" <>
[bwh: Backported to 3.2: add definition of MNT_ATIME_MASK, as we don't
 need the fix that introduced that definition upstream]
Signed-off-by: Ben Hutchings <>
7 years agocrypto: af_alg - properly label AF_ALG socket
Milan Broz [Tue, 29 Jul 2014 18:41:09 +0000 (18:41 +0000)]
crypto: af_alg - properly label AF_ALG socket

commit 4c63f83c2c2e16a13ce274ee678e28246bd33645 upstream.

Th AF_ALG socket was missing a security label (e.g. SELinux)
which means that socket was in "unlabeled" state.

This was recently demonstrated in the cryptsetup package
(cryptsetup v1.6.5 and later.)

This patch clones the sock's label from the parent sock
and resolves the issue (similar to AF_BLUETOOTH protocol family).

Signed-off-by: Milan Broz <>
Acked-by: Paul Moore <>
Signed-off-by: Herbert Xu <>
Signed-off-by: Ben Hutchings <>
7 years agoMIPS: GIC: Prevent array overrun
Jeffrey Deans [Thu, 17 Jul 2014 08:20:56 +0000 (09:20 +0100)]
MIPS: GIC: Prevent array overrun

commit ffc8415afab20bd97754efae6aad1f67b531132b upstream.

A GIC interrupt which is declared as having a GIC_MAP_TO_NMI_MSK
mapping causes the cpu parameter to gic_setup_intr() to be increased
to 32, causing memory corruption when pcpu_masks[] is written to again
later in the function.

Signed-off-by: Jeffrey Deans <>
Signed-off-by: Markos Chandras <>
Signed-off-by: Ralf Baechle <>
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (amc6821) Fix possible race condition bug
Axel Lin [Thu, 31 Jul 2014 01:43:19 +0000 (09:43 +0800)]
hwmon: (amc6821) Fix possible race condition bug

commit cf44819c98db11163f58f08b822d626c7a8f5188 upstream.

Ensure mutex lock protects the read-modify-write period to prevent possible
race condition bug.
In additional, update data->valid should also be protected by the mutex lock.

Signed-off-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (amc6821) Fix return value
Sachin Kamat [Wed, 11 Sep 2013 04:19:50 +0000 (09:49 +0530)]
hwmon: (amc6821) Fix return value

commit 3499e5b2e14b792fe411302fea3b6fcc4ba40ef2 upstream.

Propagate return value obtained from i2c_smbus_read_byte_data()
instead of hardcoding.

Signed-off-by: Sachin Kamat <>
Cc: T. Mertelj <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (lm78) Fix overflow problems seen when writing large temperature limits
Guenter Roeck [Wed, 30 Jul 2014 03:48:59 +0000 (20:48 -0700)]
hwmon: (lm78) Fix overflow problems seen when writing large temperature limits

commit 1074d683a51f1aded3562add9ef313e75d557327 upstream.

On platforms with sizeof(int) < sizeof(long), writing a temperature
limit larger than MAXINT will result in unpredictable limit values
written to the chip. Avoid auto-conversion from long to int to fix
the problem.

Cc: Axel Lin <>
Reviewed-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (lm85) Fix various errors on attribute writes
Guenter Roeck [Wed, 30 Jul 2014 05:23:12 +0000 (22:23 -0700)]
hwmon: (lm85) Fix various errors on attribute writes

commit 3248c3b771ddd9d31695da17ba350eb6e1b80a53 upstream.

Temperature limit register writes did not account for negative numbers.
As a result, writing -127000 resulted in -126000 written into the
temperature limit register. This problem affected temp[1-3]_min,
temp[1-3]_max, temp[1-3]_auto_temp_crit, and temp[1-3]_auto_temp_min.

When writing pwm[1-3]_freq, a long variable was auto-converted into an int
without range check. Wiring values larger than MAXINT resulted in unexpected
register values.

When writing temp[1-3]_auto_temp_max, an unsigned long variable was
auto-converted into an int without range check. Writing values larger than
MAXINT resulted in unexpected register values.

vrm is an u8, so the written value needs to be limited to [0, 255].

Cc: Axel Lin <>
Reviewed-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
[bwh: Backported to 3.2:
 - Driver is not using clamp_val(); keep using SENSORS_LIMIT() for consistency
 - Driver is not using kstrtoul(); make the minimum change to store_vrm_reg()
   so we can validate the value before assigning]
Signed-off-by: Ben Hutchings <>
7 years agoext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
Theodore Ts'o [Thu, 31 Jul 2014 02:17:17 +0000 (22:17 -0400)]
ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct

commit 86f0afd463215fc3e58020493482faa4ac3a4d69 upstream.

If there is a failure while allocating the preallocation structure, a
number of blocks can end up getting marked in the in-memory buddy
bitmap, and then not getting released.  This can result in the
following corruption getting reported by the kernel:

EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
12793 clusters in bitmap, 12729 in gd

In that case, we need to release the blocks using mb_free_blocks().

Tested: fs smoke test; also demonstrated that with injected errors,
the file system is no longer getting corrupted

Google-Bug-Id: 16657874

Signed-off-by: "Theodore Ts'o" <>
Signed-off-by: Ben Hutchings <>
7 years agoext4: cleanup in ext4_discard_allocated_blocks()
Zheng Liu [Mon, 28 May 2012 21:53:53 +0000 (17:53 -0400)]
ext4: cleanup in ext4_discard_allocated_blocks()

commit 400db9d30146dc062aaba97a6301b425eb6015bc upstream.

remove 'len' variable in ext4_discard_allocated_blocks() because it is

Signed-off-by: Zheng Liu <>
Signed-off-by: "Theodore Ts'o" <>
Signed-off-by: Ben Hutchings <>
7 years agomd/raid1,raid10: always abort recover on write error.
NeilBrown [Thu, 31 Jul 2014 00:16:29 +0000 (10:16 +1000)]
md/raid1,raid10: always abort recover on write error.

commit 2446dba03f9dabe0b477a126cbeb377854785b47 upstream.

Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui <>
Reported-and-tested-by: jiao hui <>
Signed-off-by: NeilBrown <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agomm, thp: do not allow thp faults to avoid cpuset restrictions
David Rientjes [Wed, 30 Jul 2014 23:08:24 +0000 (16:08 -0700)]
mm, thp: do not allow thp faults to avoid cpuset restrictions

commit b104a35d32025ca740539db2808aa3385d0f30eb upstream.

The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET
should be set in allocflags.  ALLOC_CPUSET controls if a page allocation
should be restricted only to the set of allowed cpuset mems.

Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent
the fault path from using memory compaction or direct reclaim.  Thus, it
is unfairly able to allocate outside of its cpuset mems restriction as a

This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is
truly GFP_ATOMIC by verifying it is also not a thp allocation.

Signed-off-by: David Rientjes <>
Reported-by: Alex Thorlton <>
Tested-by: Alex Thorlton <>
Cc: Bob Liu <>
Cc: Dave Hansen <>
Cc: Hedi Berriche <>
Cc: Hugh Dickins <>
Cc: Johannes Weiner <>
Cc: Kirill A. Shutemov <>
Cc: Mel Gorman <>
Cc: Rik van Riel <>
Cc: Srivatsa S. Bhat <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agoMIPS: Prevent user from setting FCSR cause bits
Paul Burton [Tue, 22 Jul 2014 13:21:21 +0000 (14:21 +0100)]
MIPS: Prevent user from setting FCSR cause bits

commit b1442d39fac2fcfbe6a4814979020e993ca59c9e upstream.

If one or more matching FCSR cause & enable bits are set in saved thread
context then when that context is restored the kernel will take an FP
exception. This is of course undesirable and considered an oops, leading
to the kernel writing a backtrace to the console and potentially
rebooting depending upon the configuration. Thus the kernel avoids this
situation by clearing the cause bits of the FCSR register when handling
FP exceptions and after emulating FP instructions.

However the kernel does not prevent userland from setting arbitrary FCSR
cause & enable bits via ptrace, using either the PTRACE_POKEUSR or
PTRACE_SETFPREGS requests. This means userland can trivially cause the
kernel to oops on any system with an FPU. Prevent this from happening
by clearing the cause bits when writing to the saved FCSR context via

This problem appears to exist at least back to the beginning of the git
era in the PTRACE_POKEUSR case.

Signed-off-by: Paul Burton <>
Cc: Paul Burton <>
Signed-off-by: Ralf Baechle <>
Signed-off-by: Ben Hutchings <>
7 years agoMIPS: tlbex: Fix a missing statement for HUGETLB
Huacai Chen [Tue, 29 Jul 2014 06:54:40 +0000 (14:54 +0800)]
MIPS: tlbex: Fix a missing statement for HUGETLB

commit 8393c524a25609a30129e4a8975cf3b91f6c16a5 upstream.

In commit 2c8c53e28f1 (MIPS: Optimize TLB handlers for Octeon CPUs)
build_r4000_tlb_refill_handler() is modified. But it doesn't compatible
with the original code in HUGETLB case. Because there is a copy & paste
error and one line of code is missing. It is very easy to produce a bug
with LTP's hugemmap05 test.

Signed-off-by: Huacai Chen <>
Signed-off-by: Binbin Zhou <>
Cc: John Crispin <>
Cc: Steven J. Hill <>
Cc: Fuxin Zhang <>
Cc: Zhangjin Wu <>
Signed-off-by: Ralf Baechle <>
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (ads1015) Fix off-by-one for valid channel index checking
Axel Lin [Wed, 30 Jul 2014 03:13:52 +0000 (11:13 +0800)]
hwmon: (ads1015) Fix off-by-one for valid channel index checking

commit 56de1377ad92f72ee4e5cb0faf7a9b6048fdf0bf upstream.

Current code uses channel as array index, so the valid channel value is
0 .. ADS1015_CHANNELS - 1.

Signed-off-by: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Ben Hutchings <>
7 years agotpm: Provide a generic means to override the chip returned timeouts
Jason Gunthorpe [Thu, 22 May 2014 00:26:44 +0000 (18:26 -0600)]
tpm: Provide a generic means to override the chip returned timeouts

commit 8e54caf407b98efa05409e1fee0e5381abd2b088 upstream.

Some Atmel TPMs provide completely wrong timeouts from their
TPM_CAP_PROP_TIS_TIMEOUT query. This patch detects that and returns
new correct values via a DID/VID table in the TIS driver.

Tested on ARM using an AT97SC3204T FW version 37.16

[PHuewe: without this fix these 'broken' Atmel TPMs won't function on
older kernels]
Signed-off-by: "Berg, Christopher" <>
Signed-off-by: Jason Gunthorpe <>
Signed-off-by: Peter Huewe <>
[bwh: Backported to 3.2:
 - Adjust filename, context
 - s/chip->ops->/chip->vendor./]
Signed-off-by: Ben Hutchings <>
7 years agonet: sendmsg: fix NULL pointer dereference
Andrey Ryabinin [Sat, 26 Jul 2014 17:26:58 +0000 (21:26 +0400)]
net: sendmsg: fix NULL pointer dereference

commit 40eea803c6b2cfaab092f053248cbeab3f368412 upstream.

Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================

This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.

After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.

This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
 affect sendto as it would bail out earlier while trying to copy-in the
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.

This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.

Cc: Hannes Frederic Sowa <>
Cc: Eric Dumazet <>
Reported-by: Sasha Levin <>
Signed-off-by: Andrey Ryabinin <>
Acked-by: Hannes Frederic Sowa <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoiommu/vt-d: Exclude devices using RMRRs from IOMMU API domains
Alex Williamson [Thu, 3 Jul 2014 15:57:02 +0000 (09:57 -0600)]
iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains

commit c875d2c1b8083cd627ea0463e20bf22c2d7421ee upstream.

The user of the IOMMU API domain expects to have full control of
the IOVA space for the domain.  RMRRs are fundamentally incompatible
with that idea.  We can neither map the RMRR into the IOMMU API
domain, nor can we guarantee that the device won't continue DMA with
the area described by the RMRR as part of the new domain.  Therefore
we must prevent such devices from being used by the IOMMU API.

Signed-off-by: Alex Williamson <>
Cc: David Woodhouse <>
Signed-off-by: Joerg Roedel <>
[bwh: Backported to 3.2: driver only operates on PCI devices]
Signed-off-by: Ben Hutchings <>
7 years agoFix gcc-4.9.0 miscompilation of load_balance() in scheduler
Linus Torvalds [Sat, 26 Jul 2014 21:52:01 +0000 (14:52 -0700)]
Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

commit 2062afb4f804afef61cbe62a30cac9a46e58e067 upstream.

Michel Dänzer and a couple of other people reported inexplicable random
oopses in the scheduler, and the cause turns out to be gcc mis-compiling
the load_balance() function when debugging is enabled.  The gcc bug
apparently goes back to gcc-4.5, but slight optimization changes means
that it now showed up as a problem in 4.9.0 and 4.9.1.

The instruction scheduling problem causes gcc to schedule a spill
operation to before the stack frame has been created, which in turn can
corrupt the spilled value if an interrupt comes in.  There may be other
effects of this bug too, but that's the code generation problem seen in
Michel's case.

This is fixed in current gcc HEAD, but the workaround as suggested by
Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
when compiling the kernel, which disables the gcc code that causes the
problem.  This can result in slightly worse debug information for
variable accesses, but that is infinitely preferable to actual code
generation problems.

Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
non-debug builds to verify that the debug build would be identical: we
can do

    export GCC_COMPARE_DEBUG=1

to make gcc internally verify that the result of the build is
independent of the "-g" flag (it will make the compiler build everything
twice, toggling the debug flag, and compare the results).

Without the "-fno-var-tracking-assignments" option, the build would fail
(even with 4.8.3 that didn't show the actual stack frame bug) with a gcc
compare failure.

See also gcc bugzilla:

Reported-by: Michel Dänzer <>
Suggested-by: Markus Trippelsdorf <>
Cc: Jakub Jelinek <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agoDrivers: scsi: storvsc: Implement a eh_timed_out handler
K. Y. Srinivasan [Sat, 12 Jul 2014 16:48:30 +0000 (09:48 -0700)]
Drivers: scsi: storvsc: Implement a eh_timed_out handler

commit 56b26e69c8283121febedd12b3cc193384af46b9 upstream.

On Azure, we have seen instances of unbounded I/O latencies. To deal with
this issue, implement handler that can reset the timeout. Note that the
host gaurantees that it will respond to each command that has been issued.

Signed-off-by: K. Y. Srinivasan <>
Reviewed-by: Hannes Reinecke <>
[hch: added a better comment explaining the issue]
Signed-off-by: Christoph Hellwig <>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <>
7 years agohpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
Stephen M. Cameron [Thu, 3 Jul 2014 15:18:03 +0000 (10:18 -0500)]
hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl

commit 0758f4f732b08b6ef07f2e5f735655cf69fea477 upstream.

When copy_from_user fails, return -EFAULT, not -ENOMEM

Signed-off-by: Stephen M. Cameron <>
Reported-by: Robert Elliott <>
Reviewed-by: Joe Handzik <>
Reviewed-by: Scott Teel <>
Reviewed by: Mike MIller <>
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Ben Hutchings <>
7 years agobfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
Ben Hutchings [Sun, 8 Jun 2014 22:33:25 +0000 (23:33 +0100)]
bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address

commit 03a6c3ff3282ee9fa893089304d951e0be93a144 upstream.

bfa_swap_words() shifts its argument (assumed to be 64-bit) by 32 bits
each way.  In two places the argument type is dma_addr_t, which may be
32-bit, in which case the effect of the bit shift is undefined:

drivers/scsi/bfa/bfa_fcpim.c: In function 'bfa_ioim_send_ioreq':
drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: left shift count >= width of type [enabled by default]
    addr = bfa_sgaddr_le(sg_dma_address(sg));
drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: right shift count >= width of type [enabled by default]
drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: left shift count >= width of type [enabled by default]
    addr = bfa_sgaddr_le(sg_dma_address(sg));
drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: right shift count >= width of type [enabled by default]

Avoid this by adding casts to u64 in bfa_swap_words().

Compile-tested only.

Signed-off-by: Ben Hutchings <>
Reviewed-by: Martin K. Petersen <>
Acked-by: Anil Gurumurthy <>
Fixes: f16a17507b09 ('[SCSI] bfa: remove all OS wrappers')
Signed-off-by: Christoph Hellwig <>
7 years agostaging: vt6655: Fix disassociated messages every 10 seconds
Malcolm Priestley [Wed, 23 Jul 2014 20:35:12 +0000 (21:35 +0100)]
staging: vt6655: Fix disassociated messages every 10 seconds

commit 4aa0abed3a2a11b7d71ad560c1a3e7631c5a31cd upstream.

byReAssocCount is incremented every second resulting in
disassociated message being send every 10 seconds whether
connection or not.

byReAssocCount should only advance while eCommandState

Change existing scope to if condition.

Signed-off-by: Malcolm Priestley <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <>
7 years agostaging: vt6655: Fix Warning on boot handle_irq_event_percpu.
Malcolm Priestley [Wed, 23 Jul 2014 20:35:11 +0000 (21:35 +0100)]
staging: vt6655: Fix Warning on boot handle_irq_event_percpu.

commit 6cff1f6ad4c615319c1a146b2aa0af1043c5e9f5 upstream.

WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0()
irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts

Using spin_lock_irqsave appears to fix this.

Signed-off-by: Malcolm Priestley <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <>
7 years agohwmon: (smsc47m192) Fix temperature limit and vrm write operations
Guenter Roeck [Fri, 18 Jul 2014 14:31:18 +0000 (07:31 -0700)]
hwmon: (smsc47m192) Fix temperature limit and vrm write operations

commit 043572d5444116b9d9ad8ae763cf069e7accbc30 upstream.

Temperature limit clamps are applied after converting the temperature
from milli-degrees C to degrees C, so either the clamp limit needs
to be specified in degrees C, not milli-degrees C, or clamping must
happen before converting to degrees C. Use the latter method to avoid

vrm is an u8, so the written value needs to be limited to [0, 255].

Cc: Axel Lin <>
Signed-off-by: Guenter Roeck <>
Reviewed-by: Jean Delvare <>
[bwh: Backported to 3.2:
 - Driver is not using clamp_val(); keep using SENSORS_LIMIT() for consistency
 - Driver is not using kstrtoul(); make the minimum change to set_vrm() so
   we can validate the value before assigning]
Signed-off-by: Ben Hutchings <>
7 years agodrm/radeon: fix irq ring buffer overflow handling
Christian König [Wed, 23 Jul 2014 07:47:58 +0000 (09:47 +0200)]
drm/radeon: fix irq ring buffer overflow handling

commit e8c214d22e76dd0ead38f97f8d2dc09aac70d651 upstream.

We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.

Signed-off-by: Christian König <>
Signed-off-by: Alex Deucher <>
Reviewed-by: Michel Dänzer <>
[bwh: Backported to 3.2: drop changes for unsupported GPUs]
Signed-off-by: Ben Hutchings <>
7 years agoUSB: Fix persist resume of some SS USB devices
Pratyush Anand [Fri, 18 Jul 2014 07:07:10 +0000 (12:37 +0530)]
USB: Fix persist resume of some SS USB devices

commit a40178b2fa6ad87670fb1e5fa4024db00c149629 upstream.

Problem Summary: Problem has been observed generally with PM states
where VBUS goes off during suspend. There are some SS USB devices which
take longer time for link training compared to many others.  Such
devices fail to reconnect with same old address which was associated
with it before suspend.

When system resumes, at some point of time (dpm_run_callback->
usb_port_resume) SW reads hub status. If device is present,
then it finishes port resume and re-enumerates device with same
address. If device is not present then, SW thinks that device was
removed during suspend and therefore does logical disconnection
and removes all the resource allocated for this device.

Now, if I put sufficient delay just before root hub status read in
usb_resume_device then, SW sees always that device is present. In normal
course(without any delay) SW sees that no device is present and then SW
removes all resource associated with the device at this port.  In the
latter case, after sometime, device says that hey I am here, now host
enumerates it, but with new address.

Problem had been reproduced when I connect verbatim USB3.0 hard disc
with my STiH407 XHCI host running with 3.10 kernel.

I see that similar problem has been reported here.
Reading above it seems that bug was not in 3.6.6 and was present in 3.8
and again it was not present for some in 3.12.6, while it was present
for few others. I tested with 3.13-FC19 running at i686 desktop, problem
was still there. However, I was failed to reproduce it with 3.16-RC4
running at same i686 machine. I would say it is just a random
observation. Problem for few devices is always there, as I am unable to
find a proper fix for the issue.

So, now question is what should be the amount of delay so that host is
always able to recognize suspended device after resume.

XHCI specs 4.19.4 says that when Link training is successful, port sets
CSC bit to 1. So if SW reads port status before successful link
training, then it will not find device to be present.  USB Analyzer log
with such buggy devices show that in some cases device switch on the
RX termination after long delay of host enabling the VBUS. In few other
cases it has been seen that device fails to negotiate link training in
first attempt. It has been reported till now that few devices take as
long as 2000 ms to train the link after host enabling its VBUS and
RX termination. This patch implements a 2000 ms timeout for CSC bit to set
ie for link training. If in a case link trains before timeout, loop will
exit earlier.

This patch implements above delay, but only for SS device and when
persist is enabled.

So, for the good device overhead is almost none. While for the bad
devices penalty could be the time which it take for link training.
But, If a device was connected before suspend, and was removed
while system was asleep, then the penalty would be the timeout ie
2000 ms.


Verbatim USB SS hard disk connected with STiH407 USB host running 3.10
Kernel resumes in 461 msecs without this patch, but hard disk is
assigned a new device address. Same system resumes in 790 msecs with
this patch, but with old device address.

Signed-off-by: Pratyush Anand <>
Acked-by: Alan Stern <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agousbcore: don't log on consecutive debounce failures of the same port
Oliver Neukum [Mon, 14 Jul 2014 13:39:49 +0000 (15:39 +0200)]
usbcore: don't log on consecutive debounce failures of the same port

commit 5ee0f803cc3a0738a63288e4a2f453c85889fbda upstream.

Some laptops have an internal port for a BT device which picks
up noise when the kill switch is used, but not enough to trigger
printk_rlimit(). So we shouldn't log consecutive faults of this kind.

Signed-off-by: Oliver Neukum <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2:
 - Adjust context
 - Error message already includes the port number]
Signed-off-by: Ben Hutchings <>
7 years agoahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
Romain Degez [Fri, 11 Jul 2014 16:08:13 +0000 (18:08 +0200)]
ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)

commit b32bfc06aefab61acc872dec3222624e6cd867ed upstream.

Add support of the Promise FastTrak TX8660 SATA HBA in ahci mode by
registering the board in the ahci_pci_tbl[].

Note: this HBA also provide a hardware RAID mode when activated in
BIOS but specific drivers from the manufacturer are required in this

Signed-off-by: Romain Degez <>
Tested-by: Romain Degez <>
Signed-off-by: Tejun Heo <>
Signed-off-by: Ben Hutchings <>
7 years agoUSB: OHCI: don't lose track of EDs when a controller dies
Alan Stern [Thu, 17 Jul 2014 20:34:29 +0000 (16:34 -0400)]
USB: OHCI: don't lose track of EDs when a controller dies

commit 977dcfdc60311e7aa571cabf6f39c36dde13339e upstream.

This patch fixes a bug in ohci-hcd.  When an URB is unlinked, the
corresponding Endpoint Descriptor is added to the ed_rm_list and taken
off the hardware schedule.  Once the ED is no longer visible to the
hardware, finish_unlinks() handles the URBs that were unlinked or have
completed.  If any URBs remain attached to the ED, the ED is added
back to the hardware schedule -- but only if the controller is

This fails when a controller dies.  A non-empty ED does not get added
back to the hardware schedule and does not remain on the ed_rm_list;
ohci-hcd loses track of it.  The remaining URBs cannot be unlinked,
which causes the USB stack to hang.

The patch changes finish_unlinks() so that non-empty EDs remain on
the ed_rm_list if the controller isn't running.  This requires moving
some of the existing code around, to avoid modifying the ED's hardware
fields more than once.

Signed-off-by: Alan Stern <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: keep using HC_IS_RUNNING()]
Signed-off-by: Ben Hutchings <>
7 years agoscsi: handle flush errors properly
James Bottomley [Thu, 3 Jul 2014 17:17:34 +0000 (19:17 +0200)]
scsi: handle flush errors properly

commit 89fb4cd1f717a871ef79fa7debbe840e3225cd54 upstream.

Flush commands don't transfer data and thus need to be special cased
in the I/O completion handler so that we can propagate errors to
the block layer and filesystem.

Signed-off-by: James Bottomley <>
Reported-by: Steven Haber <>
Tested-by: Steven Haber <>
Reviewed-by: Martin K. Petersen <>
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Ben Hutchings <>
7 years agoBluetooth: never linger on process exit
Vladimir Davydov [Tue, 15 Jul 2014 08:25:28 +0000 (12:25 +0400)]
Bluetooth: never linger on process exit

commit 093facf3634da1b0c2cc7ed106f1983da901bbab upstream.

If the current process is exiting, lingering on socket close will make
it unkillable, so we should avoid it.


  #include <sys/types.h>
  #include <sys/socket.h>

  #define BTPROTO_L2CAP   0
  #define BTPROTO_SCO     2
  #define BTPROTO_RFCOMM  3

  int main()
          int fd;
          struct linger ling;

          //or: fd = socket(PF_BLUETOOTH, SOCK_DGRAM, BTPROTO_L2CAP);
          //or: fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_SCO);

          ling.l_onoff = 1;
          ling.l_linger = 1000000000;
          setsockopt(fd, SOL_SOCKET, SO_LINGER, &ling, sizeof(ling));

          return 0;

Signed-off-by: Vladimir Davydov <>
Signed-off-by: Marcel Holtmann <>
Signed-off-by: Ben Hutchings <>
7 years agox86: don't exclude low BIOS area when allocating address space for non-PCI cards
Christoph Schulz [Wed, 16 Jul 2014 08:00:57 +0000 (10:00 +0200)]
x86: don't exclude low BIOS area when allocating address space for non-PCI cards

commit cbace46a9710a480cae51e4611697df5de41713e upstream.

Commit 30919b0bf356 ("x86: avoid low BIOS area when allocating address
space") moved the test for resource allocations that fall within the first
1MB of address space from the PCI-specific path to a generic path, such
that all resource allocations will avoid this area.  However, this breaks
ISA cards which need to allocate a memory region within the first 1MB.  An
example is the i82365 PCMCIA controller and derivatives like the Ricoh
RF5C296/396 which map part of the PCMCIA socket memory address space into
the first 1MB of system memory address space.  They do not work anymore as
no usable memory region exists due to this change:

  Intel ISA PCIC probe: Ricoh RF5C296/396 ISA-to-PCMCIA at port 0x3e0 ofs 0x00, 2 sockets
  host opts [0]: none
  host opts [1]: none
  ISA irqs (scanned) = 3,4,5,9,10 status change on irq 10
  pcmcia_socket pcmcia_socket1: pccard: PCMCIA card inserted into slot 1
  pcmcia_socket pcmcia_socket0: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
  pcmcia_socket pcmcia_socket0: cs: IO port probe 0xa00-0xaff: clean.
  pcmcia_socket pcmcia_socket0: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
  pcmcia_socket pcmcia_socket0: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
  pcmcia_socket pcmcia_socket0: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
  pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
  pcmcia_socket pcmcia_socket0: cs: memory probe 0x0d0000-0x0dffff: clean.
  pcmcia_socket pcmcia_socket0: cs: memory probe 0x0e0000-0x0effff: clean.
  pcmcia_socket pcmcia_socket0: cs: memory probe 0x60000000-0x60ffffff: clean.
  pcmcia_socket pcmcia_socket0: cs: memory probe 0xa0000000-0xa0ffffff: clean.
  pcmcia_socket pcmcia_socket1: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
  pcmcia_socket pcmcia_socket1: cs: IO port probe 0xa00-0xaff: clean.
  pcmcia_socket pcmcia_socket1: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x0d0000-0x0dffff: clean.
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x0e0000-0x0effff: clean.
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x60000000-0x60ffffff: clean.
  pcmcia_socket pcmcia_socket1: cs: memory probe 0xa0000000-0xa0ffffff: clean.
  pcmcia_socket pcmcia_socket1: cs: memory probe 0x0cc000-0x0effff: excluding 0xe0000-0xeffff
  pcmcia_socket pcmcia_socket1: cs: unable to map card memory!

If filtering out the first 1MB is reverted, everything works as expected.

Tested-by: Robert Resch <>
Signed-off-by: Christoph Schulz <>
Signed-off-by: Bjorn Helgaas <>
Signed-off-by: Ben Hutchings <>
7 years agomtd/ftl: fix the double free of the buffers allocated in build_maps()
Kevin Hao [Thu, 3 Jul 2014 02:35:26 +0000 (10:35 +0800)]
mtd/ftl: fix the double free of the buffers allocated in build_maps()

commit a152056c912db82860a8b4c23d0bd3a5aa89e363 upstream.

I got the following panic on my fsl p5020ds board.

  Unable to handle kernel paging request for data at address 0x7375627379737465
  Faulting instruction address: 0xc000000000100778
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=24 CoreNet Generic
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-next-20140613 #145
  task: c0000000fe080000 ti: c0000000fe088000 task.ti: c0000000fe088000
  NIP: c000000000100778 LR: c00000000010073c CTR: 0000000000000000
  REGS: c0000000fe08aa00 TRAP: 0300   Not tainted  (3.15.0-next-20140613)
  MSR: 0000000080029000 <CE,EE,ME>  CR: 24ad2e24  XER: 00000000
  DEAR: 7375627379737465 ESR: 0000000000000000 SOFTE: 1
  GPR00: c0000000000c99b0 c0000000fe08ac80 c0000000009598e0 c0000000fe001d80
  GPR04: 00000000000000d0 0000000000000913 c000000007902b20 0000000000000000
  GPR08: c0000000feaae888 0000000000000000 0000000007091000 0000000000200200
  GPR12: 0000000028ad2e28 c00000000fff4000 c0000000007abe08 0000000000000000
  GPR16: c0000000007ab160 c0000000007aaf98 c00000000060ba68 c0000000007abda8
  GPR20: c0000000007abde8 c0000000feaea6f8 c0000000feaea708 c0000000007abd10
  GPR24: c000000000989370 c0000000008c6228 00000000000041ed c0000000fe00a400
  GPR28: c00000000017c1cc 00000000000000d0 7375627379737465 c0000000fe001d80
  NIP [c000000000100778] .__kmalloc_track_caller+0x70/0x168
  LR [c00000000010073c] .__kmalloc_track_caller+0x34/0x168
  Call Trace:
  [c0000000fe08ac80] [c00000000087e6b8] uevent_sock_list+0x0/0x10 (unreliable)
  [c0000000fe08ad20] [c0000000000c99b0] .kstrdup+0x44/0x90
  [c0000000fe08adc0] [c00000000017c1cc] .__kernfs_new_node+0x4c/0x130
  [c0000000fe08ae70] [c00000000017d7e4] .kernfs_new_node+0x2c/0x64
  [c0000000fe08aef0] [c00000000017db00] .kernfs_create_dir_ns+0x34/0xc8
  [c0000000fe08af80] [c00000000018067c] .sysfs_create_dir_ns+0x58/0xcc
  [c0000000fe08b010] [c0000000002c711c] .kobject_add_internal+0xc8/0x384
  [c0000000fe08b0b0] [c0000000002c7644] .kobject_add+0x64/0xc8
  [c0000000fe08b140] [c000000000355ebc] .device_add+0x11c/0x654
  [c0000000fe08b200] [c0000000002b5988] .add_disk+0x20c/0x4b4
  [c0000000fe08b2c0] [c0000000003a21d4] .add_mtd_blktrans_dev+0x340/0x514
  [c0000000fe08b350] [c0000000003a3410] .mtdblock_add_mtd+0x74/0xb4
  [c0000000fe08b3e0] [c0000000003a32cc] .blktrans_notify_add+0x64/0x94
  [c0000000fe08b470] [c00000000039b5b4] .add_mtd_device+0x1d4/0x368
  [c0000000fe08b520] [c00000000039b830] .mtd_device_parse_register+0xe8/0x104
  [c0000000fe08b5c0] [c0000000003b8408] .of_flash_probe+0x72c/0x734
  [c0000000fe08b750] [c00000000035ba40] .platform_drv_probe+0x38/0x84
  [c0000000fe08b7d0] [c0000000003599a4] .really_probe+0xa4/0x29c
  [c0000000fe08b870] [c000000000359d3c] .__driver_attach+0x100/0x104
  [c0000000fe08b900] [c00000000035746c] .bus_for_each_dev+0x84/0xe4
  [c0000000fe08b9a0] [c0000000003593c0] .driver_attach+0x24/0x38
  [c0000000fe08ba10] [c000000000358f24] .bus_add_driver+0x1c8/0x2ac
  [c0000000fe08bab0] [c00000000035a3a4] .driver_register+0x8c/0x158
  [c0000000fe08bb30] [c00000000035b9f4] .__platform_driver_register+0x6c/0x80
  [c0000000fe08bba0] [c00000000084e080] .of_flash_driver_init+0x1c/0x30
  [c0000000fe08bc10] [c000000000001864] .do_one_initcall+0xbc/0x238
  [c0000000fe08bd00] [c00000000082cdc0] .kernel_init_freeable+0x188/0x268
  [c0000000fe08bdb0] [c0000000000020a0] .kernel_init+0x1c/0xf7c
  [c0000000fe08be30] [c000000000000884] .ret_from_kernel_thread+0x58/0xd4
  Instruction dump:
  41bd0010 480000c8 4bf04eb5 60000000 e94d0028 e93f0000 7cc95214 e8a60008
  7fc9502a 2fbe0000 419e00c8 e93f0022 <7f7e482a39200000 88ed06b2 992d06b2
  ---[ end trace b4c9a94804a42d40 ]---

It seems that the corrupted partition header on my mtd device triggers
a bug in the ftl. In function build_maps() it will allocate the buffers
needed by the mtd partition, but if something goes wrong such as kmalloc
failure, mtd read error or invalid partition header parameter, it will
free all allocated buffers and then return non-zero. In my case, it
seems that partition header parameter 'NumTransferUnits' is invalid.

And the ftl_freepart() is a function which free all the partition
buffers allocated by build_maps(). Given the build_maps() is a self
cleaning function, so there is no need to invoke this function even
if build_maps() return with error. Otherwise it will causes the
buffers to be freed twice and then weird things would happen.

Signed-off-by: Kevin Hao <>
Signed-off-by: Brian Norris <>
Signed-off-by: Ben Hutchings <>
7 years agogspca_pac7302: Add new usb-id for Genius i-Look 317
Hans de Goede [Wed, 9 Jul 2014 09:20:44 +0000 (06:20 -0300)]
gspca_pac7302: Add new usb-id for Genius i-Look 317

commit 242841d3d71191348f98310e2d2001e1001d8630 upstream.

Tested-and-reported-by: yullaw <>
Signed-off-by: Hans de Goede <>
Signed-off-by: Mauro Carvalho Chehab <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
7 years agotda10071: force modulation to QPSK on DVB-S
Antti Palosaari [Fri, 4 Jul 2014 08:44:39 +0000 (05:44 -0300)]
tda10071: force modulation to QPSK on DVB-S

commit db4175ae2095634dbecd4c847da439f9c83e1b3b upstream.

Only supported modulation for DVB-S is QPSK. Modulation parameter
contains invalid value for DVB-S on some cases, which leads driver
refusing tuning attempt. Due to that, hard code modulation to QPSK
in case of DVB-S.

Signed-off-by: Antti Palosaari <>
Signed-off-by: Mauro Carvalho Chehab <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
7 years agoserial: core: Preserve termios c_cflag for console resume
Peter Hurley [Wed, 9 Jul 2014 13:21:14 +0000 (09:21 -0400)]
serial: core: Preserve termios c_cflag for console resume

commit ae84db9661cafc63d179e1d985a2c5b841ff0ac4 upstream.

When a tty is opened for the serial console, the termios c_cflag
settings are inherited from the console line settings.
However, if the tty is subsequently closed, the termios settings
are lost. This results in a garbled console if the console is later
suspended and resumed.

Preserve the termios c_cflag for the serial console when the tty
is shutdown; this reflects the most recent line settings.

Fixes: Bugzilla #69751, 'serial console does not wake from S3'
Reported-by: Valerio Vanni <>
Acked-by: Alan Cox <>
Signed-off-by: Peter Hurley <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: tty_struct::termios is a pointer]
Signed-off-by: Ben Hutchings <>
7 years agodebugfs: Fix corrupted loop in debugfs_remove_recursive
Steven Rostedt [Mon, 9 Jun 2014 18:06:07 +0000 (14:06 -0400)]
debugfs: Fix corrupted loop in debugfs_remove_recursive

commit 485d44022a152c0254dd63445fdb81c4194cbf0e upstream.

[ I'm currently running my tests on it now, and so far, after a few
 hours it has yet to blow up. I'll run it for 24 hours which it never
 succeeded in the past. ]

The tracing code has a way to make directories within the debugfs file
system as well as deleting them using mkdir/rmdir in the instance
directory. This is very limited in functionality, such as there is
no renames, and the parent directory "instance" can not be modified.
The tracing code creates the instance directory from the debugfs code
and then replaces the dentry->d_inode->i_op with its own to allow
for mkdir/rmdir to work.

When these are called, the d_entry and inode locks need to be released
to call the instance creation and deletion code. That code has its own
accounting and locking to serialize everything to prevent multiple
users from causing harm. As the parent "instance" directory can not
be modified this simplifies things.

I created a stress test that creates several threads that randomly
creates and deletes directories thousands of times a second. The code
stood up to this test and I submitted it a while ago.

Recently I added a new test that adds readers to the mix. While the
instance directories were being added and deleted, readers would read
from these directories and even enable tracing within them. This test
was able to trigger a bug:

 general protection fault: 0000 [#1] PREEMPT SMP
 Modules linked in: ...
 CPU: 3 PID: 17789 Comm: rmdir Tainted: G        W     3.15.0-rc2-test+ #41
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
 RIP: 0010:[<ffffffff811ed5eb>]  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
 RSP: 0018:ffff880077019df8  EFLAGS: 00010246
 RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
 RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
 RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
 R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
 R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
 FS:  00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
  ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
  0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
  00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
 Call Trace:
  [<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
  [<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
  [<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
  [<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
  [<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
 Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
 RIP  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
  RSP <ffff880077019df8>

It took a while, but every time it triggered, it was always in the
same place:

list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {

Where the child->d_u.d_child seemed to be corrupted.  I added lots of
trace_printk()s to see what was wrong, and sure enough, it was always
the child's d_u.d_child field. I looked around to see what touches
it and noticed that in __dentry_kill() which calls dentry_free():

static void dentry_free(struct dentry *dentry)
/* if dentry was never visible to RCU, immediate free is OK */
if (!(dentry->d_flags & DCACHE_RCUACCESS))
call_rcu(&dentry->d_u.d_rcu, __d_free);

I also noticed that __dentry_kill() unlinks the child->d_u.child
under the parent->d_lock spin_lock.

Looking back at the loop in debugfs_remove_recursive() it never takes the
parent->d_lock to do the list walk. Adding more tracing, I was able to
prove this was the issue:

 ftrace-t-15385   1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
    rmdir-15409   2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058

The dentry_kill freed ffff88006d573600 just as the remove recursive was walking

In order to fix this, the list walk needs to be modified a bit to take
the parent->d_lock. The safe version is no longer necessary, as every
time we remove a child, the parent->d_lock must be released and the
list walk must start over. Each time a child is removed, even though it
may still be on the list, it should be skipped by the first check
in the loop:

if (!debugfs_positive(child))

Signed-off-by: Steven Rostedt <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: deleted code is slightly different; we don't
 have list_next_entry()]
Signed-off-by: Ben Hutchings <>
7 years agostable_kernel_rules: Add pointer to netdev-FAQ for network patches
Dave Chiluk [Tue, 24 Jun 2014 15:11:26 +0000 (10:11 -0500)]
stable_kernel_rules: Add pointer to netdev-FAQ for network patches

commit b76fc285337b6b256e9ba20a40cfd043f70c27af upstream.

Stable_kernel_rules should point submitters of network stable patches to the
netdev_FAQ.txt as requests for stable network patches should go to netdev

Signed-off-by: Dave Chiluk <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agoblock: don't assume last put of shared tags is for the host
Christoph Hellwig [Tue, 8 Jul 2014 10:25:28 +0000 (12:25 +0200)]
block: don't assume last put of shared tags is for the host

commit d45b3279a5a2252cafcd665bbf2db8c9b31ef783 upstream.

There is no inherent reason why the last put of a tag structure must be
the one for the Scsi_Host, as device model objects can be held for
arbitrary periods.  Merge blk_free_tags and __blk_free_tags into a single
funtion that just release a references and get rid of the BUG() when the
host reference wasn't the last.

Signed-off-by: Christoph Hellwig <>
Signed-off-by: Jens Axboe <>
Signed-off-by: Ben Hutchings <>
7 years agoASoC: samsung: Correct I2S DAI suspend/resume ops
Sylwester Nawrocki [Fri, 4 Jul 2014 14:05:45 +0000 (16:05 +0200)]
ASoC: samsung: Correct I2S DAI suspend/resume ops

commit d3d4e5247b013008a39e4d5f69ce4c60ed57f997 upstream.

We should save/restore relevant I2S registers regardless of
the dai->active flag, otherwise some settings are being lost
after system suspend/resume cycle. E.g. I2S slave mode set only
during dai initialization is not preserved and the device ends
up in master mode after system resume.

Signed-off-by: Sylwester Nawrocki <>
Signed-off-by: Mark Brown <>
Signed-off-by: Ben Hutchings <>
7 years agoKVM: x86: Inter-privilege level ret emulation is not implemeneted
Nadav Amit [Sun, 15 Jun 2014 13:12:59 +0000 (16:12 +0300)]
KVM: x86: Inter-privilege level ret emulation is not implemeneted

commit 9e8919ae793f4edfaa29694a70f71a515ae9942a upstream.

Return unhandlable error on inter-privilege level ret instruction.  This is
since the current emulation does not check the privilege level correctly when
loading the CS, and does not pop RSP/SS as needed.

Signed-off-by: Nadav Amit <>
Signed-off-by: Paolo Bonzini <>
Signed-off-by: Ben Hutchings <>
7 years agoLinux 3.2.62
Ben Hutchings [Wed, 6 Aug 2014 17:07:43 +0000 (18:07 +0100)]
Linux 3.2.62

7 years agoiommu/vt-d: Disable translation if already enabled
Takao Indoh [Tue, 23 Apr 2013 08:35:03 +0000 (17:35 +0900)]
iommu/vt-d: Disable translation if already enabled

commit 3a93c841c2b3b14824f7728dd74bd00a1cedb806 upstream.

This patch disables translation(dma-remapping) before its initialization
if it is already enabled.

This is needed for kexec/kdump boot. If dma-remapping is enabled in the
first kernel, it need to be disabled before initializing its page table
during second kernel boot. Wei Hu also reported that this is needed
when second kernel boots with intel_iommu=off.

Basically iommu->gcmd is used to know whether translation is enabled or
disabled, but it is always zero at boot time even when translation is
enabled since iommu->gcmd is initialized without considering such a
case. Therefor this patch synchronizes iommu->gcmd value with global
command register when iommu structure is allocated.

Signed-off-by: Takao Indoh <>
Signed-off-by: Joerg Roedel <>
[wyj: Backported to 3.4: adjust context]
Signed-off-by: Yijing Wang <>
Signed-off-by: Ben Hutchings <>
7 years agox86_32, entry: Store badsys error code in %eax
Sven Wegener [Tue, 22 Jul 2014 08:26:06 +0000 (10:26 +0200)]
x86_32, entry: Store badsys error code in %eax

commit 8142b215501f8b291a108a202b3a053a265b03dd upstream.

Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.

The following code:

> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));

results in:

> result=666 errno=0 error=Success

Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:

> result=-1 errno=38 error=Function not implemented

The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.

Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.

Signed-off-by: Sven Wegener <>
Reviewed-and-tested-by: Andy Lutomirski <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Ben Hutchings <>
7 years agolibata: introduce ata_host->n_tags to avoid oops on SAS controllers
Tejun Heo [Wed, 23 Jul 2014 13:05:27 +0000 (09:05 -0400)]
libata: introduce ata_host->n_tags to avoid oops on SAS controllers

commit 1a112d10f03e83fb3a2fdc4c9165865dec8a3ca6 upstream.

1871ee134b73 ("libata: support the ata host which implements a queue
depth less than 32") directly used ata_port->scsi_host->can_queue from
ata_qc_new() to determine the number of tags supported by the host;
unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
leading to the following oops.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
 IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
 PGD 0
 Oops: 0002 [#1] SMP
 Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
 CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
 Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
 task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
 RIP: 0010:[<ffffffff814e0618>]  [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
 RSP: 0018:ffff88061a003ae8  EFLAGS: 00010012
 RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
 RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
 RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
 R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
 FS:  00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
  ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
  ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
  ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
 Call Trace:
  [<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
  [<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
  [<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
  [<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
  [<ffffffff81317613>] __blk_run_queue+0x33/0x40
  [<ffffffff8131781a>] queue_unplugged+0x2a/0x90
  [<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
  [<ffffffff8131d274>] blk_finish_plug+0x14/0x50
  [<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
  [<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
  [<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
  [<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
  [<ffffffff81219897>] blkdev_read_iter+0x37/0x40
  [<ffffffff811e307e>] new_sync_read+0x7e/0xb0
  [<ffffffff811e3734>] vfs_read+0x94/0x170
  [<ffffffff811e43c6>] SyS_read+0x46/0xb0
  [<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
  [<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
 Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00

Fix it by introducing ata_host->n_tags which is initialized to
ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
scsi_host_template->can_queue in ata_host_register() for !SAS ones.
As SAS hosts are never registered, this will give them the same
ATA_MAX_QUEUE - 1 as before.  Note that we can't use
scsi_host->can_queue directly for SAS hosts anyway as they can go
higher than the libata maximum.

Signed-off-by: Tejun Heo <>
Reported-by: Mike Qiu <>
Reported-by: Jesse Brandeburg <>
Reported-by: Peter Hurley <>
Reported-by: Peter Zijlstra <>
Tested-by: Alexey Kardashevskiy <>
Fixes: 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
Cc: Kevin Hao <>
Cc: Dan Williams <>
Signed-off-by: Ben Hutchings <>
7 years agolibata: support the ata host which implements a queue depth less than 32
Kevin Hao [Sat, 12 Jul 2014 04:08:24 +0000 (12:08 +0800)]
libata: support the ata host which implements a queue depth less than 32

commit 1871ee134b73fb4cadab75752a7152ed2813c751 upstream.

The sata on fsl mpc8315e is broken after the commit 8a4aeec8d2d6
("libata/ahci: accommodate tag ordered controllers"). The reason is
that the ata controller on this SoC only implement a queue depth of
16. When issuing the commands in tag order, all the commands in tag
16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata
malfunction. It makes no senses to use a 32 queue in software while
the hardware has less queue depth. So consider the queue depth
implemented by the hardware when requesting a command tag.

Fixes: 8a4aeec8d2d6 ("libata/ahci: accommodate tag ordered controllers")
Signed-off-by: Kevin Hao <>
Acked-by: Dan Williams <>
Signed-off-by: Tejun Heo <>
Signed-off-by: Ben Hutchings <>
7 years agomm: kmemleak: avoid false negatives on vmalloc'ed objects
Catalin Marinas [Tue, 12 Nov 2013 23:07:45 +0000 (15:07 -0800)]
mm: kmemleak: avoid false negatives on vmalloc'ed objects

commit 7f88f88f83ed609650a01b18572e605ea50cd163 upstream.

Commit 248ac0e1943a ("mm/vmalloc: remove guard page from between vmap
blocks") had the side effect of making vmap_area.va_end member point to
the next vmap_area.va_start.  This was creating an artificial reference
to vmalloc'ed objects and kmemleak was rarely reporting vmalloc() leaks.

This patch marks the vmap_area containing pointers explicitly and
reduces the min ref_count to 2 as vm_struct still contains a reference
to the vmalloc'ed object.  The kmemleak add_scan_area() function has
been improved to allow a SIZE_MAX argument covering the rest of the
object (for simpler calling sites).

Signed-off-by: Catalin Marinas <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agointroduce SIZE_MAX
Xi Wang [Thu, 31 May 2012 23:26:04 +0000 (16:26 -0700)]
introduce SIZE_MAX

commit a3860c1c5dd1137db23d7786d284939c5761d517 upstream.

ULONG_MAX is often used to check for integer overflow when calculating
allocation size.  While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang <>
Acked-by: Alex Elder <>
Cc: David Airlie <>
Cc: Pekka Enberg <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agoceph: fix overflow check in build_snap_context()
Xi Wang [Thu, 16 Feb 2012 16:56:29 +0000 (11:56 -0500)]
ceph: fix overflow check in build_snap_context()

commit 80834312a4da1405a9bc788313c67643de6fcb4c upstream.

The overflow check for a + n * b should be (n > (ULONG_MAX - a) / b),
rather than (n > ULONG_MAX / b - a).

Signed-off-by: Xi Wang <>
Signed-off-by: Sage Weil <>
Signed-off-by: Ben Hutchings <>
7 years agoARM: 7670/1: fix the memset fix
Nicolas Pitre [Tue, 12 Mar 2013 12:00:42 +0000 (13:00 +0100)]
ARM: 7670/1: fix the memset fix

commit 418df63adac56841ef6b0f1fcf435bc64d4ed177 upstream.

Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value.  However the memset itself became broken
by that patch for misaligned pointers.

This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.

Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.

While at it, the entry instructions are slightly reworked to help dual
issue pipelines.

Signed-off-by: Nicolas Pitre <>
Tested-by: Alexander Holler <>
Signed-off-by: Russell King <>
Signed-off-by: Ben Hutchings <>
7 years agoARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations
Ivan Djelic [Wed, 6 Mar 2013 19:09:27 +0000 (20:09 +0100)]
ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations

commit 455bd4c430b0c0a361f38e8658a0d6cb469942b5 upstream.

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <>
Reviewed-by: Nicolas Pitre <>
Signed-off-by: Dirk Behme <>
Signed-off-by: Russell King <>
Signed-off-by: Ben Hutchings <>
7 years agomm: hugetlb: fix copy_hugetlb_page_range()
Naoya Horiguchi [Wed, 23 Jul 2014 21:00:19 +0000 (14:00 -0700)]
mm: hugetlb: fix copy_hugetlb_page_range()

commit 0253d634e0803a8376a0d88efee0bf523d8673f9 upstream.

Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.

The test program for the problem is shown below:

  $ cat heap.c
  #include <unistd.h>
  #include <stdlib.h>
  #include <string.h>

  #define HPS 0x200000

  int main() {
   int i;
   char *p = malloc(HPS);
   memset(p, '1', HPS);
   for (i = 0; i < 5; i++) {
   if (!fork()) {
   memset(p, '2', HPS);
   p = malloc(HPS);
   memset(p, '3', HPS);
   return 0;
   return 0;

  $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap

Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.

Signed-off-by: Naoya Horiguchi <>
Reported-by: Guillaume Morin <>
Suggested-by: Guillaume Morin <>
Acked-by: Hugh Dickins <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agocrypto: testmgr - update LZO compression test vectors
Markus F.X.J. Oberhumer [Sun, 14 Oct 2012 13:39:04 +0000 (15:39 +0200)]
crypto: testmgr - update LZO compression test vectors

commit 0ec7382036922be063b515b2a3f1d6f7a607392c upstream.

Update the LZO compression test vectors according to the latest compressor

Signed-off-by: Markus F.X.J. Oberhumer <>
Signed-off-by: Ben Hutchings <>
7 years agoipvs: stop tot_stats estimator only under CONFIG_SYSCTL
Julian Anastasov [Wed, 16 Jul 2014 08:26:47 +0000 (10:26 +0200)]
ipvs: stop tot_stats estimator only under CONFIG_SYSCTL

[ Upstream commit 9802d21e7a0b0d2167ef745edc1f4ea7a0fc6ea3 ]

The tot_stats estimator is started only when CONFIG_SYSCTL
is defined. But it is stopped without checking CONFIG_SYSCTL.
Fix the crash by moving ip_vs_stop_estimator into

The change is needed after commit 14e405461e664b
("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39.

Reported-by: Jet Chen <>
Tested-by: Jet Chen <>
Signed-off-by: Julian Anastasov <>
Signed-off-by: Simon Horman <>
Signed-off-by: Ben Hutchings <>
7 years agox86, ioremap: Speed up check for RAM pages
Roland Dreier [Fri, 2 May 2014 18:18:41 +0000 (11:18 -0700)]
x86, ioremap: Speed up check for RAM pages

commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream.

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier <>
Signed-off-by: H. Peter Anvin <>
Signed-off-by: Ben Hutchings <>
7 years agosym53c8xx_2: Set DID_REQUEUE return code when aborting squeue
Mikulas Patocka [Wed, 9 Apr 2014 01:52:05 +0000 (21:52 -0400)]
sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue

commit fd1232b214af43a973443aec6a2808f16ee5bf70 upstream.

This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.

When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function

This function aborts them with DID_SOFT_ERROR.

If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.

The result is that disk returning QUEUE FULL causes request failures.

The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags.  The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.

Signed-off-by: Mikulas Patocka <>
Cc: Matthew Wilcox <>
Cc: James Bottomley <>
Signed-off-by: Linus Torvalds <>
7 years agoapplicom: dereferencing NULL on error path
Dan Carpenter [Fri, 9 May 2014 11:59:16 +0000 (14:59 +0300)]
applicom: dereferencing NULL on error path

commit 8bab797c6e5724a43b7666ad70860712365cdb71 upstream.

This is a static checker fix.  The "dev" variable is always NULL after
the while statement so we would be dereferencing a NULL pointer here.

Fixes: 819a3eba4233 ('[PATCH] applicom: fix error handling')
Signed-off-by: Dan Carpenter <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agox86-32, espfix: Remove filter for espfix32 due to race
H. Peter Anvin [Wed, 30 Apr 2014 21:03:25 +0000 (14:03 -0700)]
x86-32, espfix: Remove filter for espfix32 due to race

commit 246f2d2ee1d715e1077fc47d61c394569c8ee692 upstream.

It is not safe to use LAR to filter when to go down the espfix path,
because the LDT is per-process (rather than per-thread) and another
thread might change the descriptors behind our back.  Fortunately it
is always *safe* (if a bit slow) to go down the espfix path, and a
32-bit LDT stack segment is extremely rare.

Signed-off-by: H. Peter Anvin <>
Signed-off-by: Ben Hutchings <>
7 years agoscore: normalize global variables exported by
Jiang Liu [Wed, 3 Jul 2013 22:03:37 +0000 (15:03 -0700)]
score: normalize global variables exported by

commit ae49b83dcacfb69e22092cab688c415c2f2d870c upstream.

Generate mandatory global variables _sdata in file

Signed-off-by: Jiang Liu <>
Cc: Chen Liqin <>
Cc: Lennox Wu <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agoalpha: add io{read,write}{16,32}be functions
Michael Cree [Wed, 30 Nov 2011 13:01:40 +0000 (08:01 -0500)]
alpha: add io{read,write}{16,32}be functions

commit 25534eb7707821b796fd84f7115367e02f36aa60 upstream.

These functions are used in some PCI drivers with big-endian
MMIO space.

Admittedly it is almost certain that no one this side of the
Moon would use such a card in an Alpha but it does get us
closer to being able to build allyesconfig or allmodconfig,
and it enables the Debian default generic config to build.

Tested-by: Raúl Porcel <>
Signed-off-by: Michael Cree <>
Signed-off-by: Matt Turner <>
Signed-off-by: Ben Hutchings <>
7 years agoscore: Add missing #include <linux/export.h>
Ben Hutchings [Sun, 3 Aug 2014 16:45:10 +0000 (17:45 +0100)]
score: Add missing #include <linux/export.h>

There is no upstream commit for this, as arch/score/kernel/init_task.c
has been replaced by generic code and <linux/export.h> is included
indirectly by arch/score/mm/init.c.

Signed-off-by: Ben Hutchings <>
7 years agoScore: The commit is for compiling successfully. The modifications include: 1. Kconfi...
Lennox Wu [Sat, 14 Sep 2013 05:48:37 +0000 (13:48 +0800)]
Score: The commit is for compiling successfully. The modifications include: 1. Kconfig of Score: we don't support ioremap 2. Missed headfile including 3. There are some errors in other people's commit not checked by us, we fix it now 3.1 arch/score/kernel/entry.S: wrong instructions 3.2 arch/score/kernel/process.c : just some typos

commit 5fbbf8a1a93452b26e7791cf32cefce62b0a480b upstream.

Signed-off-by: Lennox Wu <>
[bwh: Backported to 3.2:
 - Drop addition of 'select HAVE_GENERIC_HARDIRQS' which was not removed here
 - Drop inapplicale change to copy_thread()]
Signed-off-by: Ben Hutchings <>
7 years agounicore32: select generic atomic64_t support
Fengguang Wu [Fri, 5 Oct 2012 00:11:23 +0000 (17:11 -0700)]
unicore32: select generic atomic64_t support

commit 82e54a6aaf8aec971fb16afa3a4404e238a1b98b upstream.

It's required for the core fs/namespace.c and many other basic features.

Signed-off-by: Guan Xuetao <>
Signed-off-by: Fengguang Wu <>
Cc: "Eric W. Biederman" <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
7 years agounicore32: add ioremap_nocache definition
Guan Xuetao [Thu, 18 Aug 2011 07:38:05 +0000 (15:38 +0800)]
unicore32: add ioremap_nocache definition

commit a50e4213e71adc7dde0d514aabd8af7275fee39f upstream.

Bugfix for following error messages:
lib/iomap.c: In function 'pci_iomap':
lib/iomap.c:274: error: implicit declaration of function 'ioremap_nocache'
lib/iomap.c:274: warning: return makes pointer from integer without a cast

Also see commit <f1ecc69838a2d7c8a3e1909f637d4083c071777d>
  it will hide the ioremap_nocache function for systems with an MMU

Signed-off-by: Guan Xuetao <>
Cc: Arnd Bergmann <>
Cc: Jonas Bonn <>
Signed-off-by: Ben Hutchings <>
7 years agoshmem: fix splicing from a hole while it's punched
Hugh Dickins [Wed, 23 Jul 2014 21:00:13 +0000 (14:00 -0700)]
shmem: fix splicing from a hole while it's punched

commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream.

shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.

But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.

shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.

shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.

We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?

The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c.  It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.

Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.

Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.

But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.

Signed-off-by: Hugh Dickins <>
Reported-by: Sasha Levin <>
Suggested-by: Vlastimil Babka <>
Cc: Konstantin Khlebnikov <>
Cc: Johannes Weiner <>
Cc: Lukas Czerner <>
Cc: Dave Jones <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agoshmem: fix faulting into a hole, not taking i_mutex
Hugh Dickins [Wed, 23 Jul 2014 21:00:10 +0000 (14:00 -0700)]
shmem: fix faulting into a hole, not taking i_mutex

commit 8e205f779d1443a94b5ae81aa359cb535dd3021e upstream.

Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).

We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.

So return to the original umbrella approach, but keep away from i_mutex
this time.  We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.

This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.

This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.

Signed-off-by: Hugh Dickins <>
Reported-by: Sasha Levin <>
Tested-by: Sasha Levin <>
Cc: Vlastimil Babka <>
Cc: Konstantin Khlebnikov <>
Cc: Johannes Weiner <>
Cc: Lukas Czerner <>
Cc: Dave Jones <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
7 years agoshmem: fix faulting into a hole while it's punched
Hugh Dickins [Mon, 23 Jun 2014 20:22:06 +0000 (13:22 -0700)]
shmem: fix faulting into a hole while it's punched

commit f00cdc6df7d7cfcabb5b740911e6788cb0802bdb upstream.

Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.

It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.

Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).

[ coding-style fixes]
Signed-off-by: Hugh Dickins <>
Reported-by: Sasha Levin <>
Tested-by: Sasha Levin <>
Cc: Dave Jones <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agoxfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
Dave Chinner [Wed, 11 Jul 2012 21:40:42 +0000 (07:40 +1000)]
xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near

commit e3a746f5aab71f2dd0a83116772922fb37ae29d6 upstream.

The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.

Signed-off-by: Dave Chinner <>
Tested-by: Mike Snitzer <>
Signed-off-by: Ben Myers <>
Signed-off-by: Ben Hutchings <>
7 years agoxfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
Dave Chinner [Tue, 12 Jun 2012 04:20:26 +0000 (14:20 +1000)]
xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near

commit 76d095388b040229ea1aad7dea45be0cfa20f589 upstream.

When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.

Reported-by: Chris J Arges <>
Signed-off-by: Dave Chinner <>
Reviewed-by: Christoph Hellwig <>
Signed-off-by: Ben Myers <>
Signed-off-by: Ben Hutchings <>
7 years agonetfilter: ipt_ULOG: fix info leaks
Mathias Krause [Mon, 30 Sep 2013 20:05:08 +0000 (22:05 +0200)]
netfilter: ipt_ULOG: fix info leaks

commit 278f2b3e2af5f32ea1afe34fa12a2518153e6e49 upstream.

The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.

Signed-off-by: Mathias Krause <>
Signed-off-by: Pablo Neira Ayuso <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agos390/ptrace: fix PSW mask check
Martin Schwidefsky [Mon, 23 Jun 2014 12:43:06 +0000 (14:43 +0200)]
s390/ptrace: fix PSW mask check

commit dab6cf55f81a6e16b8147aed9a843e1691dcd318 upstream.

The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect.
For the default user_mode=home address space layout the psw_user_bits
variable has the home space address-space-control bits set. But the
PSW_MASK_USER contains PSW_MASK_ASC, the ptrace validity check for the
PSW mask will therefore always fail.

Fixes CVE-2014-3534

Signed-off-by: Martin Schwidefsky <>
Signed-off-by: Ben Hutchings <>
7 years agonohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off
Thomas Gleixner [Fri, 29 Nov 2013 11:18:13 +0000 (12:18 +0100)]
nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off

commit 0e576acbc1d9600cf2d9b4a141a2554639959d50 upstream.

If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.

If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
command line option "nohz=off" or not enabled due to missing hardware
support, then tick_nohz_get_sleep_length() returns 0. That happens
because ts->sleep_length is never set in that case.

Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.

Reported-by: Michal Hocko <>
Reported-by: Borislav Petkov <>
Signed-off-by: Thomas Gleixner <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
7 years agortnetlink: fix userspace API breakage for iproute2 < v3.9.0
Michal Schmidt [Wed, 28 May 2014 12:15:19 +0000 (14:15 +0200)]
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0

commit e5eca6d41f53db48edd8cf88a3f59d2c30227f8e upstream.

When running RHEL6 userspace on a current upstream kernel, "ip link"
fails to show VF information.

The reason is a kernel<->userspace API change introduced by commit
88c5b5ce5cb57 ("rtnetlink: Call nlmsg_parse() with correct header length"),
after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
in the netlink request.

iproute2 adjusted for the API change in its commit 63338dca4513
("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").

The problem has been noticed before:
(Subject: Re: getting VF link info seems to be broken in 3.9-rc8)

We can do better than tell those with old userspace to upgrade. We can
recognize the old iproute2 in the kernel by checking the netlink message
length. Even when including the IFLA_EXT_MASK attribute, its netlink
message is shorter than struct ifinfomsg.

With this patch "ip link" shows VF information in both old and new
iproute2 versions.

Signed-off-by: Michal Schmidt <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoipv4: fix buffer overflow in ip_options_compile()
Eric Dumazet [Mon, 21 Jul 2014 05:17:42 +0000 (07:17 +0200)]
ipv4: fix buffer overflow in ip_options_compile()

[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ]

There is a benign buffer overflow in ip_options_compile spotted by
AddressSanitizer[1] :

Its benign because we always can access one extra byte in skb->head
(because header is followed by struct skb_shared_info), and in this case
this byte is not even used.

[28504.910798] ==================================================================
[28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
[28504.913170] Read of size 1 by thread T15843:
[28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
[28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
[28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.925106] Allocated by thread T15843:
[28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
[28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
[28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
[28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
[28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
[28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
[28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
[28504.934377]  of 40-byte region [ffff880026382800ffff880026382828)
[28504.937474] Memory state around the buggy address:
[28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
[28504.939884]  ffff880026382400ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.942504]  ffff880026382600ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.943483]  ffff880026382700ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.945573]                         ^
[28504.946277]  ffff880026382900ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.094949]  ffff880026382a00ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.096114]  ffff880026382b00ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.097116]  ffff880026382c00ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.098472]  ffff880026382d00ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.099804] Legend:
[28505.100269]  f - 8 freed bytes
[28505.100884]  r - 8 redzone bytes
[28505.101649]  . - 8 allocated bytes
[28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
[28505.103637] ==================================================================


Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agodns_resolver: Null-terminate the right string
Ben Hutchings [Sun, 20 Jul 2014 23:06:48 +0000 (00:06 +0100)]
dns_resolver: Null-terminate the right string

[ Upstream commit 640d7efe4c08f06c4ae5d31b79bd8740e7f6790a ]

*_result[len] is parsed as *(_result[len]) which is not at all what we
want to touch here.

Signed-off-by: Ben Hutchings <>
Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated")
Signed-off-by: David S. Miller <>
7 years agodns_resolver: assure that dns_query() result is null-terminated
Manuel Schölling [Sat, 7 Jun 2014 21:57:25 +0000 (23:57 +0200)]
dns_resolver: assure that dns_query() result is null-terminated

[ Upstream commit 84a7c0b1db1c17d5ded8d3800228a608e1070b40 ]

dns_query() credulously assumes that keys are null-terminated and
returns a copy of a memory block that is off by one.

Signed-off-by: Manuel Schölling <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agosunvnet: clean up objects created in vnet_new() on vnet_exit()
Sowmini Varadhan [Wed, 16 Jul 2014 14:02:26 +0000 (10:02 -0400)]
sunvnet: clean up objects created in vnet_new() on vnet_exit()

[ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ]

Nothing cleans up the objects created by
vnet_new(), they are completely leaked.

vnet_exit(), after doing the vio_unregister_driver() to clean
up ports, should call a helper function that iterates over vnet_list
and cleans up those objects. This includes unregister_netdevice()
as well as free_netdev().

Signed-off-by: Sowmini Varadhan <>
Acked-by: Dave Kleikamp <>
Reviewed-by: Karl Volz <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agonet: sctp: fix information leaks in ulpevent layer
Daniel Borkmann [Sat, 12 Jul 2014 18:30:35 +0000 (20:30 +0200)]
net: sctp: fix information leaks in ulpevent layer

[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ]

While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;


  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoappletalk: Fix socket referencing in skb
Andrey Utkin [Mon, 7 Jul 2014 20:22:50 +0000 (23:22 +0300)]
appletalk: Fix socket referencing in skb

[ Upstream commit 36beddc272c111689f3042bf3d10a64d8a805f93 ]

Setting just skb->sk without taking its reference and setting a
destructor is invalid. However, in the places where this was done, skb
is used in a way not requiring skb->sk setting. So dropping the setting
of skb->sk.
Thanks to Eric Dumazet <> for correct solution.

Reported-by: Ed Martin <>
Signed-off-by: Andrey Utkin <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agoigmp: fix the problem when mc leave group
dingtianhong [Wed, 2 Jul 2014 05:50:48 +0000 (13:50 +0800)]
igmp: fix the problem when mc leave group

[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ]

The problem was triggered by these steps:

1) create socket, bind and then setsockopt for add mc group.
   mreq.imr_multiaddr.s_addr = inet_addr("");
   mreq.imr_interface.s_addr = inet_addr("");
   setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));

2) drop the mc group for this socket.
   mreq.imr_multiaddr.s_addr = inet_addr("");
   mreq.imr_interface.s_addr = inet_addr("");
   setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));

3) and then drop the socket, I found the mc group was still used by the dev:

   netstat -g

   Interface       RefCnt Group
   --------------- ------ ---------------------
   eth2    1

Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:

The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.

v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
so I add the checking for the in_dev before polling the mc_list, make sure when
we remove the mc group, dec the refcnt to the real dev which was using the mc address.
The problem would never happened again.

Signed-off-by: Ding Tianhong <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years ago8021q: fix a potential memory leak
Li RongQing [Wed, 18 Jun 2014 05:46:02 +0000 (13:46 +0800)]
8021q: fix a potential memory leak

[ Upstream commit 916c1689a09bc1ca81f2d7a34876f8d35aadd11b ]

skb_cow called in vlan_reorder_header does not free the skb when it failed,
and vlan_reorder_header returns NULL to reset original skb when it is called
in vlan_untag, lead to a memory leak.

Signed-off-by: Li RongQing <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agotcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
Neal Cardwell [Thu, 19 Jun 2014 01:15:03 +0000 (21:15 -0400)]
tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb

[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ]

If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <>
Signed-off-by: Neal Cardwell <>
Cc: Eric Dumazet <>
Cc: Yuchung Cheng <>
Cc: Ilpo Jarvinen <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
7 years agousb: Check if port status is equal to RxDetect
Gavin Guo [Thu, 17 Jul 2014 17:12:13 +0000 (01:12 +0800)]
usb: Check if port status is equal to RxDetect

commit bb86cf569bbd7ad4dce581a37c7fbd748057e9dc upstream.

When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller
[1022:7814], the second hotplugging will experience the USB 3.0 pen
drive is recognized as high-speed device. After bisecting the kernel,
I found the commit number 41e7e056cdc662f704fa9262e5c6e213b4ab45dd
(USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing
some experiments, the bug can be fixed by avoiding executing the function
hub_usb3_port_disable(). Because the port status with [AMD] FCH USB
XHCI Controlleris [1022:7814] is already in RxDetect
(I tried printing out the port status before setting to Disabled state),
it's reasonable to check the port status before really executing

Fixes: 41e7e056cdc6 (USB: Allow USB 3.0 ports to be disabled.)
Signed-off-by: Gavin Guo <>
Acked-by: Alan Stern <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: use hub device as context for dev_dbg(),
 as hub ports are not devices in their own right]
Signed-off-by: Ben Hutchings <>