4 years agoshm: add sealing API
David Herrmann [Fri, 8 Aug 2014 21:25:27 +0000 (14:25 -0700)]
shm: add sealing API

If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
  - one side cannot overwrite data while the other reads it
  - one side cannot shrink the buffer while the other accesses it
  - one side cannot grow the buffer beyond previously set boundaries

If there is a trust-relationship between both parties, there is no need
for policy enforcement.  However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies.  Look at the following two

  1) A graphics client wants to share its rendering-buffer with a
     graphics-server. The memory-region is allocated by the client for
     read/write access and a second FD is passed to the server. While
     scanning out from the memory region, the server has no guarantee that
     the client doesn't shrink the buffer at any time, requiring rather
     cumbersome SIGBUS handling.
  2) A process wants to perform an RPC on another process. To avoid huge
     bandwidth consumption, zero-copy is preferred. After a message is
     assembled in-memory and a FD is passed to the remote side, both sides
     want to be sure that neither modifies this shared copy, anymore. The
     source may have put sensible data into the message without a separate
     copy and the target may want to parse the message inline, to avoid a
     local copy.

While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.

This patch introduces the concept of SEALING.  If you seal a file, a
specific set of operations is blocked on that file forever.  Unlike locks,
seals can only be set, never removed.  Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.

An initial set of SEALS is introduced by this patch:
  - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
            in size. This affects ftruncate() and open(O_TRUNC).
  - GROW: If SEAL_GROW is set, the file in question cannot be increased
          in size. This affects ftruncate(), fallocate() and write().
  - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
           are possible. This affects fallocate(PUNCH_HOLE), mmap() and
  - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
          This basically prevents the F_ADD_SEAL operation on a file and
          can be set to prevent others from adding further seals that you
          don't want.

The described use-cases can easily use these seals to provide safe use
without any trust-relationship:

  1) The graphics server can verify that a passed file-descriptor has
     SEAL_SHRINK set. This allows safe scanout, while the client is
     allowed to increase buffer size for window-resizing on-the-fly.
     Concurrent writes are explicitly allowed.
  2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
     SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
     process can modify the data while the other side parses it.
     Furthermore, it guarantees that even with writable FDs passed to the
     peer, it cannot increase the size to hit memory-limits of the source
     process (in case the file-storage is accounted to the source).

The new API is an extension to fcntl(), adding two new commands:
  F_GET_SEALS: Return a bitset describing the seals on the file. This
               can be called on any FD if the underlying file supports
  F_ADD_SEALS: Change the seals of a given file. This requires WRITE
               access to the file and F_SEAL_SEAL may not already be set.
               Furthermore, the underlying file must support sealing and
               there may not be any existing shared mapping of that file.
               Otherwise, EBADF/EPERM is returned.
               The given seals are _added_ to the existing set of seals
               on the file. You cannot remove seals again.

The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.

Signed-off-by: David Herrmann <>
Acked-by: Hugh Dickins <>
Cc: Michael Kerrisk <>
Cc: Ryan Lortie <>
Cc: Lennart Poettering <>
Cc: Daniel Mack <>
Cc: Andy Lutomirski <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
4 years agomm: allow drivers to prevent new writable mappings
David Herrmann [Fri, 8 Aug 2014 21:25:25 +0000 (14:25 -0700)]
mm: allow drivers to prevent new writable mappings

This patch (of 6):

The i_mmap_writable field counts existing writable mappings of an
address_space.  To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.

This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set.  In case there
exists a writable mapping, this operation will fail with EBUSY.

Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers.  This is the same
that we do for i_writecount.

Signed-off-by: David Herrmann <>
Acked-by: Hugh Dickins <>
Cc: Michael Kerrisk <>
Cc: Ryan Lortie <>
Cc: Lennart Poettering <>
Cc: Daniel Mack <>
Cc: Andy Lutomirski <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
4 years agomm: mmap_region: kill correct_wcount/inode, use allow_write_access()
Oleg Nesterov [Wed, 11 Sep 2013 21:20:20 +0000 (14:20 -0700)]
mm: mmap_region: kill correct_wcount/inode, use allow_write_access()

correct_wcount and inode in mmap_region() just complicate the code.  This
boolean was needed previously, when deny_write_access() was called before
vma_merge(), now we can simply check VM_DENYWRITE and do
allow_write_access() if it is set.

allow_write_access() checks file != NULL, so this is safe even if it was
possible to use VM_DENYWRITE && !file.  Just we need to ensure we use the
same file which was deny_write_access()'ed, so the patch also moves "file
= vma->vm_file" down after allow_write_access().

Signed-off-by: Oleg Nesterov <>
Cc: Hugh Dickins <>
Cc: Al Viro <>
Cc: Colin Cross <>
Cc: David Rientjes <>
Cc: KOSAKI Motohiro <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
4 years agomm: shift VM_GROWS* check from mmap_region() to do_mmap_pgoff()
Oleg Nesterov [Wed, 11 Sep 2013 21:20:18 +0000 (14:20 -0700)]
mm: shift VM_GROWS* check from mmap_region() to do_mmap_pgoff()

mmap() doesn't allow the non-anonymous mappings with VM_GROWS* bit set.
In particular this means that mmap_region()->vma_merge(file, vm_flags)
must always fail if "vm_flags & VM_GROWS" is set incorrectly.

So it does not make sense to check VM_GROWS* after we already allocated
the new vma, the only caller, do_mmap_pgoff(), which can pass this flag
can do the check itself.

And this looks a bit more correct, mmap_region() already unmapped the
old mapping at this stage. But if mmap() is going to fail, it should
avoid do_munmap() if possible.

Note: we check VM_GROWS at the end to ensure that do_mmap_pgoff() won't
return EINVAL in the case when it currently returns another error code.

Many thanks to Hugh who nacked the buggy v1.

Signed-off-by: Oleg Nesterov <>
Acked-by: Hugh Dickins <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
4 years agommap: EINVAL not ENOMEM when rejecting VM_GROWS
Hugh Dickins [Tue, 6 Mar 2012 20:28:52 +0000 (12:28 -0800)]
mmap: EINVAL not ENOMEM when rejecting VM_GROWS

Currently error is -ENOMEM when rejecting VM_GROWSDOWN|VM_GROWSUP
from shared anonymous: hoist the file case's -EINVAL up for both.

Signed-off-by: Hugh Dickins <>
Signed-off-by: Linus Torvalds <>
4 years agomm: add file_inode for easier backporting
Grazvydas Ignotas [Sat, 16 Sep 2017 21:28:35 +0000 (00:28 +0300)]
mm: add file_inode for easier backporting

4 years agomake get_file() return its argument
Al Viro [Mon, 27 Aug 2012 18:48:26 +0000 (14:48 -0400)]
make get_file() return its argument

simplifies a bunch of callers...

Signed-off-by: Al Viro <>
4 years agoautofs4: catatonic_mode vs. notify_daemon race
Al Viro [Wed, 11 Jan 2012 03:24:48 +0000 (22:24 -0500)]
autofs4: catatonic_mode vs. notify_daemon race

we need to hold ->wq_mutex while we are forming the packet to send,
lest we have autofs4_catatonic_mode() setting wq-> to NULL
just as autofs4_notify_daemon() decides to memcpy() from it...

We do have check for catatonic mode immediately after that (under
->wq_mutex, as it ought to be) and packet won't be actually sent,
but it'll be too late for us if we oops on that memcpy() from NULL...

Fix is obvious - just extend the area covered by ->wq_mutex over
that switch and check whether it's catatonic *before* doing anything

Acked-by: Ian Kent <>
Signed-off-by: Al Viro <>
4 years agoprocfs: introduce the /proc/<pid>/map_files/ directory
Pavel Emelyanov [Tue, 10 Jan 2012 23:11:23 +0000 (15:11 -0800)]
procfs: introduce the /proc/<pid>/map_files/ directory

This one behaves similarly to the /proc/<pid>/fd/ one - it contains
symlinks one for each mapping with file, the name of a symlink is
"vma->vm_start-vma->vm_end", the target is the file.  Opening a symlink
results in a file that point exactly to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/<pid>/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped
   by particular region.  We do this by opening
   /proc/$pid/map_files/$address symlink the way we do with file

2. This also helps in determining which anonymous shared mappings are
   shared with each other by comparing the inodes of them.

3. When restoring a set of processes in case two of them has a mapping
   shared, we map the memory by the 1st one and then open its
   /proc/$pid/map_files/$address file and map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings
repeatable re-reading and reparsing for this text file which slows down
restore procedure significantly.  Also as being pointed in (3) it is a way
easier to use top level shared mapping in children as
/proc/$pid/map_files/$address when needed.

[ coding-style fixes]
[ make map_files depend on CHECKPOINT_RESTORE]
Signed-off-by: Pavel Emelyanov <>
Signed-off-by: Cyrill Gorcunov <>
Reviewed-by: Vasiliy Kulikov <>
Reviewed-by: "Kirill A. Shutemov" <>
Cc: Tejun Heo <>
Cc: Alexey Dobriyan <>
Cc: Al Viro <>
Cc: Pavel Machek <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
4 years agoprocfs: make proc_get_link to use dentry instead of inode
Cyrill Gorcunov [Tue, 10 Jan 2012 23:11:20 +0000 (15:11 -0800)]
procfs: make proc_get_link to use dentry instead of inode

Prepare the ground for the next "map_files" patch which needs a name of a
link file to analyse.

Signed-off-by: Cyrill Gorcunov <>
Cc: Pavel Emelyanov <>
Cc: Tejun Heo <>
Cc: Vasiliy Kulikov <>
Cc: "Kirill A. Shutemov" <>
Cc: Alexey Dobriyan <>
Cc: Al Viro <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
4 years agoMerge branch 'stable-3.2' into pandora-3.2
Grazvydas Ignotas [Sat, 16 Sep 2017 19:23:33 +0000 (22:23 +0300)]
Merge branch 'stable-3.2' into pandora-3.2

4 years agoLinux 3.2.93 v3.2.93
Ben Hutchings [Fri, 15 Sep 2017 17:30:58 +0000 (18:30 +0100)]
Linux 3.2.93

4 years agonet: phy: marvell: Limit errata to 88m1101
Andrew Lunn [Tue, 23 May 2017 15:49:13 +0000 (17:49 +0200)]
net: phy: marvell: Limit errata to 88m1101

commit f2899788353c13891412b273fdff5f02d49aa40f upstream.

The 88m1101 has an errata when configuring autoneg. However, it was
being applied to many other Marvell PHYs as well. Limit its scope to
just the 88m1101.

Fixes: 76884679c644 ("phylib: Add support for Marvell 88e1111S and 88e1145")
Reported-by: Daniel Walker <>
Signed-off-by: Andrew Lunn <>
Acked-by: Harini Katakam <>
Reviewed-by: Florian Fainelli <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoSanitize 'move_pages()' permission checks
Linus Torvalds [Sun, 20 Aug 2017 20:26:27 +0000 (13:26 -0700)]
Sanitize 'move_pages()' permission checks

commit 197e7e521384a23b9e585178f3f11c9fa08274b9 upstream.

The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).

That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.

So change the access checks to the more common 'ptrace_may_access()'
model instead.

This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.

Famous last words.

Reported-by: Otto Ebeling <>
Acked-by: Eric W. Biederman <>
Cc: Willy Tarreau <>
Signed-off-by: Linus Torvalds <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agomm: fix NULL ptr dereference in move_pages
Sasha Levin [Wed, 25 Apr 2012 23:01:53 +0000 (16:01 -0700)]
mm: fix NULL ptr dereference in move_pages

commit 6e8b09eaf268bceac0c62e389b4bc0cb83dfb8e5 upstream.

Commit 3268c63 ("mm: fix move/migrate_pages() race on task struct") has
added an odd construct where 'mm' is checked for being NULL, and if it is,
it would get dereferenced anyways by mput()ing it.

Signed-off-by: Sasha Levin <>
Cc: Dave Hansen <>
Cc: Mel Gorman <>
Cc: Johannes Weiner <>
Cc: KOSAKI Motohiro <>
Cc: KAMEZAWA Hiroyuki <>
Cc: Hugh Dickins <>
Acked-by: Christoph Lameter <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
4 years agomm: fix NULL ptr dereference in migrate_pages
Sasha Levin [Wed, 25 Apr 2012 23:01:52 +0000 (16:01 -0700)]
mm: fix NULL ptr dereference in migrate_pages

commit f2a9ef880763d7fbd657a3af646e132a90d70d34 upstream.

Commit 3268c63 ("mm: fix move/migrate_pages() race on task struct") has
added an odd construct where 'mm' is checked for being NULL, and if it is,
it would get dereferenced anyways by mput()ing it.

This would lead to the following NULL ptr deref and BUG() when calling
migrate_pages() with a pid that has no mm struct:

[25904.193704] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[25904.194235] IP: [<ffffffff810b0de7>] mmput+0x27/0xf0
[25904.194235] PGD 773e6067 PUD 77da0067 PMD 0
[25904.194235] Oops: 0002 [#1] PREEMPT SMP
[25904.194235] CPU 2
[25904.194235] Pid: 31608, comm: trinity Tainted: G        W    3.4.0-rc2-next-20120412-sasha #69
[25904.194235] RIP: 0010:[<ffffffff810b0de7>]  [<ffffffff810b0de7>] mmput+0x27/0xf0
[25904.194235] RSP: 0018:ffff880077d49e08  EFLAGS: 00010202
[25904.194235] RAX: 0000000000000286 RBX: 0000000000000000 RCX: 0000000000000000
[25904.194235] RDX: ffff880075ef8000 RSI: 000000000000023d RDI: 0000000000000286
[25904.194235] RBP: ffff880077d49e18 R08: 0000000000000001 R09: 0000000000000001
[25904.194235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[25904.194235] R13: 00000000ffffffea R14: ffff880034287740 R15: ffff8800218d3010
[25904.194235] FS:  00007fc8b244c700(0000) GS:ffff880029800000(0000) knlGS:0000000000000000
[25904.194235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25904.194235] CR2: 0000000000000050 CR3: 00000000767c6000 CR4: 00000000000406e0
[25904.194235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25904.194235] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[25904.194235] Process trinity (pid: 31608, threadinfo ffff880077d48000, task ffff880075ef8000)
[25904.194235] Stack:
[25904.194235]  ffff8800342876c0 0000000000000000 ffff880077d49f78 ffffffff811b8020
[25904.194235]  ffffffff811b7d91 ffff880075ef8000 ffff88002256d200 0000000000000000
[25904.194235]  00000000000003ff 0000000000000000 0000000000000000 0000000000000000
[25904.194235] Call Trace:
[25904.194235]  [<ffffffff811b8020>] sys_migrate_pages+0x340/0x3a0
[25904.194235]  [<ffffffff811b7d91>] ? sys_migrate_pages+0xb1/0x3a0
[25904.194235]  [<ffffffff8266cbb9>] system_call_fastpath+0x16/0x1b
[25904.194235] Code: c9 c3 66 90 55 31 d2 48 89 e5 be 3d 02 00 00 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 89 fb 48 c7 c7 cf 0e e1 82 e8 69 18 03 00 <f0> ff 4b 50 0f 94 c0 84 c0 0f 84 aa 00 00 00 48 89 df e8 72 f1
[25904.194235] RIP  [<ffffffff810b0de7>] mmput+0x27/0xf0
[25904.194235]  RSP <ffff880077d49e08>
[25904.194235] CR2: 0000000000000050
[25904.348999] ---[ end trace a307b3ed40206b4b ]---

Signed-off-by: Sasha Levin <>
Cc: Dave Hansen <>
Cc: Mel Gorman <>
Cc: Johannes Weiner <>
Cc: KOSAKI Motohiro <>
Cc: KAMEZAWA Hiroyuki <>
Cc: Hugh Dickins <>
Cc: Christoph Lameter <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
4 years agomm: fix move/migrate_pages() race on task struct
Christoph Lameter [Wed, 21 Mar 2012 23:34:06 +0000 (16:34 -0700)]
mm: fix move/migrate_pages() race on task struct

commit 3268c63eded4612a3d07b56d1e02ce7731e6608e upstream.

Migration functions perform the rcu_read_unlock too early.  As a result
the task pointed to may change from under us.  This can result in an oops,
as reported by Dave Hansen in

The following patch extend the period of the rcu_read_lock until after the
permissions checks are done.  We also take a refcount so that the task
reference is stable when calling security check functions and performing
cpuset node validation (which takes a mutex).

The refcount is dropped before actual page migration occurs so there is no
change to the refcounts held during page migration.

Also move the determination of the mm of the task struct to immediately
before the do_migrate*() calls so that it is clear that we switch from
handling the task during permission checks to the mm for the actual
migration.  Since the determination is only done once and we then no
longer use the task_struct we can be sure that we operate on a specific
address space that will not change from under us.

[ checkpatch fixes]
Signed-off-by: Christoph Lameter <>
Cc: "Eric W. Biederman" <>
Reported-by: Dave Hansen <>
Cc: Mel Gorman <>
Cc: Johannes Weiner <>
Cc: KOSAKI Motohiro <>
Cc: KAMEZAWA Hiroyuki <>
Cc: Hugh Dickins <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
4 years agoptrace: use fsuid, fsgid, effective creds for fs access checks
Jann Horn [Wed, 20 Jan 2016 23:00:04 +0000 (15:00 -0800)]
ptrace: use fsuid, fsgid, effective creds for fs access checks

commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[ fix warning]
Signed-off-by: Jann Horn <>
Acked-by: Kees Cook <>
Cc: Casey Schaufler <>
Cc: Oleg Nesterov <>
Cc: Ingo Molnar <>
Cc: James Morris <>
Cc: "Serge E. Hallyn" <>
Cc: Andy Shevchenko <>
Cc: Andy Lutomirski <>
Cc: Al Viro <>
Cc: "Eric W. Biederman" <>
Cc: Willy Tarreau <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2:
 - Drop changes to kcmp, procfs map_files, procfs has_pid_permissions()
 - Keep using uid_t, gid_t and == operator for IDs
 - Adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoxen: fix bio vec merging
Roger Pau Monne [Tue, 18 Jul 2017 14:01:00 +0000 (15:01 +0100)]
xen: fix bio vec merging

commit 462cdace790ac2ed6aad1b19c9c0af0143b6aab0 upstream.

The current test for bio vec merging is not fully accurate and can be
tricked into merging bios when certain grant combinations are used.
The result of these malicious bio merges is a bio that extends past
the memory page used by any of the originating bios.

Take into account the following scenario, where a guest creates two
grant references that point to the same mfn, ie: grant 1 -> mfn A,
grant 2 -> mfn A.

These references are then used in a PV block request, and mapped by
the backend domain, thus obtaining two different pfns that point to
the same mfn, pfn B -> mfn A, pfn C -> mfn A.

If those grants happen to be used in two consecutive sectors of a disk
IO operation becoming two different bios in the backend domain, the
checks in xen_biovec_phys_mergeable will succeed, because bfn1 == bfn2
(they both point to the same mfn). However due to the bio merging,
the backend domain will end up with a bio that expands past mfn A into
mfn A + 1.

Fix this by making sure the check in xen_biovec_phys_mergeable takes
into account the offset and the length of the bio, this basically
replicates whats done in __BIOVEC_PHYS_MERGEABLE using mfns (bus
addresses). While there also remove the usage of
__BIOVEC_PHYS_MERGEABLE, since that's already checked by the callers
of xen_biovec_phys_mergeable.

Reported-by: "Jan H. Schönherr" <>
Signed-off-by: Roger Pau Monné <>
Reviewed-by: Juergen Gross <>
Signed-off-by: Konrad Rzeszutek Wilk <>
[bwh: Backported to 3.2:
 - s/bfn/mfn/g
 - Adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoxfrm: policy: check policy direction value
Vladis Dronov [Wed, 2 Aug 2017 17:50:14 +0000 (19:50 +0200)]
xfrm: policy: check policy direction value

commit 7bab09631c2a303f87a7eb7e3d69e888673b9b7e upstream.

The 'dir' parameter in xfrm_migrate() is a user-controlled byte which is used
as an array index. This can lead to an out-of-bound access, kernel lockup and
DoS. Add a check for the 'dir' value.

This fixes CVE-2017-11600.

Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Reported-by: "bo Zhang" <>
Signed-off-by: Vladis Dronov <>
Signed-off-by: Steffen Klassert <>
Signed-off-by: Ben Hutchings <>
4 years agotcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
Wei Wang [Thu, 18 May 2017 18:22:33 +0000 (11:22 -0700)]
tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

commit 499350a5a6e7512d9ed369ed63a4244b6536f4f8 upstream.

When tcp_disconnect() is called, inet_csk_delack_init() sets
icsk->icsk_ack.rcv_mss to 0.
This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
__tcp_select_window() call path to have division by 0 issue.
So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.

Reported-by: Andrey Konovalov <>
Signed-off-by: Wei Wang <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: Neal Cardwell <>
Signed-off-by: Yuchung Cheng <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
4 years agotracing/kprobes: Allow to create probe with a module name starting with a digit
Sabrina Dubroca [Thu, 22 Jun 2017 09:24:42 +0000 (11:24 +0200)]
tracing/kprobes: Allow to create probe with a module name starting with a digit

commit 9e52b32567126fe146f198971364f68d3bc5233f upstream.

Always try to parse an address, since kstrtoul() will safely fail when
given a symbol as input. If that fails (which will be the case for a
symbol), try to parse a symbol instead.

This allows creating a probe such as:

    p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0

Which is necessary for this command to work:

    perf probe -m 8021q -a vlan_gro_receive

Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer")
Acked-by: Masami Hiramatsu <>
Signed-off-by: Sabrina Dubroca <>
Signed-off-by: Steven Rostedt (VMware) <>
[bwh: Backported to 3.2: preserve the check that an addresses isn't used for
 a kretprobe]
Signed-off-by: Ben Hutchings <>
4 years agoMIPS: Fix IRQ tracing & lockdep when rescheduling
Paul Burton [Fri, 3 Mar 2017 23:26:05 +0000 (15:26 -0800)]
MIPS: Fix IRQ tracing & lockdep when rescheduling

commit d8550860d910c6b7b70f830f59003b33daaa52c9 upstream.

When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
from arch/mips/kernel/entry.S we disable interrupts. This is true
regardless of whether we reach work_resched from syscall_exit_work,
resume_userspace or by looping after calling schedule(). Although we
disable interrupts in these paths we don't call trace_hardirqs_off()
before calling into C code which may acquire locks, and we therefore
leave lockdep with an inconsistent view of whether interrupts are
both enabled.

Without tracing this interrupt state lockdep will print warnings such
as the following once a task returns from a syscall via
syscall_exit_partial with TIF_NEED_RESCHED set:

[   49.927678] ------------[ cut here ]------------
[   49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
[   49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
[   49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
[   49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
[   49.974431]         0000000000000000 0000000000000000 0000000000000000 000000000000004a
[   49.985300]         ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
[   49.996194]         0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
[   50.007063]         000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
[   50.017945]         0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
[   50.028827]         0000000000000000 0000000000000001 0000000000000000 0000000000000000
[   50.039688]         0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
[   50.050575]         00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
[   50.061448]         0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
[   50.072327]         ...
[   50.076087] Call Trace:
[   50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
[   50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
[   50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
[   50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
[   50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
[   50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
[   50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
[   50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
[   50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
[   50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
[   50.148405] possible reason: unannotated irqs-off.
[   50.154600] irq event stamp: 400463
[   50.159566] hardirqs last  enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
[   50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
[   50.183897] softirqs last  enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
[   50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128

Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
schedule() following the work_resched label because:

 1) Interrupts are disabled regardless of the path we take to reach
    work_resched() & schedule().

 2) Performing the tracing here avoids the need to do it in paths which
    disable interrupts but don't call out to C code before hitting a
    path which uses the RESTORE_SOME macro that will call
    trace_hardirqs_on() or trace_hardirqs_off() as appropriate.

We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
syscall_trace_leave() for similar reasons, ensuring that lockdep has a
consistent view of state after we re-enable interrupts.

Signed-off-by: Paul Burton <>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ralf Baechle <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agonet: prevent sign extension in dev_get_stats()
Eric Dumazet [Tue, 27 Jun 2017 14:02:20 +0000 (07:02 -0700)]
net: prevent sign extension in dev_get_stats()

commit 6f64ec74515925cced6df4571638b5a099a49aae upstream.

Similar to the fix provided by Dominik Heidler in commit
9b3dc0a17d73 ("l2tp: cast l2tp traffic counter to unsigned")
we need to take care of 32bit kernels in dev_get_stats().

When using atomic_long_read(), we add a 'long' to u64 and
might misinterpret high order bit, unless we cast to unsigned.

Fixes: caf586e5f23ce ("net: add a core netdev->rx_dropped counter")
Fixes: 015f0688f57ca ("net: net: add a core netdev->tx_dropped counter")
Fixes: 6e7333d315a76 ("net: add rx_nohandler stat counter")
Signed-off-by: Eric Dumazet <>
Cc: Jarod Wilson <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: only rx_dropped is updated here]
Signed-off-by: Ben Hutchings <>
4 years agolib/cmdline.c: fix get_options() overflow while parsing ranges
Ilya Matveychikov [Fri, 23 Jun 2017 22:08:49 +0000 (15:08 -0700)]
lib/cmdline.c: fix get_options() overflow while parsing ranges

commit a91e0f680bcd9e10c253ae8b62462a38bd48f09f upstream.

When using get_options() it's possible to specify a range of numbers,
like 1-100500.  The problem is that it doesn't track array size while
calling internally to get_range() which iterates over the range and
fills the memory with numbers.

Signed-off-by: Ilya V. Matveychikov <>
Cc: Jonathan Corbet <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
4 years agoautofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
NeilBrown [Fri, 23 Jun 2017 22:08:43 +0000 (15:08 -0700)]
autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL

commit 9fa4eb8e490a28de40964b1b0e583d8db4c7e57c upstream.

If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
autofs4_d_automount() will return


with that status to follow_automount(), which will then dereference an
invalid pointer.

So treat a positive status the same as zero, and map to ENOENT.

See comment in systemd src/core/automount.c::automount_send_ready().

Signed-off-by: NeilBrown <>
Cc: Ian Kent <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
Signed-off-by: Ben Hutchings <>
4 years agopowerpc/64: Initialise thread_info for emergency stacks
Nicholas Piggin [Wed, 21 Jun 2017 05:58:29 +0000 (15:58 +1000)]
powerpc/64: Initialise thread_info for emergency stacks

commit 34f19ff1b5a0d11e46df479623d6936460105c9f upstream.

Emergency stacks have their thread_info mostly uninitialised, which in
particular means garbage preempt_count values.

Emergency stack code runs with interrupts disabled entirely, and is
used very rarely, so this has been unnoticed so far. It was found by a
proposed new powerpc watchdog that takes a soft-NMI directly from the
masked_interrupt handler and using the emergency stack. That crashed
at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be

To fix this, zero the entire THREAD_SIZE allocation, and initialize
the thread_info.

Reported-by: Abdul Haleem <>
Signed-off-by: Nicholas Piggin <>
[mpe: Move it all into setup_64.c, use a function not a macro. Fix
      crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
Signed-off-by: Michael Ellerman <>
[bwh: Backported to 3.2:
 - There's only one emergency stack
 - No need to call klp_init_thread_info()
 - Add the ti variable in emergency_stack_init()]
Signed-off-by: Ben Hutchings <>
4 years agoipv6: avoid unregistering inet6_dev for loopback
WANG Cong [Wed, 21 Jun 2017 21:34:58 +0000 (14:34 -0700)]
ipv6: avoid unregistering inet6_dev for loopback

commit 60abc0be96e00ca71bac083215ac91ad2e575096 upstream.

The per netns loopback_dev->ip6_ptr is unregistered and set to
NULL when its mtu is set to smaller than IPV6_MIN_MTU, this
leads to that we could set rt->rt6i_idev NULL after a
rt6_uncached_list_flush_dev() and then crash after another

In this case we should just bring its inet6_dev down, rather
than unregistering it, at least prior to commit 176c39af29bc
("netns: fix addrconf_ifdown kernel panic") we always
override the case for loopback.

Thanks a lot to Andrey for finding a reliable reproducer.

Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic")
Reported-by: Andrey Konovalov <>
Cc: Andrey Konovalov <>
Cc: Daniel Lezcano <>
Cc: David Ahern <>
Signed-off-by: Cong Wang <>
Acked-by: David Ahern <>
Tested-by: Andrey Konovalov <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: the NETDEV_CHANGEMTU case used to fall-through to the
 NETDEV_DOWN case here, so replace that with a separate call to addrconf_ifdown()]
Signed-off-by: Ben Hutchings <>
4 years agortnetlink: add IFLA_GROUP to ifla_policy
Serhey Popovych [Tue, 20 Jun 2017 11:35:23 +0000 (14:35 +0300)]
rtnetlink: add IFLA_GROUP to ifla_policy

commit db833d40ad3263b2ee3b59a1ba168bb3cfed8137 upstream.

Network interface groups support added while ago, however
there is no IFLA_GROUP attribute description in policy
and netlink message size calculations until now.

Add IFLA_GROUP attribute to the policy.

Fixes: cbda10fa97d7 ("net_device: add support for network device groups")
Signed-off-by: Serhey Popovych <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agodrm/radeon: add a quirk for Toshiba Satellite L20-183
Alex Deucher [Mon, 19 Jun 2017 19:59:58 +0000 (15:59 -0400)]
drm/radeon: add a quirk for Toshiba Satellite L20-183

commit acfd6ee4fa7ebeee75511825fe02be3f7ac1d668 upstream.

Fixes resume from suspend.

Reported-by: Przemek <>
Signed-off-by: Alex Deucher <>
Signed-off-by: Ben Hutchings <>
4 years agoInput: i8042 - add Fujitsu Lifebook AH544 to notimeout list
Daniel Drake [Tue, 20 Jun 2017 02:48:52 +0000 (19:48 -0700)]
Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list

commit 817ae460c784f32cd45e60b2b1b21378c3c6a847 upstream.

Without this quirk, the touchpad is not responsive on this product, with
the following message repeated in the logs:

 psmouse serio1: bad data from KBC - timeout

Add it to the notimeout list alongside other similar Fujitsu laptops.

Signed-off-by: Daniel Drake <>
Signed-off-by: Dmitry Torokhov <>
Signed-off-by: Ben Hutchings <>
4 years agosignal: Only reschedule timers on signals timers have sent
Eric W. Biederman [Tue, 13 Jun 2017 09:31:16 +0000 (04:31 -0500)]
signal: Only reschedule timers on signals timers have sent

commit 57db7e4a2d92c2d3dfbca4ef8057849b2682436b upstream.

Thomas Gleixner  wrote:
> The CRIU support added a 'feature' which allows a user space task to send
> arbitrary (kernel) signals to itself. The changelog says:
>   The kernel prevents sending of siginfo with positive si_code, because
>   these codes are reserved for kernel.  I think we can allow a task to
>   send such a siginfo to itself.  This operation should not be dangerous.
> Quite contrary to that claim, it turns out that it is outright dangerous
> for signals with info->si_code == SI_TIMER. The following code sequence in
> a user space task allows to crash the kernel:
>    id = timer_create(CLOCK_XXX, ..... signo = SIGX);
>    timer_set(id, ....);
>    info->si_signo = SIGX;
>    info->si_code = SI_TIMER:
>    info->_sifields._timer._tid = id;
>    info->_sifields._timer._sys_private = 2;
>    rt_[tg]sigqueueinfo(..., SIGX, info);
>    sigemptyset(&sigset);
>    sigaddset(&sigset, SIGX);
>    rt_sigtimedwait(sigset, info);
> results in a kernel crash because sigwait() dequeues the signal and the
> dequeue code observes:
>   info->si_code == SI_TIMER && info->_sifields._timer._sys_private != 0
> which triggers the following callchain:
>  do_schedule_next_timer() -> posix_cpu_timer_schedule() -> arm_timer()
> arm_timer() executes a list_add() on the timer, which is already armed via
> the timer_set() syscall. That's a double list add which corrupts the posix
> cpu timer list. As a consequence the kernel crashes on the next operation
> touching the posix cpu timer list.
> Posix clocks which are internally implemented based on hrtimers are not
> affected by this because hrtimer_start() can handle already armed timers
> nicely, but it's a reliable way to trigger the WARN_ON() in
> hrtimer_forward(), which complains about calling that function on an
> already armed timer.

This problem has existed since the posix timer code was merged into
2.5.63. A few releases earlier in 2.5.60 ptrace gained the ability to
inject not just a signal (which linux has supported since 1.0) but the
full siginfo of a signal.

The core problem is that the code will reschedule in response to
signals getting dequeued not just for signals the timers sent but
for other signals that happen to a si_code of SI_TIMER.

Avoid this confusion by testing to see if the queued signal was
preallocated as all timer signals are preallocated, and so far
only the timer code preallocates signals.

Move the check for if a timer needs to be rescheduled up into
collect_signal where the preallocation check must be performed,
and pass the result back to dequeue_signal where the code reschedules
timers.   This makes it clear why the code cares about preallocated

Reported-by: Thomas Gleixner <>
History Tree:
Reference: 66dd34ad31e5 ("signal: allow to send any siginfo to itself")
Reference: 1669ce53e2ff ("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO")
Fixes: db8b50ba75f2 ("[PATCH] POSIX clocks & timers")
Signed-off-by: "Eric W. Biederman" <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoswap: cond_resched in swap_cgroup_prepare()
Yu Zhao [Fri, 16 Jun 2017 21:02:31 +0000 (14:02 -0700)]
swap: cond_resched in swap_cgroup_prepare()

commit ef70762948dde012146926720b70e79736336764 upstream.

I saw need_resched() warnings when swapping on large swapfile (TBs)
because continuously allocating many pages in swap_cgroup_prepare() took
too long.

We already cond_resched when freeing page in swap_cgroup_swapoff().  Do
the same for the page allocation.

Signed-off-by: Yu Zhao <>
Acked-by: Michal Hocko <>
Acked-by: Vladimir Davydov <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
4 years agopowerpc/kprobes: Pause function_graph tracing during jprobes handling
Naveen N. Rao [Thu, 1 Jun 2017 10:48:15 +0000 (16:18 +0530)]
powerpc/kprobes: Pause function_graph tracing during jprobes handling

commit a9f8553e935f26cb5447f67e280946b0923cd2dc upstream.

This fixes a crash when function_graph and jprobes are used together.
This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix
conflict between jprobes and function graph tracing"), but for powerpc.

Jprobes breaks function_graph tracing since the jprobe hook needs to use
jprobe_return(), which never returns back to the hook, but instead to
the original jprobe'd function. The solution is to momentarily pause
function_graph tracing before invoking the jprobe hook and re-enable it
when returning back to the original jprobe'd function.

Fixes: 6794c78243bf ("powerpc64: port of the function graph tracer")
Signed-off-by: Naveen N. Rao <>
Acked-by: Masami Hiramatsu <>
Acked-by: Steven Rostedt (VMware) <>
Signed-off-by: Michael Ellerman <>
[bwh: Backported to 3.2: include <linux/ftrace.h>, which apparently gets
 included indirectly upstream]
Signed-off-by: Ben Hutchings <>
4 years agoxfrm: NULL dereference on allocation failure
Dan Carpenter [Wed, 14 Jun 2017 10:35:37 +0000 (13:35 +0300)]
xfrm: NULL dereference on allocation failure

commit e747f64336fc15e1c823344942923195b800aa1e upstream.

The default error code in pfkey_msg2xfrm_state() is -ENOBUFS.  We
added a new call to security_xfrm_state_alloc() which sets "err" to zero
so there several places where we can return ERR_PTR(0) if kmalloc()
fails.  The caller is expecting error pointers so it leads to a NULL

Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
Signed-off-by: Dan Carpenter <>
Signed-off-by: Steffen Klassert <>
Signed-off-by: Ben Hutchings <>
4 years agoxfrm: Oops on error in pfkey_msg2xfrm_state()
Dan Carpenter [Wed, 14 Jun 2017 10:34:05 +0000 (13:34 +0300)]
xfrm: Oops on error in pfkey_msg2xfrm_state()

commit 1e3d0c2c70cd3edb5deed186c5f5c75f2b84a633 upstream.

There are some missing error codes here so we accidentally return NULL
instead of an error pointer.  It results in a NULL pointer dereference.

Fixes: df71837d5024 ("[LSM-IPSec]: Security association restriction.")
Signed-off-by: Dan Carpenter <>
Signed-off-by: Steffen Klassert <>
Signed-off-by: Ben Hutchings <>
4 years agoselinux: fix double free in selinux_parse_opts_str()
Paul Moore [Wed, 7 Jun 2017 20:48:19 +0000 (16:48 -0400)]
selinux: fix double free in selinux_parse_opts_str()

commit 023f108dcc187e34ef864bf10ed966cf25e14e2a upstream.

This patch is based on a discussion generated by an earlier patch
from Tetsuo Handa:


The double free problem involves the mnt_opts field of the
security_mnt_opts struct, selinux_parse_opts_str() frees the memory
on error, but doesn't set the field to NULL so if the caller later
attempts to call security_free_mnt_opts() we trigger the problem.

In order to play it safe we change selinux_parse_opts_str() to call
security_free_mnt_opts() on error instead of free'ing the memory
directly.  This should ensure that everything is handled correctly,
regardless of what the caller may do.

Fixes: e0007529893c1c06 ("LSM/SELinux: Interfaces to allow FS to control mount options")
Cc: Tetsuo Handa <>
Reported-by: Dmitry Vyukov <>
Signed-off-by: Paul Moore <>
Signed-off-by: James Morris <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agousb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
Corentin Labbe [Fri, 9 Jun 2017 11:48:41 +0000 (14:48 +0300)]
usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk

commit d2f48f05cd2a2a0a708fbfa45f1a00a87660d937 upstream.

When plugging an USB webcam I see the following message:
[106385.615559] xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
[106390.583860] handle_tx_event: 913 callbacks suppressed

With this patch applied, I get no more printing of this message.

Signed-off-by: Corentin Labbe <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoconfigfs: Fix race between create_link and configfs_rmdir
Nicholas Bellinger [Thu, 8 Jun 2017 04:51:54 +0000 (04:51 +0000)]
configfs: Fix race between create_link and configfs_rmdir

commit ba80aa909c99802c428682c352b0ee0baac0acd3 upstream.

This patch closes a long standing race in configfs between
the creation of a new symlink in create_link(), while the
symlink target's config_item is being concurrently removed
via configfs_rmdir().

This can happen because the symlink target's reference
is obtained by config_item_get() in create_link() before
the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep()
during configfs_rmdir() shutdown is actually checked..

This originally manifested itself on ppc64 on v4.8.y under
heavy load using ibmvscsi target ports with Novalink API:

[ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
[ 7879.893760] ------------[ cut here ]------------
[ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
[ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O 4.8.17-customv2.22 #12
[ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
[ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
[ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G O     (4.8.17-customv2.22)
[ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
[ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
                GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
                GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
                GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
                GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
                GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
                GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
                GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
                GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
[ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
[ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893842] Call Trace:
[ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
[ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
[ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
[ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
[ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
[ 7879.893856] Instruction dump:
[ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
[ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe000004bffffd0 60000000 60000000
[ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---

To close this race, go ahead and obtain the symlink's target
config_item reference only after the existing CONFIGFS_USET_DROPPING
check succeeds.

This way, if configfs_rmdir() wins create_link() will return -ENONET,
and if create_link() wins configfs_rmdir() will return -EBUSY.

Reported-by: Bryant G. Ly <>
Tested-by: Bryant G. Ly <>
Signed-off-by: Nicholas Bellinger <>
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Ben Hutchings <>
4 years agoKVM: async_pf: avoid async pf injection when in guest mode
Wanpeng Li [Fri, 9 Jun 2017 03:13:40 +0000 (20:13 -0700)]
KVM: async_pf: avoid async pf injection when in guest mode

commit 9bc1f09f6fa76fdf31eb7d6a4a4df43574725f93 upstream.

 INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  ? do_async_page_fault+0x77/0xb0

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patch fixes the hang by doing async pf when executing L1 guest.

Cc: Paolo Bonzini <>
Cc: Radim Krčmář <>
Signed-off-by: Wanpeng Li <>
Signed-off-by: Paolo Bonzini <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoexcessive checks in ufs_write_failed() and ufs_evict_inode()
Al Viro [Fri, 9 Jun 2017 20:20:34 +0000 (16:20 -0400)]
excessive checks in ufs_write_failed() and ufs_evict_inode()

commit babef37dccbaa49249a22bae9150686815d7be71 upstream.

As it is, short copy in write() to append-only file will fail
to truncate the excessive allocated blocks.  As the matter of
fact, all checks in ufs_truncate_blocks() are either redundant
or wrong for that caller.  As for the only other caller
(ufs_evict_inode()), we only need the file type checks there.

Signed-off-by: Al Viro <>
[bwh: Backported to 3.2:
 - No functions need to be renamed
 - Adjust filenames, context]
Signed-off-by: Ben Hutchings <>
4 years agoufs: set correct ->s_maxsize
Al Viro [Fri, 9 Jun 2017 01:15:45 +0000 (21:15 -0400)]
ufs: set correct ->s_maxsize

commit 6b0d144fa758869bdd652c50aa41aaf601232550 upstream.

Signed-off-by: Al Viro <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agofix ufs_isblockset()
Al Viro [Thu, 8 Jun 2017 22:15:18 +0000 (18:15 -0400)]
fix ufs_isblockset()

commit 414cf7186dbec29bd946c138d6b5c09da5955a08 upstream.

Signed-off-by: Al Viro <>
Signed-off-by: Ben Hutchings <>
4 years agoKEYS: fix dereferencing NULL payload with nonzero length
Eric Biggers [Thu, 8 Jun 2017 13:48:40 +0000 (14:48 +0100)]
KEYS: fix dereferencing NULL payload with nonzero length

commit 5649645d725c73df4302428ee4e02c869248b4c5 upstream.

sys_add_key() and the KEYCTL_UPDATE operation of sys_keyctl() allowed a
NULL payload with nonzero length to be passed to the key type's
->preparse(), ->instantiate(), and/or ->update() methods.  Various key
types including asymmetric, cifs.idmap, cifs.spnego, and pkcs7_test did
not handle this case, allowing an unprivileged user to trivially cause a
NULL pointer dereference (kernel oops) if one of these key types was
present.  Fix it by doing the copy_from_user() when 'plen' is nonzero
rather than when '_payload' is non-NULL, causing the syscall to fail
with EFAULT as expected when an invalid buffer is specified.

Signed-off-by: Eric Biggers <>
Signed-off-by: David Howells <>
Signed-off-by: James Morris <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoMIPS: kprobes: flush_insn_slot should flush only if probe initialised
Marcin Nowakowski [Thu, 8 Jun 2017 13:20:32 +0000 (15:20 +0200)]
MIPS: kprobes: flush_insn_slot should flush only if probe initialised

commit 698b851073ddf5a894910d63ca04605e0473414e upstream.

When ftrace is used with kprobes, it is possible for a kprobe to contain
an invalid location (ie. only initialised to 0 and not to a specific
location in the code). Trying to perform a cache flush on such location
leads to a crash r4k_flush_icache_range().

Fixes: c1bf207d6ee1 ("MIPS: kprobe: Add support.")
Signed-off-by: Marcin Nowakowski <>
Signed-off-by: Ralf Baechle <>
Signed-off-by: Ben Hutchings <>
4 years agoKVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
Wanpeng Li [Thu, 8 Jun 2017 08:22:07 +0000 (01:22 -0700)]
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation

commit a3641631d14571242eec0d30c9faa786cbf52d44 upstream.

If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
potentially can be exploited the vulnerability. this will out-of-bounds
read and write.  Luckily, the effect is small:

/* when no next entry is found, the current entry[i] is reselected */
for (j = i + 1; ; j = (j + 1) % nent) {
struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
if (ej->function == e->function) {

It reads ej->maxphyaddr, which is user controlled.  However...


After cpuid_entries there is

int maxphyaddr;
struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */

So we have:

- cpuid_entries at offset 1B50 (6992)
- maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
- padding at 27D4...27DF
- emulate_ctxt at 27E0

And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
would have been much worse.

This patch fixes it by modding the index to avoid the out-of-bounds
access. Worst case, i == j and ej->function == e->function,
the loop can bail out.

Reported-by: Moguofang <>
Cc: Paolo Bonzini <>
Cc: Radim Krčmář <>
Cc: Guofang Mo <>
Signed-off-by: Wanpeng Li <>
Signed-off-by: Paolo Bonzini <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
4 years agoperf script python: Remove dups in documentation examples
SeongJae Park [Tue, 30 May 2017 11:18:25 +0000 (20:18 +0900)]
perf script python: Remove dups in documentation examples

commit 14fc42fa1b3e7ea5160c84d0e686a3a0c1ffe619 upstream.

Few shell command examples in perf-script-python.txt has few nitpicks

- tools/perf/scripts/python directory listing command is unnecessarily
- few examples contain additional information in command prompt
  unnecessarily and inconsistently.

This commit fixes them to enhance readability of the document.

Signed-off-by: SeongJae Park <>
Cc: Alexander Shishkin <>
Cc: Frederic Weisbecker <>
Cc: Peter Zijlstra <>
Cc: Tom Zanussi <>
Fixes: cff68e582237 ("perf/scripts: Add perf-trace-python Documentation")
Signed-off-by: Arnaldo Carvalho de Melo <>
Signed-off-by: Ben Hutchings <>
4 years agoperf script python: Updated trace_unhandled() signature
SeongJae Park [Tue, 30 May 2017 11:18:27 +0000 (20:18 +0900)]
perf script python: Updated trace_unhandled() signature

commit 1bf8d5a4a5da19b1f6e7958fe67db4118fa7a1c1 upstream.

Default function signature of trace_unhandled() got changed to include a
field dict, but its documentation, perf-script-python.txt has not been
updated.  Fix it.

Signed-off-by: SeongJae Park <>
Cc: Alexander Shishkin <>
Cc: Peter Zijlstra <>
Cc: Pierre Tardy <>
Fixes: c02514850d67 ("perf scripts python: Give field dict to unhandled callback")
Signed-off-by: Arnaldo Carvalho de Melo <>
Signed-off-by: Ben Hutchings <>
4 years agoperf script python: Fix wrong code snippets in documentation
SeongJae Park [Tue, 30 May 2017 11:18:26 +0000 (20:18 +0900)]
perf script python: Fix wrong code snippets in documentation

commit 26ddb8722df865aa67fbe459107d2f3f8e5c6829 upstream.

This commit fixes wrong code snippets for trace_begin() and trace_end()
function example definition.

Signed-off-by: SeongJae Park <>
Cc: Alexander Shishkin <>
Cc: Frederic Weisbecker <>
Cc: Peter Zijlstra <>
Cc: Tom Zanussi <>
Fixes: cff68e582237 ("perf/scripts: Add perf-trace-python Documentation")
Signed-off-by: Arnaldo Carvalho de Melo <>
Signed-off-by: Ben Hutchings <>
4 years agoperf script: Fix documentation errors
SeongJae Park [Tue, 30 May 2017 11:18:24 +0000 (20:18 +0900)]
perf script: Fix documentation errors

commit 34d4453dac257be53c21abf2f713c992fb692b5c upstream.

This commit fixes two errors in documents for perf-script-python and
perf-script-perl as below:

- /sys/kernel/debug/tracing events -> /sys/kernel/debug/tracing/events/
- trace_handled -> trace_unhandled

Signed-off-by: SeongJae Park <>
Cc: Alexander Shishkin <>
Cc: Frederic Weisbecker <>
Cc: Peter Zijlstra <>
Cc: Tom Zanussi <>
Fixes: cff68e582237 ("perf/scripts: Add perf-trace-python Documentation")
Signed-off-by: Arnaldo Carvalho de Melo <>
Signed-off-by: Ben Hutchings <>
4 years agoperf script: Fix outdated comment for perf-trace-python
SeongJae Park [Tue, 30 May 2017 11:18:23 +0000 (20:18 +0900)]
perf script: Fix outdated comment for perf-trace-python

commit c76132dc5182776b98e946d674cb41c421661ea9 upstream.

Script generated by the '--gen-script' option contains an outdated
comment. It mentions a 'perf-trace-python' document while it has been
renamed to 'perf-script-python'. Fix it.

Signed-off-by: SeongJae Park <>
Cc: Alexander Shishkin <>
Cc: Peter Zijlstra <>
Cc: Thomas Gleixner <>
Fixes: 133dc4c39c57 ("perf: Rename 'perf trace' to 'perf script'")
Signed-off-by: Arnaldo Carvalho de Melo <>
Signed-off-by: Ben Hutchings <>
4 years agoperf probe: Fix examples section of documentation
SeongJae Park [Sun, 7 May 2017 10:36:42 +0000 (19:36 +0900)]
perf probe: Fix examples section of documentation

commit d89269a89ebb6a74512f3f40e89cd12017f60a75 upstream.

An example in perf-probe documentation for pattern of function name
based probe addition is not providing example command for that case.

This commit fixes the example to give appropriate example command.

Signed-off-by: SeongJae Park <>
Acked-by: Masami Hiramatsu <>
Cc: Peter Zijlstra <>
Cc: Taeung Song <>
Fixes: ee391de876ae ("perf probe: Update perf probe document")
Signed-off-by: Arnaldo Carvalho de Melo <>
Signed-off-by: Ben Hutchings <>
4 years agodrm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
Dan Carpenter [Thu, 27 Apr 2017 09:12:08 +0000 (12:12 +0300)]
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()

commit f0c62e9878024300319ba2438adc7b06c6b9c448 upstream.

If vmalloc() fails then we need to a bit of cleanup before returning.

Fixes: fb1d9738ca05 ("drm/vmwgfx: Add DRM driver for VMware Virtual GPU")
Signed-off-by: Dan Carpenter <>
Reviewed-by: Sinclair Yeh <>
Signed-off-by: Ben Hutchings <>
4 years agonet: ethoc: enable NAPI before poll may be scheduled
Max Filippov [Tue, 6 Jun 2017 01:31:16 +0000 (18:31 -0700)]
net: ethoc: enable NAPI before poll may be scheduled

commit d220b942a4b6a0640aee78841608f4aa5e8e185e upstream.

ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.

Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Max Filippov <>
Reviewed-by: Tobias Klauser <>
Reviewed-by: Florian Fainelli <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agovb2: Fix an off by one error in 'vb2_plane_vaddr'
Christophe JAILLET [Fri, 28 Apr 2017 04:51:40 +0000 (01:51 -0300)]
vb2: Fix an off by one error in 'vb2_plane_vaddr'

commit 5ebb6dd36c9f5fb37b1077b393c254d70a14cb46 upstream.

We should ensure that 'plane_no' is '< vb->num_planes' as done in
'vb2_plane_cookie' just a few lines below.

Fixes: e23ccc0ad925 ("[media] v4l: add videobuf2 Video for Linux 2 driver framework")

Signed-off-by: Christophe JAILLET <>
Reviewed-by: Sakari Ailus <>
Signed-off-by: Hans Verkuil <>
Signed-off-by: Mauro Carvalho Chehab <>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <>
4 years agovb2: fix plane index sanity check in vb2_plane_cookie()
Zhaowei Yuan [Fri, 22 Aug 2014 02:28:21 +0000 (23:28 -0300)]
vb2: fix plane index sanity check in vb2_plane_cookie()

commit a9ae4692eda4b99f85757b15d60971ff78a0a0e2 upstream.

It's also invalid when plane_no is equal to vb->num_planes

Signed-off-by: Zhaowei Yuan <>
Signed-off-by: Hans Verkuil <>
Signed-off-by: Mauro Carvalho Chehab <>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <>
4 years agonet: ping: do not abuse udp_poll()
Eric Dumazet [Sat, 3 Jun 2017 16:29:25 +0000 (09:29 -0700)]
net: ping: do not abuse udp_poll()

commit 77d4b1d36926a9b8387c6b53eeba42bcaaffcea3 upstream.

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <>
Reported-by: Sasha Levin <>
Cc: Solar Designer <>
Cc: Vasiliy Kulikov <>
Cc: Lorenzo Colitti <>
Acked-By: Lorenzo Colitti <>
Tested-By: Lorenzo Colitti <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2:
 - Drop IPv6 bits
 - Adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoipv6: Fix leak in ipv6_gso_segment().
David S. Miller [Mon, 5 Jun 2017 01:41:10 +0000 (21:41 -0400)]
ipv6: Fix leak in ipv6_gso_segment().

commit e3e86b5119f81e5e2499bea7ea1ebe8ac6aab789 upstream.

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <>
Signed-off-by: David S. Miller <>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <>
4 years agonet: add kfree_skb_list()
Ben Hutchings [Sun, 18 Jun 2017 01:36:32 +0000 (02:36 +0100)]
net: add kfree_skb_list()

Extracted from upstream commit bd8a7036c06c "gre: fix a possible skb leak".

This patch adds a kfree_skb_list() helper.

Signed-off-by: Ben Hutchings <>
4 years agorc-core: race condition during ir_raw_event_register()
Sean Young [Wed, 24 May 2017 09:24:51 +0000 (06:24 -0300)]
rc-core: race condition during ir_raw_event_register()

commit 963761a0b2e85663ee4a5630f72930885a06598a upstream.

A rc device can call ir_raw_event_handle() after rc_allocate_device(),
but before rc_register_device() has completed. This is racey because
rcdev->raw is set before rcdev->raw->thread has a valid value.

Reported-by: kbuild test robot <>
Signed-off-by: Sean Young <>
Signed-off-by: Mauro Carvalho Chehab <>
[bwh: Backported to 3.2: adjust filename, context, indentation]
Signed-off-by: Ben Hutchings <>
4 years agoalarmtimer: Rate limit periodic intervals
Thomas Gleixner [Tue, 30 May 2017 21:15:35 +0000 (23:15 +0200)]
alarmtimer: Rate limit periodic intervals

commit ff86bf0c65f14346bf2440534f9ba5ac232c39a0 upstream.

The alarmtimer code has another source of potentially rearming itself too
fast. Interval timers with a very samll interval have a similar CPU hog
effect as the previously fixed overflow issue.

The reason is that alarmtimers do not implement the normal protection
against this kind of problem which the other posix timer use:

  timer expires -> queue signal -> deliver signal -> rearm timer

This scheme brings the rearming under scheduler control and prevents
permanently firing timers which hog the CPU.

Bringing this scheme to the alarm timer code is a major overhaul because it
lacks all the necessary mechanisms completely.

So for a quick fix limit the interval to one jiffie. This is not
problematic in practice as alarmtimers are usually backed by an RTC for
suspend which have 1 second resolution. It could be therefor argued that
the resolution of this clock should be set to 1 second in general, but
that's outside the scope of this fix.

Signed-off-by: Thomas Gleixner <>
Cc: Peter Zijlstra <>
Cc: Kostya Serebryany <>
Cc: syzkaller <>
Cc: John Stultz <>
Cc: Dmitry Vyukov <>
[bwh: Backported to 3.2:
 - Use ktime_to_ns()/ktime_set() as ktime_t is not scalar
 - Adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoalarmtimer: Prevent overflow of relative timers
Thomas Gleixner [Tue, 30 May 2017 21:15:34 +0000 (23:15 +0200)]
alarmtimer: Prevent overflow of relative timers

commit f4781e76f90df7aec400635d73ea4c35ee1d4765 upstream.

Andrey reported a alartimer related RCU stall while fuzzing the kernel with

The reason for this is an overflow in ktime_add() which brings the
resulting time into negative space and causes immediate expiry of the
timer. The following rearm with a small interval does not bring the timer
back into positive space due to the same issue.

This results in a permanent firing alarmtimer which hogs the CPU.

Use ktime_add_safe() instead which detects the overflow and clamps the
result to KTIME_SEC_MAX.

Reported-by: Andrey Konovalov <>
Signed-off-by: Thomas Gleixner <>
Cc: Peter Zijlstra <>
Cc: Kostya Serebryany <>
Cc: syzkaller <>
Cc: John Stultz <>
Cc: Dmitry Vyukov <>
[bwh: Backported to 3.2: drop change in alarm_start_relative()]
Signed-off-by: Ben Hutchings <>
4 years agodrivers: char: mem: Fix wraparound check to allow mappings up to the end
Julius Werner [Fri, 2 Jun 2017 22:36:39 +0000 (15:36 -0700)]
drivers: char: mem: Fix wraparound check to allow mappings up to the end

commit 32829da54d9368103a2f03269a5120aa9ee4d5da upstream.

A recent fix to /dev/mem prevents mappings from wrapping around the end
of physical address space. However, the check was written in a way that
also prevents a mapping reaching just up to the end of physical address
space, which may be a valid use case (especially on 32-bit systems).
This patch fixes it by checking the last mapped address (instead of the
first address behind that) for overflow.

Fixes: b299cde245 ("drivers: char: mem: Check for address space wraparound with mmap()")
Reported-by: Nico Huber <>
Signed-off-by: Julius Werner <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
4 years agoipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
Ben Hutchings [Wed, 31 May 2017 12:15:41 +0000 (13:15 +0100)]
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

commit 6e80ac5cc992ab6256c3dae87f7e57db15e1a58c upstream.

xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <>
Acked-by: Craig Gallek <>
Signed-off-by: David S. Miller <>
4 years agousb: gadget: f_mass_storage: Serialize wake and sleep execution
Thinh Nguyen [Fri, 12 May 2017 00:26:48 +0000 (17:26 -0700)]
usb: gadget: f_mass_storage: Serialize wake and sleep execution

commit dc9217b69dd6089dcfeb86ed4b3c671504326087 upstream.

f_mass_storage has a memorry barrier issue with the sleep and wake
functions that can cause a deadlock. This results in intermittent hangs
during MSC file transfer. The host will reset the device after receiving
no response to resume the transfer. This issue is seen when dwc3 is
processing 2 transfer-in-progress events at the same time, invoking
completion handlers for CSW and CBW. Also this issue occurs depending on
the system timing and latency.

To increase the chance to hit this issue, you can force dwc3 driver to
wait and process those 2 events at once by adding a small delay (~100us)
in dwc3_check_event_buf() whenever the request is for CSW and read the
event count again. Avoid debugging with printk and ftrace as extra
delays and memory barrier will mask this issue.

Scenario which can lead to failure:
1) The main thread sleeps and waits for the next command in
2) bulk_in_complete() wakes up main thread for CSW.
3) bulk_out_complete() tries to wake up the running main thread for CBW.
4) thread_wakeup_needed is not loaded with correct value in
5) Main thread goes to sleep again.

The pattern is shown below. Note the 2 critical variables.
 * common->thread_wakeup_needed
 * bh->state

CPU 0 (sleep_thread) CPU 1 (wakeup_thread)
==============================  ===============================

bh->state = BH_STATE_FULL;
thread_wakeup_needed = 0; thread_wakeup_needed = 1;
if (bh->state != BH_STATE_FULL)
sleep again ...

As pointed out by Alan Stern, this is an R-pattern issue. The issue can
be seen when there are two wakeups in quick succession. The
thread_wakeup_needed can be overwritten in sleep_thread, and the read of
the bh->state maybe reordered before the write to thread_wakeup_needed.

This patch applies full memory barrier smp_mb() in both sleep_thread()
and wakeup_thread() to ensure the order which the thread_wakeup_needed
and bh->state are written and loaded.

However, a better solution in the future would be to use wait_queue
method that takes care of managing memory barrier between waker and

Acked-by: Alan Stern <>
Signed-off-by: Thinh Nguyen <>
Signed-off-by: Felipe Balbi <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
4 years agonet: phy: fix marvell phy status reading
Russell King [Tue, 30 May 2017 15:21:51 +0000 (16:21 +0100)]
net: phy: fix marvell phy status reading

commit 898805e0cdf7fd860ec21bf661d3a0285a3defbd upstream.

The Marvell driver incorrectly provides phydev->lp_advertising as the
logical and of the link partner's advert and our advert.  This is
incorrect - this field is supposed to store the link parter's unmodified

This allows ethtool to report the correct link partner auto-negotiation

Fixes: be937f1f89ca ("Marvell PHY m88e1111 driver fix")
Signed-off-by: Russell King <>
Reviewed-by: Andrew Lunn <>
Reviewed-by: Florian Fainelli <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
4 years agoext4: fix fdatasync(2) after extent manipulation operations
Jan Kara [Mon, 29 May 2017 17:24:55 +0000 (13:24 -0400)]
ext4: fix fdatasync(2) after extent manipulation operations

commit 67a7d5f561f469ad2fa5154d2888258ab8e6df7c upstream.

Currently, extent manipulation operations such as hole punch, range
zeroing, or extent shifting do not record the fact that file data has
changed and thus fdatasync(2) has a work to do. As a result if we crash
e.g. after a punch hole and fdatasync, user can still possibly see the
punched out data after journal replay. Test generic/392 fails due to
these problems.

Fix the problem by properly marking that file data has changed in these

Fixes: a4bb6b64e39abc0e41ca077725f2a72c868e7622
Signed-off-by: Jan Kara <>
Signed-off-by: Theodore Ts'o <>
[bwh: Backported to 3.2: Only the punch-hole operation is supported, and
 it's in extents.c.]
Signed-off-by: Ben Hutchings <>
4 years agoext4: fix data corruption for mmap writes
Jan Kara [Fri, 26 May 2017 21:45:45 +0000 (17:45 -0400)]
ext4: fix data corruption for mmap writes

commit a056bdaae7a181f7dcc876cfab2f94538e508709 upstream.

mpage_submit_page() can race with another process growing i_size and
writing data via mmap to the written-back page. As mpage_submit_page()
samples i_size too early, it may happen that ext4_bio_write_page()
zeroes out too large tail of the page and thus corrupts user data.

Fix the problem by sampling i_size only after the page has been
write-protected in page tables by clear_page_dirty_for_io() call.

Reported-by: Michael Zimmer <>
Fixes: cb20d5188366f04d96d2e07b1240cc92170ade40
Signed-off-by: Jan Kara <>
Signed-off-by: Theodore Ts'o <>
[bwh: Backported to 3.2: The writeback path is very different here and
 it needs to read i_size long before calling clear_page_dirty_for_io().
 So read it twice and skip the page if it changed.]
Signed-off-by: Ben Hutchings <>
4 years agonet: ethernet: ax88796: don't call free_irq without request_irq first
Uwe Kleine-König [Thu, 25 May 2017 20:54:53 +0000 (22:54 +0200)]
net: ethernet: ax88796: don't call free_irq without request_irq first

commit 82533ad9a1ce3a7a6863849a552c2cc041b55e0d upstream.

The function ax_init_dev (which is called only from the driver's .probe
function) calls free_irq in the error path without having requested the
irq in the first place. So drop the free_irq call in the error path.

Fixes: 825a2ff1896e ("AX88796 network driver")
Signed-off-by: Uwe Kleine-König <>
Signed-off-by: David S. Miller <>
Signed-off-by: Ben Hutchings <>
4 years agoscsi: qla2xxx: don't disable a not previously enabled PCI device
Johannes Thumshirn [Tue, 23 May 2017 14:50:47 +0000 (16:50 +0200)]
scsi: qla2xxx: don't disable a not previously enabled PCI device

commit ddff7ed45edce4a4c92949d3c61cd25d229c4a14 upstream.

When pci_enable_device() or pci_enable_device_mem() fail in
qla2x00_probe_one() we bail out but do a call to
pci_disable_device(). This causes the dev_WARN_ON() in
pci_disable_device() to trigger, as the device wasn't enabled

So instead of taking the 'probe_out' error path we can directly return
*iff* one of the pci_enable_device() calls fails.

Additionally rename the 'probe_out' goto label's name to the more
descriptive 'disable_device'.

Signed-off-by: Johannes Thumshirn <>
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Reviewed-by: Bart Van Assche <>
Reviewed-by: Giridhar Malavali <>
Signed-off-by: Martin K. Petersen <>
Signed-off-by: Ben Hutchings <>
4 years agoASoC: Fix use-after-free at card unregistration
Takashi Iwai [Wed, 24 May 2017 08:19:45 +0000 (10:19 +0200)]
ASoC: Fix use-after-free at card unregistration

commit 4efda5f2130da033aeedc5b3205569893b910de2 upstream.

soc_cleanup_card_resources() call snd_card_free() at the last of its
procedure.  This turned out to lead to a use-after-free.
PCM runtimes have been already removed via soc_remove_pcm_runtimes(),
while it's dereferenced later in soc_pcm_free() called via

The fix is simple: just move the snd_card_free() call to the beginning
of the whole procedure.  This also gives another benefit: it
guarantees that all operations have been shut down before actually
releasing the resources, which was racy until now.

Reported-and-tested-by: Robert Jarzmik <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Mark Brown <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agonetfilter: ctnetlink: fix incorrect nf_ct_put during hash resize
Liping Zhang [Sat, 20 May 2017 23:22:49 +0000 (07:22 +0800)]
netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize

commit fefa92679dbe0c613e62b6c27235dcfbe9640ad1 upstream.

If nf_conntrack_htable_size was adjusted by the user during the ct
dump operation, we may invoke nf_ct_put twice for the same ct, i.e.
the "last" ct. This will cause the ct will be freed but still linked
in hash buckets.

It's very easy to reproduce the problem by the following commands:
  # while : ; do
  echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
  # while : ; do
  conntrack -L
  # iperf -s &
  # iperf -c -P 60 -t 36000

After a while, the system will hang like this:
  NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184]
  NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382]

So at last if we find cb->args[1] is equal to "last", this means hash
resize happened, then we can set cb->args[1] to 0 to fix the above

Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping")
Signed-off-by: Liping Zhang <>
Signed-off-by: Pablo Neira Ayuso <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agodmaengine: ep93xx: Always start from BASE0
Alexander Sverdlin [Mon, 22 May 2017 14:05:22 +0000 (16:05 +0200)]
dmaengine: ep93xx: Always start from BASE0

commit 0037ae47812b1f431cc602100d1d51f37d77b61e upstream.

The current buffer is being reset to zero on device_free_chan_resources()
but not on device_terminate_all(). It could happen that HW is restarted and
expects BASE0 to be used, but the driver is not synchronized and will start
from BASE1. One solution is to reset the buffer explicitly in

Signed-off-by: Alexander Sverdlin <>
Signed-off-by: Vinod Koul <>
Signed-off-by: Ben Hutchings <>
4 years agodrm/gma500/psb: Actually use VBT mode when it is found
Patrik Jakobsson [Tue, 18 Apr 2017 11:43:32 +0000 (13:43 +0200)]
drm/gma500/psb: Actually use VBT mode when it is found

commit 82bc9a42cf854fdf63155759c0aa790bd1f361b0 upstream.

With LVDS we were incorrectly picking the pre-programmed mode instead of
the prefered mode provided by VBT. Make sure we pick the VBT mode if
one is provided. It is likely that the mode read-out code is still wrong
but this patch fixes the immediate problem on most machines.

Signed-off-by: Patrik Jakobsson <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
4 years agolibceph: NULL deref on crush_decode() error path
Dan Carpenter [Tue, 23 May 2017 14:25:10 +0000 (17:25 +0300)]
libceph: NULL deref on crush_decode() error path

commit 293dffaad8d500e1a5336eeb90d544cf40d4fbd8 upstream.

If there is not enough space then ceph_decode_32_safe() does a goto bad.
We need to return an error code in that situation.  The current code
returns ERR_PTR(0) which is NULL.  The callers are not expecting that
and it results in a NULL dereference.

Fixes: f24e9980eb86 ("ceph: OSD client")
Signed-off-by: Dan Carpenter <>
Reviewed-by: Ilya Dryomov <>
Signed-off-by: Ilya Dryomov <>
Signed-off-by: Ben Hutchings <>
4 years agoblock: fix an error code in add_partition()
Dan Carpenter [Tue, 23 May 2017 14:28:36 +0000 (17:28 +0300)]
block: fix an error code in add_partition()

commit 7bd897cfce1eb373892d35d7f73201b0f9b221c4 upstream.

We don't set an error code on this path.  It means that we return NULL
instead of an error pointer and the caller does a NULL dereference.

Fixes: 6d1d8050b4bc ("block, partition: add partition_meta_info to hd_struct")
Signed-off-by: Dan Carpenter <>
Signed-off-by: Jens Axboe <>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <>
4 years agoALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
Alexander Tsoy [Mon, 22 May 2017 17:58:11 +0000 (20:58 +0300)]
ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430

commit 1fc2e41f7af4572b07190f9dec28396b418e9a36 upstream.

This model is actually called 92XXM2-8 in Windows driver. But since pin
configs for M22 and M28 are identical, just reuse M22 quirk.

Fixes external microphone (tested) and probably docking station ports
(not tested).

Signed-off-by: Alexander Tsoy <>
Signed-off-by: Takashi Iwai <>
Signed-off-by: Ben Hutchings <>
4 years agocrypto: gcm - wait for crypto op not signal safe
Gilad Ben-Yossef [Thu, 18 May 2017 13:29:25 +0000 (16:29 +0300)]
crypto: gcm - wait for crypto op not signal safe

commit f3ad587070d6bd961ab942b3fd7a85d00dfc934b upstream.

crypto_gcm_setkey() was using wait_for_completion_interruptible() to
wait for completion of async crypto op but if a signal occurs it
may return before DMA ops of HW crypto provider finish, thus
corrupting the data buffer that is kfree'ed in this case.

Resolve this by using wait_for_completion() instead.

Reported-by: Eric Biggers <>
Signed-off-by: Gilad Ben-Yossef <>
Signed-off-by: Herbert Xu <>
Signed-off-by: Ben Hutchings <>
4 years agoi2c: i2c-tiny-usb: fix buffer not being DMA capable
Sebastian Reichel [Fri, 5 May 2017 09:06:50 +0000 (11:06 +0200)]
i2c: i2c-tiny-usb: fix buffer not being DMA capable

commit 5165da5923d6c7df6f2927b0113b2e4d9288661e upstream.

Since v4.9 i2c-tiny-usb generates the below call trace
and longer works, since it can't communicate with the
USB device. The reason is, that since v4.9 the USB
stack checks, that the buffer it should transfer is DMA
capable. This was a requirement since v2.2 days, but it
usually worked nevertheless.

[   17.504959] ------------[ cut here ]------------
[   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.506545] transfer buffer not dma capable
[   17.507022] Modules linked in:
[   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
[   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   17.509039] Call Trace:
[   17.509320]  ? dump_stack+0x5c/0x78
[   17.509714]  ? __warn+0xbe/0xe0
[   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
[   17.510532]  ? nommu_map_sg+0xb0/0xb0
[   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
[   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
[   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
[   17.513125]  ? usb_start_wait_urb+0x65/0x160
[   17.513604]  ? usb_control_msg+0xdc/0x130
[   17.514061]  ? usb_xfer+0xa4/0x2a0
[   17.514445]  ? __i2c_transfer+0x108/0x3c0
[   17.514899]  ? i2c_transfer+0x57/0xb0
[   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
[   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
[   17.516408]  ? i2c_smbus_xfer+0x125/0x330
[   17.516876]  ? i2c_smbus_xfer+0x125/0x330
[   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
[   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
[   17.518248]  ? do_vfs_ioctl+0x9f/0x600
[   17.518671]  ? vfs_write+0x144/0x190
[   17.519078]  ? SyS_ioctl+0x74/0x80
[   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
[   17.519959] ---[ end trace d047c04982f5ac50 ]---

Signed-off-by: Sebastian Reichel <>
Reviewed-by: Greg Kroah-Hartman <>
Acked-by: Till Harbaum <>
Signed-off-by: Wolfram Sang <>
Signed-off-by: Ben Hutchings <>
4 years agoext4: keep existing extra fields when inode expands
Konstantin Khlebnikov [Mon, 22 May 2017 02:36:23 +0000 (22:36 -0400)]
ext4: keep existing extra fields when inode expands

commit 887a9730614727c4fff7cb756711b190593fc1df upstream.

ext4_expand_extra_isize() should clear only space between old and new

Fixes: 6dd4ee7cab7e # v2.6.23
Signed-off-by: Konstantin Khlebnikov <>
Signed-off-by: Theodore Ts'o <>
Signed-off-by: Ben Hutchings <>
4 years agoosf_wait4(): fix infoleak
Al Viro [Mon, 15 May 2017 01:47:25 +0000 (21:47 -0400)]
osf_wait4(): fix infoleak

commit a8c39544a6eb2093c04afd5005b6192bd0e880c6 upstream.

failing sys_wait4() won't fill struct rusage...

Signed-off-by: Al Viro <>
Signed-off-by: Ben Hutchings <>
4 years agoKVM: x86: zero base3 of unusable segments
Radim Krčmář [Thu, 18 May 2017 17:37:30 +0000 (19:37 +0200)]
KVM: x86: zero base3 of unusable segments

commit f0367ee1d64d27fa08be2407df5c125442e885e3 upstream.

Static checker noticed that base3 could be used uninitialized if the
segment was not present (useable).  Random stack values probably would
not pass VMCS entry checks.

Reported-by: Dan Carpenter <>
Fixes: 1aa366163b8b ("KVM: x86 emulator: consolidate segment accessors")
Reviewed-by: Paolo Bonzini <>
Reviewed-by: David Hildenbrand <>
Signed-off-by: Radim Krčmář <>
Signed-off-by: Ben Hutchings <>
4 years agoKVM: x86: fix use of uninitialized memory as segment descriptor in emulator.
Gleb Natapov [Mon, 21 Jan 2013 13:36:48 +0000 (15:36 +0200)]
KVM: x86: fix use of uninitialized memory as segment descriptor in emulator.

commit 378a8b099fc207ddcb91b19a8c1457667e0af398 upstream.

If VMX reports segment as unusable, zero descriptor passed by the emulator
before returning. Such descriptor will be considered not present by the

Signed-off-by: Gleb Natapov <>
Signed-off-by: Marcelo Tosatti <>
Signed-off-by: Ben Hutchings <>
4 years agoKVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
Wanpeng Li [Fri, 19 May 2017 09:46:56 +0000 (02:46 -0700)]
KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation

commit cbfc6c9184ce71b52df4b1d82af5afc81a709178 upstream.

Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.

- "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
  a read should be ignored) in guest can get a random number.
- "rep insb" instruction to access PIT register port 0x43 can control memcpy()
  in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
  which will disclose the unimportant kernel memory in host but no crash.

The similar test program below can reproduce the read out-of-bounds vulnerability:

void hexdump(void *mem, unsigned int len)
        unsigned int i, j;

        for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
                /* print offset */
                if(i % HEXDUMP_COLS == 0)
                        printf("0x%06x: ", i);

                /* print hex data */
                if(i < len)
                        printf("%02x ", 0xFF & ((char*)mem)[i]);
                else /* end of block, just aligning for ASCII dump */
                        printf("   ");

                /* print ASCII dump */
                if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                        for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                                if(j >= len) /* end of block, not really printing */
                                        putchar(' ');
                                else if(isprint(((char*)mem)[j])) /* printable char */
                                        putchar(0xFF & ((char*)mem)[j]);
                                else /* other char */

int main(void)
int i;
if (iopl(3))
err(1, "set iopl unsuccessfully\n");
return -1;
static char buf[0x40];

/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */

memset(buf, 0xab, sizeof(buf));

asm volatile("push %rdi;");
asm volatile("mov %0, %%rdi;"::"q"(buf));

asm volatile ("mov $0x40, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");

asm volatile ("mov $0x41, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");

asm volatile ("mov $0x42, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");

asm volatile ("mov $0x43, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");

asm volatile ("mov $0x44, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");

asm volatile ("mov $0x45, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");

asm volatile ("pop %rdi;");
hexdump(buf, 0x40);


/* ins port 0x40 */

memset(buf, 0xab, sizeof(buf));

asm volatile("push %rdi;");
asm volatile("mov %0, %%rdi;"::"q"(buf));

asm volatile ("mov $0x20, %rcx;");
asm volatile ("mov $0x40, %rdx;");
asm volatile ("rep insb;");

asm volatile ("pop %rdi;");
hexdump(buf, 0x40);


/* ins port 0x43 */

memset(buf, 0xab, sizeof(buf));

asm volatile("push %rdi;");
asm volatile("mov %0, %%rdi;"::"q"(buf));

asm volatile ("mov $0x20, %rcx;");
asm volatile ("mov $0x43, %rdx;");
asm volatile ("rep insb;");

asm volatile ("pop %rdi;");
hexdump(buf, 0x40);

return 0;

The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
w/o clear after using which results in some random datas are left over in
the buffer. Guest reads port 0x43 will be ignored since it is write only,
however, the function kernel_pio() can't distigush this ignore from successfully
reads data from device's ioport. There is no new data fill the buffer from
port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
the buffer to the guest unconditionally. This patch fixes it by clearing the
buffer before in instruction emulation to avoid to grant guest the stale data
in the buffer.

In addition, string I/O is not supported for in kernel device. So there is no
iteration to read ioport %RCX times for string I/O. The function kernel_pio()
just reads one round, and then copy the io size * %RCX to the guest unconditionally,
actually it copies the one round ioport data w/ other random datas which are left
over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
introducing the string I/O support for in kernel device in order to grant the right
ioport datas to the guest.

Before the patch:

0x000000: fe 38 93 93 ff ff ab ab .8......
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

After the patch:

0x000000: 1e 02 f8 00 ff ff ab ab ........
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: d2 e2 d2 df d2 db d2 d7 ........
0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: 00 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 00 00 00 00 ........
0x000018: 00 00 00 00 00 00 00 00 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

Reported-by: Moguofang <>
Cc: Paolo Bonzini <>
Cc: Radim Krčmář <>
Cc: Moguofang <>
Signed-off-by: Wanpeng Li <>
Signed-off-by: Radim Krčmář <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agopowerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
Michael Ellerman [Thu, 18 May 2017 10:37:31 +0000 (20:37 +1000)]
powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash

commit e41e53cd4fe331d0d1f06f8e4ed7e2cc63ee2c34 upstream.

virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
an address. What this means in practice is that it should only return true for
addresses in the linear mapping which are backed by a valid PFN.

We are failing to properly check that the address is in the linear mapping,
because virt_to_pfn() will return a valid looking PFN for more or less any
address. That bug is actually caused by __pa(), used in virt_to_pfn().

eg: __pa(0xc000000000010000) = 0x10000  # Good
    __pa(0xd000000000010000) = 0x10000  # Bad!
    __pa(0x0000000000010000) = 0x10000  # Bad!

This started happening after commit bdbc29c19b26 ("powerpc: Work around gcc
miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
you something bogus back.

Until we can verify if that GCC bug is no longer an issue, or come up with
another solution, this commit does the minimal fix to make virt_addr_valid()
work, by explicitly checking that the address is in the linear mapping region.

Fixes: bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
Signed-off-by: Michael Ellerman <>
Reviewed-by: Paul Mackerras <>
Reviewed-by: Balbir Singh <>
Tested-by: Breno Leitao <>
[bwh: Backported to 3.2: open-code virt_to_pfn()]
Signed-off-by: Ben Hutchings <>
4 years agowatchdog: pcwd_usb: fix NULL-deref at probe
Johan Hovold [Mon, 13 Mar 2017 12:49:45 +0000 (13:49 +0100)]
watchdog: pcwd_usb: fix NULL-deref at probe

commit 46c319b848268dab3f0e7c4a5b6e9146d3bca8a4 upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <>
Reviewed-by: Guenter Roeck <>
Signed-off-by: Guenter Roeck <>
Signed-off-by: Wim Van Sebroeck <>
Signed-off-by: Ben Hutchings <>
4 years agodrivers: char: mem: Check for address space wraparound with mmap()
Julius Werner [Fri, 12 May 2017 21:42:58 +0000 (14:42 -0700)]
drivers: char: mem: Check for address space wraparound with mmap()

commit b299cde245b0b76c977f4291162cf668e087b408 upstream.

/dev/mem currently allows mmap() mappings that wrap around the end of
the physical address space, which should probably be illegal. It
circumvents the existing STRICT_DEVMEM permission check because the loop
immediately terminates (as the start address is already higher than the
end address). On the x86_64 architecture it will then cause a panic
(from the BUG(start >= end) in arch/x86/mm/pat.c:reserve_memtype()).

This patch adds an explicit check to make sure offset + size will not
wrap around in the physical address type.

Signed-off-by: Julius Werner <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
4 years agousb: musb: tusb6010_omap: Do not reset the other direction's packet size
Peter Ujfalusi [Wed, 17 May 2017 16:23:11 +0000 (11:23 -0500)]
usb: musb: tusb6010_omap: Do not reset the other direction's packet size

commit 6df2b42f7c040d57d9ecb67244e04e905ab87ac6 upstream.

We have one register for each EP to set the maximum packet size for both
TX and RX.
If for example an RX programming would happen before the previous TX
transfer finishes we would reset the TX packet side.

To fix this issue, only modify the TX or RX part of the register.

Fixes: 550a7375fe72 ("USB: Add MUSB and TUSB support")
Signed-off-by: Peter Ujfalusi <>
Tested-by: Tony Lindgren <>
Signed-off-by: Bin Liu <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
4 years agoUSB: xhci: fix lock-inversion problem
Alan Stern [Wed, 17 May 2017 15:32:03 +0000 (18:32 +0300)]
USB: xhci: fix lock-inversion problem

commit 63aea0dbab90a2461faaae357cbc8cfd6c8de9fe upstream.

With threaded interrupts, bottom-half handlers are called with
interrupts enabled.  Therefore they can't safely use spin_lock(); they
have to use spin_lock_irqsave().  Lockdep warns about a violation
occurring in xhci_irq():

[ INFO: possible irq lock inversion dependency detected ]
4.11.0-rc8-dbg+ #1 Not tainted
swapper/7/0 just changed the state of lock:
 (&(&ehci->lock)->rlock){-.-...}, at: [<ffffffffa0130a69>]
ehci_hrtimer_func+0x29/0xc0 [ehci_hcd]
but this lock took another, HARDIRQ-unsafe lock in the past:

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
 *** DEADLOCK ***

no locks held by swapper/7/0.
the shortest dependencies between 2nd lock and 1st lock:
 -> (hcd_urb_list_lock){+.....} ops: 252 {
    HARDIRQ-ON-W at:
                      usb_hcd_unlink_urb_from_ep+0x1b/0x60 [usbcore]
                      xhci_giveback_urb_in_irq.isra.45+0x70/0x1b0 [xhci_hcd]
                      finish_td.constprop.60+0x1d8/0x2e0 [xhci_hcd]
                      xhci_irq+0xdd6/0x1fa0 [xhci_hcd]
                      usb_hcd_irq+0x26/0x40 [usbcore]

This patch fixes the problem.

Signed-off-by: Alan Stern <>
Reported-and-tested-by: Bart Van Assche <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agousb: host: xhci: simplify irq handler return
Felipe Balbi [Mon, 23 Jan 2017 12:20:07 +0000 (14:20 +0200)]
usb: host: xhci: simplify irq handler return

commit 76a35293b901915c5dcb4a87a4a0da8d7caf39fe upstream.

Instead of having several return points, let's use a local variable and
a single place to return. This makes the code slightly easier to read.

[set ret = IRQ_HANDLED in default working case  -Mathias]
Signed-off-by: Felipe Balbi <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agousb: host: xhci-mem: allocate zeroed Scratchpad Buffer
Peter Chen [Wed, 17 May 2017 15:32:01 +0000 (18:32 +0300)]
usb: host: xhci-mem: allocate zeroed Scratchpad Buffer

commit 7480d912d549f414e0ce39331870899e89a5598c upstream.

According to xHCI ch4.20 Scratchpad Buffers, the Scratchpad
Buffer needs to be zeroed.

The following operations take place to allocate
        Scratchpad Buffers to the xHC:
b. Software clears the Scratchpad Buffer to '0'

Signed-off-by: Peter Chen <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: we only do one allocation for scratchpad buffers]
Signed-off-by: Ben Hutchings <>
4 years agoxhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
Mathias Nyman [Wed, 17 May 2017 15:32:00 +0000 (18:32 +0300)]
xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton

commit a0c16630d35a874e82bdf2088f58ecaca1024315 upstream.

Intel Denverton microserver is Atom based and need the PME and CAS quirks
as well.

Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
4 years agousb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Apollo Lake
Wan Ahmad Zainie [Tue, 3 Jan 2017 16:28:52 +0000 (18:28 +0200)]
usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Apollo Lake

commit 6c97cfc1a097b1e0786c836e92b7a72b4d031e25 upstream.

Intel Apollo Lake also requires XHCI_PME_STUCK_QUIRK.
Adding its PCI ID to quirk.

Signed-off-by: Wan Ahmad Zainie <>
Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
4 years agoxhci: workaround for hosts missing CAS bit
Mathias Nyman [Thu, 20 Oct 2016 15:09:19 +0000 (18:09 +0300)]
xhci: workaround for hosts missing CAS bit

commit 346e99736c3ce328fd42d678343b70243aca5f36 upstream.

If a device is unplugged and replugged during Sx system suspend
some  Intel xHC hosts will overwrite the CAS (Cold attach status) flag
and no device connection is noticed in resume.

A device in this state can be identified in resume if its link state
is in polling or compliance mode, and the current connect status is 0.
A device in this state needs to be warm reset.

Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8

Observed on Cherryview and Apollolake as they go into compliance mode
if LFPS times out during polling, and re-plugged devices are not
discovered at resume.

Signed-off-by: Mathias Nyman <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agotracing/kprobes: Enforce kprobes teardown after testing
Thomas Gleixner [Wed, 17 May 2017 08:19:49 +0000 (10:19 +0200)]
tracing/kprobes: Enforce kprobes teardown after testing

commit 30e7d894c1478c88d50ce94ddcdbd7f9763d9cdd upstream.

Enabling the tracer selftest triggers occasionally the warning in
text_poke(), which warns when the to be modified page is not marked

The reason is that the tracer selftest installs kprobes on functions marked
__init for testing. These probes are removed after the tests, but that
removal schedules the delayed kprobes_optimizer work, which will do the
actual text poke. If the work is executed after the init text is freed,
then the warning triggers. The bug can be reproduced reliably when the work
delay is increased.

Flush the optimizer work and wait for the optimizing/unoptimizing lists to
become empty before returning from the kprobes tracer selftest. That
ensures that all operations which were queued due to the probes removal
have completed.

Signed-off-by: Thomas Gleixner <>
Acked-by: Masami Hiramatsu <>
Fixes: 6274de498 ("kprobes: Support delayed unoptimizing")
Signed-off-by: Steven Rostedt (VMware) <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agoof: fdt: add missing allocation-failure check
Johan Hovold [Wed, 17 May 2017 15:29:09 +0000 (17:29 +0200)]
of: fdt: add missing allocation-failure check

commit 49e67dd17649b60b4d54966e18ec9c80198227f0 upstream.

The memory allocator passed to __unflatten_device_tree() (e.g. a wrapped
kzalloc) can fail so add the missing sanity check to avoid dereferencing
a NULL pointer.

Fixes: fe14042358fa ("of/flattree: Refactor unflatten_device_tree and add fdt_unflatten_tree")
Signed-off-by: Johan Hovold <>
Signed-off-by: Rob Herring <>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <>
4 years agomac80211: strictly check mesh address extension mode
Rajkumar Manoharan [Mon, 15 May 2017 04:41:55 +0000 (21:41 -0700)]
mac80211: strictly check mesh address extension mode

commit 5667c86acf021e6dcf02584408b4484a273ac68f upstream.

Mesh forwarding path checks for address extension mode to fetch
appropriate proxied address and MPP address. Existing condition
that looks for 6 address format is not strict enough so that
frames with improper values are processed and invalid entries
are added into MPP table. Fix that by adding a stricter check before
processing the packet.

Per IEEE Std 802.11s-2011 spec. Table 7-6g1 lists address extension
mode 0x3 as reserved one. And also Table Table 9-13 does not specify
0x3 as valid address field.

Fixes: 9b395bc3be1c ("mac80211: verify that skb data is present")
Signed-off-by: Rajkumar Manoharan <>
Signed-off-by: Johannes Berg <>
[bwh: Backported to 3.2: add mesh_flags variable in ieee80211_data_to_8023(),
 added separately upstream]
Signed-off-by: Ben Hutchings <>
4 years agoUSB: hub: fix SS max number of ports
Johan Hovold [Wed, 10 May 2017 16:18:29 +0000 (18:18 +0200)]
USB: hub: fix SS max number of ports

commit 93491ced3c87c94b12220dbac0527e1356702179 upstream.

Add define for the maximum number of ports on a SuperSpeed hub as per
USB 3.1 spec Table 10-5, and use it when verifying the retrieved hub

This specifically avoids benign attempts to update the DeviceRemovable
mask for non-existing ports (should we get that far).

Fixes: dbe79bbe9dcb ("USB 3.0 Hub Changes")
Acked-by: Alan Stern <>
Signed-off-by: Johan Hovold <>
Signed-off-by: Greg Kroah-Hartman <>
[bwh: Backported to 3.2:
 - Add maxchild variable in hub_configure(), which was added separately upstream
 - Adjust filename]
Signed-off-by: Ben Hutchings <>
4 years agoUSB: hub: fix non-SS hub-descriptor handling
Johan Hovold [Wed, 10 May 2017 16:18:28 +0000 (18:18 +0200)]
USB: hub: fix non-SS hub-descriptor handling

commit bec444cd1c94c48df409a35ad4e5b143c245c3f7 upstream.

Add missing sanity check on the non-SuperSpeed hub-descriptor length in
order to avoid parsing and leaking two bytes of uninitialised slab data
through sysfs removable-attributes (or a compound-device debug

Note that we only make sure that the DeviceRemovable field is always
present (and specifically ignore the unused PortPwrCtrlMask field) in
order to continue support any hubs with non-compliant descriptors. As a
further safeguard, the descriptor buffer is also cleared.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <>
Acked-by: Alan Stern <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>
4 years agoUSB: hub: fix SS hub-descriptor handling
Johan Hovold [Wed, 10 May 2017 16:18:27 +0000 (18:18 +0200)]
USB: hub: fix SS hub-descriptor handling

commit 2c25a2c818023df64463aac3288a9f969491e507 upstream.

A SuperSpeed hub descriptor does not have any variable-length fields so
bail out when reading a short descriptor.

This avoids parsing and leaking two bytes of uninitialised slab data
through sysfs removable-attributes.

Fixes: dbe79bbe9dcb ("USB 3.0 Hub Changes")
Cc: John Youn <>
Acked-by: Alan Stern <>
Signed-off-by: Johan Hovold <>
Signed-off-by: Greg Kroah-Hartman <>
Signed-off-by: Ben Hutchings <>