Alan Stern [Wed, 18 Oct 2017 16:49:38 +0000 (12:49 -0400)]
 
USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
commit 
1c0edc3633b56000e18d82fc241e3995ca18a69e upstream.
Andrey used the syzkaller fuzzer to find an out-of-bounds memory
access in usb_get_bos_descriptor().  The code wasn't checking that the
next usb_dev_cap_header structure could fit into the remaining buffer
space.
This patch fixes the error and also reduces the bNumDeviceCaps field
in the header to match the actual number of capabilities found, in
cases where there are fewer than expected.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jaejoong Kim [Thu, 28 Sep 2017 10:16:30 +0000 (19:16 +0900)]
 
HID: usbhid: fix out-of-bounds bug
commit 
f043bfc98c193c284e2cd768fefabe18ac2fed9b upstream.
The hid descriptor identifies the length and type of subordinate
descriptors for a device. If the received hid descriptor is smaller than
the size of the struct hid_descriptor, it is possible to cause
out-of-bounds.
In addition, if bNumDescriptors of the hid descriptor have an incorrect
value, this can also cause out-of-bounds while approaching hdesc->desc[n].
So check the size of hid descriptor and bNumDescriptors.
	BUG: KASAN: slab-out-of-bounds in usbhid_parse+0x9b1/0xa20
	Read of size 1 at addr 
ffff88006c5f8edf by task kworker/1:2/1261
	CPU: 1 PID: 1261 Comm: kworker/1:2 Not tainted
	4.14.0-rc1-42251-gebb2c2437d80 #169
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
	Workqueue: usb_hub_wq hub_event
	Call Trace:
	__dump_stack lib/dump_stack.c:16
	dump_stack+0x292/0x395 lib/dump_stack.c:52
	print_address_description+0x78/0x280 mm/kasan/report.c:252
	kasan_report_error mm/kasan/report.c:351
	kasan_report+0x22f/0x340 mm/kasan/report.c:409
	__asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:427
	usbhid_parse+0x9b1/0xa20 drivers/hid/usbhid/hid-core.c:1004
	hid_add_device+0x16b/0xb30 drivers/hid/hid-core.c:2944
	usbhid_probe+0xc28/0x1100 drivers/hid/usbhid/hid-core.c:1369
	usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
	really_probe drivers/base/dd.c:413
	driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
	__device_attach_driver+0x230/0x290 drivers/base/dd.c:653
	bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
	__device_attach+0x26e/0x3d0 drivers/base/dd.c:710
	device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
	bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
	device_add+0xd0b/0x1660 drivers/base/core.c:1835
	usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
	generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
	usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
	really_probe drivers/base/dd.c:413
	driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
	__device_attach_driver+0x230/0x290 drivers/base/dd.c:653
	bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
	__device_attach+0x26e/0x3d0 drivers/base/dd.c:710
	device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
	bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
	device_add+0xd0b/0x1660 drivers/base/core.c:1835
	usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
	hub_port_connect drivers/usb/core/hub.c:4903
	hub_port_connect_change drivers/usb/core/hub.c:5009
	port_event drivers/usb/core/hub.c:5115
	hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
	process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
	worker_thread+0x221/0x1850 kernel/workqueue.c:2253
	kthread+0x3a1/0x470 kernel/kthread.c:231
	ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Alan Stern [Fri, 29 Sep 2017 14:54:24 +0000 (10:54 -0400)]
 
usb: usbtest: fix NULL pointer dereference
commit 
7c80f9e4a588f1925b07134bb2e3689335f6c6d8 upstream.
If the usbtest driver encounters a device with an IN bulk endpoint but
no OUT bulk endpoint, it will try to dereference a NULL pointer
(out->desc.bEndpointAddress).  The problem can be solved by adding a
missing test.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Greg Kroah-Hartman [Tue, 19 Sep 2017 13:07:17 +0000 (15:07 +0200)]
 
USB: fix out-of-bounds in usb_set_configuration
commit 
bd7a3fe770ebd8391d1c7d072ff88e9e76d063eb upstream.
Andrey Konovalov reported a possible out-of-bounds problem for a USB interface
association descriptor.  He writes:
	It seems there's no proper size check of a USB_DT_INTERFACE_ASSOCIATION
	descriptor. It's only checked that the size is >= 2 in
	usb_parse_configuration(), so find_iad() might do out-of-bounds access
	to intf_assoc->bInterfaceCount.
And he's right, we don't check for crazy descriptors of this type very well, so
resolve this problem.  Yet another issue found by syzkaller...
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Takashi Iwai [Fri, 22 Sep 2017 14:18:53 +0000 (16:18 +0200)]
 
ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
commit 
bfc81a8bc18e3c4ba0cbaa7666ff76be2f998991 upstream.
When a USB-audio device receives a maliciously adjusted or corrupted
buffer descriptor, the USB-audio driver may access an out-of-bounce
value at its parser.  This was detected by syzkaller, something like:
  BUG: KASAN: slab-out-of-bounds in usb_audio_probe+0x27b2/0x2ab0
  Read of size 1 at addr 
ffff88006b83a9e8 by task kworker/0:1/24
  CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc1-42251-gebb2c2437d80 #224
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: usb_hub_wq hub_event
  Call Trace:
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x292/0x395 lib/dump_stack.c:52
   print_address_description+0x78/0x280 mm/kasan/report.c:252
   kasan_report_error mm/kasan/report.c:351
   kasan_report+0x22f/0x340 mm/kasan/report.c:409
   __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:427
   snd_usb_create_streams sound/usb/card.c:248
   usb_audio_probe+0x27b2/0x2ab0 sound/usb/card.c:605
   usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
   really_probe drivers/base/dd.c:413
   driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
   __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
   bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
   __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
   device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
   bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
   device_add+0xd0b/0x1660 drivers/base/core.c:1835
   usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
   generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
   usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
   really_probe drivers/base/dd.c:413
   driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
   __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
   bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
   __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
   device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
   bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
   device_add+0xd0b/0x1660 drivers/base/core.c:1835
   usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
   hub_port_connect drivers/usb/core/hub.c:4903
   hub_port_connect_change drivers/usb/core/hub.c:5009
   port_event drivers/usb/core/hub.c:5115
   hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
   process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
   worker_thread+0x221/0x1850 kernel/workqueue.c:2253
   kthread+0x3a1/0x470 kernel/kthread.c:231
   ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
This patch adds the checks of out-of-bounce accesses at appropriate
places and bails out when it goes out of the given buffer.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Takashi Iwai [Tue, 10 Oct 2017 12:10:32 +0000 (14:10 +0200)]
 
ALSA: usb-audio: Kill stray URB at exiting
commit 
124751d5e63c823092060074bd0abaae61aaa9c4 upstream.
USB-audio driver may leave a stray URB for the mixer interrupt when it
exits by some error during probe.  This leads to a use-after-free
error as spotted by syzkaller like:
  ==================================================================
  BUG: KASAN: use-after-free in snd_usb_mixer_interrupt+0x604/0x6f0
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x292/0x395 lib/dump_stack.c:52
   print_address_description+0x78/0x280 mm/kasan/report.c:252
   kasan_report_error mm/kasan/report.c:351
   kasan_report+0x23d/0x350 mm/kasan/report.c:409
   __asan_report_load8_noabort+0x19/0x20 mm/kasan/report.c:430
   snd_usb_mixer_interrupt+0x604/0x6f0 sound/usb/mixer.c:2490
   __usb_hcd_giveback_urb+0x2e0/0x650 drivers/usb/core/hcd.c:1779
   ....
  Allocated by task 1484:
   save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
   save_stack+0x43/0xd0 mm/kasan/kasan.c:447
   set_track mm/kasan/kasan.c:459
   kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
   kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772
   kmalloc ./include/linux/slab.h:493
   kzalloc ./include/linux/slab.h:666
   snd_usb_create_mixer+0x145/0x1010 sound/usb/mixer.c:2540
   create_standard_mixer_quirk+0x58/0x80 sound/usb/quirks.c:516
   snd_usb_create_quirk+0x92/0x100 sound/usb/quirks.c:560
   create_composite_quirk+0x1c4/0x3e0 sound/usb/quirks.c:59
   snd_usb_create_quirk+0x92/0x100 sound/usb/quirks.c:560
   usb_audio_probe+0x1040/0x2c10 sound/usb/card.c:618
   ....
  Freed by task 1484:
   save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
   save_stack+0x43/0xd0 mm/kasan/kasan.c:447
   set_track mm/kasan/kasan.c:459
   kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524
   slab_free_hook mm/slub.c:1390
   slab_free_freelist_hook mm/slub.c:1412
   slab_free mm/slub.c:2988
   kfree+0xf6/0x2f0 mm/slub.c:3919
   snd_usb_mixer_free+0x11a/0x160 sound/usb/mixer.c:2244
   snd_usb_mixer_dev_free+0x36/0x50 sound/usb/mixer.c:2250
   __snd_device_free+0x1ff/0x380 sound/core/device.c:91
   snd_device_free_all+0x8f/0xe0 sound/core/device.c:244
   snd_card_do_free sound/core/init.c:461
   release_card_device+0x47/0x170 sound/core/init.c:181
   device_release+0x13f/0x210 drivers/base/core.c:814
   ....
Actually such a URB is killed properly at disconnection when the
device gets probed successfully, and what we need is to apply it for
the error-path, too.
In this patch, we apply snd_usb_mixer_disconnect() at releasing.
Also introduce a new flag, disconnected, to struct usb_mixer_interface
for not performing the disconnection procedure twice.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: snd_usb_mixer_disconnect() takes a pointer to
 usb_mixer_interface::list, not to usb_mixer_interface itself]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Willem de Bruijn [Tue, 26 Sep 2017 16:19:37 +0000 (12:19 -0400)]
 
packet: in packet_do_bind, test fanout with bind_lock held
commit 
4971613c1639d8e5f102c4e797c3bf8f83a5a69e upstream.
Once a socket has po->fanout set, it remains a member of the group
until it is destroyed. The prot_hook must be constant and identical
across sockets in the group.
If fanout_add races with packet_do_bind between the test of po->fanout
and taking the lock, the bind call may make type or dev inconsistent
with that of the fanout group.
Hold po->bind_lock when testing po->fanout to avoid this race.
I had to introduce artificial delay (local_bh_enable) to actually
observe the race.
Fixes: 
dc99f600698d ("packet: Add fanout support.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Willem de Bruijn [Thu, 14 Sep 2017 21:14:41 +0000 (17:14 -0400)]
 
packet: hold bind lock when rebinding to fanout hook
commit 
008ba2a13f2d04c947adc536d19debb8fe66f110 upstream.
Packet socket bind operations must hold the po->bind_lock. This keeps
po->running consistent with whether the socket is actually on a ptype
list to receive packets.
fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
binds the fanout object to receive through packet_rcv_fanout.
Make it hold the po->bind_lock when testing po->running and rebinding.
Else, it can race with other rebind operations, such as that in
packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
can result in a socket being added to a fanout group twice, causing
use-after-free KASAN bug reports, among others.
Reported independently by both trinity and syzkaller.
Verified that the syzkaller reproducer passes after this patch.
Fixes: 
dc99f600698d ("packet: Add fanout support.")
Reported-by: nixioaming <nixiaoming@huawei.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: use atomic_read() not refcount_read()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Francesco Ruggeri [Thu, 5 Nov 2015 16:16:14 +0000 (08:16 -0800)]
 
packet: race condition in packet_bind
commit 
30f7ea1c2b5f5fb7462c5ae44fe2e40cb2d6a474 upstream.
There is a race conditions between packet_notifier and packet_bind{_spkt}.
It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.
This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.
import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
   vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId
os.system('taskset -p 0x10 %d' % os.getpid())
s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
          (realDev, dev, vlanId))
os.system('ip netns add dummy')
pid=os.fork()
if pid == 0:
   # dev should be moved while packet_do_bind is in synchronize net
   os.system('taskset -p 0x20000 %d' % os.getpid())
   os.system('ip link set %s netns dummy' % dev)
   os.system('ip netns exec dummy ip link del %s' % dev)
   s.close()
   sys.exit(0)
time.sleep(.004)
try:
   s.bind(('%s' % dev, proto+1))
except:
   print 'Could not bind socket'
   s.close()
   os.system('ip netns del dummy')
   sys.exit(0)
os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Add the 'dev_curr' variable
 - Drop the packet_cached_dev changes
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
David Howells [Thu, 12 Oct 2017 15:00:41 +0000 (16:00 +0100)]
 
KEYS: don't let add_key() update an uninstantiated key
commit 
60ff5b2f547af3828aebafd54daded44cfb0807a upstream.
Currently, when passed a key that already exists, add_key() will call the
key's ->update() method if such exists.  But this is heavily broken in the
case where the key is uninstantiated because it doesn't call
__key_instantiate_and_link().  Consequently, it doesn't do most of the
things that are supposed to happen when the key is instantiated, such as
setting the instantiation state, clearing KEY_FLAG_USER_CONSTRUCT and
awakening tasks waiting on it, and incrementing key->user->nikeys.
It also never takes key_construction_mutex, which means that
->instantiate() can run concurrently with ->update() on the same key.  In
the case of the "user" and "logon" key types this causes a memory leak, at
best.  Maybe even worse, the ->update() methods of the "encrypted" and
"trusted" key types actually just dereference a NULL pointer when passed an
uninstantiated key.
Change key_create_or_update() to wait interruptibly for the key to finish
construction before continuing.
This patch only affects *uninstantiated* keys.  For now we still allow a
negatively instantiated key to be updated (thereby positively
instantiating it), although that's broken too (the next patch fixes it)
and I'm not sure that anyone actually uses that functionality either.
Here is a simple reproducer for the bug using the "encrypted" key type
(requires CONFIG_ENCRYPTED_KEYS=y), though as noted above the bug
pertained to more than just the "encrypted" key type:
    #include <stdlib.h>
    #include <unistd.h>
    #include <keyutils.h>
    int main(void)
    {
        int ringid = keyctl_join_session_keyring(NULL);
        if (fork()) {
            for (;;) {
                const char payload[] = "update user:foo 32";
                usleep(rand() % 10000);
                add_key("encrypted", "desc", payload, sizeof(payload), ringid);
                keyctl_clear(ringid);
            }
        } else {
            for (;;)
                request_key("encrypted", "desc", "callout_info", ringid);
        }
    }
It causes:
    BUG: unable to handle kernel NULL pointer dereference at 
0000000000000018
    IP: encrypted_update+0xb0/0x170
    PGD 
7a178067 P4D 
7a178067 PUD 
77269067 PMD 0
    PREEMPT SMP
    CPU: 0 PID: 340 Comm: reproduce Tainted: G      D         4.14.0-rc1-00025-g428490e38b2e #796
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: 
ffff8a467a39a340 task.stack: 
ffffb15c40770000
    RIP: 0010:encrypted_update+0xb0/0x170
    RSP: 0018:
ffffb15c40773de8 EFLAGS: 
00010246
    RAX: 
0000000000000000 RBX: 
ffff8a467a275b00 RCX: 
0000000000000000
    RDX: 
0000000000000005 RSI: 
ffff8a467a275b14 RDI: 
ffffffffb742f303
    RBP: 
ffffb15c40773e20 R08: 
0000000000000000 R09: 
ffff8a467a275b17
    R10: 
0000000000000020 R11: 
0000000000000000 R12: 
0000000000000000
    R13: 
0000000000000000 R14: 
ffff8a4677057180 R15: 
ffff8a467a275b0f
    FS:  
00007f5d7fb08700(0000) GS:
ffff8a467f200000(0000) knlGS:
0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
    CR2: 
0000000000000018 CR3: 
0000000077262005 CR4: 
00000000001606f0
    Call Trace:
     key_create_or_update+0x2bc/0x460
     SyS_add_key+0x10c/0x1d0
     entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x7f5d7f211259
    RSP: 002b:
00007ffed03904c8 EFLAGS: 
00000246 ORIG_RAX: 
00000000000000f8
    RAX: 
ffffffffffffffda RBX: 
000000003b2a7955 RCX: 
00007f5d7f211259
    RDX: 
00000000004009e4 RSI: 
00000000004009ff RDI: 
0000000000400a04
    RBP: 
0000000068db8bad R08: 
000000003b2a7955 R09: 
0000000000000004
    R10: 
000000000000001a R11: 
0000000000000246 R12: 
0000000000400868
    R13: 
00007ffed03905d0 R14: 
0000000000000000 R15: 
0000000000000000
    Code: 77 28 e8 64 34 1f 00 45 31 c0 31 c9 48 8d 55 c8 48 89 df 48 8d 75 d0 e8 ff f9 ff ff 85 c0 41 89 c4 0f 88 84 00 00 00 4c 8b 7d c8 <49> 8b 75 18 4c 89 ff e8 24 f8 ff ff 85 c0 41 89 c4 78 6d 49 8b
    RIP: encrypted_update+0xb0/0x170 RSP: 
ffffb15c40773de8
    CR2: 
0000000000000018
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Biggers <ebiggers@google.com>
[bwh: Backported to 3.2:
 - Use the 'error' label to return, not 'error_free_prep'
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Takashi Iwai [Mon, 9 Oct 2017 09:09:20 +0000 (11:09 +0200)]
 
ALSA: seq: Fix use-after-free at creating a port
commit 
71105998845fb012937332fe2e806d443c09e026 upstream.
There is a potential race window opened at creating and deleting a
port via ioctl, as spotted by fuzzing.  snd_seq_create_port() creates
a port object and returns its pointer, but it doesn't take the
refcount, thus it can be deleted immediately by another thread.
Meanwhile, snd_seq_ioctl_create_port() still calls the function
snd_seq_system_client_ev_port_start() with the created port object
that is being deleted, and this triggers use-after-free like:
 BUG: KASAN: use-after-free in snd_seq_ioctl_create_port+0x504/0x630 [snd_seq] at addr 
ffff8801f2241cb1
 =============================================================================
 BUG kmalloc-512 (Tainted: G    B          ): kasan: bad access detected
 -----------------------------------------------------------------------------
 INFO: Allocated in snd_seq_create_port+0x94/0x9b0 [snd_seq] age=1 cpu=3 pid=4511
 	___slab_alloc+0x425/0x460
 	__slab_alloc+0x20/0x40
  	kmem_cache_alloc_trace+0x150/0x190
	snd_seq_create_port+0x94/0x9b0 [snd_seq]
	snd_seq_ioctl_create_port+0xd1/0x630 [snd_seq]
 	snd_seq_do_ioctl+0x11c/0x190 [snd_seq]
 	snd_seq_ioctl+0x40/0x80 [snd_seq]
 	do_vfs_ioctl+0x54b/0xda0
 	SyS_ioctl+0x79/0x90
 	entry_SYSCALL_64_fastpath+0x16/0x75
 INFO: Freed in port_delete+0x136/0x1a0 [snd_seq] age=1 cpu=2 pid=4717
 	__slab_free+0x204/0x310
 	kfree+0x15f/0x180
 	port_delete+0x136/0x1a0 [snd_seq]
 	snd_seq_delete_port+0x235/0x350 [snd_seq]
 	snd_seq_ioctl_delete_port+0xc8/0x180 [snd_seq]
 	snd_seq_do_ioctl+0x11c/0x190 [snd_seq]
 	snd_seq_ioctl+0x40/0x80 [snd_seq]
 	do_vfs_ioctl+0x54b/0xda0
 	SyS_ioctl+0x79/0x90
 	entry_SYSCALL_64_fastpath+0x16/0x75
 Call Trace:
  [<
ffffffff81b03781>] dump_stack+0x63/0x82
  [<
ffffffff81531b3b>] print_trailer+0xfb/0x160
  [<
ffffffff81536db4>] object_err+0x34/0x40
  [<
ffffffff815392d3>] kasan_report.part.2+0x223/0x520
  [<
ffffffffa07aadf4>] ? snd_seq_ioctl_create_port+0x504/0x630 [snd_seq]
  [<
ffffffff815395fe>] __asan_report_load1_noabort+0x2e/0x30
  [<
ffffffffa07aadf4>] snd_seq_ioctl_create_port+0x504/0x630 [snd_seq]
  [<
ffffffffa07aa8f0>] ? snd_seq_ioctl_delete_port+0x180/0x180 [snd_seq]
  [<
ffffffff8136be50>] ? taskstats_exit+0xbc0/0xbc0
  [<
ffffffffa07abc5c>] snd_seq_do_ioctl+0x11c/0x190 [snd_seq]
  [<
ffffffffa07abd10>] snd_seq_ioctl+0x40/0x80 [snd_seq]
  [<
ffffffff8136d433>] ? acct_account_cputime+0x63/0x80
  [<
ffffffff815b515b>] do_vfs_ioctl+0x54b/0xda0
  .....
We may fix this in a few different ways, and in this patch, it's fixed
simply by taking the refcount properly at snd_seq_create_port() and
letting the caller unref the object after use.  Also, there is another
potential use-after-free by sprintf() call in snd_seq_create_port(),
and this is moved inside the lock.
This fix covers CVE-2017-15265.
Reported-and-tested-by: Michael23 Yu <ycqzsy@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Johannes Berg [Tue, 5 Sep 2017 12:54:54 +0000 (14:54 +0200)]
 
mac80211: accept key reinstall without changing anything
commit 
fdf7cb4185b60c68e1a75e61691c4afdc15dea0e upstream.
When a key is reinstalled we can reset the replay counters
etc. which can lead to nonce reuse and/or replay detection
being impossible, breaking security properties, as described
in the "KRACK attacks".
In particular, CVE-2017-13080 applies to GTK rekeying that
happened in firmware while the host is in D3, with the second
part of the attack being done after the host wakes up. In
this case, the wpa_supplicant mitigation isn't sufficient
since wpa_supplicant doesn't know the GTK material.
In case this happens, simply silently accept the new key
coming from userspace but don't take any action on it since
it's the same key; this keeps the PN replay counters intact.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2:
 - Use __ieee80211_key_free() instead of ieee80211_key_free_unused()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Vitaly Mayatskikh [Fri, 22 Sep 2017 05:18:39 +0000 (01:18 -0400)]
 
fix unbalanced page refcounting in bio_map_user_iov
commit 
95d78c28b5a85bacbc29b8dba7c04babb9b0d467 upstream.
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ronnie Sahlberg [Mon, 30 Oct 2017 02:28:03 +0000 (13:28 +1100)]
 
cifs: check MaxPathNameComponentLength != 0 before using it
commit 
f74bc7c6679200a4a83156bb89cbf6c229fe8ec0 upstream.
And fix tcon leak in error path.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: David Disseldorp <ddiss@samba.org>
[bwh: Backported to 3.2: cifs_tcon pointer is tcon, and there's no leak to fix]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Oleg Nesterov [Fri, 1 Sep 2017 16:55:33 +0000 (18:55 +0200)]
 
epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
commit 
138e4ad67afd5c6c318b056b4d17c17f2c0ca5c0 upstream.
The race was introduced by me in commit 
971316f0503a ("epoll:
ep_unregister_pollwait() can use the freed pwq->whead").  I did not
realize that nothing can protect eventpoll after ep_poll_callback() sets
->whead = NULL, only whead->lock can save us from the race with
ep_free() or ep_remove().
Move ->whead = NULL to the end of ep_poll_callback() and add the
necessary barriers.
TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
before this patch.
Hopefully this explains use-after-free reported by syzcaller:
	BUG: KASAN: use-after-free in debug_spin_lock_before
	...
	 _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
	 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148
this is spin_lock(eventpoll->lock),
	...
	Freed by task 17774:
	...
	 kfree+0xe8/0x2c0 mm/slub.c:3883
	 ep_free+0x22c/0x2a0 fs/eventpoll.c:865
Fixes: 
971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
Reported-by: 范龙飞 <long7573@126.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Use smp_mb() and ACCESS_ONCE() instead of smp_{load_acquire,store_release}()
 - EPOLLEXCLUSIVE is not supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cong Wang [Thu, 31 Aug 2017 14:47:43 +0000 (16:47 +0200)]
 
wl1251: add a missing spin_lock_init()
commit 
f581a0dd744fe32b0a8805e279c59ec1ac676d60 upstream.
wl1251: add a missing spin_lock_init()
This fixes the following kernel warning:
 [ 5668.771453] BUG: spinlock bad magic on CPU#0, kworker/u2:3/9745
 [ 5668.771850]  lock: 0xce63ef20, .magic: 
00000000, .owner: <none>/-1,
 .owner_cpu: 0
 [ 5668.772277] CPU: 0 PID: 9745 Comm: kworker/u2:3 Tainted: G        W
 4.12.0-03002-gec979a4-dirty #40
 [ 5668.772796] Hardware name: Nokia RX-51 board
 [ 5668.773071] Workqueue: phy1 wl1251_irq_work
 [ 5668.773345] [<
c010c9e4>] (unwind_backtrace) from [<
c010a274>]
 (show_stack+0x10/0x14)
 [ 5668.773803] [<
c010a274>] (show_stack) from [<
c01545a4>]
 (do_raw_spin_lock+0x6c/0xa0)
 [ 5668.774230] [<
c01545a4>] (do_raw_spin_lock) from [<
c06ca578>]
 (_raw_spin_lock_irqsave+0x10/0x18)
 [ 5668.774658] [<
c06ca578>] (_raw_spin_lock_irqsave) from [<
c048c010>]
 (wl1251_op_tx+0x38/0x5c)
 [ 5668.775115] [<
c048c010>] (wl1251_op_tx) from [<
c06a12e8>]
 (ieee80211_tx_frags+0x188/0x1c0)
 [ 5668.775543] [<
c06a12e8>] (ieee80211_tx_frags) from [<
c06a138c>]
 (__ieee80211_tx+0x6c/0x130)
 [ 5668.775970] [<
c06a138c>] (__ieee80211_tx) from [<
c06a3dbc>]
 (ieee80211_tx+0xdc/0x104)
 [ 5668.776367] [<
c06a3dbc>] (ieee80211_tx) from [<
c06a4af0>]
 (__ieee80211_subif_start_xmit+0x454/0x8c8)
 [ 5668.776824] [<
c06a4af0>] (__ieee80211_subif_start_xmit) from
 [<
c06a4f94>] (ieee80211_subif_start_xmit+0x30/0x2fc)
 [ 5668.777343] [<
c06a4f94>] (ieee80211_subif_start_xmit) from
 [<
c0578848>] (dev_hard_start_xmit+0x80/0x118)
...
    by adding the missing spin_lock_init().
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Nikolay Aleksandrov [Wed, 30 Aug 2017 09:49:05 +0000 (12:49 +0300)]
 
sch_tbf: fix two null pointer dereferences on init failure
commit 
c2d6511e6a4f1f3673d711569c00c3849549e9b0 upstream.
sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy
callbacks but it may fail before the timer is initialized due to missing
options (either not supplied by user-space or set as a default qdisc),
also q->qdisc is used by ->reset and ->destroy so we need it initialized.
Reproduce:
$ sysctl net.core.default_qdisc=tbf
$ ip l set ethX up
Crash log:
[  959.160172] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000018
[  959.160323] IP: qdisc_reset+0xa/0x5c
[  959.160400] PGD 
59cdb067
[  959.160401] P4D 
59cdb067
[  959.160466] PUD 
59ccb067
[  959.160532] PMD 0
[  959.160597]
[  959.160706] Oops: 0000 [#1] SMP
[  959.160778] Modules linked in: sch_tbf sch_sfb sch_prio sch_netem
[  959.160891] CPU: 2 PID: 1562 Comm: ip Not tainted 4.13.0-rc6+ #62
[  959.160998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[  959.161157] task: 
ffff880059c9a700 task.stack: 
ffff8800376d0000
[  959.161263] RIP: 0010:qdisc_reset+0xa/0x5c
[  959.161347] RSP: 0018:
ffff8800376d3610 EFLAGS: 
00010286
[  959.161531] RAX: 
ffffffffa001b1dd RBX: 
ffff8800373a2800 RCX: 
0000000000000000
[  959.161733] RDX: 
ffffffff8215f160 RSI: 
ffffffff8215f160 RDI: 
0000000000000000
[  959.161939] RBP: 
ffff8800376d3618 R08: 
00000000014080c0 R09: 
00000000ffffffff
[  959.162141] R10: 
ffff8800376d3578 R11: 
0000000000000020 R12: 
ffffffffa001d2c0
[  959.162343] R13: 
ffff880037538000 R14: 
00000000ffffffff R15: 
0000000000000001
[  959.162546] FS:  
00007fcc5126b740(0000) GS:
ffff88005d900000(0000) knlGS:
0000000000000000
[  959.162844] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[  959.163030] CR2: 
0000000000000018 CR3: 
000000005abc4000 CR4: 
00000000000406e0
[  959.163233] DR0: 
0000000000000000 DR1: 
0000000000000000 DR2: 
0000000000000000
[  959.163436] DR3: 
0000000000000000 DR6: 
00000000fffe0ff0 DR7: 
0000000000000400
[  959.163638] Call Trace:
[  959.163788]  tbf_reset+0x19/0x64 [sch_tbf]
[  959.163957]  qdisc_destroy+0x8b/0xe5
[  959.164119]  qdisc_create_dflt+0x86/0x94
[  959.164284]  ? dev_activate+0x129/0x129
[  959.164449]  attach_one_default_qdisc+0x36/0x63
[  959.164623]  netdev_for_each_tx_queue+0x3d/0x48
[  959.164795]  dev_activate+0x4b/0x129
[  959.164957]  __dev_open+0xe7/0x104
[  959.165118]  __dev_change_flags+0xc6/0x15c
[  959.165287]  dev_change_flags+0x25/0x59
[  959.165451]  do_setlink+0x30c/0xb3f
[  959.165613]  ? check_chain_key+0xb0/0xfd
[  959.165782]  rtnl_newlink+0x3a4/0x729
[  959.165947]  ? rtnl_newlink+0x117/0x729
[  959.166121]  ? ns_capable_common+0xd/0xb1
[  959.166288]  ? ns_capable+0x13/0x15
[  959.166450]  rtnetlink_rcv_msg+0x188/0x197
[  959.166617]  ? rcu_read_unlock+0x3e/0x5f
[  959.166783]  ? rtnl_newlink+0x729/0x729
[  959.166948]  netlink_rcv_skb+0x6c/0xce
[  959.167113]  rtnetlink_rcv+0x23/0x2a
[  959.167273]  netlink_unicast+0x103/0x181
[  959.167439]  netlink_sendmsg+0x326/0x337
[  959.167607]  sock_sendmsg_nosec+0x14/0x3f
[  959.167772]  sock_sendmsg+0x29/0x2e
[  959.167932]  ___sys_sendmsg+0x209/0x28b
[  959.168098]  ? do_raw_spin_unlock+0xcd/0xf8
[  959.168267]  ? _raw_spin_unlock+0x27/0x31
[  959.168432]  ? __handle_mm_fault+0x651/0xdb1
[  959.168602]  ? check_chain_key+0xb0/0xfd
[  959.168773]  __sys_sendmsg+0x45/0x63
[  959.168934]  ? __sys_sendmsg+0x45/0x63
[  959.169100]  SyS_sendmsg+0x19/0x1b
[  959.169260]  entry_SYSCALL_64_fastpath+0x23/0xc2
[  959.169432] RIP: 0033:0x7fcc5097e690
[  959.169592] RSP: 002b:
00007ffd0d5c7b48 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002e
[  959.169887] RAX: 
ffffffffffffffda RBX: 
ffffffff810d278c RCX: 
00007fcc5097e690
[  959.170089] RDX: 
0000000000000000 RSI: 
00007ffd0d5c7b90 RDI: 
0000000000000003
[  959.170292] RBP: 
ffff8800376d3f98 R08: 
0000000000000001 R09: 
0000000000000003
[  959.170494] R10: 
00007ffd0d5c7910 R11: 
0000000000000246 R12: 
0000000000000006
[  959.170697] R13: 
000000000066f1a0 R14: 
00007ffd0d5cfc40 R15: 
0000000000000000
[  959.170900]  ? trace_hardirqs_off_caller+0xa7/0xcf
[  959.171076] Code: 00 41 c7 84 24 14 01 00 00 00 00 00 00 41 c7 84 24
98 00 00 00 00 00 00 00 41 5c 41 5d 41 5e 5d c3 66 66 66 66 90 55 48 89
e5 53 <48> 8b 47 18 48 89 fb 48 8b 40 48 48 85 c0 74 02 ff d0 48 8b bb
[  959.171637] RIP: qdisc_reset+0xa/0x5c RSP: 
ffff8800376d3610
[  959.171821] CR2: 
0000000000000018
Fixes: 
87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 
0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Nikolay Aleksandrov [Wed, 30 Aug 2017 09:49:03 +0000 (12:49 +0300)]
 
sch_netem: avoid null pointer deref on init failure
commit 
634576a1844dba15bc5e6fc61d72f37e13a21615 upstream.
netem can fail in ->init due to missing options (either not supplied by
user-space or used as a default qdisc) causing a timer->base null
pointer deref in its ->destroy() and ->reset() callbacks.
Reproduce:
$ sysctl net.core.default_qdisc=netem
$ ip l set ethX up
Crash log:
[ 1814.846943] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1814.847181] IP: hrtimer_active+0x17/0x8a
[ 1814.847270] PGD 
59c34067
[ 1814.847271] P4D 
59c34067
[ 1814.847337] PUD 
37374067
[ 1814.847403] PMD 0
[ 1814.847468]
[ 1814.847582] Oops: 0000 [#1] SMP
[ 1814.847655] Modules linked in: sch_netem(O) sch_fq_codel(O)
[ 1814.847761] CPU: 3 PID: 1573 Comm: ip Tainted: G           O 4.13.0-rc6+ #62
[ 1814.847884] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1814.848043] task: 
ffff88003723a700 task.stack: 
ffff88005adc8000
[ 1814.848235] RIP: 0010:hrtimer_active+0x17/0x8a
[ 1814.848407] RSP: 0018:
ffff88005adcb590 EFLAGS: 
00010246
[ 1814.848590] RAX: 
0000000000000000 RBX: 
ffff880058e359d8 RCX: 
0000000000000000
[ 1814.848793] RDX: 
0000000000000000 RSI: 
0000000000000000 RDI: 
ffff880058e359d8
[ 1814.848998] RBP: 
ffff88005adcb5b0 R08: 
00000000014080c0 R09: 
00000000ffffffff
[ 1814.849204] R10: 
ffff88005adcb660 R11: 
0000000000000020 R12: 
0000000000000000
[ 1814.849410] R13: 
ffff880058e359d8 R14: 
00000000ffffffff R15: 
0000000000000001
[ 1814.849616] FS:  
00007f733bbca740(0000) GS:
ffff88005d980000(0000) knlGS:
0000000000000000
[ 1814.849919] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[ 1814.850107] CR2: 
0000000000000000 CR3: 
0000000059f0d000 CR4: 
00000000000406e0
[ 1814.850313] DR0: 
0000000000000000 DR1: 
0000000000000000 DR2: 
0000000000000000
[ 1814.850518] DR3: 
0000000000000000 DR6: 
00000000fffe0ff0 DR7: 
0000000000000400
[ 1814.850723] Call Trace:
[ 1814.850875]  hrtimer_try_to_cancel+0x1a/0x93
[ 1814.851047]  hrtimer_cancel+0x15/0x20
[ 1814.851211]  qdisc_watchdog_cancel+0x12/0x14
[ 1814.851383]  netem_reset+0xe6/0xed [sch_netem]
[ 1814.851561]  qdisc_destroy+0x8b/0xe5
[ 1814.851723]  qdisc_create_dflt+0x86/0x94
[ 1814.851890]  ? dev_activate+0x129/0x129
[ 1814.852057]  attach_one_default_qdisc+0x36/0x63
[ 1814.852232]  netdev_for_each_tx_queue+0x3d/0x48
[ 1814.852406]  dev_activate+0x4b/0x129
[ 1814.852569]  __dev_open+0xe7/0x104
[ 1814.852730]  __dev_change_flags+0xc6/0x15c
[ 1814.852899]  dev_change_flags+0x25/0x59
[ 1814.853064]  do_setlink+0x30c/0xb3f
[ 1814.853228]  ? check_chain_key+0xb0/0xfd
[ 1814.853396]  ? check_chain_key+0xb0/0xfd
[ 1814.853565]  rtnl_newlink+0x3a4/0x729
[ 1814.853728]  ? rtnl_newlink+0x117/0x729
[ 1814.853905]  ? ns_capable_common+0xd/0xb1
[ 1814.854072]  ? ns_capable+0x13/0x15
[ 1814.854234]  rtnetlink_rcv_msg+0x188/0x197
[ 1814.854404]  ? rcu_read_unlock+0x3e/0x5f
[ 1814.854572]  ? rtnl_newlink+0x729/0x729
[ 1814.854737]  netlink_rcv_skb+0x6c/0xce
[ 1814.854902]  rtnetlink_rcv+0x23/0x2a
[ 1814.855064]  netlink_unicast+0x103/0x181
[ 1814.855230]  netlink_sendmsg+0x326/0x337
[ 1814.855398]  sock_sendmsg_nosec+0x14/0x3f
[ 1814.855584]  sock_sendmsg+0x29/0x2e
[ 1814.855747]  ___sys_sendmsg+0x209/0x28b
[ 1814.855912]  ? do_raw_spin_unlock+0xcd/0xf8
[ 1814.856082]  ? _raw_spin_unlock+0x27/0x31
[ 1814.856251]  ? __handle_mm_fault+0x651/0xdb1
[ 1814.856421]  ? check_chain_key+0xb0/0xfd
[ 1814.856592]  __sys_sendmsg+0x45/0x63
[ 1814.856755]  ? __sys_sendmsg+0x45/0x63
[ 1814.856923]  SyS_sendmsg+0x19/0x1b
[ 1814.857083]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 1814.857256] RIP: 0033:0x7f733b2dd690
[ 1814.857419] RSP: 002b:
00007ffe1d3387d8 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002e
[ 1814.858238] RAX: 
ffffffffffffffda RBX: 
ffffffff810d278c RCX: 
00007f733b2dd690
[ 1814.858445] RDX: 
0000000000000000 RSI: 
00007ffe1d338820 RDI: 
0000000000000003
[ 1814.858651] RBP: 
ffff88005adcbf98 R08: 
0000000000000001 R09: 
0000000000000003
[ 1814.858856] R10: 
00007ffe1d3385a0 R11: 
0000000000000246 R12: 
0000000000000002
[ 1814.859060] R13: 
000000000066f1a0 R14: 
00007ffe1d3408d0 R15: 
0000000000000000
[ 1814.859267]  ? trace_hardirqs_off_caller+0xa7/0xcf
[ 1814.859446] Code: 10 55 48 89 c7 48 89 e5 e8 45 a1 fb ff 31 c0 5d c3
31 c0 c3 66 66 66 66 90 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 8b
45 30 <4c> 8b 20 41 8b 5c 24 38 31 c9 31 d2 48 c7 c7 50 8e 1d 82 41 89
[ 1814.860022] RIP: hrtimer_active+0x17/0x8a RSP: 
ffff88005adcb590
[ 1814.860214] CR2: 
0000000000000000
Fixes: 
87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 
0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Nikolay Aleksandrov [Wed, 30 Aug 2017 09:49:01 +0000 (12:49 +0300)]
 
sch_cbq: fix null pointer dereferences on init failure
commit 
3501d059921246ff617b43e86250a719c140bd97 upstream.
CBQ can fail on ->init by wrong nl attributes or simply for missing any,
f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL
when it is activated. The first thing init does is parse opt but it will
dereference a null pointer if used as a default qdisc, also since init
failure at default qdisc invokes ->reset() which cancels all timers then
we'll also dereference two more null pointers (timer->base) as they were
never initialized.
To reproduce:
$ sysctl net.core.default_qdisc=cbq
$ ip l set ethX up
Crash log of the first null ptr deref:
[44727.907454] BUG: unable to handle kernel NULL pointer dereference at (null)
[44727.907600] IP: cbq_init+0x27/0x205
[44727.907676] PGD 
59ff4067
[44727.907677] P4D 
59ff4067
[44727.907742] PUD 
59c70067
[44727.907807] PMD 0
[44727.907873]
[44727.907982] Oops: 0000 [#1] SMP
[44727.908054] Modules linked in:
[44727.908126] CPU: 1 PID: 21312 Comm: ip Not tainted 4.13.0-rc6+ #60
[44727.908235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[44727.908477] task: 
ffff88005ad42700 task.stack: 
ffff880037214000
[44727.908672] RIP: 0010:cbq_init+0x27/0x205
[44727.908838] RSP: 0018:
ffff8800372175f0 EFLAGS: 
00010286
[44727.909018] RAX: 
ffffffff816c3852 RBX: 
ffff880058c53800 RCX: 
0000000000000000
[44727.909222] RDX: 
0000000000000004 RSI: 
0000000000000000 RDI: 
ffff8800372175f8
[44727.909427] RBP: 
ffff880037217650 R08: 
ffffffff81b0f380 R09: 
0000000000000000
[44727.909631] R10: 
ffff880037217660 R11: 
0000000000000020 R12: 
ffffffff822a44c0
[44727.909835] R13: 
ffff880058b92000 R14: 
00000000ffffffff R15: 
0000000000000001
[44727.910040] FS:  
00007ff8bc583740(0000) GS:
ffff88005d880000(0000) knlGS:
0000000000000000
[44727.910339] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[44727.910525] CR2: 
0000000000000000 CR3: 
00000000371e5000 CR4: 
00000000000406e0
[44727.910731] DR0: 
0000000000000000 DR1: 
0000000000000000 DR2: 
0000000000000000
[44727.910936] DR3: 
0000000000000000 DR6: 
00000000fffe0ff0 DR7: 
0000000000000400
[44727.911141] Call Trace:
[44727.911291]  ? lockdep_init_map+0xb6/0x1ba
[44727.911461]  ? qdisc_alloc+0x14e/0x187
[44727.911626]  qdisc_create_dflt+0x7a/0x94
[44727.911794]  ? dev_activate+0x129/0x129
[44727.911959]  attach_one_default_qdisc+0x36/0x63
[44727.912132]  netdev_for_each_tx_queue+0x3d/0x48
[44727.912305]  dev_activate+0x4b/0x129
[44727.912468]  __dev_open+0xe7/0x104
[44727.912631]  __dev_change_flags+0xc6/0x15c
[44727.912799]  dev_change_flags+0x25/0x59
[44727.912966]  do_setlink+0x30c/0xb3f
[44727.913129]  ? check_chain_key+0xb0/0xfd
[44727.913294]  ? check_chain_key+0xb0/0xfd
[44727.913463]  rtnl_newlink+0x3a4/0x729
[44727.913626]  ? rtnl_newlink+0x117/0x729
[44727.913801]  ? ns_capable_common+0xd/0xb1
[44727.913968]  ? ns_capable+0x13/0x15
[44727.914131]  rtnetlink_rcv_msg+0x188/0x197
[44727.914300]  ? rcu_read_unlock+0x3e/0x5f
[44727.914465]  ? rtnl_newlink+0x729/0x729
[44727.914630]  netlink_rcv_skb+0x6c/0xce
[44727.914796]  rtnetlink_rcv+0x23/0x2a
[44727.914956]  netlink_unicast+0x103/0x181
[44727.915122]  netlink_sendmsg+0x326/0x337
[44727.915291]  sock_sendmsg_nosec+0x14/0x3f
[44727.915459]  sock_sendmsg+0x29/0x2e
[44727.915619]  ___sys_sendmsg+0x209/0x28b
[44727.915784]  ? do_raw_spin_unlock+0xcd/0xf8
[44727.915954]  ? _raw_spin_unlock+0x27/0x31
[44727.916121]  ? __handle_mm_fault+0x651/0xdb1
[44727.916290]  ? check_chain_key+0xb0/0xfd
[44727.916461]  __sys_sendmsg+0x45/0x63
[44727.916626]  ? __sys_sendmsg+0x45/0x63
[44727.916792]  SyS_sendmsg+0x19/0x1b
[44727.916950]  entry_SYSCALL_64_fastpath+0x23/0xc2
[44727.917125] RIP: 0033:0x7ff8bbc96690
[44727.917286] RSP: 002b:
00007ffc360991e8 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002e
[44727.917579] RAX: 
ffffffffffffffda RBX: 
ffffffff810d278c RCX: 
00007ff8bbc96690
[44727.917783] RDX: 
0000000000000000 RSI: 
00007ffc36099230 RDI: 
0000000000000003
[44727.917987] RBP: 
ffff880037217f98 R08: 
0000000000000001 R09: 
0000000000000003
[44727.918190] R10: 
00007ffc36098fb0 R11: 
0000000000000246 R12: 
0000000000000006
[44727.918393] R13: 
000000000066f1a0 R14: 
00007ffc360a12e0 R15: 
0000000000000000
[44727.918597]  ? trace_hardirqs_off_caller+0xa7/0xcf
[44727.918774] Code: 41 5f 5d c3 66 66 66 66 90 55 48 8d 56 04 45 31 c9
49 c7 c0 80 f3 b0 81 48 89 e5 41 55 41 54 53 48 89 fb 48 8d 7d a8 48 83
ec 48 <0f> b7 0e be 07 00 00 00 83 e9 04 e8 e6 f7 d8 ff 85 c0 0f 88 bb
[44727.919332] RIP: cbq_init+0x27/0x205 RSP: 
ffff8800372175f0
[44727.919516] CR2: 
0000000000000000
Fixes: 
0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Keep using HRTIMER_MODE_ABS
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Nikolay Aleksandrov [Wed, 30 Aug 2017 09:49:00 +0000 (12:49 +0300)]
 
sch_hfsc: fix null pointer deref and double free on init failure
commit 
3bdac362a2f89ed3e148fa6f38c5f5d858f50b1a upstream.
Depending on where ->init fails we can get a null pointer deref due to
uninitialized hires timer (watchdog) or a double free of the qdisc hash
because it is already freed by ->destroy().
Fixes: 
8d5537387505 ("net/sched/hfsc: allocate tcf block for hfsc root class")
Fixes: 
87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: sch_hfsc doesn't use a tcf block]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Nikolay Aleksandrov [Wed, 30 Aug 2017 09:48:58 +0000 (12:48 +0300)]
 
sch_multiq: fix double free on init failure
commit 
e89d469e3be3ed3d7124a803211a463ff83d0964 upstream.
The below commit added a call to ->destroy() on init failure, but multiq
still frees ->queues on error in init, but ->queues is also freed by
->destroy() thus we get double free and corrupted memory.
Very easy to reproduce (eth0 not multiqueue):
$ tc qdisc add dev eth0 root multiq
RTNETLINK answers: Operation not supported
$ ip l add dumdum type dummy
(crash)
Trace log:
[ 3929.467747] general protection fault: 0000 [#1] SMP
[ 3929.468083] Modules linked in:
[ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56
[ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 3929.469124] task: 
ffff88003716a700 task.stack: 
ffff88005872c000
[ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be
[ 3929.469746] RSP: 0018:
ffff88005872f6a0 EFLAGS: 
00010246
[ 3929.470042] RAX: 
00000000000002de RBX: 
0000000058a59000 RCX: 
00000000000002df
[ 3929.470406] RDX: 
0000000000000000 RSI: 
0000000000000000 RDI: 
ffffffff821f7020
[ 3929.470770] RBP: 
ffff88005872f6e8 R08: 
000000000001f010 R09: 
0000000000000000
[ 3929.471133] R10: 
ffff88005872f730 R11: 
0000000000008cdd R12: 
ff006d75646d7564
[ 3929.471496] R13: 
00000000014000c0 R14: 
ffff88005b403c00 R15: 
ffff88005b403c00
[ 3929.471869] FS:  
00007f0b70480740(0000) GS:
ffff88005d980000(0000) knlGS:
0000000000000000
[ 3929.472286] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[ 3929.472677] CR2: 
00007ffcee4f3000 CR3: 
0000000059d45000 CR4: 
00000000000406e0
[ 3929.473209] DR0: 
0000000000000000 DR1: 
0000000000000000 DR2: 
0000000000000000
[ 3929.474109] DR3: 
0000000000000000 DR6: 
00000000fffe0ff0 DR7: 
0000000000000400
[ 3929.474873] Call Trace:
[ 3929.475337]  ? kstrdup_const+0x23/0x25
[ 3929.475863]  kstrdup+0x2e/0x4b
[ 3929.476338]  kstrdup_const+0x23/0x25
[ 3929.478084]  __kernfs_new_node+0x28/0xbc
[ 3929.478478]  kernfs_new_node+0x35/0x55
[ 3929.478929]  kernfs_create_link+0x23/0x76
[ 3929.479478]  sysfs_do_create_link_sd.isra.2+0x85/0xd7
[ 3929.480096]  sysfs_create_link+0x33/0x35
[ 3929.480649]  device_add+0x200/0x589
[ 3929.481184]  netdev_register_kobject+0x7c/0x12f
[ 3929.481711]  register_netdevice+0x373/0x471
[ 3929.482174]  rtnl_newlink+0x614/0x729
[ 3929.482610]  ? rtnl_newlink+0x17f/0x729
[ 3929.483080]  rtnetlink_rcv_msg+0x188/0x197
[ 3929.483533]  ? rcu_read_unlock+0x3e/0x5f
[ 3929.483984]  ? rtnl_newlink+0x729/0x729
[ 3929.484420]  netlink_rcv_skb+0x6c/0xce
[ 3929.484858]  rtnetlink_rcv+0x23/0x2a
[ 3929.485291]  netlink_unicast+0x103/0x181
[ 3929.485735]  netlink_sendmsg+0x326/0x337
[ 3929.486181]  sock_sendmsg_nosec+0x14/0x3f
[ 3929.486614]  sock_sendmsg+0x29/0x2e
[ 3929.486973]  ___sys_sendmsg+0x209/0x28b
[ 3929.487340]  ? do_raw_spin_unlock+0xcd/0xf8
[ 3929.487719]  ? _raw_spin_unlock+0x27/0x31
[ 3929.488092]  ? __handle_mm_fault+0x651/0xdb1
[ 3929.488471]  ? check_chain_key+0xb0/0xfd
[ 3929.488847]  __sys_sendmsg+0x45/0x63
[ 3929.489206]  ? __sys_sendmsg+0x45/0x63
[ 3929.489576]  SyS_sendmsg+0x19/0x1b
[ 3929.489901]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 3929.490172] RIP: 0033:0x7f0b6fb93690
[ 3929.490423] RSP: 002b:
00007ffcee4ed588 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002e
[ 3929.490881] RAX: 
ffffffffffffffda RBX: 
ffffffff810d278c RCX: 
00007f0b6fb93690
[ 3929.491198] RDX: 
0000000000000000 RSI: 
00007ffcee4ed5d0 RDI: 
0000000000000003
[ 3929.491521] RBP: 
ffff88005872ff98 R08: 
0000000000000001 R09: 
0000000000000000
[ 3929.491801] R10: 
00007ffcee4ed350 R11: 
0000000000000246 R12: 
0000000000000002
[ 3929.492075] R13: 
000000000066f1a0 R14: 
00007ffcee4f5680 R15: 
0000000000000000
[ 3929.492352]  ? trace_hardirqs_off_caller+0xa7/0xcf
[ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44
89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d
8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01
[ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: 
ffff88005872f6a0
Fixes: 
87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 
f07d1501292b ("multiq: Further multiqueue cleanup")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: delete now-unused 'err' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Nikolay Aleksandrov [Wed, 30 Aug 2017 09:48:57 +0000 (12:48 +0300)]
 
sch_htb: fix crash on init failure
commit 
88c2ace69dbef696edba77712882af03879abc9c upstream.
The commit below added a call to the ->destroy() callback for all qdiscs
which failed in their ->init(), but some were not prepared for such
change and can't handle partially initialized qdisc. HTB is one of them
and if any error occurs before the qdisc watchdog timer and qdisc work are
initialized then we can hit either a null ptr deref (timer->base) when
canceling in ->destroy or lockdep error info about trying to register
a non-static key and a stack dump. So to fix these two move the watchdog
timer and workqueue init before anything that can err out.
To reproduce userspace needs to send broken htb qdisc create request,
tested with a modified tc (q_htb.c).
Trace log:
[ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2710.897977] IP: hrtimer_active+0x17/0x8a
[ 2710.898174] PGD 
58fab067
[ 2710.898175] P4D 
58fab067
[ 2710.898353] PUD 
586c0067
[ 2710.898531] PMD 0
[ 2710.898710]
[ 2710.899045] Oops: 0000 [#1] SMP
[ 2710.899232] Modules linked in:
[ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
[ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 2710.900035] task: 
ffff880059ed2700 task.stack: 
ffff88005ad4c000
[ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
[ 2710.900467] RSP: 0018:
ffff88005ad4f960 EFLAGS: 
00010246
[ 2710.900684] RAX: 
0000000000000000 RBX: 
ffff88003701e298 RCX: 
0000000000000000
[ 2710.900933] RDX: 
0000000000000000 RSI: 
0000000000000000 RDI: 
ffff88003701e298
[ 2710.901177] RBP: 
ffff88005ad4f980 R08: 
0000000000000001 R09: 
0000000000000001
[ 2710.901419] R10: 
ffff88005ad4f800 R11: 
0000000000000400 R12: 
0000000000000000
[ 2710.901663] R13: 
ffff88003701e298 R14: 
ffffffff822a4540 R15: 
ffff88005ad4fac0
[ 2710.901907] FS:  
00007f2f5e90f740(0000) GS:
ffff88005d880000(0000) knlGS:
0000000000000000
[ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[ 2710.902500] CR2: 
0000000000000000 CR3: 
0000000058ca3000 CR4: 
00000000000406e0
[ 2710.902744] DR0: 
0000000000000000 DR1: 
0000000000000000 DR2: 
0000000000000000
[ 2710.902977] DR3: 
0000000000000000 DR6: 
00000000fffe0ff0 DR7: 
0000000000000400
[ 2710.903180] Call Trace:
[ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
[ 2710.903504]  hrtimer_cancel+0x15/0x20
[ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
[ 2710.903866]  htb_destroy+0x2e/0xf7
[ 2710.904097]  qdisc_create+0x377/0x3fd
[ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
[ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
[ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
[ 2710.904849]  ? rtnl_newlink+0x729/0x729
[ 2710.905017]  netlink_rcv_skb+0x6c/0xce
[ 2710.905183]  rtnetlink_rcv+0x23/0x2a
[ 2710.905345]  netlink_unicast+0x103/0x181
[ 2710.905511]  netlink_sendmsg+0x326/0x337
[ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
[ 2710.905847]  sock_sendmsg+0x29/0x2e
[ 2710.906010]  ___sys_sendmsg+0x209/0x28b
[ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
[ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
[ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
[ 2710.906685]  ? check_chain_key+0xb0/0xfd
[ 2710.906855]  __sys_sendmsg+0x45/0x63
[ 2710.907018]  ? __sys_sendmsg+0x45/0x63
[ 2710.907185]  SyS_sendmsg+0x19/0x1b
[ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2
Note that probably this bug goes further back because the default qdisc
handling always calls ->destroy on init failure too.
Fixes: 
87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Fixes: 
0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Eric Dumazet [Fri, 10 Feb 2017 18:31:49 +0000 (10:31 -0800)]
 
net_sched: fix error recovery at qdisc creation
commit 
87b60cfacf9f17cf71933c6e33b66e68160af71d upstream.
Dmitry reported uses after free in qdisc code [1]
The problem here is that ops->init() can return an error.
qdisc_create_dflt() then call ops->destroy(),
while qdisc_create() does _not_ call it.
Four qdisc chose to call their own ops->destroy(), assuming their caller
would not.
This patch makes sure qdisc_create() calls ops->destroy()
and fixes the four qdisc to avoid double free.
[1]
BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr 
ffff8801d415d440
Read of size 8 by task syz-executor2/5030
CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 
0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400
 ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898
 ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0
Call Trace:
 [<
ffffffff81bbbed4>] __dump_stack lib/dump_stack.c:15 [inline]
 [<
ffffffff81bbbed4>] dump_stack+0x6c/0x98 lib/dump_stack.c:51
 [<
ffffffff816682b1>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [<
ffffffff81668524>] print_address_description mm/kasan/report.c:196 [inline]
 [<
ffffffff81668524>] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
 [<
ffffffff81668953>] kasan_report mm/kasan/report.c:305 [inline]
 [<
ffffffff81668953>] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326
 [<
ffffffff82527b02>] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
 [<
ffffffff82524bdd>] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
 [<
ffffffff82524e30>] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
 [<
ffffffff8252550d>] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline]
 [<
ffffffff8252550d>] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
 [<
ffffffff824b1db1>] __dev_open+0x221/0x320 net/core/dev.c:1403
 [<
ffffffff824b24ce>] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
 [<
ffffffff824b27de>] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
 [<
ffffffff824f5bf6>] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
 [<
ffffffff824f61fa>] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
 [<
ffffffff82430509>] sock_do_ioctl+0x99/0xb0 net/socket.c:879
 [<
ffffffff82430d30>] sock_ioctl+0x2a0/0x390 net/socket.c:958
 [<
ffffffff816f3b68>] vfs_ioctl fs/ioctl.c:44 [inline]
 [<
ffffffff816f3b68>] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
 [<
ffffffff816f41a4>] SYSC_ioctl fs/ioctl.c:626 [inline]
 [<
ffffffff816f41a4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
 [<
ffffffff8123e357>] entry_SYSCALL_64_fastpath+0x12/0x17
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop changes to sch_hhf (doesn't exist) and sch_sfq (doesn't have this bug)
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Steve French [Sun, 27 Aug 2017 21:56:08 +0000 (16:56 -0500)]
 
CIFS: remove endian related sparse warning
commit 
6e3c1529c39e92ed64ca41d53abadabbaa1d5393 upstream.
Recent patch had an endian warning ie
cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ben Hutchings [Thu, 1 Oct 2015 00:35:55 +0000 (01:35 +0100)]
 
alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
commit 
cec80d82142ab25c71eee24b529cfeaf17c43062 upstream.
This fixes compiler errors in perf such as:
tests/attr.c: In function 'store_event':
tests/attr.c:66:27: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64 {aka long unsigned int}' [-Werror=format=]
  snprintf(path, PATH_MAX, "%s/event-%d-%llu-%d", dir,
                           ^
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Tejun Heo [Mon, 28 Aug 2017 21:51:27 +0000 (14:51 -0700)]
 
cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
commit 
b339752d054fb32863418452dff350a1086885b1 upstream.
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node.  The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct.  However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.
This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.
This bug has been around forever but doesn't look like it has caused any
noticeable symptoms.  However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.
Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Wei Wang [Fri, 25 Aug 2017 22:03:10 +0000 (15:03 -0700)]
 
ipv6: fix sparse warning on rt6i_node
commit 
4e587ea71bf924f7dac621f1351653bd41e446cb upstream.
Commit 
c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
  net/ipv6/route.c:1394:30: error: incompatible types in comparison
  expression (different address spaces)
  ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
  expression (different address spaces)
This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.
Fixes: 
c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - fib6_add_rt2node() has only one assignment to update
 - Drop changes in rt6_cache_allowed_for_pmtu()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Fri, 25 Aug 2017 14:51:46 +0000 (16:51 +0200)]
 
l2tp: hold tunnel used while creating sessions with netlink
commit 
e702c1204eb57788ef189c839c8c779368267d70 upstream.
Use l2tp_tunnel_get() to retrieve tunnel, so that it can't go away on
us. Otherwise l2tp_tunnel_destruct() might release the last reference
count concurrently, thus freeing the tunnel while we're using it.
Fixes: 
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Tue, 11 Apr 2017 11:12:13 +0000 (13:12 +0200)]
 
l2tp: remove useless duplicate session detection in l2tp_netlink
commit 
af87ae465abdc070de0dc35d6c6a9e7a8cd82987 upstream.
There's no point in checking for duplicate sessions at the beginning of
l2tp_nl_cmd_session_create(); the ->session_create() callbacks already
return -EEXIST when the session already exists.
Furthermore, even if l2tp_session_find() returns NULL, a new session
might be created right after the test. So relying on ->session_create()
to avoid duplicate session is the only sane behaviour.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: also delete the now-unused local variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Fri, 25 Aug 2017 14:51:43 +0000 (16:51 +0200)]
 
l2tp: hold tunnel while handling genl TUNNEL_GET commands
commit 
4e4b21da3acc68a7ea55f850cacc13706b7480e9 upstream.
Use l2tp_tunnel_get() instead of l2tp_tunnel_find() so that we get
a reference on the tunnel, preventing l2tp_tunnel_destruct() from
freeing it from under us.
Also move l2tp_tunnel_get() below nlmsg_new() so that we only take
the reference when needed.
Fixes: 
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Fri, 25 Aug 2017 14:51:42 +0000 (16:51 +0200)]
 
l2tp: hold tunnel while handling genl tunnel updates
commit 
8c0e421525c9eb50d68e8f633f703ca31680b746 upstream.
We need to make sure the tunnel is not going to be destroyed by
l2tp_tunnel_destruct() concurrently.
Fixes: 
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Fri, 25 Aug 2017 14:51:42 +0000 (16:51 +0200)]
 
l2tp: hold tunnel while processing genl delete command
commit 
bb0a32ce4389e17e47e198d2cddaf141561581ad upstream.
l2tp_nl_cmd_tunnel_delete() needs to take a reference on the tunnel, to
prevent it from being concurrently freed by l2tp_tunnel_destruct().
Fixes: 
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Fri, 25 Aug 2017 14:51:40 +0000 (16:51 +0200)]
 
l2tp: hold tunnel while looking up sessions in l2tp_netlink
commit 
54652eb12c1b72e9602d09cb2821d5760939190f upstream.
l2tp_tunnel_find() doesn't take a reference on the returned tunnel.
Therefore, it's unsafe to use it because the returned tunnel can go
away on us anytime.
Fix this by defining l2tp_tunnel_get(), which works like
l2tp_tunnel_find(), but takes a reference on the returned tunnel.
Caller then has to drop this reference using l2tp_tunnel_dec_refcount().
As l2tp_tunnel_dec_refcount() needs to be moved to l2tp_core.h, let's
simplify the patch and not move the L2TP_REFCNT_DEBUG part. This code
has been broken (not even compiling) in May 2012 by
commit 
a4ca44fa578c ("net: l2tp: Standardize logging styles")
and fixed more than two years later by
commit 
29abe2fda54f ("l2tp: fix missing line continuation"). So it
doesn't appear to be used by anyone.
Same thing for l2tp_tunnel_free(); instead of moving it to l2tp_core.h,
let's just simplify things and call kfree_rcu() directly in
l2tp_tunnel_dec_refcount(). Extra assertions and debugging code
provided by l2tp_tunnel_free() didn't help catching any of the
reference counting and socket handling issues found while working on
this series.
Fixes: 
309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: l2tp_tunnel_free() does more than just kfree_rcu(), so
 don't remove it]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Wed, 12 Apr 2017 08:05:29 +0000 (10:05 +0200)]
 
l2tp: define parameters of l2tp_session_get*() as "const"
commit 
9aaef50c44f132e040dcd7686c8e78a3390037c5 upstream.
Make l2tp_pernet()'s parameter constant, so that l2tp_session_get*() can
declare their "net" variable as "const".
Also constify "ifname" in l2tp_session_get_by_ifname().
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Guillaume Nault [Fri, 25 Aug 2017 14:22:17 +0000 (16:22 +0200)]
 
l2tp: initialise session's refcount before making it reachable
commit 
9ee369a405c57613d7c83a3967780c3e30c52ecc upstream.
Sessions must be fully initialised before calling
l2tp_session_add_to_tunnel(). Otherwise, there's a short time frame
where partially initialised sessions can be accessed by external users.
Fixes: 
dbdbc73b4478 ("l2tp: fix duplicate session creation")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: keep using l2tp_session_inc_refcount()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Bart Van Assche [Wed, 9 Aug 2017 18:32:11 +0000 (11:32 -0700)]
 
dm: fix printk() rate limiting code
commit 
604407890ecf624c2fb41013c82b22aade59b455 upstream.
Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.
Fixes: 
71a16736a15e ("dm: use local printk ratelimit")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Joe Perches [Thu, 20 Apr 2017 17:46:07 +0000 (10:46 -0700)]
 
dm: convert DM printk macros to pr_<level> macros
commit 
d2c3c8dcb5987b8352e82089c79a41b6e17e28d2 upstream.
Using pr_<level> is the more common logging style.
Standardize style and use new macro DM_FMT.
Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mathias Krause [Sat, 26 Aug 2017 15:09:00 +0000 (17:09 +0200)]
 
xfrm_user: fix info leak in build_aevent()
commit 
931e79d7a7ddee4709c56b39de169a36804589a1 upstream.
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the sa_id before filling it.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Fixes: 
d51d081d6504 ("[IPSEC]: Sync series - user")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mathias Krause [Sat, 26 Aug 2017 15:08:58 +0000 (17:08 +0200)]
 
xfrm_user: fix info leak in xfrm_notify_sa()
commit 
50329c8a340c9dea60d837645fcf13fc36bfb84d upstream.
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the whole struct before filling
it.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 
0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Florian Fainelli [Fri, 25 Aug 2017 01:34:43 +0000 (18:34 -0700)]
 
r8169: Do not increment tx_dropped in TX ring cleaning
commit 
1089650d8837095f63e001bbf14d7b48043d67ad upstream.
rtl8169_tx_clear_range() is responsible for cleaning up the TX ring
during interface shutdown, incrementing tx_dropped for every SKB that we
left at the time in the ring is misleading.
Fixes: 
cac4b22f3d6a ("r8169: do not account fragments as packets")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Steffen Klassert [Fri, 25 Aug 2017 07:05:42 +0000 (09:05 +0200)]
 
ipv6: Fix may be used uninitialized warning in rt6_check
commit 
3614364527daa870264f6dde77f02853cdecd02c upstream.
rt_cookie might be used uninitialized, fix this by
initializing it.
Fixes: 
c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Wei Wang [Mon, 21 Aug 2017 16:47:10 +0000 (09:47 -0700)]
 
ipv6: add rcu grace period before freeing fib6_node
commit 
c5cff8561d2d0006e972bd114afd51f082fee77c upstream.
We currently keep rt->rt6i_node pointing to the fib6_node for the route.
And some functions make use of this pointer to dereference the fib6_node
from rt structure, e.g. rt6_check(). However, as there is neither
refcount nor rcu taken when dereferencing rt->rt6i_node, it could
potentially cause crashes as rt->rt6i_node could be set to NULL by other
CPUs when doing a route deletion.
This patch introduces an rcu grace period before freeing fib6_node and
makes sure the functions that dereference it takes rcu_read_lock().
Note: there is no "Fixes" tag because this bug was there in a very
early stage.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Martin KaFai Lau [Sat, 23 May 2015 03:56:01 +0000 (20:56 -0700)]
 
ipv6: Add rt6_get_cookie() function
commit 
b197df4f0f3782782e9ea8996e91b65ae33e8dd9 upstream.
Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie.  Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop changes in inet6_sk_rx_dst_set(), sctp_v6_get_dst()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Chen Yu [Fri, 25 Aug 2017 22:55:30 +0000 (15:55 -0700)]
 
PM/hibernate: touch NMI watchdog when creating snapshot
commit 
556b969a1cfe2686aae149137fa1dfcac0eefe54 upstream.
There is a problem that when counting the pages for creating the
hibernation snapshot will take significant amount of time, especially on
system with large memory.  Since the counting job is performed with irq
disabled, this might lead to NMI lockup.  The following warning were
found on a system with 1.5TB DRAM:
  Freezing user space processes ... (elapsed 0.002 seconds) done.
  OOM killer disabled.
  PM: Preallocating image memory...
  NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
  CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1
  task: 
ffff9f01971ac000 task.stack: 
ffffb1a3f325c000
  RIP: 0010:memory_bm_find_bit+0xf4/0x100
  Call Trace:
   swsusp_set_page_free+0x2b/0x30
   mark_free_pages+0x147/0x1c0
   count_data_pages+0x41/0xa0
   hibernate_preallocate_memory+0x80/0x450
   hibernation_snapshot+0x58/0x410
   hibernate+0x17c/0x310
   state_store+0xdf/0xf0
   kobj_attr_store+0xf/0x20
   sysfs_kf_write+0x37/0x40
   kernfs_fop_write+0x11c/0x1a0
   __vfs_write+0x37/0x170
   vfs_write+0xb1/0x1a0
   SyS_write+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xa5
  ...
  done (allocated 6590003 pages)
  PM: Allocated 
26360012 kbytes in 19.89 seconds (1325.28 MB/s)
It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup was
triggered.  In case the timeout of the NMI watch dog has been set to 1
second, a safe interval should be 6590003/20 = 320k pages in theory.
However there might also be some platforms running at a lower frequency,
so feed the watchdog every 100k pages.
[yu.c.chen@intel.com: simplification]
Link: http://lkml.kernel.org/r/1503460079-29721-1-git-send-email-yu.c.chen@intel.com
[yu.c.chen@intel.com: use interval of 128k instead of 100k to avoid modulus]
Link: http://lkml.kernel.org/r/1503328098-5120-1-git-send-email-yu.c.chen@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reported-by: Jan Filipcewicz <jan.filipcewicz@intel.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mark Rutland [Thu, 22 Jun 2017 14:41:38 +0000 (15:41 +0100)]
 
perf/core: Fix group {cpu,task} validation
commit 
64aee2a965cf2954a038b5522f11d2cd2f0f8f3e upstream.
Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.
Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.
For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.
This can be triggered with the below test case, resulting in warnings
from arch backends.
  #define _GNU_SOURCE
  #include <linux/hw_breakpoint.h>
  #include <linux/perf_event.h>
  #include <sched.h>
  #include <stdio.h>
  #include <sys/prctl.h>
  #include <sys/syscall.h>
  #include <unistd.h>
  static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
			   int group_fd, unsigned long flags)
  {
	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
  }
  char watched_char;
  struct perf_event_attr wp_attr = {
	.type = PERF_TYPE_BREAKPOINT,
	.bp_type = HW_BREAKPOINT_RW,
	.bp_addr = (unsigned long)&watched_char,
	.bp_len = 1,
	.size = sizeof(wp_attr),
  };
  int main(int argc, char *argv[])
  {
	int leader, ret;
	cpu_set_t cpus;
	/*
	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
	 */
	CPU_ZERO(&cpus);
	CPU_SET(0, &cpus);
	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
	if (ret) {
		printf("Unable to set cpu affinity\n");
		return 1;
	}
	/* open leader event, bound to this task, CPU0 only */
	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
	if (leader < 0) {
		printf("Couldn't open leader: %d\n", leader);
		return 1;
	}
	/*
	 * Open a follower event that is bound to the same task, but a
	 * different CPU. This means that the group should never be possible to
	 * schedule.
	 */
	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
	if (ret < 0) {
		printf("Couldn't open mismatched follower: %d\n", ret);
		return 1;
	} else {
		printf("Opened leader/follower with mismastched CPUs\n");
	}
	/*
	 * Open as many independent events as we can, all bound to the same
	 * task, CPU0 only.
	 */
	do {
		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
	} while (ret >= 0);
	/*
	 * Force enable/disble all events to trigger the erronoeous
	 * installation of the follower event.
	 */
	printf("Opened all events. Toggling..\n");
	for (;;) {
		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
	}
	return 0;
  }
Fix this by validating this requirement regardless of whether we're
moving events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Peter Zijlstra [Fri, 23 Jan 2015 10:19:48 +0000 (11:19 +0100)]
 
perf: Tighten (and fix) the grouping condition
commit 
c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream.
The fix from 
9fc81d87420d ("perf: Fix events installation during
moving group") was incomplete in that it failed to recognise that
creating a group with events for different CPUs is semantically
broken -- they cannot be co-scheduled.
Furthermore, it leads to real breakage where, when we create an event
for CPU Y and then migrate it to form a group on CPU X, the code gets
confused where the counter is programmed -- triggered in practice
as well by me via the perf fuzzer.
Fix this by tightening the rules for creating groups. Only allow
grouping of counters that can be co-scheduled in the same context.
This means for the same task and/or the same cpu.
Fixes: 
9fc81d87420d ("perf: Fix events installation during moving group")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Arnd Bergmann [Wed, 23 Aug 2017 13:59:49 +0000 (15:59 +0200)]
 
qlge: avoid memcpy buffer overflow
commit 
e58f95831e7468d25eb6e41f234842ecfe6f014f upstream.
gcc-8.0.0 (snapshot) points out that we copy a variable-length string
into a fixed length field using memcpy() with the destination length,
and that ends up copying whatever follows the string:
    inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
  memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
Changing it to use strncpy() will instead zero-pad the destination,
which seems to be the right thing to do here.
The bug is probably harmless, but it seems like a good idea to address
it in stable kernels as well, if only for the purpose of building with
gcc-8 without warnings.
Fixes: 
a61f80261306 ("qlge: Add ethtool register dump function.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ronnie Sahlberg [Wed, 23 Aug 2017 04:48:14 +0000 (14:48 +1000)]
 
cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
commit 
d3edede29f74d335f81d95a4588f5f136a9f7dcf upstream.
Add checking for the path component length and verify it is <= the maximum
that the server advertizes via FileFsAttributeInformation.
With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
when users to access an overlong path.
To test this, try to cd into a (non-existing) directory on a CIFS share
that has a too long name:
cd /mnt/
aaaaaaaaaaaaaaa...
and it now should show a good error message from the shell:
bash: cd: /mnt/
aaaaaaaaaaaaaaaa...aaaaaa: File name too long
rh bz 1153996
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: name checks are done only in cifs_lookup()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Takashi Iwai [Wed, 23 Aug 2017 07:30:17 +0000 (09:30 +0200)]
 
ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
commit 
bbba6f9d3da357bbabc6fda81e99ff5584500e76 upstream.
Lenovo G50-70 (17aa:3978) with Conexant codec chip requires the
similar workaround for the inverted stereo dmic like other Lenovo
models.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1020657
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Stefano Brivio [Fri, 18 Aug 2017 12:40:53 +0000 (14:40 +0200)]
 
ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
commit 
3de33e1ba0506723ab25734e098cf280ecc34756 upstream.
A packet length of exactly IPV6_MAXPLEN is allowed, we should
refuse parsing options only if the size is 64KiB or more.
While at it, remove one extra variable and one assignment which
were also introduced by the commit that introduced the size
check. Checking the sum 'offset + len' and only later adding
'len' to 'offset' doesn't provide any advantage over directly
summing to 'offset' and checking it.
Fixes: 
6399f1fae4ec ("ipv6: avoid overflow of offset in ip6_find_1stfragopt")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Takashi Iwai [Tue, 22 Aug 2017 06:15:13 +0000 (08:15 +0200)]
 
ALSA: core: Fix unexpected error at replacing user TLV
commit 
88c54cdf61f508ebcf8da2d819f5dfc03e954d1d upstream.
When user tries to replace the user-defined control TLV, the kernel
checks the change of its content via memcmp().  The problem is that
the kernel passes the return value from memcmp() as is.  memcmp()
gives a non-zero negative value depending on the comparison result,
and this shall be recognized as an error code.
The patch covers that corner-case, return 1 properly for the changed
TLV.
Fixes: 
8aa9b586e420 ("[ALSA] Control API - more robust TLV implementation")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Aaron Ma [Fri, 18 Aug 2017 19:17:21 +0000 (12:17 -0700)]
 
Input: trackpoint - add new trackpoint firmware ID
commit 
ec667683c532c93fb41e100e5d61a518971060e2 upstream.
Synaptics add new TP firmware ID: 0x2 and 0x3, for now both lower 2 bits
are indicated as TP. Change the constant to bitwise values.
This makes trackpoint to be recognized on Lenovo Carbon X1 Gen5 instead
of it being identified as "PS/2 Generic Mouse".
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
zhong jiang [Fri, 18 Aug 2017 22:16:24 +0000 (15:16 -0700)]
 
mm/mempolicy: fix use after free when calling get_mempolicy
commit 
73223e4e2e3867ebf033a5a8eb2e5df0158ccc99 upstream.
I hit a use after free issue when executing trinity and repoduced it
with KASAN enabled.  The related call trace is as follows.
  BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr 
ffff8801f582d766
  Read of size 2 by task syz-executor1/798
  INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
     __slab_alloc+0x768/0x970
     kmem_cache_alloc+0x2e7/0x450
     mpol_new.part.2+0x74/0x160
     mpol_new+0x66/0x80
     SyS_mbind+0x267/0x9f0
     system_call_fastpath+0x16/0x1b
  INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
     __slab_free+0x495/0x8e0
     kmem_cache_free+0x2f3/0x4c0
     __mpol_put+0x2b/0x40
     SyS_mbind+0x383/0x9f0
     system_call_fastpath+0x16/0x1b
  INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
  INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600
  Bytes b4 
ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
  Object 
ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  Object 
ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
  Redzone 
ffff8801f582d778: bb bb bb bb bb bb bb bb                          ........
  Padding 
ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
  Memory state around the buggy address:
  
ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
  
ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  >
ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
!shared memory policy is not protected against parallel removal by other
thread which is normally protected by the mmap_sem.  do_get_mempolicy,
however, drops the lock midway while we can still access it later.
Early premature up_read is a historical artifact from times when
put_user was called in this path see https://lwn.net/Articles/124754/
but that is gone since 
8bccd85ffbaf ("[PATCH] Implement sys_* do_*
layering in the memory policy layer.").  but when we have the the
current mempolicy ref count model.  The issue was introduced
accordingly.
Fix the issue by removing the premature release.
Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Takashi Iwai [Wed, 16 Aug 2017 12:18:37 +0000 (14:18 +0200)]
 
ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
commit 
0f174b3525a43bd51f9397394763925e0ebe7bc7 upstream.
C-Media devices (at least some models) mute the playback stream when
volumes are set to the minimum value.  But this isn't informed via TLV
and the user-space, typically PulseAudio, gets confused as if it's
still played in a low volume.
This patch adds the new flag, min_mute, to struct usb_mixer_elem_info
for indicating that the mixer element is with the minimum-mute volume.
This flag is set for known C-Media devices in
snd_usb_mixer_fu_apply_quirk() in turn.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196669
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Thomas Bogendoerfer [Sat, 12 Aug 2017 21:36:47 +0000 (23:36 +0200)]
 
parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
commit 
4098116039911e8870d84c975e2ec22dab65a909 upstream.
For 64bit kernels the lmmio_space_offset of the host bridge window
isn't set correctly on systems with dino/cujo PCI host bridges.
This leads to not assigned memory bars and failing drivers, which
need to use these bars.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jan Kara [Tue, 15 Aug 2017 11:00:36 +0000 (13:00 +0200)]
 
audit: Fix use after free in audit_remove_watch_rule()
commit 
d76036ab47eafa6ce52b69482e91ca3ba337d6d6 upstream.
audit_remove_watch_rule() drops watch's reference to parent but then
continues to work with it. That is not safe as parent can get freed once
we drop our reference. The following is a trivial reproducer:
mount -o loop image /mnt
touch /mnt/file
auditctl -w /mnt/file -p wax
umount /mnt
auditctl -D
<crash in fsnotify_destroy_mark()>
Grab our own reference in audit_remove_watch_rule() earlier to make sure
mark does not get freed under us.
Reported-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Eric Dumazet [Mon, 14 Aug 2017 17:16:45 +0000 (10:16 -0700)]
 
af_key: do not use GFP_KERNEL in atomic contexts
commit 
36f41f8fc6d8aa9f8c9072d66ff7cf9055f5e69b upstream.
pfkey_broadcast() might be called from non process contexts,
we can not use GFP_KERNEL in these cases [1].
This patch partially reverts commit 
ba51b6be38c1 ("net: Fix RCU splat in
af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
section.
[1] : syzkaller reported :
in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
3 locks held by syzkaller183439/2932:
 #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<
ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
 #1:  (&pfk->dump_lock){+.+.+.}, at: [<
ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
 #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<
ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
 #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<
ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
 __might_sleep+0x95/0x190 kernel/sched/core.c:5947
 slab_pre_alloc_hook mm/slab.h:416 [inline]
 slab_alloc mm/slab.c:3383 [inline]
 kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
 skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
 pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
 pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
 dump_sp+0x3d6/0x500 net/key/af_key.c:2685
 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
 pfkey_process+0x606/0x710 net/key/af_key.c:2814
 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 ___sys_sendmsg+0x755/0x890 net/socket.c:2035
 __sys_sendmsg+0xe5/0x210 net/socket.c:2069
 SYSC_sendmsg net/socket.c:2080 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2076
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x445d79
RSP: 002b:
00007f32447c1dc8 EFLAGS: 
00000202 ORIG_RAX: 
000000000000002e
RAX: 
ffffffffffffffda RBX: 
0000000000000000 RCX: 
0000000000445d79
RDX: 
0000000000000000 RSI: 
000000002023dfc8 RDI: 
0000000000000008
RBP: 
0000000000000086 R08: 
00007f32447c2700 R09: 
00007f32447c2700
R10: 
00007f32447c2700 R11: 
0000000000000202 R12: 
0000000000000000
R13: 
00007ffe33edec4f R14: 
00007f32447c29c0 R15: 
0000000000000000
Fixes: 
ba51b6be38c1 ("net: Fix RCU splat in af_key")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Omar Sandoval [Fri, 11 Aug 2017 16:00:06 +0000 (09:00 -0700)]
 
xfs: fix inobt inode allocation search optimization
commit 
c44245b3d5435f533ca8346ece65918f84c057f9 upstream.
When we try to allocate a free inode by searching the inobt, we try to
find the inode nearest the parent inode by searching chunks both left
and right of the chunk containing the parent. As an optimization, we
cache the leftmost and rightmost records that we previously searched; if
we do another allocation with the same parent inode, we'll pick up the
search where it last left off.
There's a bug in the case where we found a free inode to the left of the
parent's chunk: we need to update the cached left and right records, but
because we already reassigned the right record to point to the left, we
end up assigning the left record to both the cached left and right
records.
This isn't a correctness problem strictly, but it can result in the next
allocation rechecking chunks unnecessarily or allocating inodes further
away from the parent than it needs to. Fix it by swapping the record
pointer after we update the cached left and right records.
Fixes: 
bd169565993b ("xfs: speed up free inode search")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Yishai Hadas [Tue, 1 Aug 2017 06:41:36 +0000 (09:41 +0300)]
 
IB/uverbs: Fix device cleanup
commit 
efdd6f53b10aead0f5cf19a93dd3eb268ac0d991 upstream.
Uverbs device should be cleaned up only when there is no
potential usage of.
As part of ib_uverbs_remove_one which might be triggered upon reset flow
the device reference count is decreased as expected and leave the final
cleanup to the FDs that were opened.
Current code increases reference count upon opening a new command FD and
decreases it upon closing the file. The event FD is opened internally
and rely on the command FD by taking on it a reference count.
In case that the command FD was closed and just later the event FD we
may ensure that the device resources as of srcu are still alive as they
are still in use.
Fixing the above by moving the reference count decreasing to the place
where the command FD is really freed instead of doing that when it was
just closed.
fixes: 
036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Leon Romanovsky [Tue, 1 Aug 2017 06:41:35 +0000 (09:41 +0300)]
 
RDMA/uverbs: Prevent leak of reserved field
commit 
f7a6cb7b38c6845b26aaa8bbdf519ff6e3090831 upstream.
initialize to zero the response structure to prevent
the leakage of "resp.reserved" field.
drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn:
	check that 'resp.reserved' doesn't leak information
Fixes: 
33b9b3ee9709 ("IB: Add userspace support for resizing CQs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jan Kara [Wed, 2 Aug 2017 20:32:30 +0000 (13:32 -0700)]
 
ocfs2: don't clear SGID when inheriting ACLs
commit 
19ec8e48582670c021e998b9deb88e39a842ff45 upstream.
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0').  However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
into ocfs2_iop_set_acl().  That way the function will not be called when
inheriting ACLs which is what we want as it prevents SGID bit clearing
and the mode has been properly set by posix_acl_create() anyway.  Also
posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
mode itself.
Fixes: 
073931017b4 ("posix_acl: Clear SGID bit when setting file permissions")
Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: Move the call to posix_acl_update_mode() into
 ocfs2_xattr_set_acl(). Pass NULL as the bh argument to
 ocfs2_acl_set_mode(). Reuse the existing cleanup label.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Inbar Karmy [Tue, 1 Aug 2017 13:43:43 +0000 (16:43 +0300)]
 
net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support
commit 
c994f778bb1cca8ebe7a4e528cefec233e93b5cc upstream.
Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.
Tested:
As accepted, when NIC supports WoL- ethtool reports:
	Supports Wake-on: g
	Wake-on: d
when NIC doesn't support WoL- ethtool reports:
        Supports Wake-on: d
        Wake-on: d
Fixes: 
14c07b1358ed ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Michał Mirosław [Tue, 18 Jul 2017 12:35:45 +0000 (14:35 +0200)]
 
gpio: tegra: fix unbalanced chained_irq_enter/exit
commit 
9e9509e38fbe034782339eb09c915f0b5765ff69 upstream.
When more than one GPIO IRQs are triggered simultaneously,
tegra_gpio_irq_handler() called chained_irq_exit() multiple
times for one chained_irq_enter().
Fixes: 
3c92db9ac0ca3eee8e46e2424b6c074e2e394ad9
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
[Also changed the variable to a bool]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Max Filippov [Tue, 1 Aug 2017 18:15:15 +0000 (11:15 -0700)]
 
xtensa: mm/cache: add missing EXPORT_SYMBOLs
commit 
bc652eb6a0d5cffaea7dc8e8ad488aab2a1bf1ed upstream.
Functions clear_user_highpage, copy_user_highpage, flush_dcache_page,
local_flush_cache_range and local_flush_cache_page may be used from
modules. Export them.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
[bwh: Backported to 3.2: drop exports of {clear,copy}_user_highpage()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Max Filippov [Tue, 1 Aug 2017 18:02:46 +0000 (11:02 -0700)]
 
xtensa: don't limit csum_partial export by CONFIG_NET
commit 
7f81e55c737a8fa82c71f290945d729a4902f8d2 upstream.
csum_partial and csum_partial_copy_generic are defined unconditionally
and are available even when CONFIG_NET is disabled. They are used not
only by the network drivers, but also by scsi and media.
Don't limit these functions export by CONFIG_NET.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Max Filippov [Mon, 17 Sep 2012 01:44:56 +0000 (05:44 +0400)]
 
xtensa: add missing symbol exports
commit 
d3738f407c8ced4fd17dccf6cce729023c735c73 upstream.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
[bwh: Backported to 3.2: drop exports of some functions that aren't defined here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Rafael J. Wysocki [Tue, 25 Jul 2017 21:58:50 +0000 (23:58 +0200)]
 
USB: hcd: Mark secondary HCD as dead if the primary one died
commit 
cd5a6a4fdaba150089af2afc220eae0fef74878a upstream.
Make usb_hc_died() clear the HCD_FLAG_RH_RUNNING flag for the shared
HCD and set HCD_FLAG_DEAD for it, in analogy with what is done for
the primary one.
Among other thigs, this prevents check_root_hub_suspended() from
returning -EBUSY for dead HCDs which helps to work around system
suspend issues in some situations.
This actually fixes occasional suspend failures on one of my test
machines.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Max Filippov [Sat, 29 Jul 2017 00:42:59 +0000 (17:42 -0700)]
 
xtensa: fix cache aliasing handling code for WT cache
commit 
6d0f581d1768d3eaba15776e7dd1fdfec10cfe36 upstream.
Currently building kernel for xtensa core with aliasing WT cache fails
with the following messages:
  mm/memory.c:2152: undefined reference to `flush_dcache_page'
  mm/memory.c:2332: undefined reference to `local_flush_cache_page'
  mm/memory.c:1919: undefined reference to `local_flush_cache_range'
  mm/memory.c:4179: undefined reference to `copy_to_user_page'
  mm/memory.c:4183: undefined reference to `copy_from_user_page'
This happens because implementation of these functions is only compiled
when data cache is WB, which looks wrong: even when data cache doesn't
need flushing it still needs invalidation. The functions like
__flush_[invalidate_]dcache_* are correctly defined for both WB and WT
caches (and even if they weren't that'd still be ok, just slower).
Fix this by providing the same implementation of the above functions for
both WB and WT cache.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Arnd Bergmann [Wed, 19 Mar 2014 17:41:37 +0000 (18:41 +0100)]
 
ARM: pxa: select both FB and FB_W100 for eseries
commit 
1d20d8a9fce8f1e2ef00a0f3d068fa18d59ddf8f upstream.
We get a link error trying to access the w100fb_gpio_read/write
functions from the platform when the driver is a loadable module
or not built-in, so the platform already uses 'select' to hard-enable
the driver.
However, that fails if the framebuffer subsystem is disabled
altogether.
I've considered various ways to fix this properly, but they
all seem like too much work or too risky, so this simply
adds another 'select' to force the subsystem on as well.
Fixes: 
82427de2c7c3 ("ARM: pxa: PXA_ESERIES depends on FB_W100.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Xin Long [Wed, 26 Jul 2017 08:24:59 +0000 (16:24 +0800)]
 
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
commit 
6b84202c946cd3da3a8daa92c682510e9ed80321 upstream.
Commit 
b1f5bfc27a19 ("sctp: don't dereference ptr before leaving
_sctp_walk_{params, errors}()") tried to fix the issue that it
may overstep the chunk end for _sctp_walk_{params, errors} with
'chunk_end > offset(length) + sizeof(length)'.
But it introduced a side effect: When processing INIT, it verifies
the chunks with 'param.v == chunk_end' after iterating all params
by sctp_walk_params(). With the check 'chunk_end > offset(length)
+ sizeof(length)', it would return when the last param is not yet
accessed. Because the last param usually is fwdtsn supported param
whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'
This is a badly issue even causing sctp couldn't process 4-shakes.
Client would always get abort when connecting to server, due to
the failure of INIT chunk verification on server.
The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
instead of 'chunk_end < offset(length) + sizeof(length)' for both
_sctp_walk_params and _sctp_walk_errors.
Fixes: 
b1f5bfc27a19 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Alexander Potapenko [Fri, 14 Jul 2017 16:32:45 +0000 (18:32 +0200)]
 
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
commit 
b1f5bfc27a19f214006b9b4db7b9126df2dfdf5a upstream.
If the length field of the iterator (|pos.p| or |err|) is past the end
of the chunk, we shouldn't access it.
This bug has been detected by KMSAN. For the following pair of system
calls:
  socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
  sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
         inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
         sin6_scope_id=0}, 28) = 1
the tool has reported a use of uninitialized memory:
  ==================================================================
  BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
  CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
  01/01/2011
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x172/0x1c0 lib/dump_stack.c:52
   kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
   __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
   __sctp_rcv_init_lookup net/sctp/input.c:1074
   __sctp_rcv_lookup_harder net/sctp/input.c:1233
   __sctp_rcv_lookup net/sctp/input.c:1255
   sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
   sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
   ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
   NF_HOOK ./include/linux/netfilter.h:257
   ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
   dst_input ./include/net/dst.h:492
   ip6_rcv_finish net/ipv6/ip6_input.c:69
   NF_HOOK ./include/linux/netfilter.h:257
   ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
   __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
   __netif_receive_skb net/core/dev.c:4246
   process_backlog+0x667/0xba0 net/core/dev.c:4866
   napi_poll net/core/dev.c:5268
   net_rx_action+0xc95/0x1590 net/core/dev.c:5333
   __do_softirq+0x485/0x942 kernel/softirq.c:284
   do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
   </IRQ>
   do_softirq kernel/softirq.c:328
   __local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
   local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
   rcu_read_unlock_bh ./include/linux/rcupdate.h:931
   ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
   ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
   NF_HOOK_COND ./include/linux/netfilter.h:246
   ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
   dst_output ./include/net/dst.h:486
   NF_HOOK ./include/linux/netfilter.h:257
   ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
   sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
   sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
   sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
   sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
   sctp_side_effects net/sctp/sm_sideeffect.c:1773
   sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
   sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
   sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
   inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
   sock_sendmsg_nosec net/socket.c:633
   sock_sendmsg net/socket.c:643
   SYSC_sendto+0x608/0x710 net/socket.c:1696
   SyS_sendto+0x8a/0xb0 net/socket.c:1664
   do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
   entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
  RIP: 0033:0x401133
  RSP: 002b:
00007fff6d99cd38 EFLAGS: 
00000246 ORIG_RAX: 
000000000000002c
  RAX: 
ffffffffffffffda RBX: 
00000000004002b0 RCX: 
0000000000401133
  RDX: 
0000000000000001 RSI: 
0000000000494088 RDI: 
0000000000000003
  RBP: 
00007fff6d99cd90 R08: 
00007fff6d99cd50 R09: 
000000000000001c
  R10: 
0000000000000001 R11: 
0000000000000246 R12: 
0000000000000000
  R13: 
00000000004063d0 R14: 
0000000000406460 R15: 
0000000000000000
  origin:
   save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
   kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
   kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
   kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
   slab_alloc_node mm/slub.c:2743
   __kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
   __kmalloc_reserve net/core/skbuff.c:138
   __alloc_skb+0x26b/0x840 net/core/skbuff.c:231
   alloc_skb ./include/linux/skbuff.h:933
   sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
   sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
   sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
   sctp_side_effects net/sctp/sm_sideeffect.c:1773
   sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
   sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
   sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
   inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
   sock_sendmsg_nosec net/socket.c:633
   sock_sendmsg net/socket.c:643
   SYSC_sendto+0x608/0x710 net/socket.c:1696
   SyS_sendto+0x8a/0xb0 net/socket.c:1664
   do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
   return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
  ==================================================================
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Leon Romanovsky [Sat, 15 Jul 2017 13:26:55 +0000 (16:26 +0300)]
 
IB/ipoib: Remove double pointer assigning
commit 
1b355094b308f3377c8f574ce86135ee159c6285 upstream.
There is no need to assign "p" pointer twice.
This patch fixes the following smatch warning:
drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn:
	missing break? reassigning 'p->id'
Fixes: 
839fcaba355a ("IPoIB: Connected mode experimental support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Alex Vesker [Thu, 13 Jul 2017 08:27:12 +0000 (11:27 +0300)]
 
IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp
commit 
11f74b40359b19f760964e71d04882a6caf530cc upstream.
Don't allow negative values to max_nonsrq_conn_qp. There is no functional
impact on a negative value but it is logicically incorrect.
Fixes: 
68e995a29572 ("IPoIB/cm: Add connected mode support for devices without SRQs")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jiri Olsa [Thu, 20 Jul 2017 14:14:55 +0000 (16:14 +0200)]
 
perf/core: Fix locking for children siblings group read
commit 
2aeb1883547626d82c597cce2c99f0b9c62e2425 upstream.
We're missing ctx lock when iterating children siblings
within the perf_read path for group reading. Following
race and crash can happen:
User space doing read syscall on event group leader:
T1:
  perf_read
    lock event->ctx->mutex
    perf_read_group
      lock leader->child_mutex
      __perf_read_group_add(child)
        list_for_each_entry(sub, &leader->sibling_list, group_entry)
---->   sub might be invalid at this point, because it could
        get removed via perf_event_exit_task_context in T2
Child exiting and cleaning up its events:
T2:
  perf_event_exit_task_context
    lock ctx->mutex
    list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
      perf_event_exit_event(child)
        lock ctx->lock
        perf_group_detach(child)
        unlock ctx->lock
---->   child is removed from sibling_list without any sync
        with T1 path above
        ...
        free_event(child)
Before the child is removed from the leader's child_list,
(and thus is omitted from perf_read_group processing), we
need to ensure that perf_read_group touches child's
siblings under its ctx->lock.
Peter further notes:
| One additional note; this bug got exposed by commit:
|
|   
ba5213ae6b88 ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
|
| which made it possible to actually trigger this code-path.
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 
ba5213ae6b88 ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Peter Zijlstra [Fri, 4 Sep 2015 03:07:49 +0000 (20:07 -0700)]
 
perf/core: Invert perf_read_group() loops
commit 
fa8c269353d560b7c28119ad7617029f92e40b15 upstream.
In order to enable the use of perf_event_read(.group = true), we need
to invert the sibling-child loop nesting of perf_read_group().
Currently we iterate the child list for each sibling, this precludes
using group reads. Flip things around so we iterate each group for
each child.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Made the patch compile and things. ]
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-7-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2 as a dependency of commit 
2aeb18835476 ("perf/core: Fix
 locking for children siblings group read"):
 - Keep the function name perf_event_read_group()
 - Keep using perf_event_read_value()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mahesh Bandewar [Wed, 19 Jul 2017 22:41:33 +0000 (15:41 -0700)]
 
ipv4: initialize fib_trie prior to register_netdev_notifier call.
commit 
8799a221f5944a7d74516ecf46d58c28ec1d1f75 upstream.
Net stack initialization currently initializes fib-trie after the
first call to netdevice_notifier() call. In fact fib_trie initialization
needs to happen before first rtnl_register(). It does not cause any problem
since there are no devices UP at this moment, but trying to bring 'lo'
UP at initialization would make this assumption wrong and exposes the issue.
Fixes following crash
 Call Trace:
  ? alternate_node_alloc+0x76/0xa0
  fib_table_insert+0x1b7/0x4b0
  fib_magic.isra.17+0xea/0x120
  fib_add_ifaddr+0x7b/0x190
  fib_netdev_event+0xc0/0x130
  register_netdevice_notifier+0x1c1/0x1d0
  ip_fib_init+0x72/0x85
  ip_rt_init+0x187/0x1e9
  ip_init+0xe/0x1a
  inet_init+0x171/0x26c
  ? ipv4_offload_init+0x66/0x66
  do_one_initcall+0x43/0x160
  kernel_init_freeable+0x191/0x219
  ? rest_init+0x80/0x80
  kernel_init+0xe/0x150
  ret_from_fork+0x22/0x30
 Code: f6 46 23 04 74 86 4c 89 f7 e8 ae 45 01 00 49 89 c7 4d 85 ff 0f 85 7b ff ff ff 31 db eb 08 4c 89 ff e8 16 47 01 00 48 8b 44 24 38 <45> 8b 6e 14 4d 63 76 74 48 89 04 24 0f 1f 44 00 00 48 83 c4 08
 RIP: kmem_cache_alloc+0xcf/0x1c0 RSP: 
ffff9b1500017c28
 CR2: 
0000000000000014
Fixes: 
7b1a74fdbb9e ("[NETNS]: Refactor fib initialization so it can handle multiple namespaces.")
Fixes: 
7f9b80529b8a ("[IPV4]: fib hash|trie initialization")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ismail, Mustafa [Fri, 14 Jul 2017 14:41:31 +0000 (09:41 -0500)]
 
RDMA/core: Initialize port_num in qp_attr
commit 
a62ab66b13a0f9bcb17b7b761f6670941ed5cd62 upstream.
Initialize the port_num for iWARP in rdma_init_qp_attr.
Fixes: 
5ecce4c9b17b("Check port number supplied by user verbs cmds")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ismail, Mustafa [Fri, 14 Jul 2017 14:41:30 +0000 (09:41 -0500)]
 
RDMA/uverbs: Fix the check for port number
commit 
5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3 upstream.
The port number is only valid if IB_QP_PORT is set in the mask.
So only check port number if it is valid to prevent modify_qp from
failing due to an invalid port number.
Fixes: 
5ecce4c9b17b("Check port number supplied by user verbs cmds")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: command structure is cmd not cmd->base]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Dan Carpenter [Thu, 13 Jul 2017 07:48:00 +0000 (10:48 +0300)]
 
IB/cxgb3: Fix error codes in iwch_alloc_mr()
commit 
9064d6055c14f700aa13f7c72fd3e63d12bee643 upstream.
We accidentally don't set the error code on some error paths.  It means
return ERR_PTR(0) which is NULL and results in a NULL dereference in the
caller.
Fixes: 
13a239330abd ("RDMA/cxgb3: Don't ignore insert_handle() failures")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: drop inapplicable hunk]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Dan Carpenter [Thu, 13 Jul 2017 07:47:40 +0000 (10:47 +0300)]
 
cxgb4: Fix error codes in c4iw_create_cq()
commit 
6ebedacbb44602d4dec3348dee5ec31dd9b09521 upstream.
If one of these kmalloc() calls fails then we return ERR_PTR(0) which is
NULL.  It results in a NULL dereference in the callers.
Fixes: 
cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Seunghun Han [Tue, 18 Jul 2017 11:03:51 +0000 (20:03 +0900)]
 
x86/acpi: Prevent out of bound access caused by broken ACPI tables
commit 
dad5ab0db8deac535d03e3fe3d8f2892173fa6a4 upstream.
The bus_irq argument of mp_override_legacy_irq() is used as the index into
the isa_irq_to_gsi[] array. The bus_irq argument originates from
ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
tables, but is nowhere sanity checked.
That allows broken or malicious ACPI tables to overwrite memory, which
might cause malfunction, panic or arbitrary code execution.
Add a sanity check and emit a warning when that triggers.
[ tglx: Added warning and rewrote changelog ]
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Steve Dickson [Thu, 29 Jun 2017 15:48:26 +0000 (11:48 -0400)]
 
mount: copy the port field into the cloned nfs_server structure.
commit 
89a6814d9b665b196aa3a102f96b6dc7e8cb669e upstream.
Doing this copy eliminates the "port=0" entry in
the /proc/mounts entries
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=69241
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Dan Carpenter [Wed, 19 Jul 2017 10:06:41 +0000 (13:06 +0300)]
 
libata: array underflow in ata_find_dev()
commit 
59a5e266c3f5c1567508888dd61a45b86daed0fa upstream.
My static checker complains that "devno" can be negative, meaning that
we read before the start of the loop.  I've looked at the code, and I
think the warning is right.  This come from /proc so it's root only or
it would be quite a quite a serious bug.  The call tree looks like this:
proc_scsi_write() <- gets id and channel from simple_strtoul()
-> scsi_add_single_device() <- calls shost->transportt->user_scan()
   -> ata_scsi_user_scan()
      -> ata_find_dev()
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Yoshihiro Shimoda [Wed, 19 Jul 2017 07:16:54 +0000 (16:16 +0900)]
 
usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
commit 
59a0879a0e17b2e43ecdc5e3299da85b8410d7ce upstream.
This patch fixes an issue that some registers may be not initialized
after resume if the USBHSF_RUNTIME_PWCTRL is not set. Otherwise,
if a cable is not connected, the driver will not enable INTENB0.VBSE
after resume. And then, the driver cannot detect the VBUS.
Fixes: 
ca8a282a5373 ("usb: gadget: renesas_usbhs: add suspend/resume support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Kuninori Morimoto [Mon, 6 Aug 2012 05:44:43 +0000 (22:44 -0700)]
 
usb: renesas_usbhs: fixup resume method for autonomy mode
commit 
5b50d3b52601651ef3183cfb33d03cf486180e48 upstream.
If renesas_usbhs is probed as autonomy mode,
phy reset should be called after power resumed,
and manual cold-plug should be called with slight delay.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Colin Ian King [Thu, 6 Jul 2017 15:06:32 +0000 (16:06 +0100)]
 
usb: storage: return on error to avoid a null pointer dereference
commit 
446230f52a5bef593554510302465eabab45a372 upstream.
When us->extra is null the driver is not initialized, however, a
later call to osd200_scsi_to_ata is made that dereferences
us->extra, causing a null pointer dereference.  The code
currently detects and reports that the driver is not initialized;
add a return to avoid the subsequent dereference issue in this
check.
Thanks to Alan Stern for pointing out that srb->result needs setting
to DID_ERROR << 16
Detected by CoverityScan, CID#100308 ("Dereference after null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Johan Hovold [Wed, 12 Jul 2017 13:08:39 +0000 (15:08 +0200)]
 
USB: cdc-acm: add device-id for quirky printer
commit 
fe855789d605590e57f9cd968d85ecce46f5c3fd upstream.
Add device-id entry for DATECS FP-2000 fiscal printer needing the
NO_UNION_NORMAL quirk.
Reported-by: Anton Avramov <lukav@lukav.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Stefan Triller [Fri, 30 Jun 2017 12:44:03 +0000 (14:44 +0200)]
 
USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
commit 
9585e340db9f6cc1c0928d82c3a23cc4460f0a3f upstream.
The German Telekom offers a ZigBee USB Stick under the brand name Qivicon
for their SmartHome Home Base in its 1. Generation. The productId is not
known by the according kernel module, this patch adds support for it.
Signed-off-by: Stefan Triller <github@stefantriller.de>
Reviewed-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Arnd Bergmann [Fri, 14 Jul 2017 09:31:03 +0000 (11:31 +0200)]
 
staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
commit 
105967ad68d2eb1a041bc041f9cf96af2a653b65 upstream.
gcc-7 points out an older regression:
drivers/staging/iio/resolver/ad2s1210.c: In function 'ad2s1210_read_raw':
drivers/staging/iio/resolver/ad2s1210.c:515:42: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]
The original code had 'unsigned short' here, but incorrectly got
converted to 'bool'. This reverts the regression and uses a normal
type instead.
Fixes: 
29148543c521 ("staging:iio:resolver:ad2s1210 minimal chan spec conversion.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Akinobu Mita [Tue, 20 Jun 2017 16:46:37 +0000 (01:46 +0900)]
 
iio: light: tsl2563: use correct event code
commit 
a3507e48d3f99a93a3056a34a5365f310434570f upstream.
The TSL2563 driver provides three iio channels, two of which are raw ADC
channels (channel 0 and channel 1) in the device and the remaining one
is calculated by the two.  The ADC channel 0 only supports programmable
interrupt with threshold settings and this driver supports the event but
the generated event code does not contain the corresponding iio channel
type.
This is going to change userspace ABI.  Hopefully fixing this to be
what it should always have been won't break any userspace code.
Cc: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mateusz Jurczyk [Wed, 7 Jun 2017 10:26:49 +0000 (12:26 +0200)]
 
fuse: initialize the flock flag in fuse_file on allocation
commit 
68227c03cba84a24faf8a7277d2b1a03c8959c2c upstream.
Before the patch, the flock flag could remain uninitialized for the
lifespan of the fuse_file allocation. Unless set to true in
fuse_file_flock(), it would remain in an indeterminate state until read in
an if statement in fuse_release_common(). This could consequently lead to
taking an unexpected branch in the code.
The bug was discovered by a runtime instrumentation designed to detect use
of uninitialized memory in the kernel.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Fixes: 
37fb3a30b462 ("fuse: fix flock")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Ben Hutchings [Thu, 12 Oct 2017 14:27:23 +0000 (15:27 +0100)]
 
Linux 3.2.94
Zefan Li [Thu, 25 Sep 2014 01:41:02 +0000 (09:41 +0800)]
 
cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
commit 
2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.
When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.
Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.
Here's the full report:
https://lkml.org/lkml/2014/9/19/230
To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: 
950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.4:
 - adjust context
 - check current->flags & PF_MEMPOLICY rather than current->mempolicy]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Zefan Li [Thu, 25 Sep 2014 01:40:40 +0000 (09:40 +0800)]
 
sched: add macros to define bitops for task atomic flags
commit 
e0e5070b20e01f0321f97db4e4e174f3f6b49e50 upstream.
This will simplify code when we add new flags.
v3:
- Kees pointed out that no_new_privs should never be cleared, so we
shouldn't define task_clear_no_new_privs(). we define 3 macros instead
of a single one.
v2:
- updated scripts/tags.sh, suggested by Peter
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.4:
 - adjust context
 - remove no_new_priv code
 - add atomic_flags to struct task_struct]
[bwh: Backported to 3.2:
 - Drop changes in scripts/tags.sh
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Jamal Hadi Salim [Tue, 25 Oct 2016 00:18:27 +0000 (20:18 -0400)]
 
net sched filters: fix notification of filter delete with proper handle
[ Upstream commit 
9ee7837449b3d6f0fcf9132c6b5e5aaa58cc67d4 ]
Daniel says:
While trying out [1][2], I noticed that tc monitor doesn't show the
correct handle on delete:
$ tc monitor
qdisc clsact ffff: dev eno1 parent ffff:fff1
filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80
some context to explain the above:
The user identity of any tc filter is represented by a 32-bit
identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
above. A user wishing to delete, get or even modify a specific filter
uses this handle to reference it.
Every classifier is free to provide its own semantics for the 32 bit handle.
Example: classifiers like u32 use schemes like 800:1:801 to describe
the semantics of their filters represented as hash table, bucket and
node ids etc.
Classifiers also have internal per-filter representation which is different
from this externally visible identity. Most classifiers set this
internal representation to be a pointer address (which allows fast retrieval
of said filters in their implementations). This internal representation
is referenced with the "fh" variable in the kernel control code.
When a user successfuly deletes a specific filter, by specifying the correct
tcm->tcm_handle, an event is generated to user space which indicates
which specific filter was deleted.
Before this patch, the "fh" value was sent to user space as the identity.
As an example what is shown in the sample bpf filter delete event above
is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
which happens to be a 64-bit memory address of the internal filter
representation (address of the corresponding filter's struct cls_bpf_prog);
After this patch the appropriate user identifiable handle as encoded
in the originating request tcm->tcm_handle is generated in the event.
One of the cardinal rules of netlink rules is to be able to take an
event (such as a delete in this case) and reflect it back to the
kernel and successfully delete the filter. This patch achieves that.
Note, this issue has existed since the original TC action
infrastructure code patch back in 2004 as found in:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/
[1] http://patchwork.ozlabs.org/patch/682828/
[2] http://patchwork.ozlabs.org/patch/682829/
Fixes: 
4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Sudip Mukherjee [Tue, 29 Dec 2015 22:54:19 +0000 (14:54 -0800)]
 
m32r: add io*_rep helpers
commit 
92a8ed4c7643809123ef0a65424569eaacc5c6b0 upstream.
m32r allmodconfig was failing with the error:
  error: implicit declaration of function 'read'
On checking io.h it turned out that 'read' is not defined but 'readb' is
defined and 'ioread8' will then obviously mean 'readb'.
At the same time some of the helper functions ioreadN_rep() and
iowriteN_rep() were missing which also led to the build failure.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Abhilash Kesavan [Fri, 6 Feb 2015 13:45:26 +0000 (19:15 +0530)]
 
m32r: add definition of ioremap_wc to io.h
commit 
71a49d16f06de2ccdf52ca247d496a2bb1ca36fe upstream.
Before adding a resource managed ioremap_wc function, we need
to have ioremap_wc defined for m32r to prevent build errors.
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Arun Sharma [Fri, 20 Apr 2012 22:41:35 +0000 (15:41 -0700)]
 
perf/x86: Check if user fp is valid
commit 
bc6ca7b342d5ae15c3ba3081fd40271b8039fb25 upstream.
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-4-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: also add user_addr_max() macro]
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Steven J. Hill [Fri, 6 Jul 2012 19:56:01 +0000 (21:56 +0200)]
 
MIPS: Refactor 'clear_page' and 'copy_page' functions.
commit 
c022630633624a75b3b58f43dd3c6cc896a56cff upstream.
Remove usage of the '__attribute__((alias("...")))' hack that aliased
to integer arrays containing micro-assembled instructions. This hack
breaks when building a microMIPS kernel. It also makes the code much
easier to understand.
[ralf@linux-mips.org: Added back export of the clear_page and copy_page
symbols so certain modules will work again.  Also fixed build with
CONFIG_SIBYTE_DMA_PAGEOPS enabled.]
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3866/
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: adjust context]
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Andrey Vagin [Wed, 29 Jan 2014 18:34:14 +0000 (19:34 +0100)]
 
netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
commit 
c6825c0976fa7893692e0e43b09740b419b23c09 upstream.
Lets look at destroy_conntrack:
hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.
But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().
Florian Westphal suggested to check the confirm bit too. I think it's
right.
task 1			task 2			task 3
			nf_conntrack_find_get
			 ____nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
						__nf_conntrack_alloc
						 kmem_cache_alloc
						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
			 if (nf_ct_is_dying(ct))
			 if (!nf_ct_tuple_equal()
I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.
<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[<
ffffffffa01e00a4>]  [<
ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [<
ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
<4>[46267.085697]  [<
ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [<
ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [<
ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [<
ffffffff814841b9>] nf_iterate+0x69/0xb0
<4>[46267.085991]  [<
ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [<
ffffffff81484374>] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [<
ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [<
ffffffff814b5890>] ? dst_output+0x0/0x20
<4>[46267.086277]  [<
ffffffff81495204>] ip_output+0xa4/0xc0
<4>[46267.086346]  [<
ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [<
ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [<
ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [<
ffffffff81444d67>] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [<
ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [<
ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [<
ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [<
ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [<
ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [<
ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [<
ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [<
ffffffff81445599>] sys_sendto+0x139/0x190
<4>[46267.087229]  [<
ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [<
ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [<
ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [<
ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [<
ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [<
ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [<
ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>