pandora-kernel.git
9 years agohugetlbfs: add swap entry check in follow_hugetlb_page()
Naoya Horiguchi [Wed, 17 Apr 2013 22:58:30 +0000 (15:58 -0700)]
hugetlbfs: add swap entry check in follow_hugetlb_page()

commit 9cc3a5bd40067b9a0fbd49199d0780463fc2140f upstream.

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access the
error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) in
get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.

In other words, physical address information is stored in different bit
layout between hugepage hwpoison entry and pmd entry, so
follow_hugetlb_page() which is called in get_dump_page() returns a wrong
page from a given address.

The expected behavior is like this:

  absent   is_swap_pte   FOLL_DUMP   Expected behavior
  -------------------------------------------------------------------
   true     false         false       hugetlb_fault
   false    true          false       hugetlb_fault
   false    false         false       return page
   true     false         true        skip page (to avoid allocation)
   false    true          true        hugetlb_fault
   false    false         true        return page

With this patch, we can call hugetlb_fault() and take proper actions (we
wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for
hwpoisoned entries,) and as the result we can dump all hugepages except
for hwpoisoned ones.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoARM: 7698/1: perf: fix group validation when using enable_on_exec
Will Deacon [Fri, 12 Apr 2013 18:04:19 +0000 (19:04 +0100)]
ARM: 7698/1: perf: fix group validation when using enable_on_exec

commit cb2d8b342aa084d1f3ac29966245dec9163677fb upstream.

Events may be created with attr->disabled == 1 and attr->enable_on_exec
== 1, which confuses the group validation code because events with the
PERF_EVENT_STATE_OFF are not considered candidates for scheduling, which
may lead to failure at group scheduling time.

This patch fixes the validation check for ARM, so that events in the
OFF state are still considered when enable_on_exec is true.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Reported-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon
Illia Ragozin [Wed, 10 Apr 2013 18:43:34 +0000 (19:43 +0100)]
ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon

commit cd272d1ea71583170e95dde02c76166c7f9017e6 upstream.

On Feroceon the L2 cache becomes non-coherent with the CPU
when the L1 caches are disabled. Thus the L2 needs to be invalidated
after both L1 caches are disabled.

On kexec before the starting the code for relocation the kernel,
the L1 caches are disabled in cpu_froc_fin (cpu_v7_proc_fin for Feroceon),
but after L2 cache is never invalidated, because inv_all is not set
in cache-feroceon-l2.c.
So kernel relocation and decompression may has (and usually has) errors.
Setting the function enables L2 invalidation and fixes the issue.

Signed-off-by: Illia Ragozin <illia.ragozin@grapecom.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoath9k_hw: change AR9580 initvals to fix a stability issue
Felix Fietkau [Wed, 10 Apr 2013 13:26:06 +0000 (15:26 +0200)]
ath9k_hw: change AR9580 initvals to fix a stability issue

commit f09a878511997c25a76bf111a32f6b8345a701a5 upstream.

The hardware parsing of Control Wrapper Frames needs to be disabled, as
it has been causing spurious decryption error reports. The initvals for
other chips have been updated to disable it, but AR9580 was left out for
some reason.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agocan: sja1000: fix handling on dt properties on little endian systems
Christoph Fritz [Thu, 11 Apr 2013 19:32:57 +0000 (21:32 +0200)]
can: sja1000: fix handling on dt properties on little endian systems

commit 0443de5fbf224abf41f688d8487b0c307dc5a4b4 upstream.

To get correct endianes on little endian cpus (like arm) while reading device
tree properties, this patch replaces of_get_property() with
of_property_read_u32(). While there use of_property_read_bool() for the
handling of the boolean "nxp,no-comparator-bypass" property.

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoof: introduce helper to manage boolean
Jean-Christophe PLAGNIOL-VILLARD [Tue, 7 Feb 2012 04:12:51 +0000 (12:12 +0800)]
of: introduce helper to manage boolean

commit fa4d34ccd0914ac87336ea2c17e9370dfecef286 upstream.

of_property_read_bool

Search for a property in a device node.
Returns true if the property exist false otherwise.

Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoath9k_htc: accept 1.x firmware newer than 1.3
Felix Fietkau [Sun, 7 Apr 2013 19:10:48 +0000 (21:10 +0200)]
ath9k_htc: accept 1.x firmware newer than 1.3

commit 319e7bd96aca64a478f3aad40711c928405b8b77 upstream.

Since the firmware has been open sourced, the minor version has been
bumped to 1.4 and the API/ABI will stay compatible across further 1.x
releases.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoARM: Do 15e0d9e37c (ARM: pm: let platforms select cpu_suspend support) properly
Russell King [Mon, 8 Apr 2013 10:44:57 +0000 (11:44 +0100)]
ARM: Do 15e0d9e37c (ARM: pm: let platforms select cpu_suspend support) properly

commit b6c7aabd923a17af993c5a5d5d7995f0b27c000a upstream.

Let's do the changes properly and fix the same problem everywhere, not
just for one case.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: mohawk doesn't support suspend at all]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agovfs: Revert spurious fix to spinning prevention in prune_icache_sb
Suleiman Souhlal [Sat, 13 Apr 2013 23:03:06 +0000 (16:03 -0700)]
vfs: Revert spurious fix to spinning prevention in prune_icache_sb

commit 5b55d708335a9e3e4f61f2dadf7511502205ccd1 upstream.

Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb").

This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.

Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agokobject: fix kset_find_obj() race with concurrent last kobject_put()
Linus Torvalds [Sat, 13 Apr 2013 22:15:30 +0000 (15:15 -0700)]
kobject: fix kset_find_obj() race with concurrent last kobject_put()

commit a49b7e82cab0f9b41f483359be83f44fbb6b4979 upstream.

Anatol Pomozov identified a race condition that hits module unloading
and re-loading.  To quote Anatol:

 "This is a race codition that exists between kset_find_obj() and
  kobject_put().  kset_find_obj() might return kobject that has refcount
  equal to 0 if this kobject is freeing by kobject_put() in other
  thread.

  Here is timeline for the crash in case if kset_find_obj() searches for
  an object tht nobody holds and other thread is doing kobject_put() on
  the same kobject:

    THREAD A (calls kset_find_obj())     THREAD B (calls kobject_put())
    splin_lock()
                                         atomic_dec_return(kobj->kref), counter gets zero here
                                         ... starts kobject cleanup ....
                                         spin_lock() // WAIT thread A in kobj_kset_leave()
    iterate over kset->list
    atomic_inc(kobj->kref) (counter becomes 1)
    spin_unlock()
                                         spin_lock() // taken
                                         // it does not know that thread A increased counter so it
                                         remove obj from list
                                         spin_unlock()
                                         vfree(module) // frees module object with containing kobj

    // kobj points to freed memory area!!
    kobject_put(kobj) // OOPS!!!!

  The race above happens because module.c tries to use kset_find_obj()
  when somebody unloads module.  The module.c code was introduced in
  commit 6494a93d55fa"

Anatol supplied a patch specific for module.c that worked around the
problem by simply not using kset_find_obj() at all, but rather than make
a local band-aid, this just fixes kset_find_obj() to be thread-safe
using the proper model of refusing the get a new reference if the
refcount has already dropped to zero.

See examples of this proper refcount handling not only in the kref
documentation, but in various other equivalent uses of this pattern by
grepping for atomic_inc_not_zero().

[ Side note: the module race does indicate that module loading and
  unloading is not properly serialized wrt sysfs information using the
  module mutex.  That may require further thought, but this is the
  correct fix at the kobject layer regardless. ]

Reported-analyzed-and-tested-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agokref: Implement kref_get_unless_zero v3
Thomas Hellstrom [Tue, 6 Nov 2012 11:31:49 +0000 (11:31 +0000)]
kref: Implement kref_get_unless_zero v3

commit 4b20db3de8dab005b07c74161cb041db8c5ff3a7 upstream.

This function is intended to simplify locking around refcounting for
objects that can be looked up from a lookup structure, and which are
removed from that lookup structure in the object destructor.
Operations on such objects require at least a read lock around
lookup + kref_get, and a write lock around kref_put + remove from lookup
structure. Furthermore, RCU implementations become extremely tricky.
With a lookup followed by a kref_get_unless_zero *with return value check*
locking in the kref_put path can be deferred to the actual removal from
the lookup structure and RCU lookups become trivial.

v2: Formatting fixes.
v3: Invert the return value.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2:
 - Adjust context
 - Add #include <linux/atomic.h>]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoBtrfs: make sure nbytes are right after log replay
Josef Bacik [Fri, 5 Apr 2013 20:50:09 +0000 (20:50 +0000)]
Btrfs: make sure nbytes are right after log replay

commit 4bc4bee4595662d8bff92180d5c32e3313a704b0 upstream.

While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file.  That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it.  So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent.  This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct.  With this
I'm no longer getting nbytes errors out of btrfsck.

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotracing: Fix possible NULL pointer dereferences
Namhyung Kim [Thu, 11 Apr 2013 06:55:01 +0000 (15:55 +0900)]
tracing: Fix possible NULL pointer dereferences

commit 6a76f8c0ab19f215af2a3442870eeb5f0e81998d upstream.

Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops.  However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.

It can be easily reproduced with following command:

  $ cd /sys/kernel/debug/tracing
  $ echo 1234 | sudo tee -a set_ftrace_pid

In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().

Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: ftrace_regex_lseek() is static]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotarget: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs
Nicholas Bellinger [Wed, 10 Apr 2013 22:00:27 +0000 (15:00 -0700)]
target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs

commit 30f359a6f9da65a66de8cadf959f0f4a0d498bba upstream.

This patch fixes a bug where a handful of informational / control CDBs
that should be allowed during ALUA access state Standby/Offline/Transition
where incorrectly returning CHECK_CONDITION + ASCQ_04H_ALUA_TG_PT_*.

This includes INQUIRY + REPORT_LUNS, which would end up preventing LUN
registration when LUN scanning occured during these ALUA access states.

Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotarget: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits
Nicholas Bellinger [Thu, 17 May 2012 04:52:10 +0000 (21:52 -0700)]
target: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits

commit ba539743b70cd160c84bab1c82910d0789b820f8 upstream.

This patch fixes the MAINTENANCE_IN service action type checks to only
look at the proper lower 5 bits of cdb byte 1.  This addresses the case
where MI_REPORT_TARGET_PGS w/ extended header using the upper three bits of
cdb byte 1 was not processed correctly in transport_generic_cmd_sequencer,
as well as the three cases for standby, unavailable, and transition ALUA
primary access state checks.

Also add MAINTENANCE_IN to the excluded list in transport_generic_prepare_cdb()
to prevent the PARAMETER DATA FORMAT bits from being cleared.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Rob Evers <revers@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agox86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
Boris Ostrovsky [Sat, 23 Mar 2013 13:36:36 +0000 (09:36 -0400)]
x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal

commit 511ba86e1d386f671084b5d0e6f110bb30b8eeb2 upstream.

Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.

Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.

[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
  updates" may cause a minor performance regression on
  bare metal.  This patch resolves that performance regression.  It is
  somewhat unclear to me if this is a good -stable candidate. ]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agox86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
Samu Kallio [Sat, 23 Mar 2013 13:36:35 +0000 (09:36 -0400)]
x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates

commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5 upstream.

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

The OOPs usually looks as so:

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
 [<ffffffff81627759>] do_page_fault+0x399/0x4b0
 [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
 [<ffffffff81624065>] page_fault+0x25/0x30
 [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
 [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
 [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
 [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
 [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
 [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
 [<ffffffff81154962>] unmap_vmas+0x52/0xa0
 [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
 [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
 [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81059ce3>] mmput+0x83/0xf0
 [<ffffffff810624c4>] exit_mm+0x104/0x130
 [<ffffffff8106264a>] do_exit+0x15a/0x8c0
 [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81063177>] sys_exit_group+0x17/0x20
 [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotracing: Fix double free when function profile init failed
Namhyung Kim [Mon, 1 Apr 2013 12:46:23 +0000 (21:46 +0900)]
tracing: Fix double free when function profile init failed

commit 83e03b3fe4daffdebbb42151d5410d730ae50bd1 upstream.

On the failure path, stat->start and stat->pages will refer same page.
So it'll attempt to free the same page again and get kernel panic.

Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agospinlocks and preemption points need to be at least compiler barriers
Linus Torvalds [Tue, 9 Apr 2013 17:48:33 +0000 (10:48 -0700)]
spinlocks and preemption points need to be at least compiler barriers

commit 386afc91144b36b42117b0092893f15bc8798a80 upstream.

In UP and non-preempt respectively, the spinlocks and preemption
disable/enable points are stubbed out entirely, because there is no
regular code that can ever hit the kind of concurrency they are meant to
protect against.

However, while there is no regular code that can cause scheduling, we
_do_ end up having some exceptional (literally!) code that can do so,
and that we need to make sure does not ever get moved into the critical
region by the compiler.

In particular, get_user() and put_user() is generally implemented as
inline asm statements (even if the inline asm may then make a call
instruction to call out-of-line), and can obviously cause a page fault
and IO as a result.  If that inline asm has been scheduled into the
middle of a preemption-safe (or spinlock-protected) code region, we
obviously lose.

Now, admittedly this is *very* unlikely to actually ever happen, and
we've not seen examples of actual bugs related to this.  But partly
exactly because it's so hard to trigger and the resulting bug is so
subtle, we should be extra careful to get this right.

So make sure that even when preemption is disabled, and we don't have to
generate any actual *code* to explicitly tell the system that we are in
a preemption-disabled region, we need to at least tell the compiler not
to move things around the critical region.

This patch grew out of the same discussion that caused commits
79e5f05edcbf ("ARC: Add implicit compiler barrier to raw_local_irq*
functions") and 3e2e0d2c222b ("tile: comment assumption about
__insn_mtspr for <asm/irqflags.h>") to come about.

Note for stable: use discretion when/if applying this.  As mentioned,
this bug may never have actually bitten anybody, and gcc may never have
done the required code motion for it to possibly ever trigger in
practice.

Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: drop sched_preempt_enable_no_resched()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running
Alban Bedel [Tue, 9 Apr 2013 15:13:59 +0000 (17:13 +0200)]
ASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running

commit f1ca493b0b5e8f42d3b2dc8877860db2983f47b6 upstream.

The Charge Pump needs the DSP clock to work properly, without it the
bypass to HP/LINEOUT is not working properly. This requirement is not
mentioned in the datasheet but has been confirmed by Mark Brown from
Wolfson.

Signed-off-by: Alban Bedel <alban.bedel@avionic-design.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agocan: gw: use kmem_cache_free() instead of kfree()
Wei Yongjun [Tue, 9 Apr 2013 06:16:04 +0000 (14:16 +0800)]
can: gw: use kmem_cache_free() instead of kfree()

commit 3480a2125923e4b7a56d79efc76743089bf273fc upstream.

Memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoPM / reboot: call syscore_shutdown() after disable_nonboot_cpus()
Huacai Chen [Sun, 7 Apr 2013 02:14:14 +0000 (02:14 +0000)]
PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()

commit 6f389a8f1dd22a24f3d9afc2812b30d639e94625 upstream.

As commit 40dc166c (PM / Core: Introduce struct syscore_ops for core
subsystems PM) say, syscore_ops operations should be carried with one
CPU on-line and interrupts disabled. However, after commit f96972f2d
(kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()),
syscore_shutdown() is called before disable_nonboot_cpus(), so break
the rules. We have a MIPS machine with a 8259A PIC, and there is an
external timer (HPET) linked at 8259A. Since 8259A has been shutdown
too early (by syscore_shutdown()), disable_nonboot_cpus() runs without
timer interrupt, so it hangs and reboot fails. This patch call
syscore_shutdown() a little later (after disable_nonboot_cpus()) to
avoid reboot failure, this is the same way as poweroff does.

For consistency, add disable_nonboot_cpus() to kernel_halt().

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoftrace: Consistently restore trace function on sysctl enabling
Jan Kiszka [Tue, 26 Mar 2013 16:53:03 +0000 (17:53 +0100)]
ftrace: Consistently restore trace function on sysctl enabling

commit 5000c418840b309251c5887f0b56503aae30f84c upstream.

If we reenable ftrace via syctl, we currently set ftrace_trace_function
based on the previous simplistic algorithm. This is inconsistent with
what update_ftrace_function does. So better call that helper instead.

Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agosched_clock: Prevent 64bit inatomicity on 32bit systems
Thomas Gleixner [Sat, 6 Apr 2013 08:10:27 +0000 (10:10 +0200)]
sched_clock: Prevent 64bit inatomicity on 32bit systems

commit a1cbcaa9ea87b87a96b9fc465951dcf36e459ca2 upstream.

The sched_clock_remote() implementation has the following inatomicity
problem on 32bit systems when accessing the remote scd->clock, which
is a 64bit value.

CPU0 CPU1

sched_clock_local() sched_clock_remote(CPU0)
...
remote_clock = scd[CPU0]->clock
    read_low32bit(scd[CPU0]->clock)
cmpxchg64(scd->clock,...)
    read_high32bit(scd[CPU0]->clock)

While the update of scd->clock is using an atomic64 mechanism, the
readout on the remote cpu is not, which can cause completely bogus
readouts.

It is a quite rare problem, because it requires the update to hit the
narrow race window between the low/high readout and the update must go
across the 32bit boundary.

The resulting misbehaviour is, that CPU1 will see the sched_clock on
CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
to this bogus timestamp. This stays that way due to the clamping
implementation for about 4 seconds until the synchronization with
CLOCK_MONOTONIC undoes the problem.

The issue is hard to observe, because it might only result in a less
accurate SCHED_OTHER timeslicing behaviour. To create observable
damage on realtime scheduling classes, it is necessary that the bogus
update of CPU1 sched_clock happens in the context of an realtime
thread, which then gets charged 4 seconds of RT runtime, which results
in the RT throttler mechanism to trigger and prevent scheduling of RT
tasks for a little less than 4 seconds. So this is quite unlikely as
well.

The issue was quite hard to decode as the reproduction time is between
2 days and 3 weeks and intrusive tracing makes it less likely, but the
following trace recorded with trace_clock=global, which uses
sched_clock_local(), gave the final hint:

  <idle>-0   0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80
  <idle>-0   0d..30 400269.477151: hrtimer_start:  hrtimer=0xf7061e80 ...
irq/20-S-587 1d..32 400273.772118: sched_wakeup:   comm= ... target_cpu=0
  <idle>-0   0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80

What happens is that CPU0 goes idle and invokes
sched_clock_idle_sleep_event() which invokes sched_clock_local() and
CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
sched_remote_clock(). The time jump gets propagated to CPU0 via
sched_remote_clock() and stays stale on both cores for ~4 seconds.

There are only two other possibilities, which could cause a stale
sched clock:

1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
   wrong value.

2) sched_clock() which reads the TSC returns a sporadic wrong value.

#1 can be excluded because sched_clock would continue to increase for
   one jiffy and then go stale.

#2 can be excluded because it would not make the clock jump
   forward. It would just result in a stale sched_clock for one jiffy.

After quite some brain twisting and finding the same pattern on other
traces, sched_clock_remote() remained the only place which could cause
such a problem and as explained above it's indeed racy on 32bit
systems.

So while on 64bit systems the readout is atomic, we need to verify the
remote readout on 32bit machines. We need to protect the local->clock
readout in sched_clock_remote() on 32bit as well because an NMI could
hit between the low and the high readout, call sched_clock_local() and
modify local->clock.

Thanks to Siegfried Wulsch for bearing with my debug requests and
going through the tedious tasks of running a bunch of reproducer
systems to generate the debug information which let me decode the
issue.

Reported-by: Siegfried Wulsch <Siegfried.Wulsch@rovema.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agopowerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before...
Michael Wolf [Fri, 5 Apr 2013 10:41:40 +0000 (10:41 +0000)]
powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before the ANDCOND test

commit 9fb2640159f9d4f5a2a9d60e490482d4cbecafdb upstream.

Some versions of pHyp will perform the adjunct partition test before the
ANDCOND test.  The result of this is that H_RESOURCE can be returned and
cause the BUG_ON condition to occur. The HPTE is not removed.  So add a
check for H_RESOURCE, it is ok if this HPTE is not removed as
pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a
specific HPTE to remove.  So it is ok to just move on to the next slot
and try again.

Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoalpha: Add irongate_io to PCI bus resources
Jay Estabrook [Sun, 7 Apr 2013 09:36:09 +0000 (21:36 +1200)]
alpha: Add irongate_io to PCI bus resources

commit aa8b4be3ac049c8b1df2a87e4d1d902ccfc1f7a9 upstream.

Fixes a NULL pointer dereference at boot on UP1500.

Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jay Estabrook <jay.estabrook@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoALSA: usb-audio: fix endianness bug in snd_nativeinstruments_*
Eldad Zack [Fri, 5 Apr 2013 18:49:46 +0000 (20:49 +0200)]
ALSA: usb-audio: fix endianness bug in snd_nativeinstruments_*

commit 889d66848b12d891248b03abcb2a42047f8e172a upstream.

The usb_control_msg() function expects __u16 types and performs
the endianness conversions by itself.
However, in three places, a conversion is performed before it is
handed over to usb_control_msg(), which leads to a double conversion
(= no conversion):
* snd_usb_nativeinstruments_boot_quirk()
* snd_nativeinstruments_control_get()
* snd_nativeinstruments_control_put()

Caught by sparse:

sound/usb/mixer_quirks.c:512:38: warning: incorrect type in argument 6 (different base types)
sound/usb/mixer_quirks.c:512:38:    expected unsigned short [unsigned] [usertype] index
sound/usb/mixer_quirks.c:512:38:    got restricted __le16 [usertype] <noident>
sound/usb/mixer_quirks.c:543:35: warning: incorrect type in argument 5 (different base types)
sound/usb/mixer_quirks.c:543:35:    expected unsigned short [unsigned] [usertype] value
sound/usb/mixer_quirks.c:543:35:    got restricted __le16 [usertype] <noident>
sound/usb/mixer_quirks.c:543:56: warning: incorrect type in argument 6 (different base types)
sound/usb/mixer_quirks.c:543:56:    expected unsigned short [unsigned] [usertype] index
sound/usb/mixer_quirks.c:543:56:    got restricted __le16 [usertype] <noident>
sound/usb/quirks.c:502:35: warning: incorrect type in argument 5 (different base types)
sound/usb/quirks.c:502:35:    expected unsigned short [unsigned] [usertype] value
sound/usb/quirks.c:502:35:    got restricted __le16 [usertype] <noident>

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Acked-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agohwspinlock: fix __hwspin_lock_request error path
Li Fei [Fri, 5 Apr 2013 13:20:36 +0000 (21:20 +0800)]
hwspinlock: fix __hwspin_lock_request error path

commit c10b90d85a5126d25c89cbaa50dc9fdd1c4d001a upstream.

Even in failed case of pm_runtime_get_sync, the usage_count
is incremented. In order to keep the usage_count with correct
value and runtime power management to behave correctly, call
pm_runtime_put_noidle in such case.

In __hwspin_lock_request, module_put is also called before
return in pm_runtime_get_sync failed case.

Signed-off-by Liu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Li Fei <fei.li@intel.com>
[edit commit log]
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoata_piix: Fix DVD not dectected at some Haswell platforms
Youquan Song [Wed, 6 Mar 2013 15:49:05 +0000 (10:49 -0500)]
ata_piix: Fix DVD not dectected at some Haswell platforms

commit b55f84e2d527182e7c611d466cd0bb6ddce201de upstream.

There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d
"ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
 chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode.

We've hit a problem with DVD not recognized on Haswell Desktop platform which
includes Lynx Point 2-port SATA controller.

This quirk patch disables 32bit PIO on this controller in IDE mode.

v2: Change spelling error in statememnt pointed by Sergei Shtylyov.
v3: Change comment statememnt and spliting line over 80 characters pointed by
    Libor Pechacek and also rebase the patch against 3.8-rc7 kernel.

Tested-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agolibata: Set max sector to 65535 for Slimtype DVD A DS8A8SH drive
Shan Hai [Mon, 18 Mar 2013 02:30:44 +0000 (10:30 +0800)]
libata: Set max sector to 65535 for Slimtype DVD A DS8A8SH drive

commit a32450e127fc6e5ca6d958ceb3cfea4d30a00846 upstream.

The Slimtype DVD A  DS8A8SH drive locks up when max sector is smaller than
65535, and the blow backtrace is observed on locking up:

INFO: task flush-8:32:1130 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:32      D ffffffff8180cf60     0  1130      2 0x00000000
 ffff880273aef618 0000000000000046 0000000000000005 ffff880273aee000
 ffff880273aee000 ffff880273aeffd8 ffff880273aee010 ffff880273aee000
 ffff880273aeffd8 ffff880273aee000 ffff88026e842ea0 ffff880274a10000
Call Trace:
 [<ffffffff8168fc2d>] schedule+0x5d/0x70
 [<ffffffff8168fccc>] io_schedule+0x8c/0xd0
 [<ffffffff81324461>] get_request+0x731/0x7d0
 [<ffffffff8133dc60>] ? cfq_allow_merge+0x50/0x90
 [<ffffffff81083aa0>] ? wake_up_bit+0x40/0x40
 [<ffffffff81320443>] ? bio_attempt_back_merge+0x33/0x110
 [<ffffffff813248ea>] blk_queue_bio+0x23a/0x3f0
 [<ffffffff81322176>] generic_make_request+0xc6/0x120
 [<ffffffff81322308>] submit_bio+0x138/0x160
 [<ffffffff811d7596>] ? bio_alloc_bioset+0x96/0x120
 [<ffffffff811d1f61>] submit_bh+0x1f1/0x220
 [<ffffffff811d48b8>] __block_write_full_page+0x228/0x340
 [<ffffffff811d3650>] ? attach_nobh_buffers+0xc0/0xc0
 [<ffffffff811d8960>] ? I_BDEV+0x10/0x10
 [<ffffffff811d8960>] ? I_BDEV+0x10/0x10
 [<ffffffff811d4ab6>] block_write_full_page_endio+0xe6/0x100
 [<ffffffff811d4ae5>] block_write_full_page+0x15/0x20
 [<ffffffff811d9268>] blkdev_writepage+0x18/0x20
 [<ffffffff81142527>] __writepage+0x17/0x40
 [<ffffffff811438ba>] write_cache_pages+0x34a/0x4a0
 [<ffffffff81142510>] ? set_page_dirty+0x70/0x70
 [<ffffffff81143a61>] generic_writepages+0x51/0x80
 [<ffffffff81143ab0>] do_writepages+0x20/0x50
 [<ffffffff811c9ed6>] __writeback_single_inode+0xa6/0x2b0
 [<ffffffff811ca861>] writeback_sb_inodes+0x311/0x4d0
 [<ffffffff811caaa6>] __writeback_inodes_wb+0x86/0xd0
 [<ffffffff811cad43>] wb_writeback+0x1a3/0x330
 [<ffffffff816916cf>] ? _raw_spin_lock_irqsave+0x3f/0x50
 [<ffffffff811b8362>] ? get_nr_inodes+0x52/0x70
 [<ffffffff811cb0ac>] wb_do_writeback+0x1dc/0x260
 [<ffffffff8168dd34>] ? schedule_timeout+0x204/0x240
 [<ffffffff811cb232>] bdi_writeback_thread+0x102/0x2b0
 [<ffffffff811cb130>] ? wb_do_writeback+0x260/0x260
 [<ffffffff81083550>] kthread+0xc0/0xd0
 [<ffffffff81083490>] ? kthread_worker_fn+0x1b0/0x1b0
 [<ffffffff8169a3ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81083490>] ? kthread_worker_fn+0x1b0/0x1b0

 The above trace was triggered by
   "dd if=/dev/zero of=/dev/sr0 bs=2048 count=32768"

 It was previously working by accident, since another bug introduced
 by 4dce8ba94c7 (libata: Use 'bool' return value for ata_id_XXX) caused
 all drives to use maxsect=65535.

Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agolibata: Use integer return value for atapi_command_packet_set
Shan Hai [Mon, 18 Mar 2013 02:30:43 +0000 (10:30 +0800)]
libata: Use integer return value for atapi_command_packet_set

commit d8668fcb0b257d9fdcfbe5c172a99b8d85e1cd82 upstream.

The function returns type of ATAPI drives so it should return integer value.
The commit 4dce8ba94c7 (libata: Use 'bool' return value for ata_id_XXX) since
v2.6.39 changed the type of return value from int to bool, the change would
cause all of the ATAPI class drives to be treated as TYPE_TAPE and the
max_sectors of the drives to be set to 65535 because of the commit
f8d8e5799b7(libata: increase 128 KB / cmd limit for ATAPI tape drives), for the
function would return true for all ATAPI class drives and the TYPE_TAPE is
defined as 0x01.

Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agort2x00: rt2x00pci_regbusy_read() - only print register access failure once
Tim Gardner [Mon, 18 Feb 2013 19:56:28 +0000 (12:56 -0700)]
rt2x00: rt2x00pci_regbusy_read() - only print register access failure once

commit 83589b30f1e1dc9898986293c9336b8ce1705dec upstream.

BugLink: http://bugs.launchpad.net/bugs/1128840
It appears that when this register read fails it never recovers, so
I think there is no need to repeat the same error message ad infinitum.

Cc: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Gertjan van Wingerde <gwingerde@gmail.com>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: linux-wireless@vger.kernel.org
Cc: users@rt2x00.serialmonkey.com
Cc: netdev@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agocrypto: gcm - fix assumption that assoc has one segment
Jussi Kivilinna [Thu, 28 Mar 2013 19:54:03 +0000 (21:54 +0200)]
crypto: gcm - fix assumption that assoc has one segment

commit d3dde52209ab571e4e2ec26c66f85ad1355f7475 upstream.

rfc4543(gcm(*)) code for GMAC assumes that assoc scatterlist always contains
only one segment and only makes use of this first segment. However ipsec passes
assoc with three segments when using 'extended sequence number' thus in this
case rfc4543(gcm(*)) fails to function correctly. Patch fixes this issue.

Reported-by: Chaoxing Lin <Chaoxing.Lin@ultra-3eti.com>
Tested-by: Chaoxing Lin <Chaoxing.Lin@ultra-3eti.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agohrtimer: Don't reinitialize a cpu_base lock on CPU_UP
Michael Bohan [Wed, 20 Mar 2013 02:19:25 +0000 (19:19 -0700)]
hrtimer: Don't reinitialize a cpu_base lock on CPU_UP

commit 84cc8fd2fe65866e49d70b38b3fdf7219dd92fe0 upstream.

The current code makes the assumption that a cpu_base lock won't be
held if the CPU corresponding to that cpu_base is offline, which isn't
always true.

If a hrtimer is not queued, then it will not be migrated by
migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's
cpu_base may still point to a CPU which has subsequently gone offline
if the timer wasn't enqueued at the time the CPU went down.

Normally this wouldn't be a problem, but a cpu_base's lock is blindly
reinitialized each time a CPU is brought up. If a CPU is brought
online during the period that another thread is performing a hrtimer
operation on a stale hrtimer, then the lock will be reinitialized
under its feet, and a SPIN_BUG() like the following will be observed:

<0>[   28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0
<0>[   28.087078]  lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
<4>[   42.451150] [<c0014398>] (unwind_backtrace+0x0/0x120) from [<c0269220>] (do_raw_spin_unlock+0x44/0xdc)
<4>[   42.460430] [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) from [<c071b5bc>] (_raw_spin_unlock+0x8/0x30)
<4>[   42.469632] [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) from [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8)
<4>[   42.479521] [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [<c00aa014>] (hrtimer_start+0x20/0x28)
<4>[   42.489247] [<c00aa014>] (hrtimer_start+0x20/0x28) from [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320)
<4>[   42.498709] [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) from [<c00e6440>] (rcu_idle_enter+0xa0/0xb8)
<4>[   42.508259] [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) from [<c000f268>] (cpu_idle+0x24/0xf0)
<4>[   42.516503] [<c000f268>] (cpu_idle+0x24/0xf0) from [<c06ed3c0>] (rest_init+0x88/0xa0)
<4>[   42.524319] [<c06ed3c0>] (rest_init+0x88/0xa0) from [<c0c00978>] (start_kernel+0x3d0/0x434)

As an example, this particular crash occurred when hrtimer_start() was
executed on CPU #0. The code locked the hrtimer's current cpu_base
corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's
cpu_base to an optimal CPU which was online. In this case, it selected
the cpu_base corresponding to CPU #3.

Before it could proceed, CPU #1 came online and reinitialized the
spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock
which was reinitialized. When CPU #0 finally ended up unlocking the
old cpu_base corresponding to CPU #1 so that it could switch to CPU
#3, we hit this SPIN_BUG() above while in switch_hrtimer_base().

CPU #0                            CPU #1
----                              ----
...                               <offline>
hrtimer_start()
lock_hrtimer_base(base #1)
...                               init_hrtimers_cpu()
switch_hrtimer_base()             ...
...                               raw_spin_lock_init(&cpu_base->lock)
raw_spin_unlock(&cpu_base->lock)  ...
<spin_bug>

Solve this by statically initializing the lock.

Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: ti_usb_3410_5052: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:26 +0000 (09:21 +0100)]
USB: ti_usb_3410_5052: fix use-after-free in TIOCMIWAIT

commit fc98ab873aa3dbe783ce56a2ffdbbe7c7609521a upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: ssu100: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:25 +0000 (09:21 +0100)]
USB: ssu100: fix use-after-free in TIOCMIWAIT

commit 43a66b4c417ad15f6d2f632ce67ad195bdf999e8 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: spcp8x5: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:24 +0000 (09:21 +0100)]
USB: spcp8x5: fix use-after-free in TIOCMIWAIT

commit dbcea7615d8d7d58f6ff49d2c5568113f70effe9 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: pl2303: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:22 +0000 (09:21 +0100)]
USB: pl2303: fix use-after-free in TIOCMIWAIT

commit 40509ca982c00c4b70fc00be887509feca0bff15 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: oti6858: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:21 +0000 (09:21 +0100)]
USB: oti6858: fix use-after-free in TIOCMIWAIT

commit 8edfdab37157d2683e51b8be5d3d5697f66a9f7b upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: mos7840: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:20 +0000 (09:21 +0100)]
USB: mos7840: fix use-after-free in TIOCMIWAIT

commit a14430db686b8e459e1cf070a6ecf391515c9ab9 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: mos7840: fix broken TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:19 +0000 (09:21 +0100)]
USB: mos7840: fix broken TIOCMIWAIT

commit e670c6af12517d08a403487b1122eecf506021cf upstream.

Make sure waiting processes are woken on modem-status changes.

Currently processes are only woken on termios changes regardless of
whether the modem status has changed.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: mct_u232: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:18 +0000 (09:21 +0100)]
USB: mct_u232: fix use-after-free in TIOCMIWAIT

commit cf1d24443677a0758cfa88ca40f24858b89261c0 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: io_ti: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:17 +0000 (09:21 +0100)]
USB: io_ti: fix use-after-free in TIOCMIWAIT

commit 7b2459690584f239650a365f3411ba2ec1c6d1e0 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: io_edgeport: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:16 +0000 (09:21 +0100)]
USB: io_edgeport: fix use-after-free in TIOCMIWAIT

commit 333576255d4cfc53efd056aad438568184b36af6 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: ftdi_sio: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:15 +0000 (09:21 +0100)]
USB: ftdi_sio: fix use-after-free in TIOCMIWAIT

commit 71ccb9b01981fabae27d3c98260ea4613207618e upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

When switching to tty ports, some lifetime assumptions were changed.
Specifically, close can now be called before the final tty reference is
dropped as part of hangup at device disconnect. Even with the ftdi
private-data refcounting this means that the port private data can be
freed while a process is sleeping on modem-status changes and thus
cannot be relied on to detect disconnects when woken up.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: cypress_m8: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:13 +0000 (09:21 +0100)]
USB: cypress_m8: fix use-after-free in TIOCMIWAIT

commit 356050d8b1e526db093e9d2c78daf49d6bf418e3 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Also remove bogus test for private data pointer being NULL as it is
never assigned in the loop.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: ch341: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:12 +0000 (09:21 +0100)]
USB: ch341: fix use-after-free in TIOCMIWAIT

commit fa1e11d5231c001c80a479160b5832933c5d35fb upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: ark3116: fix use-after-free in TIOCMIWAIT
Johan Hovold [Tue, 19 Mar 2013 08:21:11 +0000 (09:21 +0100)]
USB: ark3116: fix use-after-free in TIOCMIWAIT

commit 5018860321dc7a9e50a75d5f319bc981298fb5b7 upstream.

Use the port wait queue and make sure to check the serial disconnected
flag before accessing private port data after waking up.

This is is needed as the private port data (including the wait queue
itself) can be gone when waking up after a disconnect.

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: serial: fix hang when opening port
Ming Lei [Tue, 26 Mar 2013 02:49:55 +0000 (10:49 +0800)]
USB: serial: fix hang when opening port

commit eba0e3c3a0ba7b96f01cbe997680f6a4401a0bfc upstream.

Johan's 'fix use-after-free in TIOCMIWAIT' patchset[1] introduces
one bug which can cause kernel hang when opening port.

This patch initialized the 'port->delta_msr_wait' waitqueue head
to fix the bug which is introduced in 3.9-rc4.

[1], http://marc.info/?l=linux-usb&m=136368139627876&w=2

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoUSB: serial: add modem-status-change wait queue
Johan Hovold [Tue, 19 Mar 2013 08:21:10 +0000 (09:21 +0100)]
USB: serial: add modem-status-change wait queue

commit e5b33dc9d16053c2ae4c2c669cf008829530364b upstream.

Add modem-status-change wait queue to struct usb_serial_port that
subdrivers can use to implement TIOCMIWAIT.

Currently subdrivers use a private wait queue which may have been
released when waking up after device disconnected.

Note that we're adding a new wait queue rather than reusing the tty-port
one as we do not want to get woken up at hangup (yet).

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoLinux 3.2.43 v3.2.43
Ben Hutchings [Wed, 10 Apr 2013 02:20:17 +0000 (03:20 +0100)]
Linux 3.2.43

9 years agoHID: microsoft: do not use compound literal - fix build
Jiri Slaby [Mon, 12 Nov 2012 09:16:09 +0000 (10:16 +0100)]
HID: microsoft: do not use compound literal - fix build

commit 6b90466cfec2a2fe027187d675d8d14217c12d82 upstream.

In patch "HID: microsoft: fix invalid rdesc for 3k kbd" I fixed
support for MS 3k keyboards. However the added check using memcmp and
a compound statement breaks build on architectures where memcmp is a
macro with parameters.

hid-microsoft.c:51:18: error: macro "memcmp" passed 6 arguments, but takes just 3

On x86_64, memcmp is a function, so I did not see the error.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agobonding: get netdev_rx_handler_unregister out of locks
Veaceslav Falico [Tue, 2 Apr 2013 05:15:16 +0000 (05:15 +0000)]
bonding: get netdev_rx_handler_unregister out of locks

commit fcd99434fb5c137274d2e15dd2a6a7455f0f29ff upstream.

Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agosmsc75xx: fix jumbo frame support
Steve Glendinning [Thu, 28 Mar 2013 02:34:41 +0000 (02:34 +0000)]
smsc75xx: fix jumbo frame support

[ Upstream commit 4c51e53689569398d656e631c17308d9b8e84650 ]

This patch enables RX of jumbo frames for LAN7500.

Previously the driver would transmit jumbo frames succesfully but
would drop received jumbo frames (incrementing the interface errors
count).

With this patch applied the device can succesfully receive jumbo
frames up to MTU 9000 (9014 bytes on the wire including ethernet
header).

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agopch_gbe: fix ip_summed checksum reporting on rx
Veaceslav Falico [Mon, 25 Mar 2013 22:26:21 +0000 (22:26 +0000)]
pch_gbe: fix ip_summed checksum reporting on rx

[ Upstream commit 76a0e68129d7d24eb995a6871ab47081bbfa0acc ]

skb->ip_summed should be CHECKSUM_UNNECESSARY when the driver reports that
checksums were correct and CHECKSUM_NONE in any other case. They're
currently placed vice versa, which breaks the forwarding scenario. Fix it
by placing them as described above.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agonet: add a synchronize_net() in netdev_rx_handler_unregister()
Eric Dumazet [Fri, 29 Mar 2013 03:01:22 +0000 (03:01 +0000)]
net: add a synchronize_net() in  netdev_rx_handler_unregister()

[ Upstream commit 00cfec37484761a44a3b6f4675a54caa618210ae ]

commit 35d48903e97819 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :

<quoting Steven>

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

        slave = bond_slave_get_rcu(skb->dev);
        bond = slave->bond; <--- BUG

the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
        dev->rx_handler = NULL;
        dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:

        CPU0                    CPU1
        ----                    ----
  rx_handler = skb->dev->rx_handler

                        netdev_rx_handler_unregister() {
                           dev->rx_handler = NULL;
                           dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!

What protection am I missing in the bond release handler that would
prevent the above from happening?

</quoting Steven>

We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.

A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.

The second way is better as it avoids an extra test in fast path.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoks8851: Fix interpretation of rxlen field.
Max.Nekludov@us.elster.com [Fri, 29 Mar 2013 05:27:36 +0000 (05:27 +0000)]
ks8851: Fix interpretation of rxlen field.

[ Upstream commit 14bc435ea54cb888409efb54fc6b76c13ef530e9 ]

According to the Datasheet (page 52):
15-12 Reserved
11-0 RXBC Receive Byte Count
This field indicates the present received frame byte size.

The code has a bug:
                 rxh = ks8851_rdreg32(ks, KS_RXFHSR);
                 rxstat = rxh & 0xffff;
                 rxlen = rxh >> 16; // BUG!!! 0xFFF mask should be applied

Signed-off-by: Max Nekludov <Max.Nekludov@us.elster.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoipv6: don't accept node local multicast traffic from the wire
Hannes Frederic Sowa [Tue, 26 Mar 2013 08:13:34 +0000 (08:13 +0000)]
ipv6: don't accept node local multicast traffic from  the wire

[ Upstream commit 1c4a154e5253687c51123956dfcee9e9dfa8542d ]

Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been
verified: http://www.rfc-editor.org/errata_search.php?eid=3480

We have to check for pkt_type and loopback flag because either the
packets are allowed to travel over the loopback interface (in which case
pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel
over a non-loopback interface back to us (in which case PACKET_TYPE is
PACKET_LOOPBACK and IFF_LOOPBACK flag is not set).

Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoipv6: fix bad free of addrconf_init_net
Hong Zhiguo [Mon, 25 Mar 2013 17:52:45 +0000 (01:52 +0800)]
ipv6: fix bad free of addrconf_init_net

[ Upstream commit a79ca223e029aa4f09abb337accf1812c900a800 ]

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoipv6: don't accept multicast traffic with scope 0
Hannes Frederic Sowa [Sun, 10 Feb 2013 05:35:22 +0000 (05:35 +0000)]
ipv6: don't accept multicast traffic with scope 0

[ Upstream commit 20314092c1b41894d8c181bf9aa6f022be2416aa ]

v2:
a) moved before multicast source address check
b) changed comment to netdev style

Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoDM9000B: driver initialization upgrade
Joseph CHANG [Thu, 28 Mar 2013 23:13:42 +0000 (23:13 +0000)]
DM9000B: driver initialization upgrade

[ Upstream commit 6741f40d198c6a5feb23653a1efd4ca47f93d83d ]

Fix bug for DM9000 revision B which contain a DSP PHY

DM9000B use DSP PHY instead previouse DM9000 revisions' analog PHY,
So need extra change in initialization, For
explicity PHY Reset and PHY init parameter, and
first DM9000_NCR reset need NCR_MAC_LBK bit by dm9000_probe().

Following DM9000_NCR reset cause by dm9000_open() clear the
NCR_MAC_LBK bit.

Without this fix, Power-up FIFO pointers error happen around 2%
rate among Davicom's customers' boards. With this fix, All above
cases can be solved.

Signed-off-by: Joseph CHANG <josright123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoatl1e: drop pci-msi support because of packet corruption
Hannes Frederic Sowa [Thu, 28 Mar 2013 18:10:50 +0000 (18:10 +0000)]
atl1e: drop pci-msi support because of packet  corruption

[ Upstream commit 188ab1b105c96656f6bcfb49d0d8bb1b1936b632 ]

Usage of pci-msi results in corrupted dma packet transfers to the host.

Reported-by: rebelyouth <rebelyouth.hacklab@gmail.com>
Cc: Huang, Xiong <xiong@qca.qualcomm.com>
Tested-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoaoe: reserve enough headroom on skbs
Eric Dumazet [Wed, 27 Mar 2013 18:28:41 +0000 (18:28 +0000)]
aoe: reserve enough headroom on skbs

[ Upstream commit 91c5746425aed8f7188a351f1224a26aa232e4b3 ]

Some network drivers use a non default hard_header_len

Transmitted skb should take into account dev->hard_header_len, or risk
crashes or expensive reallocations.

In the case of aoe, lets reserve MAX_HEADER bytes.

David reported a crash in defxx driver, solved by this patch.

Reported-by: David Oostdyk <daveo@ll.mit.edu>
Tested-by: David Oostdyk <daveo@ll.mit.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agodrivers: net: ethernet: davinci_emac: use netif_wake_queue() while restarting tx...
Mugunthan V N [Wed, 27 Mar 2013 04:42:00 +0000 (04:42 +0000)]
drivers: net: ethernet: davinci_emac: use  netif_wake_queue() while restarting tx queue

commit 7e51cde276ca820d526c6c21cf8147df595a36bf upstream.

To restart tx queue use netif_wake_queue() intead of netif_start_queue()
so that net schedule will restart transmission immediately which will
increase network performance while doing huge data transfers.

Reported-by: Dan Franke <dan.franke@schneider-electric.com>
Suggested-by: Sriramakrishnan A G <srk@ti.com>
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agobonding: fix disabling of arp_interval and miimon
nikolay@redhat.com [Wed, 27 Mar 2013 03:32:41 +0000 (03:32 +0000)]
bonding: fix disabling of arp_interval and miimon

[ Upstream commit 1bc7db16782c2a581fb4d53ca853631050f31611 ]

Currently if either arp_interval or miimon is disabled, they both get
disabled, and upon disabling they get executed once more which is not
the proper behaviour. Also when doing a no-op and disabling an already
disabled one, the other again gets disabled.
Also fix the error messages with the proper valid ranges, and a small
typo fix in the up delay error message (outputting "down delay", instead
of "up delay").

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agobonding: remove already created master sysfs link on failure
Veaceslav Falico [Tue, 26 Mar 2013 16:43:28 +0000 (17:43 +0100)]
bonding: remove already created master sysfs link on  failure

[ Upstream commit 9fe16b78ee17579cb4f333534cf7043e94c67024 ]

If slave sysfs symlink failes to be created - we end up without removing
the master sysfs symlink. Remove it in case of failure.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agounix: fix a race condition in unix_release()
Paul Moore [Mon, 25 Mar 2013 03:18:33 +0000 (03:18 +0000)]
unix: fix a race condition in unix_release()

[ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ]

As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned.  This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned.  This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agobonding: fix miimon and arp_interval delayed work race conditions
nikolay@redhat.com [Thu, 29 Nov 2012 01:31:31 +0000 (01:31 +0000)]
bonding: fix miimon and arp_interval delayed work race  conditions

[ Upstream commit fbb0c41b814d497c656fc7be9e35456f139cb2fb ]

First I would give three observations which will be used later.
Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
 This usage is wrong because the pending bit is cleared just before the
 work's fn is executed and if the function re-arms itself we might end up
 with the work still running. It's safe to call cancel_delayed_work_sync()
 even if the work is not queued at all.
Observation 2: Use of INIT_DELAYED_WORK()
 Work needs to be initialized only once prior to (de/en)queueing.
Observation 3: IFF_UP is set only after ndo_open is called

Related race conditions:
1. Race between bonding_store_miimon() and bonding_store_arp_interval()
 Because of Obs.1 we can end up having both works enqueued.
2. Multiple races with INIT_DELAYED_WORK()
 Since the works are not protected by anything between INIT_DELAYED_WORK()
 and calls to (en/de)queue it is possible for races between the following
 functions:
 (races are also possible between the calls to INIT_DELAYED_WORK()
  and workqueue code)
 bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
  bond_open(), enqueued functions
 bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
bond_open(), enqueued functions
3. By Obs.1 we need to change bond_cancel_all()

Bugs 1 and 2 are fixed by moving all work initializations in bond_open
which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
are cancelled in bond_close(), is guaranteed not to have any work
enqueued.
Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
they can't race with bond_close and bond_open. The opposing work is
cancelled only if the IFF_UP flag is set and it is cancelled
unconditionally. The opposing work is already cancelled if the interface
is down so no need to cancel it again. This way we don't need new
synchronizations for the bonding workqueue. These bugs (and fixes) are
tied together and belong in the same patch.
Note: I have left 1 line intentionally over 80 characters (84) because I
      didn't like how it looks broken down. If you'd prefer it otherwise,
      then simply break it.

 v2: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agothermal: shorten too long mcast group name
Masatake YAMATO [Mon, 1 Apr 2013 18:50:40 +0000 (14:50 -0400)]
thermal: shorten too long mcast group name

[ Upstream commits 73214f5d9f33b79918b1f7babddd5c8af28dd23d
  and f1e79e208076ffe7bad97158275f1c572c04f5c7, the latter
  adds an assertion to genetlink to prevent this from happening
  again in the future. ]

The original name is too long.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years ago8021q: fix a potential use-after-free
Cong Wang [Fri, 22 Mar 2013 19:14:07 +0000 (19:14 +0000)]
8021q: fix a potential use-after-free

[ Upstream commit 4a7df340ed1bac190c124c1601bfc10cde9fb4fb ]

vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.

This patch moves vlan_vid_del() as behind as possible.

Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotcp: undo spurious timeout after SACK reneging
Yuchung Cheng [Sun, 24 Mar 2013 10:42:25 +0000 (10:42 +0000)]
tcp: undo spurious timeout after SACK reneging

[ Upstream commit 7ebe183c6d444ef5587d803b64a1f4734b18c564 ]

On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotcp: preserve ACK clocking in TSO
Eric Dumazet [Thu, 21 Mar 2013 17:36:09 +0000 (17:36 +0000)]
tcp: preserve ACK clocking in TSO

[ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ]

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agosky2: Threshold for Pause Packet is set wrong
Mirko Lindner [Tue, 26 Mar 2013 06:38:42 +0000 (06:38 +0000)]
sky2: Threshold for Pause Packet is set wrong

[ Upstream commit 74f9f42c1c1650e74fb464f76644c9041f996851 ]

The sky2 driver sets the Rx Upper Threshold for Pause Packet generation to a
wrong value which leads to only 2kB of RAM remaining space. This can lead to
Rx overflow errors even with activated flow-control.

Fix: We should increase the value to 8192/8

Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agosky2: Receive Overflows not counted
Mirko Lindner [Tue, 26 Mar 2013 06:38:35 +0000 (06:38 +0000)]
sky2: Receive Overflows not counted

[ Upstream commit 9cfe8b156c21cf340b3a10ecb3022fbbc1c39185 ]

The sky2 driver doesn't count the Receive Overflows because the MAC
interrupt for this event is not set in the MAC's interrupt mask.
The MAC's interrupt mask is set only for Transmit FIFO Underruns.

Fix: The correct setting should be (GM_IS_TX_FF_UR | GM_IS_RX_FF_OR)
Otherwise the Receive Overflow event will not generate any interrupt.
The  Receive Overflow interrupt is handled correctly

Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoloop: prevent bdev freeing while device in use
Anatol Pomozov [Mon, 1 Apr 2013 16:47:56 +0000 (09:47 -0700)]
loop: prevent bdev freeing while device in use

commit c1681bf8a7b1b98edee8b862a42c19c4e53205fd upstream.

struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoNFS: nfs_getaclargs.acl_len is a size_t
Chuck Lever [Wed, 11 Jul 2012 20:30:32 +0000 (16:30 -0400)]
NFS: nfs_getaclargs.acl_len is a size_t

commit 56d08fef2369d5ca9ad2e1fc697f5379fd8af751 upstream.

Squelch compiler warnings:

fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]

Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
acl data", Dec 7, 2011.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoudf: Fix bitmap overflow on large filesystems with small block size
Jan Kara [Tue, 5 Feb 2013 12:59:56 +0000 (13:59 +0100)]
udf: Fix bitmap overflow on large filesystems with small block size

commit 89b1f39eb4189de745fae554b0d614d87c8d5c63 upstream.

For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.

Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.

Reported-and-tested-by: v10lator@myway.de
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agokey: Fix resource leak
Alan Cox [Fri, 28 Sep 2012 11:20:02 +0000 (12:20 +0100)]
key: Fix resource leak

commit a84a921978b7d56e0e4b87ffaca6367429b4d8ff upstream.

On an error iov may still have been reallocated and need freeing

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agort2x00: error in configurations with mesh support disabled
Felix Fietkau [Tue, 26 Feb 2013 15:09:55 +0000 (16:09 +0100)]
rt2x00: error in configurations with  mesh support disabled

commit 6ef9e2f6d12ce9e2120916804d2ddd46b954a70b upstream.

If CONFIG_MAC80211_MESH is not set, cfg80211 will now allow advertising
interface combinations with NL80211_IFTYPE_MESH_POINT present.
Add appropriate ifdefs to avoid running into errors.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[lxiang: Backported for 3.4-stable. Removed code of simultaneous AP and mesh
 mode added in 4a5fc6d 3.9-rc1.]
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoixgbe: fix registration order of driver and DCA nofitication
Jakub Kicinski [Wed, 3 Apr 2013 16:50:54 +0000 (16:50 +0000)]
ixgbe: fix registration order of driver and DCA nofitication

commit f01fc1a82c2ee68726b400fadb156bd623b5f2f1 upstream.

ixgbe_notify_dca cannot be called before driver registration
because it expects driver's klist_devices to be allocated and
initialized. While on it make sure debugfs files are removed
when registration fails.

Signed-off-by: Jakub Kicinski <jakub.kicinski@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: no debugfs support]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agomm: prevent mmap_cache race in find_vma()
Jan Stancek [Thu, 4 Apr 2013 18:35:10 +0000 (11:35 -0700)]
mm: prevent mmap_cache race in find_vma()

commit b6a9b7f6b1f21735a7456d534dc0e68e61359d2c upstream.

find_vma() can be called by multiple threads with read lock
held on mm->mmap_sem and any of them can update mm->mmap_cache.
Prevent compiler from re-fetching mm->mmap_cache, because other
readers could update it in the meantime:

               thread 1                             thread 2
                                        |
  find_vma()                            |  find_vma()
    struct vm_area_struct *vma = NULL;  |
    vma = mm->mmap_cache;               |
    if (!(vma && vma->vm_end > addr     |
        && vma->vm_start <= addr)) {    |
                                        |    mm->mmap_cache = vma;
    return vma;                         |
     ^^ compiler may optimize this      |
        local variable out and re-read  |
        mm->mmap_cache                  |

This issue can be reproduced with gcc-4.8.0-1 on s390x by running
mallocstress testcase from LTP, which triggers:

  kernel BUG at mm/rmap.c:1088!
    Call Trace:
     ([<000003d100c57000>] 0x3d100c57000)
      [<000000000023a1c0>] do_wp_page+0x2fc/0xa88
      [<000000000023baae>] handle_pte_fault+0x41a/0xac8
      [<000000000023d832>] handle_mm_fault+0x17a/0x268
      [<000000000060507a>] do_protection_exception+0x1e2/0x394
      [<0000000000603a04>] pgm_check_handler+0x138/0x13c
      [<000003fffcf1f07a>] 0x3fffcf1f07a
    Last Breaking-Event-Address:
      [<000000000024755e>] page_add_new_anon_rmap+0xc2/0x168

Thanks to Jakub Jelinek for his insight on gcc and helping to
track this down.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoEISA/PCI: Init EISA early, before PNP
Yinghai Lu [Thu, 28 Mar 2013 04:28:05 +0000 (21:28 -0700)]
EISA/PCI: Init EISA early, before PNP

commit c5fb301ae83bec6892e54984e6ec765c47df8e10 upstream.

Matthew reported kernels fail the pci_eisa probe and are later successful
with the virtual_eisa_root_init force probe without slot0.

The reason for that is: PNP probing is before pci_eisa_init gets called
as pci_eisa_init is called via pci_driver.

pnp 00:0f has 0xc80 - 0xc84 reserved.
[    9.700409] pnp 00:0f: [io  0x0c80-0x0c84]

so eisa_probe will fail from pci_eisa_init
==>eisa_root_register
==>eisa_probe path.
as force_probe is not set in pci_eisa_root, it will bail early when
slot0 is not probed and initialized.

Try to use subsys_initcall_sync instead, and will keep following sequence:
pci_subsys_init
pci_eisa_init_early
pnpacpi_init/isapnp_init

After this patch EISA can be initialized properly, and PNP overlapping
resource will not be reserved.
[   10.104434] system 00:0f: [io  0x0c80-0x0c84] could not be reserved

Reported-by: Matthew Whitehead <mwhitehe@redhat.com>
Tested-by: Matthew Whitehead <mwhitehe@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agospi/mpc512x-psc: optionally keep PSC SS asserted across xfer segmensts
Anatolij Gustschin [Wed, 13 Mar 2013 13:57:43 +0000 (14:57 +0100)]
spi/mpc512x-psc: optionally keep PSC SS asserted across xfer segmensts

commit 1ad849aee5f53353ed88d9cd3d68a51b03a7d44f upstream.

Some SPI slave devices require asserted chip select signal across
multiple transfer segments of an SPI message. Currently the driver
always de-asserts the internal SS signal for every single transfer
segment of the message and ignores the 'cs_change' flag of the
transfer description. Disable the internal chip select (SS) only
if this is needed and indicated by the 'cs_change' flag.

Without this change, each partial transfer of a surrounding
multi-part SPI transaction might erroneously change the SS
signal, which might prevent slaves from answering the request
that was sent in a previous transfer segment because the
transaction could be considered aborted (SS was de-asserted
before reading the response).

Reported-by: Gerhard Sittig <gerhard.sittig@ifm.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agovirtio: console: add locking around c_ovq operations
Amit Shah [Fri, 29 Mar 2013 11:00:08 +0000 (16:30 +0530)]
virtio: console: add locking around c_ovq operations

commit 9ba5c80b1aea8648a3efe5f22dc1f7cacdfbeeb8 upstream.

When multiple ovq operations are being performed (lots of open/close
operations on virtio_console fds), the __send_control_msg() function can
get confused without locking.

A simple recipe to cause badness is:
* create a QEMU VM with two virtio-serial ports
* in the guest, do
  while true;do echo abc >/dev/vport0p1;done
  while true;do echo edf >/dev/vport0p2;done

In one run, this caused a panic in __send_control_msg().  In another, I
got

   virtio_console virtio0: control-o:id 0 is not a head!

This also results repeated messages similar to these on the host:

  qemu-kvm: virtio-serial-bus: Unexpected port id 478762112 for device virtio-serial-bus.0
  qemu-kvm: virtio-serial-bus: Unexpected port id 478762368 for device virtio-serial-bus.0

Reported-by: FuXiangChun <xfu@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agovirtio: console: rename cvq_lock to c_ivq_lock
Amit Shah [Fri, 29 Mar 2013 11:00:07 +0000 (16:30 +0530)]
virtio: console: rename cvq_lock to c_ivq_lock

commit 165b1b8bbc17c9469b053bab78b11b7cbce6d161 upstream.

The cvq_lock was taken for the c_ivq.  Rename the lock to make that
obvious.

We'll also add a lock around the c_ovq in the next commit, so there's no
ambiguity.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
[bwh: Backported to 3.2:
 - Adjust context
 - Drop change to virtcons_restore()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotile: expect new initramfs name from hypervisor file system
Chris Metcalf [Fri, 29 Mar 2013 17:50:21 +0000 (13:50 -0400)]
tile: expect new initramfs name from hypervisor file system

commit ff7f3efb9abf986f4ecd8793a9593f7ca4d6431a upstream.

The current Tilera boot infrastructure now provides the initramfs
to Linux as a Tilera-hypervisor file named "initramfs", rather than
"initramfs.cpio.gz", as before.  (This makes it reasonable to use
other compression techniques than gzip on the file without having to
worry about the name causing confusion.)  Adapt to use the new name,
but also fall back to checking for the old name.

Cc'ing to stable so that older kernels will remain compatible with
newer Tilera boot infrastructure.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoreiserfs: Fix warning and inode leak when deleting inode with xattrs
Jan Kara [Fri, 29 Mar 2013 14:39:16 +0000 (15:39 +0100)]
reiserfs: Fix warning and inode leak when deleting inode with xattrs

commit 35e5cbc0af240778e61113286c019837e06aeec6 upstream.

After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs
started failing to delete xattrs from inode. This was due to a buggy
test for '.' and '..' in fill_with_dentries() which resulted in passing
'.' and '..' entries to lookup_one_len() in some cases. That returned
error and so we failed to iterate over all xattrs of and inode.

Fix the test in fill_with_dentries() along the lines of the one in
lookup_one_len().

Reported-by: Pawel Zawora <pzawora@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agousb: ftdi_sio: Add support for Mitsubishi FX-USB-AW/-BD
Konstantin Holoborodko [Thu, 28 Mar 2013 15:06:13 +0000 (00:06 +0900)]
usb: ftdi_sio: Add support for Mitsubishi FX-USB-AW/-BD

commit 482b0b5d82bd916cc0c55a2abf65bdc69023b843 upstream.

It enhances the driver for FTDI-based USB serial adapters
to recognize Mitsubishi Electric Corp. USB/RS422 Converters
as FT232BM chips and support them.
https://search.meau.com/?q=FX-USB-AW

Signed-off-by: Konstantin Holoborodko <klh.kernel@gmail.com>
Tested-by: Konstantin Holoborodko <klh.kernel@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoBtrfs: limit the global reserve to 512mb
Josef Bacik [Tue, 26 Mar 2013 19:31:45 +0000 (15:31 -0400)]
Btrfs: limit the global reserve to 512mb

commit fdf30d1c1b386e1b73116cc7e0fb14e962b763b0 upstream.

A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space.  This is because the
global block reserve was taking up the entire free metadata space.  This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done.  This allowed the user to
complete his rsync without issues.  Thanks

Reported-and-tested-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agotg3: fix length overflow in VPD firmware parsing
Kees Cook [Wed, 27 Mar 2013 06:40:50 +0000 (06:40 +0000)]
tg3: fix length overflow in VPD firmware parsing

commit 715230a44310a8cf66fbfb5a46f9a62a9b2de424 upstream.

Commit 184b89044fb6e2a74611dafa69b1dce0d98612c6 ("tg3: Use VPD fw version
when present") introduced VPD parsing that contained a potential length
overflow.

Limit the hardware's reported firmware string length (max 255 bytes) to
stay inside the driver's firmware string length (32 bytes). On overflow,
truncate the formatted firmware string instead of potentially overwriting
portions of the tg3 struct.

http://cansecwest.com/slides/2013/PrivateCore%20CSW%202013.pdf

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Oded Horovitz <oded@privatecore.com>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agob43: A fix for DMA transmission sequence errors
Iestyn C. Elfick [Wed, 20 Mar 2013 19:02:31 +0000 (14:02 -0500)]
b43: A fix for DMA transmission sequence errors

commit b251412db99ccd4495ce372fec7daee27bf06923 upstream.

Intermittently, b43 will report "Out of order TX status report on DMA ring".
When this happens, the driver must be reset before communication can resume.
The cause of the problem is believed to be an error in the closed-source
firmware; however, all versions of the firmware are affected.

This change uses the observation that the expected status is always 2 less
than the observed value, and supplies a fake status report to skip one
header/data pair.

Not all devices suffer from this problem, but it can occur several times
per second under heavy load. As each occurence kills the unmodified driver,
this patch makes if possible for the affected devices to function. The patch
logs only the first instance of the reset operation to prevent spamming
the logs.

Tested-by: Chris Vine <chris@cvine.freeserve.co.uk>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agocan: sja1000: fix define conflict on SH
Marc Kleine-Budde [Wed, 27 Mar 2013 10:36:42 +0000 (11:36 +0100)]
can: sja1000: fix define conflict on SH

commit f901b6bc404b67d96eca739857c097e022727b71 upstream.

Thias patch fixes a define conflict between the SH architecture and the sja1000
driver:

    drivers/net/can/sja1000/sja1000.h:59:0: warning:
        "REG_SR" redefined [enabled by default]
    arch/sh/include/asm/ptrace_32.h:25:0: note:
         this is the location of the previous definition

A SJA1000_ prefix is added to the offending sja1000 define only, to make a
minimal patch suited for stable. A later patch will add a SJA1000_ prefix to
all defines in sja1000.h.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoiommu/amd: Make sure dma_ops are set for hotplug devices
Joerg Roedel [Tue, 26 Mar 2013 21:48:23 +0000 (22:48 +0100)]
iommu/amd: Make sure dma_ops are set for hotplug devices

commit c2a2876e863356b092967ea62bebdb4dd663af80 upstream.

There is a bug introduced with commit 27c2127 that causes
devices which are hot unplugged and then hot-replugged to
not have per-device dma_ops set. This causes these devices
to not function correctly. Fixed with this patch.

Reported-by: Andreas Degert <andreas.degert@googlemail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agovt: synchronize_rcu() under spinlock is not nice...
Al Viro [Wed, 27 Mar 2013 00:30:17 +0000 (20:30 -0400)]
vt: synchronize_rcu() under spinlock is not nice...

commit e8cd81693bbbb15db57d3c9aa7dd90eda4842874 upstream.

vcs_poll_data_free() calls unregister_vt_notifier(), which calls
atomic_notifier_chain_unregister(), which calls synchronize_rcu().
Do it *after* we'd dropped ->f_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoNest rename_lock inside vfsmount_lock
Al Viro [Tue, 26 Mar 2013 22:25:57 +0000 (18:25 -0400)]
Nest rename_lock inside vfsmount_lock

commit 7ea600b5314529f9d1b9d6d3c41cb26fce6a7a4a upstream.

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.

Spotted-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
 - Adjust context
 - s/&vfsmount_lock/vfsmount_lock/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agonfsd4: reject "negative" acl lengths
J. Bruce Fields [Tue, 26 Mar 2013 18:11:13 +0000 (14:11 -0400)]
nfsd4: reject "negative" acl lengths

commit 64a817cfbded8674f345d1117b117f942a351a69 upstream.

Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.

The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agostaging: comedi: s626: fix continuous acquisition
Ian Abbott [Fri, 22 Mar 2013 15:16:29 +0000 (15:16 +0000)]
staging: comedi: s626: fix continuous acquisition

commit e4317ce877a31dbb9d96375391c1c4ad2210d637 upstream.

For the s626 driver, there is a bug in the handling of asynchronous
commands on the AI subdevice when the stop source is `TRIG_NONE`.  The
command should run continuously until cancelled, but the interrupt
handler stops the command running after the first scan.

The command set-up function `s626_ai_cmd()` contains this code:

switch (cmd->stop_src) {
case TRIG_COUNT:
/*  data arrives as one packet */
devpriv->ai_sample_count = cmd->stop_arg;
devpriv->ai_continous = 0;
break;
case TRIG_NONE:
/*  continous acquisition */
devpriv->ai_continous = 1;
devpriv->ai_sample_count = 0;
break;
}

The interrupt handler `s626_irq_handler()` contains this code:

if (!(devpriv->ai_continous))
devpriv->ai_sample_count--;
if (devpriv->ai_sample_count <= 0) {
devpriv->ai_cmd_running = 0;
/* ... */
}

So `devpriv->ai_sample_count` is only decremented for the `TRIG_COUNT`
case, but `devpriv->ai_cmd_running` is set to 0 (and the command
stopped) regardless.

Fix this in `s626_ai_cmd()` by setting `devpriv->ai_sample_count = 1`
for the `TRIG_NONE` case.  The interrupt handler will not decrement it
so it will remain greater than 0 and the check for stopping the
acquisition will fail.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agousb: xhci: Fix TRB transfer length macro used for Event TRB.
Vivek Gautam [Thu, 21 Mar 2013 06:36:48 +0000 (12:06 +0530)]
usb: xhci: Fix TRB transfer length macro used for Event TRB.

commit 1c11a172cb30492f5f6a82c6e118fdcd9946c34f upstream.

Use proper macro while extracting TRB transfer length from
Transfer event TRBs. Adding a macro EVENT_TRB_LEN (bits 0:23)
for the same, and use it instead of TRB_LEN (bits 0:16) in
case of event TRBs.

This patch should be backported to kernels as old as 2.6.31, that
contain the commit b10de142119a676552df3f0d2e3a9d647036c26a "USB: xhci:
Bulk transfer support".  This patch will have issues applying to older
kernels.

Signed-off-by: Vivek gautam <gautam.vivek@samsung.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoSUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
Trond Myklebust [Mon, 25 Mar 2013 15:23:40 +0000 (11:23 -0400)]
SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked

commit 1166fde6a923c30f4351515b6a9a1efc513e7d00 upstream.

We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
9 years agoIPoIB: Fix send lockup due to missed TX completion
Mike Marciniszyn [Tue, 26 Feb 2013 15:46:27 +0000 (15:46 +0000)]
IPoIB: Fix send lockup due to missed TX completion

commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream.

Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>