pandora-kernel.git
11 years agomm: thp: set the accessed flag for old pages on access fault
Will Deacon [Wed, 12 Dec 2012 00:01:27 +0000 (16:01 -0800)]
mm: thp: set the accessed flag for old pages on access fault

On x86 memory accesses to pages without the ACCESSED flag set result in
the ACCESSED flag being set automatically.  With the ARM architecture a
page access fault is raised instead (and it will continue to be raised
until the ACCESSED flag is set for the appropriate PTE/PMD).

For normal memory pages, handle_pte_fault will call pte_mkyoung
(effectively setting the ACCESSED flag).  For transparent huge pages,
pmd_mkyoung will only be called for a write fault.

This patch ensures that faults on transparent hugepages which do not
result in a CoW update the access flags for the faulting pmd.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

mm/huge_memory.c

11 years agomm: thp: fix the update_mmu_cache() last argument passing in mm/huge_memory.c
Catalin Marinas [Mon, 8 Oct 2012 23:33:01 +0000 (16:33 -0700)]
mm: thp: fix the update_mmu_cache() last argument passing in mm/huge_memory.c

The update_mmu_cache() takes a pointer (to pte_t by default) as the last
argument but the huge_memory.c passes a pmd_t value.  The patch changes
the argument to the pmd_t * pointer.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

mm/huge_memory.c

11 years agomm: hugetlb: add arch hook for clearing page flags before entering pool
Will Deacon [Mon, 8 Oct 2012 23:29:32 +0000 (16:29 -0700)]
mm: hugetlb: add arch hook for clearing page flags before entering pool

The core page allocator ensures that page flags are zeroed when freeing
pages via free_pages_check.  A number of architectures (ARM, PPC, MIPS)
rely on this property to treat new pages as dirty with respect to the data
cache and perform the appropriate flushing before mapping the pages into
userspace.

This can lead to cache synchronisation problems when using hugepages,
since the allocator keeps its own pool of pages above the usual page
allocator and does not reset the page flags when freeing a page into the
pool.

This patch adds a new architecture hook, arch_clear_hugepage_flags, so
that architectures which rely on the page flags being in a particular
state for fresh allocations can adjust the flags accordingly when a page
is freed into the pool.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

arch/tile/include/asm/hugetlb.h

11 years agothp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE
Gerald Schaefer [Mon, 8 Oct 2012 23:30:04 +0000 (16:30 -0700)]
thp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE

Cleanup patch in preparation for transparent hugepage support on s390.
Adding new architectures to the TRANSPARENT_HUGEPAGE config option can
make the "depends" line rather ugly, like "depends on (X86 || (S390 &&
64BIT)) && MMU".

This patch adds a HAVE_ARCH_TRANSPARENT_HUGEPAGE instead.  x86 already has
MMU "def_bool y", so the MMU check is superfluous there and
HAVE_ARCH_TRANSPARENT_HUGEPAGE can be selected in arch/x86/Kconfig.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

arch/Kconfig
arch/x86/Kconfig

11 years agoARM: mm: Transparent huge page support for non-LPAE systems.
Steve Capper [Fri, 8 Feb 2013 15:01:23 +0000 (17:01 +0200)]
ARM: mm: Transparent huge page support for non-LPAE systems.

Much of the required code for THP has been implemented in the
earlier non-LPAE HugeTLB patch.

One more domain bit is used (to store whether or not the THP is
splitting).

Some THP helper functions are defined; and we have to re-define
pmd_page such that it distinguishes between page tables and
sections.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
11 years agoARM: mm: Transparent huge page support for LPAE systems.
Catalin Marinas [Fri, 8 Feb 2013 15:01:22 +0000 (17:01 +0200)]
ARM: mm: Transparent huge page support for LPAE systems.

The patch adds support for THP (transparent huge pages) to LPAE
systems. When this feature is enabled, the kernel tries to map
anonymous pages as 2MB sections where possible.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper@arm.com: symbolic constants used, value of PMD_SECT_SPLITTING
adjusted, tlbflush.h included in pgtable.h]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
11 years agoARM: mm: HugeTLB support for non-LPAE systems.
Steve Capper [Fri, 8 Feb 2013 15:01:21 +0000 (17:01 +0200)]
ARM: mm: HugeTLB support for non-LPAE systems.

Based on Bill Carson's HugeTLB patch, with the big difference being
in the way PTEs are passed back to the memory manager. Rather than
store a "Linux Huge PTE" separately; we make one up on the fly in
huge_ptep_get. Also rather than consider 16M supersections, we focus
solely on 2x1M sections.

To construct a huge PTE on the fly we need additional information
(such as the accessed flag and dirty bit) which we choose to store
in the domain bits of the short section descriptor. In order to use
these domain bits for storage, we need to make ourselves a client
for all 16 domains and this is done in head.S.

Storing extra information in the domain bits also makes it a lot
easier to implement Transparent Huge Pages, and some of the code in
pgtable-2level.h is arranged to facilitate THP support in a later
patch.

Non-LPAE HugeTLB pages are incompatible with the huge page migration
code (enabled when CONFIG_MEMORY_FAILURE is selected) as that code
dereferences PTEs directly, rather than calling huge_ptep_get and
set_huge_pte_at.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
11 years agoARM: mm: HugeTLB support for LPAE systems.
Catalin Marinas [Fri, 8 Feb 2013 15:01:20 +0000 (17:01 +0200)]
ARM: mm: HugeTLB support for LPAE systems.

This patch adds support for hugetlbfs based on the x86 implementation.
It allows mapping of 2MB sections (see Documentation/vm/hugetlbpage.txt
for usage). The 64K pages configuration is not supported (section size
is 512MB in this case).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper@arm.com: symbolic constants replace numbers in places.
Split up into multiple files, to simplify future non-LPAE support,
removed huge_pmd_share code, as this is very rarely executed].
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
11 years agoARM: mm: Add support for flushing HugeTLB pages.
Steve Capper [Fri, 8 Feb 2013 15:01:19 +0000 (17:01 +0200)]
ARM: mm: Add support for flushing HugeTLB pages.

On ARM we use the __flush_dcache_page function to flush the dcache
of pages when needed; usually when the PG_dcache_clean bit is unset
and we are setting a PTE.

A HugeTLB page is represented as a compound page consisting of an
array of pages. Thus to flush the dcache of a HugeTLB page, one must
flush more than a single page.

This patch modifies __flush_dcache_page such that all constituent
pages of a HugeTLB page are flushed.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
11 years agoARM: mm: correct pte_same behaviour for LPAE.
Steve Capper [Fri, 8 Feb 2013 15:01:18 +0000 (17:01 +0200)]
ARM: mm: correct pte_same behaviour for LPAE.

For 3 levels of paging the PTE_EXT_NG bit will be set for user
address ptes that are written to a page table but not for ptes
created with mk_pte.

This can cause some comparison tests made by pte_same to fail
spuriously and lead to other problems.

To correct this behaviour, we mask off PTE_EXT_NG for any pte that
is present before running the comparison.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
11 years agoARM: mm: introduce L_PTE_VALID for page table entries
Will Deacon [Thu, 19 Jul 2012 10:51:05 +0000 (11:51 +0100)]
ARM: mm: introduce L_PTE_VALID for page table entries

For long-descriptor translation table formats, the ARMv7 architecture
defines the last two bits of the second- and third-level descriptors to
be:

x0b - Invalid
01b - Block (second-level), Reserved (third-level)
11b - Table (second-level), Page (third-level)

This allows us to define L_PTE_PRESENT as (3 << 0) and use this value to
create ptes directly. However, when determining whether a given pte
value is present in the low-level page table accessors, we only need to
check the least significant bit of the descriptor, allowing us to write
faulting, present entries which are required for PROT_NONE mappings.

This patch introduces L_PTE_VALID, which can be used to test whether a
pte should fault, and updates the low-level page table accessors
accordingly.

Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agomm: thp: fix the pmd_clear() arguments in pmdp_get_and_clear()
Catalin Marinas [Mon, 8 Oct 2012 23:32:59 +0000 (16:32 -0700)]
mm: thp: fix the pmd_clear() arguments in pmdp_get_and_clear()

The CONFIG_TRANSPARENT_HUGEPAGE implementation of pmdp_get_and_clear()
calls pmd_clear() with 3 arguments instead of 1.

This happens only for !__HAVE_ARCH_PMDP_GET_AND_CLEAR which doesn't seem
to happen because x86 defines this and it uses pmd_update.

[mhocko@suse.cz: changelog addition]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoARM: 7323/1: Do not allow ARM_LPAE on pre-ARMv7 architectures
Catalin Marinas [Tue, 14 Feb 2012 15:33:27 +0000 (16:33 +0100)]
ARM: 7323/1: Do not allow ARM_LPAE on pre-ARMv7 architectures

This patch expands the Kconfig dependencies for ARM_LPAE to not allow
enabling when architectures other than ARMv7 are built into the kernel.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
11 years agoARM: 7275/1: LPAE: Check the CPU support for the long descriptor format
Catalin Marinas [Mon, 9 Jan 2012 11:24:47 +0000 (12:24 +0100)]
ARM: 7275/1: LPAE: Check the CPU support for the long descriptor format

This patch adds a check for the presence of the LPAE feature during the
CPU initialisation. If not present, it reports an error when
CONFIG_DEBUG_LL is enabled.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
11 years agoARM: kexec: use soft_restart for branching to the reboot buffer
Will Deacon [Mon, 6 Jun 2011 11:35:46 +0000 (12:35 +0100)]
ARM: kexec: use soft_restart for branching to the reboot buffer

Now that there is a common way to reset the machine, let's use it
instead of reinventing the wheel in the kexec backend.

Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: stop: execute platform callback from cpu_stop code
Will Deacon [Mon, 6 Jun 2011 14:49:23 +0000 (15:49 +0100)]
ARM: stop: execute platform callback from cpu_stop code

Sending IPI_CPU_STOP to a CPU causes it to execute a busy cpu_relax
loop forever. This makes it impossible to kexec successfully on an SMP
system since the secondary CPUs do not reset.

This patch adds a callback to platform_cpu_kill, defined when
CONFIG_HOTPLUG_CPU=y, from the ipi_cpu_stop handling code. This function
currently just returns 1 on all platforms that define it but allows them
to do something more sophisticated in the future.

Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: reset: implement soft_restart for jumping to a physical address
Will Deacon [Mon, 6 Jun 2011 11:28:54 +0000 (12:28 +0100)]
ARM: reset: implement soft_restart for jumping to a physical address

Tools such as kexec and CPU hotplug require a way to reset the processor
and branch to some code in physical space. This requires various bits of
jiggery pokery with the caches and MMU which, when it goes wrong, tends
to lock up the system.

This patch fleshes out the soft_restart implementation so that it
branches to the reset code using the identity mapping. This requires us
to change to a temporary stack, held within the kernel image as a static
array, to avoid conflicting with the new view of memory.

Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: lib: add call_with_stack function for safely changing stack
Will Deacon [Wed, 8 Jun 2011 14:29:00 +0000 (15:29 +0100)]
ARM: lib: add call_with_stack function for safely changing stack

When disabling the MMU, it is necessary to take out a 1:1 identity map
of the reset code so that it can safely be executed with and without
the MMU active. To avoid the situation where the physical address of the
reset code aliases with the virtual address of the active stack (which
cannot be included in the 1:1 mapping), it is desirable to change to a
new stack at a location which is less likely to alias.

This code adds a new lib function, call_with_stack:

void call_with_stack(void (*fn)(void *), void *arg, void *sp);

which changes the stack to point at the sp parameter, before invoking
fn(arg) with the new stack selected.

Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: restart: only perform setup for restart when soft-restarting
Russell King [Tue, 1 Nov 2011 13:16:26 +0000 (13:16 +0000)]
ARM: restart: only perform setup for restart when soft-restarting

We only need to set the system up for a soft-restart if we're going to
be doing a soft-restart.  Provide a new function (soft_restart()) which
does the setup and final call for this, and make platforms use it.
Eliminate the call to setup_restart() from the default handler.

This means that platforms arch_reset() function is no longer called with
the page tables prepared for a soft-restart, and caches will still be
enabled.

Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@st.com>
Acked-by: Krzysztof Ha■asa <khc@pm.waw.pl>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Acked-by: Wan ZongShun <mcuos.com@gmail.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
11 years agoARM: restart: remove argument to setup_mm_for_reboot()
Russell King [Tue, 1 Nov 2011 10:15:27 +0000 (10:15 +0000)]
ARM: restart: remove argument to setup_mm_for_reboot()

setup_mm_for_reboot() doesn't make use of its argument, so remove it.

Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
11 years agoARM: restart: move reboot failure handing into machine_restart()
Russell King [Mon, 31 Oct 2011 09:22:22 +0000 (09:22 +0000)]
ARM: restart: move reboot failure handing into machine_restart()

Move the failure to reboot into machine_restart() to always catch
this condition, even if a platform decides to hook the restarting
via arm_pm_restart().

Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Conflicts:

arch/arm/kernel/process.c

11 years agoARM: LPAE: Add the Kconfig entries
Catalin Marinas [Tue, 22 Nov 2011 17:30:32 +0000 (17:30 +0000)]
ARM: LPAE: Add the Kconfig entries

This patch adds the ARM_LPAE and ARCH_PHYS_ADDR_T_64BIT Kconfig entries
allowing LPAE support to be compiled into the kernel.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: mark memory banks with start > ULONG_MAX as highmem
Will Deacon [Tue, 22 Nov 2011 17:30:32 +0000 (17:30 +0000)]
ARM: LPAE: mark memory banks with start > ULONG_MAX as highmem

Memory banks living outside of the 32-bit physical address
space do not have a 1:1 pa <-> va mapping and therefore the
__va macro may wrap.

This patch ensures that such banks are marked as highmem so
that the Kernel doesn't try to split them up when it sees that
the wrapped virtual address overlaps the vmalloc space.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
11 years agoARM: LPAE: Add identity mapping support for the 3-level page table format
Catalin Marinas [Tue, 22 Nov 2011 17:30:32 +0000 (17:30 +0000)]
ARM: LPAE: Add identity mapping support for the 3-level page table format

With LPAE, the pgd is a separate page table with entries pointing to the
pmd. The identity_mapping_add() function needs to ensure that the pgd is
populated before populating the pmd level. The do..while blocks now loop
over the pmd in order to have the same implementation for the two page
table formats. The pmd_addr_end() definition has been removed and the
generic one used instead. The pmd clean-up is done in the pgd_free()
function.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Add context switching support
Catalin Marinas [Tue, 22 Nov 2011 17:30:31 +0000 (17:30 +0000)]
ARM: LPAE: Add context switching support

With LPAE, TTBRx registers are 64-bit. The ASID is stored in TTBR0
rather than a separate Context ID register. This patch makes the
necessary changes to handle context switching on LPAE.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Add fault handling support
Catalin Marinas [Tue, 22 Nov 2011 17:30:31 +0000 (17:30 +0000)]
ARM: LPAE: Add fault handling support

The DFSR and IFSR register format is different when LPAE is enabled. In
addition, DFSR and IFSR have similar definitions for the fault type.
This modifies the fault code to correctly handle the new format.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Invalidate the TLB before freeing the PMD
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: Invalidate the TLB before freeing the PMD

Similar to the PTE freeing, this patch introduced __pmd_free_tlb() which
invalidates the TLB before freeing a PMD page. This is needed because on
newer processors the entry in the upper page table may be cached by the
TLB and point to random data after the PMD has been freed.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: MMU setup for the 3-level page table format
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: MMU setup for the 3-level page table format

This patch adds the MMU initialisation for the LPAE page table format.
The swapper_pg_dir size with LPAE is 5 rather than 4 pages. A new
proc-v7-3level.S file contains the TTB initialisation, context switch
and PTE setting code with the LPAE. The TTBRx split is based on the
PAGE_OFFSET with TTBR1 used for the kernel mappings. The 36-bit mappings
(supersections) and a few other memory types in mmu.c are conditionally
compiled.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Conflicts:

arch/arm/kernel/head.S

11 years agoARM: LPAE: Page table maintenance for the 3-level format
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: Page table maintenance for the 3-level format

This patch modifies the pgd/pmd/pte manipulation functions to support
the 3-level page table format. Since there is no need for an 'ext'
argument to cpu_set_pte_ext(), this patch conditionally defines a
different prototype for this function when CONFIG_ARM_LPAE.

The patch also introduces the L_PGD_SWAPPER flag to mark pgd entries
pointing to pmd tables pre-allocated in the swapper_pg_dir and avoid
trying to free them at run-time. This flag is 0 with the classic page
table format.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Introduce the 3-level page table format definitions
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: Introduce the 3-level page table format definitions

This patch introduces the pgtable-3level*.h files with definitions
specific to the LPAE page table format (3 levels of page tables).

Each table is 4KB and has 512 64-bit entries. An entry can point to a
40-bit physical address. The young, write and exec software bits share
the corresponding hardware bits (negated). Other software bits use spare
bits in the PTE.

The patch also changes some variable types from unsigned long or int to
pteval_t or pgprot_t.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: add ISBs around MMU enabling code
Will Deacon [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: add ISBs around MMU enabling code

Before we enable the MMU, we must ensure that the TTBR registers contain
sane values. After the MMU has been enabled, we jump to the *virtual*
address of the following function, so we also need to ensure that the
SCTLR write has taken effect.

This patch adds ISB instructions around the SCTLR write to ensure the
visibility of the above.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Factor out classic-MMU specific code into proc-v7-2level.S
Catalin Marinas [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: Factor out classic-MMU specific code into proc-v7-2level.S

This patch modifies the proc-v7.S file so that it only contains code
shared between classic MMU and LPAE. The non-common code is factored out
into a separate file.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Move the FSR definitions to separate files
Catalin Marinas [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: Move the FSR definitions to separate files

The FSR structure is different with LPAE and this patch moves the
classic MMU specific definition to a separate fsr-2level.c file that is
included in fault.c. It also moves the fsr_fs and FSR bits to the
fault.h file.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: LPAE: Move page table maintenance macros to pgtable-2level.h
Catalin Marinas [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: Move page table maintenance macros to pgtable-2level.h

The page table maintenance macros need to be duplicated between the
classic and the LPAE MMU so this patch moves those that are not common
to the pgtable-2level.h file.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: pgtable: switch to use pgtable-nopud.h
Russell King [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: pgtable: switch to use pgtable-nopud.h

Nick Piggin noted upon introducing 4level-fixup.h:

| Add a temporary "fallback" header so architectures can run with
| the 4level pagetables patch without modification. All architectures
| should be converted to use the folding headers (include/asm-generic/
| pgtable-nop?d.h) as soon as possible, and the fallback header removed.

This makes ARM compliant with this statement.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: pgtable: Fix compiler warning in ioremap.c introduced by nopud
Catalin Marinas [Tue, 22 Nov 2011 17:30:27 +0000 (17:30 +0000)]
ARM: pgtable: Fix compiler warning in ioremap.c introduced by nopud

With the arch/arm code conversion to pgtable-nopud.h, the section and
supersection (un|re)map code triggers compiler warnings on UP systems.
This is caused by pmd_offset() being given a pgd_t argument rather than
a pud_t one. This patch makes the necessary conversion with the
assumption that the pud is folded into the pgd. The page table setting
code only loops over the pmd which is enough with the classic page
tables. This code is not compiled when LPAE is enabled.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
11 years agoARM: SMP: use idmap_pgd for mapping MMU enable during secondary booting
Will Deacon [Wed, 23 Nov 2011 12:26:25 +0000 (12:26 +0000)]
ARM: SMP: use idmap_pgd for mapping MMU enable during secondary booting

The ARM SMP booting code allocates a temporary set of page tables
containing an identity mapping of the kernel image and provides this
to secondary CPUs for initial booting.

In reality, we only need to include the __turn_mmu_on function in the
identity mapping since the rest of the kernel is executing from virtual
addresses after this point.

This patch adds __turn_mmu_on to the .idmap.text section, allowing the
SMP booting code to use the idmap_pgd directly and not have to populate
its own set of page table.

As a result of this patch, we can make the identity_mapping_add function
static (since it is only used within mm/idmap.c) and also remove the
identity_mapping_del function. The identity map population is moved to
an early initcall so that it is setup in time for secondary CPU bringup.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: head.S: only include __turn_mmu_on in the initial identity mapping
Will Deacon [Wed, 23 Nov 2011 12:03:27 +0000 (12:03 +0000)]
ARM: head.S: only include __turn_mmu_on in the initial identity mapping

__create_page_tables identity maps the region of memory from
__enable_mmu to the end of __turn_mmu_on.

In preparation for including __turn_mmu_on in the .idmap.text section,
this patch modifies the identity mapping so that it only includes the
__turn_mmu_on code.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: idmap: use idmap_pgd when setting up mm for reboot
Will Deacon [Wed, 8 Jun 2011 14:53:34 +0000 (15:53 +0100)]
ARM: idmap: use idmap_pgd when setting up mm for reboot

For soft-rebooting a system, it is necessary to map the MMU-off code
with an identity mapping so that execution can continue safely once the
MMU has been switched off.

Currently, switch_mm_for_reboot takes out a 1:1 mapping from 0x0 to
TASK_SIZE during reboot in the hope that the reset code lives at a
physical address corresponding to a userspace virtual address.

This patch modifies the code so that we switch to the idmap_pgd tables,
which contain a 1:1 mapping of the cpu_reset code. This has the
advantage of only remapping the code that we need and also means we
don't need to worry about allocating a pgd from an atomic context in the
case that the physical address of the cpu_reset code aliases with the
virtual space used by the kernel.

Acked-by: Dave Martin <dave.martin@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: proc-*.S: place cpu_reset functions into .idmap.text section
Will Deacon [Tue, 15 Nov 2011 13:25:04 +0000 (13:25 +0000)]
ARM: proc-*.S: place cpu_reset functions into .idmap.text section

The CPU reset functions disable the MMU and therefore must be executed
with an identity mapping in place.

This patch places the CPU reset functions into the .idmap.text section,
causing the idmap code to include them as part of the identity mapping.

Acked-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: suspend: use idmap_pgd instead of suspend_pgd
Will Deacon [Tue, 15 Nov 2011 11:11:19 +0000 (11:11 +0000)]
ARM: suspend: use idmap_pgd instead of suspend_pgd

The ARM CPU suspend code requires cpu_resume_mmu to be identity mapped
in order to re-enable the MMU when coming out of suspend. Currently,
this is accomplished by maintaining a suspend_pgd with the relevant
mapping put in place at init time.

This patch replaces the use of suspend_pgd with the new idmap_pgd.
cpu_resume_mmu is placed in the .idmap.text section so that it is
included in the identity map.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Dave Martin <dave.martin@linaro.org>
Tested-by: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: idmap: populate identity map pgd at init time using .init.text
Will Deacon [Fri, 30 Sep 2011 10:43:29 +0000 (11:43 +0100)]
ARM: idmap: populate identity map pgd at init time using .init.text

When disabling and re-enabling the MMU, it is necessary to take out an
identity mapping for the code that manipulates the SCTLR in order to
avoid it disappearing from under our feet. This is useful when soft
rebooting and returning from CPU suspend.

This patch allocates a set of page tables during boot and populates them
with an identity mapping for the .idmap.text section. This means that
users of the identity map do not need to manage their own pgd and can
instead annotate their functions with __idmap or, in the case of assembly
code, place them in the correct section.

Acked-by: Dave Martin <dave.martin@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoRevert "Add various hugetlb arm high level hooks"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:04:05 +0000 (18:04 +0200)]
Revert "Add various hugetlb arm high level hooks"

This reverts commit 6e0faabf2ee89e41e65ce17e927763ef08bee903.

11 years agoRevert "Add various hugetlb page table fix"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:03:59 +0000 (18:03 +0200)]
Revert "Add various hugetlb page table fix"

This reverts commit c6fd24f0d779a73f784e64eb36496da6e11833a0.

11 years agoRevert "Introduce set_hugepte_ext api to setup huge hardware pmds"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:03:42 +0000 (18:03 +0200)]
Revert "Introduce set_hugepte_ext api to setup huge hardware pmds"

This reverts commit 66c4a0fc3ce10d65c881144bee306d348bbaddf4.

11 years agoRevert "Store huge page linux pte in mmu_context_t"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:02:33 +0000 (18:02 +0200)]
Revert "Store huge page linux pte in mmu_context_t"

This reverts commit 307b4042077c3f718a94bc3ce6d0ffd72c1575d7.

11 years agoRevert "Using do_page_fault for section fault handling"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:02:29 +0000 (18:02 +0200)]
Revert "Using do_page_fault for section fault handling"

This reverts commit ac24f0f22b6aa33f001f6fea8330f389846072ea.

11 years agoRevert "Add hugetlb Kconfig option"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:02:14 +0000 (18:02 +0200)]
Revert "Add hugetlb Kconfig option"

This reverts commit e22f739031e5fa35eb374c1557d13fbcd22b950e.

11 years agoRevert "Minor compiling fix"
Grazvydas Ignotas [Wed, 6 Feb 2013 16:01:43 +0000 (18:01 +0200)]
Revert "Minor compiling fix"

This reverts commit 0f9047bf8247df3eb20b991c0f472558a85aff9e.

11 years agoMerge branch 'stable-3.2' into pandora-3.2
Grazvydas Ignotas [Wed, 20 Feb 2013 21:00:34 +0000 (23:00 +0200)]
Merge branch 'stable-3.2' into pandora-3.2

Conflicts:
drivers/mtd/ubi/attach.c

11 years agoLinux 3.2.39 v3.2.39
Ben Hutchings [Wed, 20 Feb 2013 03:15:40 +0000 (03:15 +0000)]
Linux 3.2.39

11 years agox86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.
Jan Beulich [Thu, 24 Jan 2013 13:11:10 +0000 (13:11 +0000)]
x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.

commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream.

This fixes CVE-2013-0228 / XSA-42

Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:

-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]

Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G      D    ---------------
2.6.32-356.el6.i686 #1
Call Trace:
 [<c08476df>] ? panic+0x6e/0x122
 [<c084b63c>] ? oops_end+0xbc/0xd0
 [<c084b260>] ? do_general_protection+0x0/0x210
 [<c084a9b7>] ? error_code+0x73/
-------------

Petr says: "
 I've analysed the bug and I think that xen_iret() cannot cope with
 mangled DS, in this case zeroed out (null selector/descriptor) by either
 xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
 entry was invalidated by the reproducer. "

Jan took a look at the preliminary patch and came up a fix that solves
this problem:

"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."

The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.

Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment.  In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.

Reviewed-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agotg3: Fix crc errors on jumbo frame receive
Nithin Nayak Sujir [Mon, 14 Jan 2013 17:11:00 +0000 (17:11 +0000)]
tg3: Fix crc errors on jumbo frame receive

[ Upstream commit daf3ec688e057f6060fb9bb0819feac7a8bbf45c ]

TG3_PHY_AUXCTL_SMDSP_ENABLE/DISABLE macros do a blind write to the phy
auxiliary control register and overwrite the EXT_PKT_LEN (bit 14) resulting
in intermittent crc errors on jumbo frames with some link partners. Change
the code to do a read/modify/write.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agotg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode
Nithin Nayak Sujir [Mon, 14 Jan 2013 17:10:59 +0000 (17:10 +0000)]
tg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode

[ Upstream commit 9c13cb8bb477a83b9a3c9e5a5478a4e21294a760 ]

When netconsole is enabled, logging messages generated during tg3_open
can result in a null pointer dereference for the uninitialized tg3
status block. Use the irq_sync flag to disable polling in the early
stages. irq_sync is cleared when the driver is enabling interrupts after
all initialization is completed.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agobridge: Pull ip header into skb->data before looking into ip header.
Sarveshwar Bandi [Wed, 10 Oct 2012 01:15:01 +0000 (01:15 +0000)]
bridge: Pull ip header into skb->data before looking into ip header.

[ Upstream commit 6caab7b0544e83e6c160b5e80f5a4a7dd69545c7 ]

If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agotcp: fix MSG_SENDPAGE_NOTLAST logic
Eric Dumazet [Sun, 6 Jan 2013 18:21:49 +0000 (18:21 +0000)]
tcp: fix MSG_SENDPAGE_NOTLAST logic

[ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ]

commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.

Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agotcp: fix for zero packets_in_flight was too broad
Ilpo Järvinen [Mon, 4 Feb 2013 02:14:25 +0000 (02:14 +0000)]
tcp: fix for zero packets_in_flight was too broad

[ Upstream commit 6731d2095bd4aef18027c72ef845ab1087c3ba63 ]

There are transients during normal FRTO procedure during which
the packets_in_flight can go to zero between write_queue state
updates and firing the resulting segments out. As FRTO processing
occurs during that window the check must be more precise to
not match "spuriously" :-). More specificly, e.g., when
packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic
branch that set cwnd into zero would not be taken and new segments
might be sent out later.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agotcp: frto should not set snd_cwnd to 0
Eric Dumazet [Sun, 3 Feb 2013 09:13:05 +0000 (09:13 +0000)]
tcp: frto should not set snd_cwnd to 0

[ Upstream commit 2e5f421211ff76c17130b4597bc06df4eeead24f ]

Commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())
uncovered a bug in FRTO code :
tcp_process_frto() is setting snd_cwnd to 0 if the number
of in flight packets is 0.

As Neal pointed out, if no packet is in flight we lost our
chance to disambiguate whether a loss timeout was spurious.

We should assume it was a proper loss.

Reported-by: Pasi Kärkkäinen <pasik@iki.fi>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonetback: correct netbk_tx_err to handle wrap around.
Ian Campbell [Wed, 6 Feb 2013 23:41:38 +0000 (23:41 +0000)]
netback: correct netbk_tx_err to handle wrap around.

[ Upstream commit b9149729ebdcfce63f853aa54a404c6a8f6ebbf3 ]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoxen/netback: free already allocated memory on failure in xen_netbk_get_requests
Ian Campbell [Wed, 6 Feb 2013 23:41:37 +0000 (23:41 +0000)]
xen/netback: free already allocated memory on failure in xen_netbk_get_requests

[ Upstream commit 4cc7c1cb7b11b6f3515bd9075527576a1eecc4aa ]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoxen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
Matthew Daley [Wed, 6 Feb 2013 23:41:36 +0000 (23:41 +0000)]
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.

[ Upstream commit 7d5145d8eb2b9791533ffe4dc003b129b9696c48 ]

Signed-off-by: Matthew Daley <mattjd@gmail.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoxen/netback: shutdown the ring if it contains garbage.
Ian Campbell [Wed, 6 Feb 2013 23:41:35 +0000 (23:41 +0000)]
xen/netback: shutdown the ring if it contains garbage.

[ Upstream commit 48856286b64e4b66ec62b94e504d0b29c1ade664 ]

A buggy or malicious frontend should not be able to confuse netback.
If we spot anything which is not as it should be then shutdown the
device and don't try to continue with the ring in a potentially
hostile state. Well behaved and non-hostile frontends will not be
penalised.

As well as making the existing checks for such errors fatal also add a
new check that ensures that there isn't an insane number of requests
on the ring (i.e. more than would fit in the ring). If the ring
contains garbage then previously is was possible to loop over this
insane number, getting an error each time and therefore not generating
any more pending requests and therefore not exiting the loop in
xen_netbk_tx_build_gops for an externded period.

Also turn various netdev_dbg calls which no precipitate a fatal error
into netdev_err, they are rate limited because the device is shutdown
afterwards.

This fixes at least one known DoS/softlockup of the backend domain.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonet: sctp: sctp_endpoint_free: zero out secret key data
Daniel Borkmann [Fri, 8 Feb 2013 03:04:35 +0000 (03:04 +0000)]
net: sctp: sctp_endpoint_free: zero out secret key data

[ Upstream commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf ]

On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonet: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
Daniel Borkmann [Fri, 8 Feb 2013 03:04:34 +0000 (03:04 +0000)]
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree

[ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ]

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agosctp: refactor sctp_outq_teardown to insure proper re-initalization
Neil Horman [Thu, 17 Jan 2013 11:15:08 +0000 (11:15 +0000)]
sctp: refactor sctp_outq_teardown to insure proper re-initalization

[ Upstream commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86 ]

Jamie Parsons reported a problem recently, in which the re-initalization of an
association (The duplicate init case), resulted in a loss of receive window
space.  He tracked down the root cause to sctp_outq_teardown, which discarded
all the data on an outq during a re-initalization of the corresponding
association, but never reset the outq->outstanding_data field to zero.  I wrote,
and he tested this fix, which does a proper full re-initalization of the outq,
fixing this problem, and hopefully future proofing us from simmilar issues down
the road.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoatm/iphase: rename fregt_t -> ffreg_t
Heiko Carstens [Fri, 8 Feb 2013 00:19:11 +0000 (00:19 +0000)]
atm/iphase: rename fregt_t -> ffreg_t

[ Upstream commit ab54ee80aa7585f9666ff4dd665441d7ce41f1e8 ]

We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
iphase atm device driver, which causes the compile error below.
Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
nor can I change the include order in s390 code to avoid the conflict.

So simply rename the iphase typedef to a new name. Fixes this compile error:

In file included from drivers/atm/iphase.c:66:0:
drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
In file included from next/arch/s390/include/asm/ptrace.h:9:0,
                 from next/arch/s390/include/asm/lowcore.h:12,
                 from next/arch/s390/include/asm/thread_info.h:30,
                 from include/linux/thread_info.h:54,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:50,
                 from include/linux/seqlock.h:29,
                 from include/linux/time.h:5,
                 from include/linux/stat.h:18,
                 from include/linux/module.h:10,
                 from drivers/atm/iphase.c:43:
next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agopacket: fix leakage of tx_ring memory
Phil Sutter [Fri, 1 Feb 2013 07:21:41 +0000 (07:21 +0000)]
packet: fix leakage of tx_ring memory

[ Upstream commit 9665d5d62487e8e7b1f546c00e11107155384b9a ]

When releasing a packet socket, the routine packet_set_ring() is reused
to free rings instead of allocating them. But when calling it for the
first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
which in the second invocation makes it bail out since req->tp_block_nr
is greater zero but req->tp_block_size is zero.

This patch solves the problem by passing a zeroed auto-variable to
packet_set_ring() upon each invocation from packet_release().

As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING
and packet mmap), i.e. the original inclusion of TX ring support into
af_packet, but applies only to sockets with both RX and TX ring
allocated, which is probably why this was unnoticed all the time.

Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Cc: Johann Baudy <johann.baudy@gnu-log.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoipv6: do not create neighbor entries for local delivery
Marcelo Ricardo Leitner [Tue, 29 Jan 2013 22:26:08 +0000 (22:26 +0000)]
ipv6: do not create neighbor entries for local delivery

[ Upstream commit bd30e947207e2ea0ff2c08f5b4a03025ddce48d3 ]

They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.

Note that IPv4 already handles it this way. No neighbor entries are
created for local input.

Tested by myself and customer.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agopktgen: correctly handle failures when adding a device
Cong Wang [Sun, 27 Jan 2013 21:14:08 +0000 (21:14 +0000)]
pktgen: correctly handle failures when adding a device

[ Upstream commit 604dfd6efc9b79bce432f2394791708d8e8f6efc ]

The return value of pktgen_add_device() is not checked, so
even if we fail to add some device, for example, non-exist one,
we still see "OK:...". This patch fixes it.

After this patch, I got:

# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
-bash: echo: write error: No such device
# cat /proc/net/pktgen/kpktgend_0
Running:
Stopped:
Result: ERROR: can not add device non-exist
# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
# cat /proc/net/pktgen/kpktgend_0
Running:
Stopped: eth0
Result: OK: add_device=eth0

(Candidate for -stable)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonet: loopback: fix a dst refcounting issue
Eric Dumazet [Fri, 25 Jan 2013 07:44:41 +0000 (07:44 +0000)]
net: loopback: fix a dst refcounting issue

[ Upstream commit 794ed393b707f01858f5ebe2ae5eabaf89d00022 ]

Ben Greear reported crashes in ip_rcv_finish() on a stress
test involving many macvlans.

We tracked the bug to a dst use after free. ip_rcv_finish()
was calling dst->input() and got garbage for dst->input value.

It appears the bug is in loopback driver, lacking
a skb_dst_force() before calling netif_rx().

As a result, a non refcounted dst, normally protected by a
RCU read_lock section, was escaping this section and could
be freed before the packet being processed.

  [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
  [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
  [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
  [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
  [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
  [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
  [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
  [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
  [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
  [<ffffffff814ae412>] ip_output+0x61/0x67
  [<ffffffff814ab904>] dst_output+0x17/0x1b
  [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
  [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
  [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
  [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
  [<ffffffff814c3571>] tcp_connect+0x53c/0x587
  [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
  [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
  [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
  [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
  [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
  [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
  [<ffffffff814632c6>] sys_connect+0x75/0x98

This bug was introduced in linux-2.6.35, in commit
7fee226ad2397b (net: add a noref bit on skb dst)

skb_dst_force() is enforced in dev_queue_xmit() for devices having a
qdisc.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agor8169: remove the obsolete and incorrect AMD workaround
Timo Teräs [Mon, 21 Jan 2013 22:30:35 +0000 (22:30 +0000)]
r8169: remove the obsolete and incorrect AMD workaround

[ Upstream commit 5d0feaff230c0abfe4a112e6f09f096ed99e0b2d ]

This was introduced in commit 6dccd16 "r8169: merge with version
6.001.00 of Realtek's r8169 driver". I did not find the version
6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
this hunk is no longer present.

Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
rev d" claims to have fixed this issue otherwise.

The magic compare mask of 0xfffe000 is dubious as it masks
parts of the Reserved part, and parts of the VLAN tag. But this
does not make much sense as the VLAN tag parts are perfectly
valid there. In matter of fact this seems to be triggered with
any VLAN tagged packet as RxVlanTag bit is matched. I would
suspect 0xfffe0000 was intended to test reserved part only.

Finally, this hunk is evil as it can cause more packets to be
handled than what was NAPI quota causing net/core/dev.c:
net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
mess up the NAPI state causing device to hang.

As result, any system using VLANs and having high receive
traffic (so that NAPI poll budget limits rtl_rx) would result
in device hang.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonetxen: fix off by one bug in netxen_release_tx_buffer()
Eric Dumazet [Tue, 22 Jan 2013 06:33:05 +0000 (06:33 +0000)]
netxen: fix off by one bug in netxen_release_tx_buffer()

[ Upstream commit a05948f296ce103989b28a2606e47d2e287c3c89 ]

Christoph Paasch found netxen could trigger a BUG in its dismantle
phase, in netxen_release_tx_buffer(), using full size TSO packets.

cmd_buf->frag_count includes the skb->data part, so the loop must
start at index 1 instead of 0, or else we can make an out
of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]

Christoph provided the fixes in netxen_map_tx_skb() function.
In case of a dma mapping error, its better to clear the dma fields
so that we don't try to unmap them again in netxen_release_tx_buffer()

Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Sony Chacko <sony.chacko@qlogic.com>
Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoisdn/gigaset: fix zero size border case in debug dump
Tilman Schmidt [Mon, 21 Jan 2013 11:57:21 +0000 (11:57 +0000)]
isdn/gigaset: fix zero size border case in debug dump

[ Upstream commit d721a1752ba544df8d7d36959038b26bc92bdf80 ]

If subtracting 12 from l leaves zero we'd do a zero size allocation,
leading to an oops later when we try to set the NUL terminator.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoipv6: fix header length calculation in ip6_append_data()
Romain KUNTZ [Wed, 16 Jan 2013 12:47:40 +0000 (12:47 +0000)]
ipv6: fix header length calculation in ip6_append_data()

[ Upstream commit 7efdba5bd9a2f3e2059beeb45c9fa55eefe1bced ]

Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem)
has introduced a error in the header length calculation that
provokes corrupted packets when non-fragmentable extensions
headers (Destination Option or Routing Header Type 2) are used.

rt->rt6i_nfheader_len is the length of the non-fragmentable
extension header, and it should be substracted to
rt->dst.header_len, and not to exthdrlen, as it was done before
commit 299b0767.

This patch reverts to the original and correct behavior. It has
been successfully tested with and without IPsec on packets
that include non-fragmentable extensions headers.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoMAINTAINERS: Stephen Hemminger email change
Stephen Hemminger [Wed, 16 Jan 2013 17:55:57 +0000 (09:55 -0800)]
MAINTAINERS: Stephen Hemminger email change

[ Upstream commit adbbf69d1a54abf424e91875746a610dcc80017d ]

I changed my email because the vyatta.com mail server is now
redirected to brocade.com; and the Brocade mail system
is not friendly to Linux desktop users.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoipv6: fix the noflags test in addrconf_get_prefix_route
Romain Kuntz [Wed, 9 Jan 2013 14:02:26 +0000 (15:02 +0100)]
ipv6: fix the noflags test in addrconf_get_prefix_route

[ Upstream commit 85da53bf1c336bb07ac038fb951403ab0478d2c5 ]

The tests on the flags in addrconf_get_prefix_route() does no make
much sense: the 'noflags' parameter contains the set of flags that
must not match with the route flags, so the test must be done
against 'noflags', and not against 'flags'.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonet: prevent setting ttl=0 via IP_TTL
Cong Wang [Mon, 7 Jan 2013 21:17:00 +0000 (21:17 +0000)]
net: prevent setting ttl=0 via IP_TTL

[ Upstream commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce ]

A regression is introduced by the following commit:

commit 4d52cfbef6266092d535237ba5a4b981458ab171
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Tue Jun 2 00:42:16 2009 -0700

    net: ipv4/ip_sockglue.c cleanups

    Pure cleanups

but it is not a pure cleanup...

-               if (val != -1 && (val < 1 || val>255))
+               if (val != -1 && (val < 0 || val > 255))

Since there is no reason provided to allow ttl=0, change it back.

Reported-by: nitin padalia <padalia.nitin@gmail.com>
Cc: nitin padalia <padalia.nitin@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agokernel/resource.c: fix stack overflow in __reserve_region_with_split()
T Makphaibulchoke [Fri, 5 Oct 2012 00:16:55 +0000 (17:16 -0700)]
kernel/resource.c: fix stack overflow in __reserve_region_with_split()

commit 4965f5667f36a95b41cda6638875bc992bd7d18b upstream.

Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep.  Convert the recursive calls to an
iterative loop to avoid the problem.

Tested on a machine containing 135 regions.  The kernel no longer panicked
with stack overflow.

Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.

Signed-off-by: T Makphaibulchoke <tmac@hp.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoHID: usbhid: quirk for Formosa IR receiver
Nicholas Santos [Sat, 29 Dec 2012 03:07:02 +0000 (22:07 -0500)]
HID: usbhid: quirk for Formosa IR receiver

commit 320cde19a4e8f122b19d2df7a5c00636e11ca3fb upstream.

Patch to add the Formosa Industrial Computing, Inc. Infrared Receiver
[IR605A/Q] to hid-ids.h and hid-quirks.c.  This IR receiver causes about a 10
second timeout when the usbhid driver attempts to initialze the device.  Adding
this device to the quirks list with HID_QUIRK_NO_INIT_REPORTS removes the
delay.

Signed-off-by: Nicholas Santos <nicholas.santos@gmail.com>
[jkosina@suse.cz: fix ordering]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoBluetooth: Fix sending HCI commands after reset
Szymon Janc [Tue, 11 Dec 2012 07:51:19 +0000 (08:51 +0100)]
Bluetooth: Fix sending HCI commands after reset

commit dbccd791a3fbbdac12c33834b73beff3984988e9 upstream.

After sending reset command wait for its command complete event before
sending next command. Some chips sends CC event for command received
before reset if reset was send before chip replied with CC.

This is also required by specification that host shall not send
additional HCI commands before receiving CC for reset.

< HCI Command: Reset (0x03|0x0003) plen 0                              [hci0] 18.404612
> HCI Event: Command Complete (0x0e) plen 4                            [hci0] 18.405850
      Write Extended Inquiry Response (0x03|0x0052) ncmd 1
        Status: Success (0x00)
< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0      [hci0] 18.406079
> HCI Event: Command Complete (0x0e) plen 4                            [hci0] 18.407864
      Reset (0x03|0x0003) ncmd 1
        Status: Success (0x00)
< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0      [hci0] 18.408062
> HCI Event: Command Complete (0x0e) plen 12                           [hci0] 18.408835

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agowake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task
Oleg Nesterov [Mon, 21 Jan 2013 19:48:17 +0000 (20:48 +0100)]
wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task

commit 9067ac85d533651b98c2ff903182a20cbb361fcb upstream.

wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().

TASK_ALL has no other users, probably can be killed.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
Oleg Nesterov [Mon, 21 Jan 2013 19:48:00 +0000 (20:48 +0100)]
ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL

commit 9899d11f654474d2d54ea52ceaa2a1f4db3abd68 upstream.

putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()
Oleg Nesterov [Mon, 21 Jan 2013 19:47:41 +0000 (20:47 +0100)]
ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()

commit 910ffdb18a6408e14febbb6e4b6840fd2c928c82 upstream.

Cleanup and preparation for the next change.

signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.

Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.

This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoptrace/x86: Partly fix set_task_blockstep()->update_debugctlmsr() logic
Oleg Nesterov [Sat, 11 Aug 2012 16:06:42 +0000 (18:06 +0200)]
ptrace/x86: Partly fix set_task_blockstep()->update_debugctlmsr() logic

commit 95cf00fa5d5e2a200a2c044c84bde8389a237e02 upstream.

Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in
step.c was always very wrong.

1. update_debugctlmsr() was simply unneeded. The child sleeps
   TASK_TRACED, __switch_to_xtra(next_p => child) should notice
   TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if
   needed.

2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register
   should always match the state of current's TIF_BLOCKSTEP bit.

3. Even get_debugctlmsr() + update_debugctlmsr() itself does not
   look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR
   register or the caller can be preempted in between.

4. It is not safe to play with TIF_BLOCKSTEP if task != current.
   DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each
   other if the task is running. The tracee is stopped but it
   can be SIGKILL'ed right before set/clear_tsk_thread_flag().

However, now that uprobes uses user_enable_single_step(current)
we can't simply remove update_debugctlmsr(). So this patch adds
the additional "task == current" check and disables irqs to avoid
the race with interrupts/preemption.

Unfortunately this patch doesn't solve the last problem, we need
another fix. Probably we should teach ptrace_stop() to set/clear
single/block stepping after resume.

And afaics there is yet another problem: perf can play with
MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even
__switch_to_xtra() has problems.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoptrace/x86: Introduce set_task_blockstep() helper
Oleg Nesterov [Fri, 3 Aug 2012 15:31:46 +0000 (17:31 +0200)]
ptrace/x86: Introduce set_task_blockstep() helper

commit 848e8f5f0ad3169560c516fff6471be65f76e69f upstream.

No functional changes, preparation for the next fix and for uprobes
single-step fixes.

Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the
new helper, set_task_blockstep().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoahci: Add support for Enmotus Bobcat device.
Hugh Daschbach [Fri, 4 Jan 2013 22:39:09 +0000 (14:39 -0800)]
ahci: Add support for Enmotus Bobcat device.

commit 7f9c9f8e24590e7dcd26ca408458c43df5b83e61 upstream.

Silicon does not support standard AHCI BAR assignment.  Add
vendor/device exception to force BAR 2.

Signed-off-by: Hugh Daschbach <hugh.daschbach@enmotus.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoahci: support the STA2X11 I/O Hub
Alessandro Rubini [Fri, 6 Jan 2012 12:33:39 +0000 (13:33 +0100)]
ahci: support the STA2X11 I/O Hub

commit 318893e1429a9d50569a0379d1e20b0ecc45c555 upstream.

The AHCI controller found in the STA2X11 chip uses BAR number 0
instead of 5. Also, the chip's fixup code sets a special DMA mask
for all of its PCI functions, and the mask must be preserved here.

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agogspca_kinect: add Kinect for Windows USB id
Jacob Schloss [Sun, 9 Dec 2012 23:18:25 +0000 (20:18 -0300)]
gspca_kinect: add Kinect for Windows USB id

commit 98fd485795db064d0885150e2c0c7f296d8fe06e upstream.

Add the USB ID for the Kinect for Windows RGB camera so it can be used
with the gspca_kinect driver.

Signed-off-by: Jacob Schloss <jacob.schloss@unlimitedautomata.com>
Signed-off-by: Antonio Ospite <ospite@studenti.unina.it>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agort2800usb: Add support for 2001:3c1e (D-Link DWA-125 rev B1) USB Wi-Fi adapter
Maia Kozheva [Sun, 9 Dec 2012 09:07:40 +0000 (16:07 +0700)]
rt2800usb: Add support for 2001:3c1e (D-Link DWA-125 rev B1) USB Wi-Fi adapter

commit fd7b9270120ca7e53fbf0469febe0c68acf6a0a2 upstream.

D-Link DWA-125/B1 is a relatively new USB Wi-Fi adapter, using a
Ralink chipset supported by the rt2800usb driver. Currently, to work
around the problem (it's missing in all present kernel versions,
up to and including 3.7.x), I had to add this to /etc/rc.local:

echo 2001 3c1e >> /sys/bus/usb/drivers/rt2800usb/new_id

After that, the device works without problems. Been using it for over
a week with no bugs in sight.

The attached patch is trivial and simply adds the new USB ID to the
list of devices handled by rt2800usb.

Signed-off-by: Maia Kozheva <sikon@ubuntu.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoWireless: rt2x00: Add device id for Sweex LW323 to rt2800usb.c
Jaume Delclòs [Fri, 2 Nov 2012 22:35:20 +0000 (23:35 +0100)]
Wireless: rt2x00: Add device id for Sweex LW323 to rt2800usb.c

commit 36f318bb124b231c01db6965a009f46d5731f012 upstream.

This patch adds detection for the Sweex LW323 USB wireless network card
in the rt2x00 driver (just one line in rt2800usb.c).
It applies to linux-3.7-rc3.

Signed-off-by: Jaume Delclòs <jaume@delclos.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agowireless: rt2x00: rt{2500,73}usb.c put back duplicate id
Xose Vazquez Perez [Sat, 14 Apr 2012 21:00:01 +0000 (23:00 +0200)]
wireless: rt2x00: rt{2500,73}usb.c put back duplicate id

commit 8f35f787b75e9b6435ea37dabcae2d40dc72d31c upstream.

put back 0x050d,0x7050 to rt73usb, same usb_id for two chips:

K7SF5D7050A ver 2xxx is rt2500
K7SF5D7050B ver 3xxx is rt73

<http://en-us-support.belkin.com/app/answers/detail/a_id/297/kw/K7SF5D7050>

Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agovirtio_console: Don't access uninitialized data.
Sjur Brændeland [Mon, 21 Jan 2013 23:20:26 +0000 (09:50 +1030)]
virtio_console: Don't access uninitialized data.

commit aded024a12b32fc1ed9a80639681daae2d07ec25 upstream.

Don't access uninitialized work-queue when removing device.
The work queue is initialized only if the device multi-queue.
So don't call cancel_work unless this is a multi-queue device.

This fixes the following panic:

Kernel panic - not syncing: BUG!
Call Trace:
62031b28:  [<6026085d>] panic+0x16b/0x2d3
62031b30:  [<6004ef5e>] flush_work+0x0/0x1d7
62031b60:  [<602606f2>] panic+0x0/0x2d3
62031b68:  [<600333b0>] memcpy+0x0/0x140
62031b80:  [<6002d58a>] unblock_signals+0x0/0x84
62031ba0:  [<602609c5>] printk+0x0/0xa0
62031bd8:  [<60264e51>] __mutex_unlock_slowpath+0x13d/0x148
62031c10:  [<6004ef5e>] flush_work+0x0/0x1d7
62031c18:  [<60050234>] try_to_grab_pending+0x0/0x17e
62031c38:  [<6004e984>] get_work_gcwq+0x71/0x8f
62031c48:  [<60050539>] __cancel_work_timer+0x5b/0x115
62031c78:  [<628acc85>] unplug_port+0x0/0x191 [virtio_console]
62031c98:  [<6005061c>] cancel_work_sync+0x12/0x14
62031ca8:  [<628ace96>] virtcons_remove+0x80/0x15c [virtio_console]
62031ce8:  [<628191de>] virtio_dev_remove+0x1e/0x7e [virtio]
62031d08:  [<601cf242>] __device_release_driver+0x75/0xe4
62031d28:  [<601cf2dd>] device_release_driver+0x2c/0x40
62031d48:  [<601ce0dd>] driver_unbind+0x7d/0xc6
62031d88:  [<601cd5d9>] drv_attr_store+0x27/0x29
62031d98:  [<60115f61>] sysfs_write_file+0x100/0x14d
62031df8:  [<600b737d>] vfs_write+0xcb/0x184
62031e08:  [<600b58b8>] filp_close+0x88/0x94
62031e38:  [<600b7686>] sys_write+0x59/0x88
62031e88:  [<6001ced1>] handle_syscall+0x5d/0x80
62031ea8:  [<60030a74>] userspace+0x405/0x531
62031f08:  [<600d32cc>] sys_dup+0x0/0x5e
62031f28:  [<601b11d6>] strcpy+0x0/0x18
62031f38:  [<600be46c>] do_execve+0x10/0x12
62031f48:  [<600184c7>] run_init_process+0x43/0x45
62031fd8:  [<60019a91>] new_thread_handler+0xba/0xbc

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agodrivers/rtc/rtc-pl031.c: fix the missing operation on enable
Haojian Zhuang [Mon, 4 Feb 2013 22:28:54 +0000 (14:28 -0800)]
drivers/rtc/rtc-pl031.c: fix the missing operation on enable

commit e7e034e18a0ab6bafb2425c3242cac311164f4d6 upstream.

The RTC control register should be enabled in the process of
initializing.

Without this patch, I failed to enable RTC in Hisilicon Hi3620 SoC.  The
register mapping section in RTC is always read as zero.  So I doubt that
ST guys may already enable this register in bootloader.  So they won't
meet this issue.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agodrivers/rtc/rtc-isl1208.c: call rtc_update_irq() from the alarm irq handler
Jan Luebbe [Mon, 4 Feb 2013 22:28:53 +0000 (14:28 -0800)]
drivers/rtc/rtc-isl1208.c: call rtc_update_irq() from the alarm irq handler

commit 72fca4a4b32dc778b5b885c3498700e42b610d49 upstream.

Previously the alarm event was not propagated into the RTC subsystem.
By adding a call to rtc_update_irq, this fixes a timeout problem with
the hwclock utility.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agonilfs2: fix fix very long mount time issue
Vyacheslav Dubeyko [Mon, 4 Feb 2013 22:28:41 +0000 (14:28 -0800)]
nilfs2: fix fix very long mount time issue

commit a9bae189542e71f91e61a4428adf6e5a7dfe8063 upstream.

There exists a situation when GC can work in background alone without
any other filesystem activity during significant time.

The nilfs_clean_segments() method calls nilfs_segctor_construct() that
updates superblocks in the case of NILFS_SC_SUPER_ROOT and
THE_NILFS_DISCONTINUED flags are set.  But when GC is working alone the
nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
As a result, the update of superblocks doesn't occurred all this time
and in the case of SPOR superblocks keep very old values of last super
root placement.

SYMPTOMS:

Trying to mount a NILFS2 volume after SPOR in such environment ends with
very long mounting time (it can achieve about several hours in some
cases).

REPRODUCING PATH:

1. It needs to use external USB HDD, disable automount and doesn't
   make any additional filesystem activity on the NILFS2 volume.

2. Generate temporary file with size about 100 - 500 GB (for example,
   dd if=/dev/zero of=<file_name> bs=1073741824 count=200).  The size of
   file defines duration of GC working.

3. Then it needs to delete file.

4. Start GC manually by means of command "nilfs-clean -p 0".  When you
   start GC by means of such way then, at the end, superblocks is updated
   by once.  So, for simulation of SPOR, it needs to wait sometime (15 -
   40 minutes) and simply switch off USB HDD manually.

5. Switch on USB HDD again and try to mount NILFS2 volume.  As a
   result, NILFS2 volume will mount during very long time.

REPRODUCIBILITY: 100%

FIX:

This patch adds checking that superblocks need to update and set
THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.

Reported-by: Sergey Alexandrov <splavgm@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoUSB: storage: optimize to match the Huawei USB storage devices and support new switch...
fangxiaozhi [Mon, 4 Feb 2013 07:16:34 +0000 (15:16 +0800)]
USB: storage: optimize to match the Huawei USB storage devices and support new switch command

commit 200e0d994d9d1919b28c87f1a5fb99a8e13b8a0f upstream.

1. Optimize the match rules with new macro for Huawei USB storage devices,
   to avoid to load USB storage driver for the modem interface
   with Huawei devices.
2. Add to support new switch command for new Huawei USB dongles.

Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoUSB: storage: Define a new macro for USB storage match rules
fangxiaozhi [Mon, 4 Feb 2013 07:14:46 +0000 (15:14 +0800)]
USB: storage: Define a new macro for USB storage match rules

commit 07c7be3d87e5cdaf5f94c271c516456364ef286c upstream.

1. Define a new macro for USB storage match rules:
    matching with Vendor ID and interface descriptors.

Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoUSB: ftdi_sio: add Zolix FTDI PID
Petr Kubánek [Fri, 1 Feb 2013 16:24:04 +0000 (17:24 +0100)]
USB: ftdi_sio: add Zolix FTDI PID

commit 0ba3b2ccc72b3df5c305d61f59d93ab0f0e87991 upstream.

Add support for Zolix Omni 1509 monochromator custom USB-RS232 converter.

Signed-off-by: Petr Kubánek <petr@kubanek.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoUSB: option: add Changhong CH690
Bjørn Mork [Fri, 1 Feb 2013 11:06:51 +0000 (12:06 +0100)]
USB: option: add Changhong CH690

commit d4fa681541aa7bf8570d03426dd7ba663a71c467 upstream.

New device with 3 serial interfaces:

 If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend) Sub=ff Prot=ff
 If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend) Sub=ff Prot=ff
 If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend) Sub=ff Prot=ff
 If#= 3 Alt= 0 #EPs= 2 Cls=08(stor) Sub=06 Prot=50

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
11 years agoUSB: ftdi_sio: add PID/VID entries for ELV WS 300 PC II
Sven Killig [Fri, 1 Feb 2013 22:43:06 +0000 (23:43 +0100)]
USB: ftdi_sio: add PID/VID entries for ELV WS 300 PC II

commit c249f911406efcc7456cb4af79396726bf7b8c57 upstream.

Add PID/VID entries for ELV WS 300 PC II weather station

Signed-off-by: Sven Killig <sven@killig.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>