pandora-kernel.git
7 years agopandora: reserve CMA area for c64_tools pandora-3.2-c64_tools
Grazvydas Ignotas [Thu, 19 Sep 2013 21:28:17 +0000 (00:28 +0300)]
pandora: reserve CMA area for c64_tools

hopefully this doesn't have to be removed half a year later..

7 years agopandora: defconfig: increase CMA allocation
Grazvydas Ignotas [Thu, 19 Sep 2013 21:04:48 +0000 (00:04 +0300)]
pandora: defconfig: increase CMA allocation

8 years agopandora: defconfig: enable hfsplus sz_155
Grazvydas Ignotas [Thu, 4 Jul 2013 22:51:05 +0000 (01:51 +0300)]
pandora: defconfig: enable hfsplus

A few people appeared who want it.
Also enable BSD_PROCESS_ACCT_V3 as that's what modern distros use.

8 years agotwl4030_charger: reduce default charge_current for usb
Grazvydas Ignotas [Thu, 4 Jul 2013 22:41:15 +0000 (01:41 +0300)]
twl4030_charger: reduce default charge_current for usb

I've recently encountered a PC that has USB ports that limit current
draw to ~600mA - if more is drawn it cuts VBUS out. Incidentally pandora
has charge_current set to ~600mA, so on that PC charging only lasted for
a few secs and went out. Reducing the current to way below 600mA makes
charging reliable from that PC USB ports, and ~7% slower charging should
not make that much difference.

8 years agoARM: 7208/1: Add condition code checking to SWP emulation handler.
Leif Lindholm [Mon, 12 Dec 2011 18:45:24 +0000 (19:45 +0100)]
ARM: 7208/1: Add condition code checking to SWP emulation handler.

This patch fixes two separate issues with the SWP emulation handler:
1: Certain processors implementing ARMv7-A can (legally) take an
   undef exception even when the condition code would have meant that
   the instruction should not have been executed.
2: Opcodes with all flags set (condition code = 0xf) have been reused
   in recent, and not-so-recent, versions of the ARM architecture to
   implement unconditional extensions to the instruction set. The
   existing code would still have processed any undefs triggered by
   executing an opcode with such a value.

This patch uses the new generic ARM instruction set condition code
checks to implement proper handling of these situations.

Signed-off-by: Leif Lindholm <leif.lindholm@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: 7206/1: Add generic ARM instruction set condition code checks.
Leif Lindholm [Mon, 12 Dec 2011 18:31:55 +0000 (19:31 +0100)]
ARM: 7206/1: Add generic ARM instruction set condition code checks.

This patch breaks the ARM condition checking code out of nwfpe/fpopcode.{ch}
into a standalone file for opcode operations. It also modifies the code
somewhat for coding style adherence, and adds some temporary variables for
increased readability.

Signed-off-by: Leif Lindholm <leif.lindholm@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: 7289/1: vmlinux.lds.S: do not hardcode cacheline size as 32 bytes
Will Deacon [Fri, 20 Jan 2012 10:55:54 +0000 (11:55 +0100)]
ARM: 7289/1: vmlinux.lds.S: do not hardcode cacheline size as 32 bytes

The linker script assumes a cacheline size of 32 bytes when aligning
the .data..cacheline_aligned and .data..percpu sections.

This patch updates the script to use L1_CACHE_BYTES, which should be set
to 64 on platforms that require it.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: perf: don't pretend to support counting of L1I writes
Will Deacon [Wed, 16 Jan 2013 12:01:59 +0000 (12:01 +0000)]
ARM: perf: don't pretend to support counting of L1I writes

ARM has a harvard cache architecture and cannot write directly to the
I-side.

This patch removes the L1I write events from the cache map (which
previously returned *read* events in many cases).

Reported-by: Mike Williams <michael.williams@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Conflicts:

arch/arm/kernel/perf_event_v7.c

8 years agoARM: 7303/1: perf: add empty NODE event definitions for Cortex-A5 and Cortex-A15
Will Deacon [Wed, 25 Jan 2012 18:36:28 +0000 (19:36 +0100)]
ARM: 7303/1: perf: add empty NODE event definitions for Cortex-A5 and Cortex-A15

Commit 89d6c0b5 ("perf, arch: Add generic NODE cache events") added
empty NODE event definitions for the ARM PMU implementations. This was
merged along with Cortex-A5 and Cortex-A15 PMU support, so they missed
out on the original patch.

This patch adds the empty definitions to Cortex-A5 and Cortex-A15.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: perf: add support for stalled cycle ABI events
Will Deacon [Thu, 29 Sep 2011 17:23:39 +0000 (18:23 +0100)]
ARM: perf: add support for stalled cycle ABI events

Commit 8f622422 ("perf events: Add generic front-end and back-end
stalled cycle event definitions") added two new ABI events for counting
stalled cycles.

This patch adds support for these new events to the ARM perf
implementation.

Cc: Jamie Iles <jamie@jamieiles.com>
Cc: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
8 years agoARM: perf: clean and update ARMv7 event numbers
Will Deacon [Thu, 29 Sep 2011 14:29:02 +0000 (15:29 +0100)]
ARM: perf: clean and update ARMv7 event numbers

This patch updates the ARMv7 perf event numbers so that:

(1) A consistent naming scheme is used between different CPUs.

(2) Only events actually used by Linux are described.

(3) Where possible, architected events are used in preference to
    CPU-specific events.

This results in the removal of a load of unused, hardcoded data and
makes it more clear as to which events are supported on each PMU.

Cc: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
8 years agoARM: 7765/1: perf: Record the user-mode PC in the call chain.
Jed Davis [Thu, 20 Jun 2013 09:16:29 +0000 (10:16 +0100)]
ARM: 7765/1: perf: Record the user-mode PC in the call chain.

With this change, we no longer lose the innermost entry in the user-mode
part of the call chain.  See also the x86 port, which includes the ip.

It's possible to partially work around this problem by post-processing
the data to use the PERF_SAMPLE_IP value, but this works only if the CPU
wasn't in the kernel when the sample was taken.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jed Davis <jld@mozilla.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: 7735/2: Preserve the user r/w register TPIDRURW on context switch and fork
André Hentschel [Tue, 18 Jun 2013 22:23:26 +0000 (23:23 +0100)]
ARM: 7735/2: Preserve the user r/w register TPIDRURW on context switch and fork

Since commit 6a1c53124aa1 the user writeable TLS register was zeroed to
prevent it from being used as a covert channel between two tasks.

There are more and more applications coming to Windows RT,
Wine could support them, but mostly they expect to have
the thread environment block (TEB) in TPIDRURW.

This patch preserves that register per thread instead of clearing it.
Unlike the TPIDRURO, which is already switched, the TPIDRURW
can be updated from userspace so needs careful treatment in the case that we
modify TPIDRURW and call fork(). To avoid this we must always read
TPIDRURW in copy_thread.

Signed-off-by: André Hentschel <nerv@dawncrow.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Conflicts:

arch/arm/kernel/process.c

8 years agoARM: Don't unconditionally bloat thread_info
Russell King [Wed, 29 Aug 2012 10:16:59 +0000 (11:16 +0100)]
ARM: Don't unconditionally bloat thread_info

There is no point reserving space at the bottom of the kernel stack for
per-thread crunch state, and per-thread VFP state if these are not being
supported by the kernel being built.  Remove these members from the
thread union when these features are disabled.

Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: 7730/1: DMA-mapping: mark all !DMA_TO_DEVICE pages in unmapping as clean
Ming Lei [Sat, 18 May 2013 10:21:36 +0000 (11:21 +0100)]
ARM: 7730/1: DMA-mapping: mark all !DMA_TO_DEVICE pages in unmapping as clean

It is common for one sg to include many pages, so mark all these
pages as clean to avoid unnecessary flushing on them in
set_pte_at() or update_mmu_cache().

The patch might improve loading performance of applciation code a bit.

On the below test code to read file(~1GByte size) from usb mass storage
disk to buffer created with mmap(PROT_READ | PROT_EXEC) on
Pandaboard, average ~1% improvement can be observed with the patch on
10 times test.

unsigned int sum = 0;
static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
{
return (tv2->tv_sec - tv1->tv_sec) * 1000000 + (tv2->tv_usec - tv1->tv_usec);
}
int main(int argc, char *argv[])
{
char *mbuffer;
int fd;
int i;
unsigned long page_size, size;
struct stat stat;
struct timeval t1, t2;

page_size = getpagesize();
fd = open(argv[1], O_RDONLY);
assert(fd >= 0);

fstat(fd, &stat);
size = stat.st_size;
printf("%s: file %s, file size %lu, page size %lun", argv[0],
        read_filename, size, page_size);

gettimeofday(&t1, NULL);
mbuffer = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
for (i = 0 ; i < size ; i += page_size)
        sum += mbuffer[i];
munmap(mbuffer, page_size);
gettimeofday(&t2, NULL);
printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));

close(fd);
}

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoARM: 7746/1: mm: lazy cache flushing on non-mapped pages
Ming Lei [Wed, 5 Jun 2013 01:44:00 +0000 (02:44 +0100)]
ARM: 7746/1: mm: lazy cache flushing on non-mapped pages

Currently flush_dcache_page() thinks pages as non-mapped if
mapping_mapped(mapping) return false. This approach is very
coase:
- mmap on part of file may cause all pages backed on
the file being thought as mmaped

- file-backed pages aren't mapped into user space actually
if the memory mmaped on the file isn't accessed

This patch uses page_mapped() to decide if the page has been
mapped.

From the attached test code, I find there is much performance
improvement(>25%) when accessing page caches via read under this
situations, so memcpy benefits a lot from not flushing cache
under this situation.

No.   read time without the patch No. read time with the patch
================================================================
No. 0, time  22615636 us No. 0, time  22014717 us
No. 1, time  4387851 us  No. 1, time  3113184 us
No. 2, time  4276535 us  No. 2, time  3005244 us
No. 3, time  4259821 us  No. 3, time  3001565 us
No. 4, time  4263811 us  No. 4, time  3002748 us
No. 5, time  4258486 us  No. 5, time  3004104 us
No. 6, time  4253009 us  No. 6, time  3002188 us
No. 7, time  4262809 us  No. 7, time  2998196 us
No. 8, time  4264525 us  No. 8, time  3007255 us
No. 9, time  4267795 us  No. 9, time  3005094 us

1), No.0. is to read the file from storage device, and others are
to read the file from page caches basically.
2), file size is 512M, and is on ext4 over usb mass storage.
3), the test is done on Pandaboard.

unsigned int  sum = 0;
unsigned long sum_val = 0;

static unsigned long tv_diff(struct timeval *tv1, struct timeval *tv2)
{
return (tv2->tv_sec - tv1->tv_sec) * 1000000 +
(tv2->tv_usec - tv1->tv_usec);
}

int main(int argc, char *argv[])
{
char *mbuf, fbuf;
int fd;
int i;
unsigned long page_size, size;
struct stat stat;
struct timeval t1, t2;
unsigned char *rbuf = malloc(32 * page_size);

if (!rbuf) {
printf(" %sn", "malloc failed");
exit(-1);
}

page_size = getpagesize();
fd = open(argv[1], O_RDWR);
assert(fd >= 0);

fstat(fd, &stat);
size = stat.st_size;
printf("%s: file %s, size %lu, page size %lun",
argv[0],
argv[1], size, page_size);

gettimeofday(&t1, NULL);
mbuf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (!mbuf) {
printf(" %sn", "mmap failed");
exit(-1);
}

for (i = 0 ; i < size ; i += (page_size * 32)) {
int rcnt;
lseek(fd, i, SEEK_SET);
rcnt = read(fd, rbuf, page_size * 32);
if (rcnt != page_size * 32) {
printf("%s: read faildn", __func__);
exit(-1);
}
}
free(rbuf);
munmap(mbuf, size);
gettimeofday(&t2, NULL);
printf("tread mmaped time: %luusn", tv_diff(&t1, &t2));

close(fd);
}

Cc: Michel Lespinasse <walken@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
8 years agoasix: Add support (usb-id) for HG20F9 (pseudo-backport)
Urja Rannikko [Mon, 17 Jun 2013 12:56:22 +0000 (13:56 +0100)]
asix: Add support (usb-id) for HG20F9 (pseudo-backport)

8 years agopandora: defconfig: enable ath9k
Grazvydas Ignotas [Thu, 23 May 2013 22:22:43 +0000 (01:22 +0300)]
pandora: defconfig: enable ath9k

"now de-facto (almost all shops in Paris sell that kind of model:
TP-Link TL-WN721N) big size usb dongle", according to Linux-SWAT.

8 years agoath9k_htc: init led work before registering it
Grazvydas Ignotas [Thu, 23 May 2013 22:17:23 +0000 (01:17 +0300)]
ath9k_htc: init led work before registering it

this tree has a patch that sets initial LED state on register,
so work has to be ready.

8 years agoomap_hsmmc: pandora wifi hack
Grazvydas Ignotas [Thu, 23 May 2013 00:58:49 +0000 (03:58 +0300)]
omap_hsmmc: pandora wifi hack

don't use irqs/dma to avoid cpu PM latency on tiny reads/writes
(wl1251 register IO), gives a bit more performance.

8 years agowl1251: do tx in irq thread
Grazvydas Ignotas [Wed, 22 May 2013 21:50:06 +0000 (00:50 +0300)]
wl1251: do tx in irq thread

seems to give a bit more performance, presumably due too reduced work
rescheduling / context switching?

8 years agowl1251: switch to threaded irq
Grazvydas Ignotas [Wed, 22 May 2013 20:39:12 +0000 (23:39 +0300)]
wl1251: switch to threaded irq

seems to improve performance slightly?
At least top output looks nicer with dedicated thread to wl1251.

8 years agopandora: switch wl1251 sdio clock to 24MHz
Grazvydas Ignotas [Wed, 22 May 2013 20:36:34 +0000 (23:36 +0300)]
pandora: switch wl1251 sdio clock to 24MHz

seems to work fine on pandora, although performance impact is minimal
(if any).

8 years agowl1251: experimental PS hacks
Grazvydas Ignotas [Wed, 22 May 2013 00:40:14 +0000 (03:40 +0300)]
wl1251: experimental PS hacks

8 years agowl1251: wait for wakeup from idle
Grazvydas Ignotas [Wed, 22 May 2013 00:41:50 +0000 (03:41 +0300)]
wl1251: wait for wakeup from idle

Seems to solve occasional join timeouts.

8 years agowl1251: start handling powersave report
Grazvydas Ignotas [Wed, 22 May 2013 00:36:21 +0000 (03:36 +0300)]
wl1251: start handling powersave report

and move beacon filter enable there, like wl1271+ driver has it.

8 years agowl1251: disable BET code
Grazvydas Ignotas [Wed, 22 May 2013 00:29:09 +0000 (03:29 +0300)]
wl1251: disable BET code

It's known to be causing problems (very low performance) on at least one
router (even WL1251_ACX_BET_DISABLE call is doing bad things there).
I could not see measurable power usage improvement with this, so
disable those calls.

8 years agowl1251: do quick poll for cmd complete interrupt
Grazvydas Ignotas [Wed, 22 May 2013 00:13:54 +0000 (03:13 +0300)]
wl1251: do quick poll for cmd complete interrupt

Most of commands are complete on second read from status register, so
it's unwise to sleep after first try, as sleep may take much longer than
desired 1ms (which is actually already too much waiting for most commands).
So use udelay() for first several tries, and only sleep if command takes
longer, like wl1271+ driver does. This shortens typical powersave setup
time from 50+ ms to <1 ms.

8 years agowl1251: log command errors
Grazvydas Ignotas [Wed, 22 May 2013 00:12:09 +0000 (03:12 +0300)]
wl1251: log command errors

Commands can return errors, let's log them, like wl1271 does.

8 years agoOMAP: l3: don't bug out on app errors
Grazvydas Ignotas [Fri, 17 May 2013 22:46:23 +0000 (01:46 +0300)]
OMAP: l3: don't bug out on app errors

This "Functional Inband error" for "Agent: sDMA Rd IA" (bit19 of
L3_SI_FLAG_STATUS_0) happens randomly while both recording+playing is
active, perhaps when it's being paused/resumed?

Linux-SWAT reported that system keeps functioning if the error is
ignored, and Neelix reported recording breaks until reinitialized,
but this is still much better than bringing the system down.

8 years agoRevert "USB: ehci-omap: Fix autoloading of module"
Grazvydas Ignotas [Thu, 23 May 2013 21:50:56 +0000 (00:50 +0300)]
Revert "USB: ehci-omap: Fix autoloading of module"

This reverts commit 507c50ba088916e2eb3cc17c5aead3aa76ab968b.
We don't want that on pandora (yet?).

8 years agoMerge branch 'stable-3.2' into pandora-3.2
Grazvydas Ignotas [Thu, 23 May 2013 21:42:23 +0000 (00:42 +0300)]
Merge branch 'stable-3.2' into pandora-3.2

8 years agoLinux 3.2.45 v3.2.45
Ben Hutchings [Mon, 13 May 2013 14:02:45 +0000 (15:02 +0100)]
Linux 3.2.45

8 years agox86/mm: account for PGDIR_SIZE alignment
jerry.hoemann@hp.com [Tue, 7 May 2013 16:14:54 +0000 (10:14 -0600)]
x86/mm: account for PGDIR_SIZE alignment

Patch for 3.0-stable.  Function find_early_table_space removed upstream.

Fixes panic in alloc_low_page due to pgt_buf overflow during
init_memory_mapping.

find_early_table_space sizes pgt_buf based upon the size of the
memory being mapped, but it does not take into account the alignment
of the memory.  When the region being mapped spans a 512GB (PGDIR_SIZE)
alignment, a panic from alloc_low_pages occurs.

kernel_physical_mapping_init takes into account PGDIR_SIZE alignment.
This causes an extra call to alloc_low_page to be made.  This extra call
isn't accounted for by find_early_table_space and causes a kernel panic.

Change is to take into account PGDIR_SIZE alignment in find_early_table_space.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agor8169: fix vlan tag read ordering.
françois romieu [Thu, 24 Jan 2013 13:30:06 +0000 (13:30 +0000)]
r8169: fix vlan tag read ordering.

commit ce11ff5e5963e441feb591e76278528f876c332d upstream.

Control of receive descriptor must not be returned to ethernet chipset
before vlan tag processing is done.

VLAN tag receive word is now reset both in normal and error path.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Spotted-by: Timo Teras <timo.teras@iki.fi>
Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agopowerpc: fix numa distance for form0 device tree
Vaidyanathan Srinivasan [Fri, 22 Mar 2013 05:49:35 +0000 (05:49 +0000)]
powerpc: fix numa distance for form0 device tree

commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream.

The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.

commit 41eab6f88f24124df89e38067b3766b7bef06ddb
powerpc/numa: Use form 1 affinity to setup node distance

Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.

All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same.  (value of 'level' in sched_init_numa() will remain 0)

Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agokernel/audit_tree.c: tree will leak memory when failure occurs in audit_trim_trees()
Chen Gang [Mon, 29 Apr 2013 22:05:19 +0000 (15:05 -0700)]
kernel/audit_tree.c: tree will leak memory when failure occurs in audit_trim_trees()

commit 12b2f117f3bf738c1a00a6f64393f1953a740bd4 upstream.

audit_trim_trees() calls get_tree().  If a failure occurs we must call
put_tree().

[akpm@linux-foundation.org: run put_tree() before mutex_lock() for small scalability improvement]
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoixgbe: add missing rtnl_lock in PM resume path
Benjamin Poirier [Fri, 6 Apr 2012 07:20:21 +0000 (07:20 +0000)]
ixgbe: add missing rtnl_lock in PM resume path

commit 34948a947d1a576c10afee6d14792fd237549577 upstream.

Upon resume from standby, ixgbe may trigger the ASSERT_RTNL() in
netif_set_real_num_tx_queues(). The call stack is:
netif_set_real_num_tx_queues
ixgbe_set_num_queues
ixgbe_init_interrupt_scheme
ixgbe_resume

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agodrm/i915: Fix detection of base of stolen memory
Chris Wilson [Thu, 15 Nov 2012 11:32:18 +0000 (11:32 +0000)]
drm/i915: Fix detection of base of stolen memory

commit e12a2d53ae45a69aea499b64f75e7222cca0f12f upstream.

The routine to query the base of stolen memory was using the wrong
registers and the wrong encodings on virtually every platform.

It was not until the G33 refresh, that a PCI config register was
introduced that explicitly said where the stolen memory was. Prior to
865G there was not even a register that said where the end of usable
low memory was and where the stolen memory began (or ended depending
upon chipset). Before then, one has to look at the BIOS memory maps to
find the Top of Memory. Alas that is not exported by arch/x86 and so we
have to resort to disabling stolen memory on gen2 for the time being.

Then SandyBridge enlarged the PCI register to a full 32-bits and change
the encoding of the address, so even though we happened to be querying
the right register, we read the wrong bits and ended up using address 0
for our stolen data, i.e. notably FBC.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agor8169: fix 8168evl frame padding.
Stefan Bader [Sat, 4 May 2013 10:22:26 +0000 (12:22 +0200)]
r8169: fix 8168evl frame padding.

commit e5195c1f31f399289347e043d6abf3ffa80f0005 upstream.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: hayeswang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agosparc64: Fix race in TLB batch processing.
David S. Miller [Fri, 19 Apr 2013 21:26:26 +0000 (17:26 -0400)]
sparc64: Fix race in TLB batch processing.

[ Commits f36391d2790d04993f48da6a45810033a2cdf847 and
  f0af97070acbad5d6a361f485828223a4faaa0ee upstream. ]

As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
           flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

a) Implement __flush_tlb_page and per-cpu variants, patch
   as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
           upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonet: drop dst before queueing fragments
Eric Dumazet [Tue, 16 Apr 2013 12:55:41 +0000 (12:55 +0000)]
net: drop dst before queueing fragments

[ Upstream commit 97599dc792b45b1669c3cdb9a4b365aad0232f65 ]

Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.

Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.

Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.

Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)

When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.

Use same logic in IPv6, as there is no need to hold dst references.

Reported-by: Tom Parkin <tparkin@katalix.com>
Tested-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonetrom: fix invalid use of sizeof in nr_recvmsg()
Wei Yongjun [Tue, 9 Apr 2013 02:07:19 +0000 (10:07 +0800)]
netrom: fix invalid use of sizeof in nr_recvmsg()

[ Upstream commit c802d759623acbd6e1ee9fbdabae89159a513913 ]

sizeof() when applied to a pointer typed expression gives the size of the
pointer, not that of the pointed data.
Introduced by commit 3ce5ef(netrom: fix info leak via msg_name in nr_recvmsg)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agotipc: fix info leaks via msg_name in recv_msg/recv_stream
Mathias Krause [Sun, 7 Apr 2013 01:52:00 +0000 (01:52 +0000)]
tipc: fix info leaks via msg_name in  recv_msg/recv_stream

[ Upstream commit 60085c3d009b0df252547adb336d1ccca5ce52ec ]

The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agorose: fix info leak via msg_name in rose_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:59 +0000 (01:51 +0000)]
rose: fix info leak via msg_name in rose_recvmsg()

[ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ]

The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonetrom: fix info leak via msg_name in nr_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:57 +0000 (01:51 +0000)]
netrom: fix info leak via msg_name in nr_recvmsg()

[ Upstream commits 3ce5efad47b62c57a4f5c54248347085a750ce0e and
  c802d759623acbd6e1ee9fbdabae89159a513913 ]

In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agollc: Fix missing msg_namelen update in llc_ui_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:56 +0000 (01:51 +0000)]
llc: Fix missing msg_namelen update in  llc_ui_recvmsg()

[ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ]

For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoiucv: Fix missing msg_namelen update in iucv_sock_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:54 +0000 (01:51 +0000)]
iucv: Fix missing msg_namelen update in  iucv_sock_recvmsg()

[ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoirda: Fix missing msg_namelen update in irda_recvmsg_dgram()
Mathias Krause [Sun, 7 Apr 2013 01:51:53 +0000 (01:51 +0000)]
irda: Fix missing msg_namelen update in  irda_recvmsg_dgram()

[ Upstream commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agocaif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:52 +0000 (01:51 +0000)]
caif: Fix missing msg_namelen update in  caif_seqpkt_recvmsg()

[ Upstream commit 2d6fbfe733f35c6b355c216644e08e149c61b271 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.

Cc: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoBluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:50 +0000 (01:51 +0000)]
Bluetooth: RFCOMM - Fix missing msg_namelen update in  rfcomm_sock_recvmsg()

[ Upstream commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 ]

If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoBluetooth: fix possible info leak in bt_sock_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:49 +0000 (01:51 +0000)]
Bluetooth: fix possible info leak in bt_sock_recvmsg()

[ Upstream commit 4683f42fde3977bdb4e8a09622788cc8b5313778 ]

In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoax25: fix info leak via msg_name in ax25_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:48 +0000 (01:51 +0000)]
ax25: fix info leak via msg_name in ax25_recvmsg()

[ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ]

When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoatm: update msg_namelen in vcc_recvmsg()
Mathias Krause [Sun, 7 Apr 2013 01:51:47 +0000 (01:51 +0000)]
atm: update msg_namelen in vcc_recvmsg()

[ Upstream commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agotcp: call tcp_replace_ts_recent() from tcp_ack()
Eric Dumazet [Fri, 19 Apr 2013 07:19:48 +0000 (07:19 +0000)]
tcp: call tcp_replace_ts_recent() from tcp_ack()

[ Upstream commit 12fb3dd9dc3c64ba7d64cec977cca9b5fb7b1d4e ]

commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.

1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>

(ecr 200 should be ecr 300 in packets 3 & 4)

Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.

Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.

Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonet: sctp: sctp_auth_key_put: use kzfree instead of kfree
Daniel Borkmann [Thu, 7 Feb 2013 00:55:37 +0000 (00:55 +0000)]
net: sctp: sctp_auth_key_put: use kzfree instead of  kfree

[ Upstream commit 586c31f3bf04c290dc0a0de7fc91d20aa9a5ee53 ]

For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoesp4: fix error return code in esp_output()
Wei Yongjun [Sat, 13 Apr 2013 15:49:03 +0000 (15:49 +0000)]
esp4: fix error return code in esp_output()

[ Upstream commit 06848c10f720cbc20e3b784c0df24930b7304b93 ]

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agotcp: incoming connections might use wrong route under synflood
Dmitry Popov [Thu, 11 Apr 2013 08:55:07 +0000 (08:55 +0000)]
tcp: incoming connections might use wrong route under  synflood

[ Upstream commit d66954a066158781ccf9c13c91d0316970fe57b6 ]

There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
   inet_sk_flowi_flags(sk),
   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
   ireq->loc_addr, th->source, th->dest);

Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
taken. This dst_entry is used by new socket (get_cookie_sock ->
tcp_v4_syn_recv_sock), so its packets may take the wrong path.

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agortnetlink: Call nlmsg_parse() with correct header length
Michael Riesch [Mon, 8 Apr 2013 05:45:26 +0000 (05:45 +0000)]
rtnetlink: Call nlmsg_parse() with correct header  length

[ Upstream commit 88c5b5ce5cb57af6ca2a7cf4d5715fa320448ff9 ]

Signed-off-by: Michael Riesch <michael.riesch@omicron.at>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonetfilter: don't reset nf_trace in nf_reset()
Patrick McHardy [Fri, 5 Apr 2013 18:42:05 +0000 (20:42 +0200)]
netfilter: don't reset nf_trace in nf_reset()

[ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ]

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoaf_unix: If we don't care about credentials coallesce all messages
Eric W. Biederman [Wed, 3 Apr 2013 16:14:47 +0000 (16:14 +0000)]
af_unix: If we don't care about credentials coallesce  all messages

[ Upstream commit 0e82e7f6dfeec1013339612f74abc2cdd29d43d2 ]

It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.

The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.

The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.

Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.

I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.

Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agobonding: IFF_BONDING is not stripped on enslave failure
nikolay@redhat.com [Thu, 11 Apr 2013 09:18:56 +0000 (09:18 +0000)]
bonding: IFF_BONDING is not stripped on enslave  failure

[ Upstream commit b6a5a7b9a528a8b4c8bec940b607c5dd9102b8cc ]

While enslaving a new device and after IFF_BONDING flag is set, in case
of failure it is not stripped from the device's priv_flags while
cleaning up, which could lead to other problems.
Cleaning at err_close because the flag is set after dev_open().

v2: no change

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agobonding: fix bonding_masters race condition in bond unloading
nikolay@redhat.com [Sat, 6 Apr 2013 00:54:38 +0000 (00:54 +0000)]
bonding: fix bonding_masters race condition in bond  unloading

[ Upstream commit 69b0216ac255f523556fa3d4ff030d857eaaa37f ]

While the bonding module is unloading, it is considered that after
rtnl_link_unregister all bond devices are destroyed but since no
synchronization mechanism exists, a new bond device can be created
via bonding_masters before unregister_pernet_subsys which would
lead to multiple problems (e.g. NULL pointer dereference, wrong RIP,
list corruption).

This patch fixes the issue by removing any bond devices left in the
netns after bonding_masters is removed from sysfs.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoatl1e: limit gso segment size to prevent generation of wrong ip length fields
Hannes Frederic Sowa [Tue, 2 Apr 2013 14:36:46 +0000 (14:36 +0000)]
atl1e: limit gso segment size to prevent generation of  wrong ip length fields

[ Upstream commit 31d1670e73f4911fe401273a8f576edc9c2b5fea ]

The limit of 0x3c00 is taken from the windows driver.

Suggested-by: Huang, Xiong <xiong@qca.qualcomm.com>
Cc: Huang, Xiong <xiong@qca.qualcomm.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonet: count hw_addr syncs so that unsync works properly.
Vlad Yasevich [Tue, 2 Apr 2013 21:10:07 +0000 (17:10 -0400)]
net: count hw_addr syncs so that unsync works  properly.

[ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ]

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agonet IPv6 : Fix broken IPv6 routing table after loopback down-up
Balakumaran Kannan [Tue, 2 Apr 2013 10:45:05 +0000 (16:15 +0530)]
net IPv6 : Fix broken IPv6 routing table after  loopback down-up

[ Upstream commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f ]

IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agocbq: incorrect processing of high limits
Vasily Averin [Mon, 1 Apr 2013 03:01:32 +0000 (03:01 +0000)]
cbq: incorrect processing of high limits

[ Upstream commit f0f6ee1f70c4eaab9d52cf7d255df4bd89f8d1c2 ]

currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

 In shaper | Actual Result
-----------+---------------
  100M     | 108 Mbps
  200M     | 244 Mbps
  300M     | 412 Mbps
  500M     | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agovm: convert HPET mmap to vm_iomap_memory() helper
Linus Torvalds [Fri, 19 Apr 2013 16:46:39 +0000 (09:46 -0700)]
vm: convert HPET mmap to vm_iomap_memory() helper

commit 2323036dfec8ce3ce6e1c86a49a31b039f3300d1 upstream.

This is my example conversion of a few existing mmap users.  The HPET
case is simple, widely available, and easy to test (Clemens Ladisch sent
a trivial test-program for it).

Test-program-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agovm: convert fb_mmap to vm_iomap_memory() helper
Linus Torvalds [Fri, 19 Apr 2013 16:57:35 +0000 (09:57 -0700)]
vm: convert fb_mmap to vm_iomap_memory() helper

commit fc9bbca8f650e5f738af8806317c0a041a48ae4a upstream.

This is my example conversion of a few existing mmap users.  The
fb_mmap() case is a good example because it is a bit more complicated
than some: fb_mmap() mmaps one of two different memory areas depending
on the page offset of the mmap (but happily there is never any mixing of
the two, so the helper function still works).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: fold in the relevant part of commit 314e51b9851b
 'mm: kill vma flag VM_RESERVED and mm->reserved_vm counter']
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agovm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper
Linus Torvalds [Fri, 19 Apr 2013 17:01:04 +0000 (10:01 -0700)]
vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper

commit 0fe09a45c4848b5b5607b968d959fdc1821c161d upstream.

This is my example conversion of a few existing mmap users.  The pcm
mmap case is one of the more straightforward ones.

Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agovm: add vm_iomap_memory() helper function
Linus Torvalds [Tue, 16 Apr 2013 20:45:37 +0000 (13:45 -0700)]
vm: add vm_iomap_memory() helper function

commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e upstream.

Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.

Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size.  But it just means
that drivers end up doing that shifting instead at the interface level.

It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.

So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agos390: move dummy io_remap_pfn_range() to asm/pgtable.h
Linus Torvalds [Wed, 17 Apr 2013 15:46:19 +0000 (08:46 -0700)]
s390: move dummy io_remap_pfn_range() to asm/pgtable.h

commit 4f2e29031e6c67802e7370292dd050fd62f337ee upstream.

Commit b4cbb197c7e7 ("vm: add vm_iomap_memory() helper function") added
a helper function wrapper around io_remap_pfn_range(), and every other
architecture defined it in <asm/pgtable.h>.

The s390 choice of <asm/io.h> may make sense, but is not very convenient
for this case, and gratuitous differences like that cause unexpected errors like this:

   mm/memory.c: In function 'vm_iomap_memory':
   mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration]

Glory be the kbuild test robot who noticed this, bisected it, and
reported it to the guilty parties (ie me).

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: the macro was not defined, so this is an addition
 and not a move]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoperf/x86: Fix offcore_rsp valid mask for SNB/IVB
Stephane Eranian [Tue, 16 Apr 2013 11:51:43 +0000 (13:51 +0200)]
perf/x86: Fix offcore_rsp valid mask for SNB/IVB

commit f1923820c447e986a9da0fc6bf60c1dccdf0408e upstream.

The valid mask for both offcore_response_0 and
offcore_response_1 was wrong for SNB/SNB-EP,
IVB/IVB-EP. It was possible to write to
reserved bit and cause a GP fault crashing
the kernel.

This patch fixes the problem by correctly marking the
reserved bits in the valid mask for all the processors
mentioned above.

A distinction between desktop and server parts is introduced
because bits 24-30 are only available on the server parts.

This version of the  patch is just a rebase to perf/urgent tree
and should apply to older kernels as well.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: security@kernel.org
Cc: ak@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context; drop the IVB case]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoperf: Treat attr.config as u64 in perf_swevent_init()
Tommi Rantala [Sat, 13 Apr 2013 19:49:14 +0000 (22:49 +0300)]
perf: Treat attr.config as u64 in perf_swevent_init()

commit 8176cced706b5e5d15887584150764894e94e02f upstream.

Trinity discovered that we fail to check all 64 bits of
attr.config passed by user space, resulting to out-of-bounds
access of the perf_swevent_enabled array in
sw_perf_event_destroy().

Introduced in commit b0a873ebb ("perf: Register PMU
implementations").

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/1365882554-30259-1-git-send-email-tt.rantala@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoperf: Fix error return code
Wei Yongjun [Fri, 12 Apr 2013 03:05:54 +0000 (11:05 +0800)]
perf: Fix error return code

commit c481420248c6730246d2a1b1773d5d7007ae0835 upstream.

Fix to return -ENOMEM in the allocation error case instead of 0
(if pmu_bus_running == 1), as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/CAPgLHd8j_fWcgqe%3DKLWjpBj%2B%3Do0Pw6Z-SEq%3DNTPU08c2w1tngQ@mail.gmail.com
[ Tweaked the error code setting placement and the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agotty: fix up atime/mtime mess, take three
Linus Torvalds [Wed, 1 May 2013 14:32:21 +0000 (07:32 -0700)]
tty: fix up atime/mtime mess, take three

commit b0b885657b6c8ef63a46bc9299b2a7715d19acde upstream.

We first tried to avoid updating atime/mtime entirely (commit
b0de59b5733d: "TTY: do not update atime/mtime on read/write"), and then
limited it to only update it occasionally (commit 37b7f3c76595: "TTY:
fix atime/mtime regression"), but it turns out that this was both
insufficient and overkill.

It was insufficient because we let people attach to the shared ptmx node
to see activity without even reading atime/mtime, and it was overkill
because the "only once a minute" means that you can't really tell an
idle person from an active one with 'w'.

So this tries to fix the problem properly.  It marks the shared ptmx
node as un-notifiable, and it lowers the "only once a minute" to a few
seconds instead - still long enough that you can't time individual
keystrokes, but short enough that you can tell whether somebody is
active or not.

Reported-by: Simon Kirby <sim@hostway.ca>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoTTY: fix atime/mtime regression
Jiri Slaby [Fri, 26 Apr 2013 11:48:53 +0000 (13:48 +0200)]
TTY: fix atime/mtime regression

commit 37b7f3c76595e23257f61bd80b223de8658617ee upstream.

In commit b0de59b5733d ("TTY: do not update atime/mtime on read/write")
we removed timestamps from tty inodes to fix a security issue and waited
if something breaks.  Well, 'w', the utility to find out logged users
and their inactivity time broke.  It shows that users are inactive since
the time they logged in.

To revert to the old behaviour while still preventing attackers to
guess the password length, we update the timestamps in one-minute
intervals by this patch.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: For 3.2, use Greg's backported version]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoTTY: do not update atime/mtime on read/write
Jiri Slaby [Fri, 15 Feb 2013 14:25:05 +0000 (15:25 +0100)]
TTY: do not update atime/mtime on read/write

commit b0de59b5733d18b0d1974a060860a8b5c1b36a2e upstream.

On http://vladz.devzero.fr/013_ptmx-timing.php, we can see how to find
out length of a password using timestamps of /dev/ptmx. It is
documented in "Timing Analysis of Keystrokes and Timing Attacks on
SSH". To avoid that problem, do not update time when reading
from/writing to a TTY.

I am afraid of regressions as this is a behavior we have since 0.97
and apps may expect the time to be current, e.g. for monitoring
whether there was a change on the TTY. Now, there is no change. So
this would better have a lot of testing before it goes upstream.

References: CVE-2013-0160

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agodrm/radeon: fix handling of v6 power tables
Alex Deucher [Wed, 1 May 2013 18:34:54 +0000 (14:34 -0400)]
drm/radeon: fix handling of v6 power tables

commit 441e76ca83ac604eaf0f046def96d8e3a27eea28 upstream.

The code was mis-handling variable sized arrays.

Reported-by: Sylvain BERTRAND <sylware@legeek.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agodrm/radeon: fix possible segfault when parsing pm tables
Alex Deucher [Thu, 25 Apr 2013 13:29:17 +0000 (09:29 -0400)]
drm/radeon: fix possible segfault when parsing pm tables

commit f8e6bfc2ce162855fa4f9822a45659f4b542c960 upstream.

If we have a empty power table, bail early and allocate
the default power state.

Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=63865

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agodrm/radeon: fix endian bugs in atom_allocate_fb_scratch()
Alex Deucher [Wed, 24 Apr 2013 18:39:31 +0000 (14:39 -0400)]
drm/radeon: fix endian bugs in atom_allocate_fb_scratch()

commit beb71fc61c2cad64e347f164991b8ef476529e64 upstream.

Reviwed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoipc: sysv shared memory limited to 8TiB
Robin Holt [Wed, 1 May 2013 02:15:54 +0000 (19:15 -0700)]
ipc: sysv shared memory limited to 8TiB

commit d69f3bad4675ac519d41ca2b11e1c00ca115cecd upstream.

Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.

In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
 474         if (ns->shm_tot + numpages > ns->shm_ctlall)
 475                 return -ENOSPC;

[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt <holt@sgi.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agofs/dcache.c: add cond_resched() to shrink_dcache_parent()
Greg Thelen [Tue, 30 Apr 2013 22:26:48 +0000 (15:26 -0700)]
fs/dcache.c: add cond_resched() to shrink_dcache_parent()

commit 421348f1ca0bf17769dee0aed4d991845ae0536d upstream.

Call cond_resched() in shrink_dcache_parent() to maintain interactivity.

Before this patch:

void shrink_dcache_parent(struct dentry * parent)
{
while ((found = select_parent(parent, &dispose)) != 0)
shrink_dentry_list(&dispose);
}

select_parent() populates the dispose list with dentries which
shrink_dentry_list() then deletes.  select_parent() carefully uses
need_resched() to avoid doing too much work at once.  But neither
shrink_dcache_parent() nor its called functions call cond_resched().  So
once need_resched() is set select_parent() will return single dentry
dispose list which is then deleted by shrink_dentry_list().  This is
inefficient when there are a lot of dentry to process.  This can cause
softlockup and hurts interactivity on non preemptable kernels.

This change adds cond_resched() in shrink_dcache_parent().  The benefit
of this is that need_resched() is quickly cleared so that future calls
to select_parent() are able to efficiently return a big batch of dentry.

These additional cond_resched() do not seem to impact performance, at
least for the workload below.

Here is a program which can cause soft lockup if other system activity
sets need_resched().

int main()
{
        struct rlimit rlim;
        int i;
        int f[100000];
        char buf[20];
        struct timeval t1, t2;
        double diff;

        /* cleanup past run */
        system("rm -rf x");

        /* boost nfile rlimit */
        rlim.rlim_cur = 200000;
        rlim.rlim_max = 200000;
        if (setrlimit(RLIMIT_NOFILE, &rlim))
                err(1, "setrlimit");

        /* make directory for files */
        if (mkdir("x", 0700))
                err(1, "mkdir");

        if (gettimeofday(&t1, NULL))
                err(1, "gettimeofday");

        /* populate directory with open files */
        for (i = 0; i < 100000; i++) {
                snprintf(buf, sizeof(buf), "x/%d", i);
                f[i] = open(buf, O_CREAT);
                if (f[i] == -1)
                        err(1, "open");
        }

        /* close some of the files */
        for (i = 0; i < 85000; i++)
                close(f[i]);

        /* unlink all files, even open ones */
        system("rm -rf x");

        if (gettimeofday(&t2, NULL))
                err(1, "gettimeofday");

        diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
                ((double)t1.tv_sec * 1000000 + t1.tv_usec));

        printf("done: %g elapsed\n", diff/1e6);
        return 0;
}

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoinotify: invalid mask should return a error number but not set it
Zhao Hongjiang [Tue, 30 Apr 2013 22:26:46 +0000 (15:26 -0700)]
inotify: invalid mask should return a error number but not set it

commit 04df32fa10ab9a6f0643db2949d42efc966bc844 upstream.

When we run the crackerjack testsuite, the inotify_add_watch test is
stalled.

This is caused by the invalid mask 0 - the task is waiting for the event
but it never comes.  inotify_add_watch() should return -EINVAL as it did
before commit 676a0675cf92 ("inotify: remove broken mask checks causing
unmount to be EINVAL").  That commit removes the invalid mask check, but
that check is needed.

Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call.
If none are set, just return -EINVAL.

Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger
the problem that above commit fixed.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: Jim Somerville <Jim.Somerville@windriver.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agomd: bad block list should default to disabled.
NeilBrown [Wed, 24 Apr 2013 01:42:44 +0000 (11:42 +1000)]
md: bad block list should default to disabled.

commit 486adf72ccc0c235754923d47a2270c5dcb0c98b upstream.

Maintenance of a bad-block-list currently defaults to 'enabled'
and is then disabled when it cannot be supported.
This is backwards and causes problem for dm-raid which didn't know
to disable it.

So fix the defaults, and only enabled for v1.x metadata which
explicitly has bad blocks enabled.

The problem with dm-raid has been present since badblock support was
added in v3.1, so this patch is suitable for any -stable from 3.1
onwards.

Reported-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agodrivers/rtc/rtc-cmos.c: don't disable hpet emulation on suspend
Derek Basehore [Mon, 29 Apr 2013 23:20:23 +0000 (16:20 -0700)]
drivers/rtc/rtc-cmos.c: don't disable hpet emulation on suspend

commit e005715efaf674660ae59af83b13822567e3a758 upstream.

There's a bug where rtc alarms are ignored after the rtc cmos suspends
but before the system finishes suspend.  Since hpet emulation is
disabled and it still handles the interrupts, a wake event is never
registered which is done from the rtc layer.

This patch reverts commit d1b2efa83fbf ("rtc: disable hpet emulation on
suspend") which disabled hpet emulation.  To fix the problem mentioned
in that commit, hpet_rtc_timer_init() is called directly on resume.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agofs/fscache/stats.c: fix memory leak
Anurup m [Mon, 29 Apr 2013 22:05:52 +0000 (15:05 -0700)]
fs/fscache/stats.c: fix memory leak

commit ec686c9239b4d472052a271c505d04dae84214cc upstream.

There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.

The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release.  Hence fix
with correct release function - single_release().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101

Signed-off-by: Anurup m <anurup.m@huawei.com>
Cc: shyju pv <shyju.pv@huawei.com>
Cc: Sanil kumar <sanil.kumar@huawei.com>
Cc: Nataraj m <nataraj.m@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoARM: u300: fix ages old copy/paste bug
Linus Walleij [Fri, 26 Apr 2013 13:29:55 +0000 (15:29 +0200)]
ARM: u300: fix ages old copy/paste bug

commit 0259d9eb30d003af305626db2d8332805696e60d upstream.

The UART1 is on the fast AHB bridge, not on the slow bus.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agomwifiex: Call pci_release_region after calling pci_disable_device
Yogesh Ashok Powar [Tue, 23 Apr 2013 23:49:48 +0000 (16:49 -0700)]
mwifiex: Call pci_release_region after calling pci_disable_device

commit 5b0d9b218b74042ff72bf4bfda6eeb2e4bf98397 upstream.

"drivers should call pci_release_region() AFTER
calling pci_disable_device()"

Please refer section 3.2 Request MMIO/IOP resources
in Documentation/PCI/pci.txt

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Yogesh Ashok Powar <yogeshp@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agomwifiex: Use pci_release_region() instead of a pci_release_regions()
Yogesh Ashok Powar [Tue, 23 Apr 2013 23:49:47 +0000 (16:49 -0700)]
mwifiex: Use pci_release_region() instead of a pci_release_regions()

commit c380aafb77b7435d010698fe3ca6d3e1cd745fde upstream.

PCI regions are associated with the device using
pci_request_region() call. Hence use pci_release_region()
instead of pci_release_regions().

Signed-off-by: Yogesh Ashok Powar <yogeshp@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agos390/memory hotplug: prevent offline of active memory increments
Heiko Carstens [Thu, 25 Apr 2013 08:03:15 +0000 (10:03 +0200)]
s390/memory hotplug: prevent offline of active memory increments

commit 94c163663fc1dcfc067a5fb3cc1446b9469975ce upstream.

In case a machine supports memory hotplug all active memory increments
present at IPL time have been initialized with a "usecount" of 1.
This is wrong if the memory increment size is larger than the memory
section size of the memory hotplug code. If that is the case the
usecount must be initialized with the number of memory sections that
fit into one memory increment.
Otherwise it is possible to put a memory increment into standby state
even if there are still active sections.
Afterwards addressing exceptions might happen which cause the kernel
to panic.
However even worse, if a memory increment was put into standby state
and afterwards into active state again, it's contents would have been
zeroed, leading to memory corruption.

This was only an issue for machines that support standby memory and
have at least 256GB memory.

This is broken since commit fdb1bb15 "[S390] sclp/memory hotplug: fix
initial usecount of increments".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agopowerpc: Add isync to copy_and_flush
Michael Neuling [Wed, 24 Apr 2013 00:30:09 +0000 (00:30 +0000)]
powerpc: Add isync to copy_and_flush

commit 29ce3c5073057991217916abc25628e906911757 upstream.

In __after_prom_start we copy the kernel down to zero in two calls to
copy_and_flush.  After the first call (copy from 0 to copy_to_here:)
we jump to the newly copied code soon after.

Unfortunately there's no isync between the copy of this code and the
jump to it.  Hence it's possible that stale instructions could still be
in the icache or pipeline before we branch to it.

We've seen this on real machines and it's results in no console output
after:
  calling quiesce...
  returning from prom_init

The below adds an isync to ensure that the copy and flushing has
completed before any branching to the new instructions occurs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoixgbe: fix EICR write in ixgbe_msix_other
Jacob Keller [Sat, 2 Mar 2013 07:51:42 +0000 (07:51 +0000)]
ixgbe: fix EICR write in ixgbe_msix_other

commit d87d830720a1446403ed38bfc2da268be0d356d1 upstream.

Previously, the ixgbe_msix_other was writing the full 32bits of the set
interrupts, instead of only the ones which the ixgbe_msix_other is
handling. This resulted in a loss of performance when the X540's PPS feature is
enabled due to sometimes clearing queue interrupts which resulted in the driver
not getting the interrupt for cleaning the q_vector rings often enough. The fix
is to simply mask the lower 16bits off so that this handler does not write them
in the EICR, which causes them to remain high and be properly handled by the
clean_rings interrupt routine as normal.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoclockevents: Set dummy handler on CPU_DEAD shutdown
Thomas Gleixner [Thu, 25 Apr 2013 09:45:53 +0000 (11:45 +0200)]
clockevents: Set dummy handler on CPU_DEAD shutdown

commit 6f7a05d7018de222e40ca003721037a530979974 upstream.

Vitaliy reported that a per cpu HPET timer interrupt crashes the
system during hibernation. What happens is that the per cpu HPET timer
gets shut down when the nonboot cpus are stopped. When the nonboot
cpus are onlined again the HPET code sets up the MSI interrupt which
fires before the clock event device is registered. The event handler
is still set to hrtimer_interrupt, which then crashes the machine due
to highres mode not being active.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333

There is no real good way to avoid that in the HPET code. The HPET
code alrady has a mechanism to detect spurious interrupts when event
handler == NULL for a similar reason.

We can handle that in the clockevent/tick layer and replace the
previous functional handler with a dummy handler like we do in
tick_setup_new_device().

The original clockevents code did this in clockevents_exchange_device(),
but that got removed by commit 7c1e76897 (clockevents: prevent
clockevent event_handler ending up handler_noop) which forgot to fix
it up in tick_shutdown(). Same issue with the broadcast device.

Reported-by: Vitaliy Fillipov <vitalif@yourcmc.ru>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: 700333@bugs.debian.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoALSA: usb-audio: Fix autopm error during probing
Takashi Iwai [Thu, 25 Apr 2013 05:38:15 +0000 (07:38 +0200)]
ALSA: usb-audio: Fix autopm error during probing

commit 60af3d037eb8c670dcce31401501d1271e7c5d95 upstream.

We've got strange errors in get_ctl_value() in mixer.c during
probing, e.g. on Hercules RMX2 DJ Controller:

  ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x201, wIndex = 0xa00, type = 4
  ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x200, wIndex = 0xa00, type = 4
  ....

It turned out that the culprit is autopm: snd_usb_autoresume() returns
-ENODEV when called during card->probing = 1.

Since the call itself during card->probing = 1 is valid, let's fix the
return value of snd_usb_autoresume() as success.

Reported-and-tested-by: Daniel Schürmann <daschuer@mixxx.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agogianfar: do not advertise any alarm capability.
Richard Cochran [Mon, 22 Apr 2013 19:42:16 +0000 (19:42 +0000)]
gianfar: do not advertise any alarm capability.

commit cd4baaaa04b4aaa3b0ec4d13a6f3d203b92eadbd upstream.

An early draft of the PHC patch series included an alarm in the
gianfar driver. During the review process, the alarm code was dropped,
but the capability removal was overlooked. This patch fixes the issue
by advertising zero alarms.

This patch should be applied to every 3.x stable kernel.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Chris LaRocque <clarocq@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT
Daniel Mack [Wed, 24 Apr 2013 17:38:42 +0000 (19:38 +0200)]
ALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT

commit ebfc594c02148b6a85c2f178cf167a44a3c3ce10 upstream.

The USB_DT_CS_ENDPOINT class-specific endpoint descriptor is usually
stuffed directly after the standard USB endpoint descriptor, and this is
where the driver currently expects it to be.

There are, however, devices in the wild that have it the other way
around in their descriptor sets, so the USB_DT_CS_ENDPOINT comes
*before* the standard enpoint. Devices known to implement it that way
are "Sennheiser BTD-500" and Plantronics USB headsets.

When the driver can't find the USB_DT_CS_ENDPOINT, it won't be able to
change sample rates, as the bitmask for the validity of this command is
storen in bmAttributes of that descriptor.

Fix this by searching the entire interface instead of just the extra
bytes of the first endpoint, in case the latter fails.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-and-tested-by: Torstein Hegge <hegge@resisty.net>
Reported-and-tested-by: Yves G <alsa-user@vivigatt.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agoUSB: ftdi_sio: enable two UART ports on ST Microconnect Lite
Adrian Thomasset [Wed, 24 Apr 2013 10:37:35 +0000 (11:37 +0100)]
USB: ftdi_sio: enable two UART ports on ST Microconnect Lite

commit 71d9a2b95fc9c9474d46d764336efd7a5a805555 upstream.

The FT4232H used in the ST Micro Connect Lite has four hi-speed UART ports.
The first two ports are reserved for the JTAG interface.

We enable by default ports 2 and 3 as UARTs (where port 2 is a
conventional RS-232 UART)

Signed-off-by: Adrian Thomasset <adrian.thomasset@st.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agopowerpc/spufs: Initialise inode->i_ino in spufs_new_inode()
Michael Ellerman [Tue, 23 Apr 2013 15:13:14 +0000 (15:13 +0000)]
powerpc/spufs: Initialise inode->i_ino in spufs_new_inode()

commit 6747e83235caecd30b186d1282e4eba7679f81b7 upstream.

In commit 85fe402 (fs: do not assign default i_ino in new_inode), the
initialisation of i_ino was removed from new_inode() and pushed down
into the callers. However spufs_new_inode() was not updated.

This exhibits as no files appearing in /spu, because all our dirents
have a zero inode, which readdir() seems to dislike.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
8 years agofbcon: when font is freed, clear also vc_font.data
Mika Kuoppala [Mon, 22 Apr 2013 11:19:26 +0000 (14:19 +0300)]
fbcon: when font is freed, clear also vc_font.data

commit e6637d5427d2af9f3f33b95447bfc5347e5ccd85 upstream.

commit ae1287865f5361fa138d4d3b1b6277908b54eac9
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Jan 24 16:12:41 2013 +1000

    fbcon: don't lose the console font across generic->chip driver switch

uses a pointer in vc->vc_font.data to load font into the new driver.
However if the font is actually freed, we need to clear the data
so that we don't reload font from dangling pointer.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=892340
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>