pandora-kernel.git
11 years agoKVM: s390: add capability indicating COW support
Christian Borntraeger [Tue, 15 May 2012 12:15:25 +0000 (14:15 +0200)]
KVM: s390: add capability indicating COW support

Currently qemu/kvm on s390 uses a guest mapping that does not
allow the guest backing page table to be write-protected to
support older systems. On those older systems a host write
protection fault will be delivered to the guest.

Newer systems allow to write-protect the guest backing memory
and let the fault be delivered to the host, thus allowing COW.

Use a capability bit to tell qemu if that is possible.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: Fix mmu_reload() clash with nested vmx event injection
Avi Kivity [Mon, 14 May 2012 15:07:56 +0000 (18:07 +0300)]
KVM: Fix mmu_reload() clash with nested vmx event injection

Currently the inject_pending_event() call during guest entry happens after
kvm_mmu_reload().  This is for historical reasons - we used to
inject_pending_event() in atomic context, while kvm_mmu_reload() needs task
context.

A problem is that nested vmx can cause the mmu context to be reset, if event
injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets
CR0/CR3/CR4).  If this happens, we end up with invalid root_hpa, and since
kvm_mmu_reload() has already run, no one will fix it and we end up entering
the guest this way.

Fix by reordering event injection to be before kvm_mmu_reload().  Use
->cancel_injection() to undo if kvm_mmu_reload() fails.

https://bugzilla.kernel.org/show_bug.cgi?id=42980

Reported-by: Luke-Jr <luke-jr+linuxbugs@utopios.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: MMU: Don't use RCU for lockless shadow walking
Avi Kivity [Mon, 14 May 2012 12:44:06 +0000 (15:44 +0300)]
KVM: MMU: Don't use RCU for lockless shadow walking

Using RCU for lockless shadow walking can increase the amount of memory
in use by the system, since RCU grace periods are unpredictable.  We also
have an unconditional write to a shared variable (reader_counter), which
isn't good for scaling.

Replace that with a scheme similar to x86's get_user_pages_fast(): disable
interrupts during lockless shadow walk to force the freer
(kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
processor with interrupts enabled.

We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
kvm_flush_remote_tlbs() from avoiding the IPI.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Optimize %ds, %es reload
Avi Kivity [Sun, 13 May 2012 16:53:24 +0000 (19:53 +0300)]
KVM: VMX: Optimize %ds, %es reload

On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
since nothing in the lightweight paths uses the host %ds or %es (they are
ignored by the processor).  Furthermore we can avoid the load if the segments
are null, by letting the hardware load the null segments for us.  This is the
expected case.

On i386, we could avoid the reload entirely, since the entry.S paths take care
of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
So we set them to the same values as well.

Saves about 70 cycles out of 1600 (around 4%; noisy measurements).

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: VMX: Fix %ds/%es clobber
Avi Kivity [Sun, 13 May 2012 16:53:23 +0000 (19:53 +0300)]
KVM: VMX: Fix %ds/%es clobber

The vmx exit code unconditionally restores %ds and %es to __USER_DS.  This
can override the user's values, since %ds and %es are not saved and restored
in x86_64 syscalls.  In practice, this isn't dangerous since nobody uses
segment registers in long mode, least of all programs that use KVM.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
Joerg Roedel [Mon, 7 May 2012 10:12:25 +0000 (12:12 +0200)]
KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()

The instruction emulation for bsrw is broken in KVM because
the code always uses bsr with 32 or 64 bit operand size for
emulation. Fix that by using emulate_2op_SrcV_nobyte() macro
to use guest operand size for emulation.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoKVM: VMX: unlike vmcs on fail path
Xiao Guangrong [Mon, 14 May 2012 06:58:58 +0000 (14:58 +0800)]
KVM: VMX: unlike vmcs on fail path

fix:

[ 1529.577273] Call Trace:
[ 1529.577289]  [<ffffffffa060d58f>] kvm_arch_hardware_disable+0x13/0x30 [kvm]
[ 1529.577302]  [<ffffffffa05fa2d4>] hardware_disable_nolock+0x35/0x39 [kvm]
[ 1529.577311]  [<ffffffffa05fa29f>] ? cpumask_clear_cpu.constprop.31+0x13/0x13 [kvm]
[ 1529.577315]  [<ffffffff81096ba8>] on_each_cpu+0x44/0x84
[ 1529.577326]  [<ffffffffa05f98b5>] hardware_disable_all_nolock+0x34/0x36 [kvm]
[ 1529.577335]  [<ffffffffa05f98e2>] hardware_disable_all+0x2b/0x39 [kvm]
[ 1529.577349]  [<ffffffffa05fafe5>] kvm_put_kvm+0xed/0x10f [kvm]
[ 1529.577358]  [<ffffffffa05fb3d7>] kvm_vm_release+0x22/0x28 [kvm]

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoMerge branch 'for-upstream' of git://github.com/agraf/linux-2.6 into next
Avi Kivity [Tue, 8 May 2012 14:00:13 +0000 (17:00 +0300)]
Merge branch 'for-upstream' of git://github.com/agraf/linux-2.6 into next

PPC updates from Alex.

* 'for-upstream' of git://github.com/agraf/linux-2.6:
  KVM: PPC: Emulator: clean up SPR reads and writes
  KVM: PPC: Emulator: clean up instruction parsing
  kvm/powerpc: Add new ioctl to retreive server MMU infos
  kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
  KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
  KVM: PPC: Book3S: Enable IRQs during exit handling
  KVM: PPC: Fix PR KVM on POWER7 bare metal
  KVM: PPC: Fix stbux emulation
  KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
  KVM: PPC: Book3S: PR: No isync in slbie path
  KVM: PPC: Book3S: PR: Optimize entry path
  KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.
  KVM: PPC: Restrict PPC_[L|ST]D macro to asm code
  KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies.
  KVM: PPC: Use clockevent multiplier and shifter for decrementer
  KVM: Use minimum and maximum address mapped by TLB1

Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoKVM: PPC: Emulator: clean up SPR reads and writes
Alexander Graf [Fri, 4 May 2012 12:55:12 +0000 (14:55 +0200)]
KVM: PPC: Emulator: clean up SPR reads and writes

When reading and writing SPRs, every SPR emulation piece had to read
or write the respective GPR the value was read from or stored in itself.

This approach is pretty prone to failure. What if we accidentally
implement mfspr emulation where we just do "break" and nothing else?
Suddenly we would get a random value in the return register - which is
always a bad idea.

So let's consolidate the generic code paths and only give the core
specific SPR handling code readily made variables to read/write from/to.

Functionally, this patch doesn't change anything, but it increases the
readability of the code and makes is less prone to bugs.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Emulator: clean up instruction parsing
Alexander Graf [Fri, 4 May 2012 12:01:33 +0000 (14:01 +0200)]
KVM: PPC: Emulator: clean up instruction parsing

Instructions on PPC are pretty similarly encoded. So instead of
every instruction emulation code decoding the instruction fields
itself, we can move that code to more generic places and rely on
the compiler to optimize the unused bits away.

This has 2 advantages. It makes the code smaller and it makes the
code less error prone, as the instruction fields are always
available, so accidental misusage is reduced.

Functionally, this patch doesn't change anything.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/powerpc: Add new ioctl to retreive server MMU infos
Benjamin Herrenschmidt [Thu, 26 Apr 2012 19:43:42 +0000 (19:43 +0000)]
kvm/powerpc: Add new ioctl to retreive server MMU infos

This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agokvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
Benjamin Herrenschmidt [Thu, 15 Mar 2012 21:58:34 +0000 (21:58 +0000)]
kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM

There is nothing in the code for emulating TCE tables in the kernel
that prevents it from working on "PR" KVM... other than ifdef's and
location of the code.

This and moves the bulk of the code there to a new file called
book3s_64_vio.c.

This speeds things up a bit on my G5.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix for hv kvm, 32bit, whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
Mihai Caraman [Mon, 16 Apr 2012 04:08:53 +0000 (04:08 +0000)]
KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler

Guest r8 register is held in the scratch register and stored correctly,
so remove the instruction that clobbers it. Guest r13 was missing from vcpu,
store it there.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S: Enable IRQs during exit handling
Alexander Graf [Mon, 30 Apr 2012 08:56:12 +0000 (10:56 +0200)]
KVM: PPC: Book3S: Enable IRQs during exit handling

While handling an exit, we should listen for interrupts and make sure to
receive them when they arrive, to keep our latencies low.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Fix PR KVM on POWER7 bare metal
Alexander Graf [Fri, 27 Apr 2012 14:33:35 +0000 (16:33 +0200)]
KVM: PPC: Fix PR KVM on POWER7 bare metal

When running on a system that is HV capable, some interrupts use HSRR
SPRs instead of the normal SRR SPRs. These are also used in the Linux
handlers to jump back to code after an interrupt got processed.

Unfortunately, in our "jump back to the real host handler after we've
done the context switch" code, we were only setting the SRR SPRs,
rendering Linux to jump back to some invalid IP after it's processed
the interrupt.

This fixes random crashes on p7 opal mode with PR KVM for me.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Fix stbux emulation
Alexander Graf [Thu, 26 Apr 2012 23:00:17 +0000 (01:00 +0200)]
KVM: PPC: Fix stbux emulation

Stbux writes the address it's operating on to the register specified in ra,
not into the data source register.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
Mihai Caraman [Mon, 16 Apr 2012 04:08:54 +0000 (04:08 +0000)]
KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields

Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields
which led to memory overflow on 64-bit. Use lwz/stw instead.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S: PR: No isync in slbie path
Alexander Graf [Wed, 25 Apr 2012 12:29:57 +0000 (14:29 +0200)]
KVM: PPC: Book3S: PR: No isync in slbie path

While messing around with the SLBs we're running in real mode. The
entry to guest space goes through rfid, which is context synchronizing,
so there's no need to manually synchronize anything through isync.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2035607 to 2181301 exits per second.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Book3S: PR: Optimize entry path
Alexander Graf [Wed, 25 Apr 2012 12:28:23 +0000 (14:28 +0200)]
KVM: PPC: Book3S: PR: Optimize entry path

By shuffling a few instructions around we can execute more memory
loads in parallel, giving us a small performance boost.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2013052 to 2035607 exits per second.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.
Varun Sethi [Wed, 25 Apr 2012 01:27:34 +0000 (01:27 +0000)]
KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.

For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and
non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers.

Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Restrict PPC_[L|ST]D macro to asm code
Alexander Graf [Wed, 25 Apr 2012 11:48:54 +0000 (13:48 +0200)]
KVM: PPC: Restrict PPC_[L|ST]D macro to asm code

We only want asm code macros to be accessible from asm code, so #ifdef it
depending on it.

Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their...
Varun Sethi [Wed, 25 Apr 2012 01:26:43 +0000 (01:26 +0000)]
KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies.

Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies.

Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: PPC: Use clockevent multiplier and shifter for decrementer
Bharat Bhushan [Wed, 18 Apr 2012 06:01:19 +0000 (06:01 +0000)]
KVM: PPC: Use clockevent multiplier and shifter for decrementer

Time for which the hrtimer is started for decrementer emulation is calculated
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC
reprogramming (if needed) and which calculate timebase ticks using the
multiplier and shifter mechanism implemented within clockevent layer.

It was observed that this conversion (timebase->time->timebase) are not
correct because the mechanism are not consistent.
In our setup it adds 2% jitter.

With this patch clockevent multiplier and shifter mechanism are used when
starting hrtimer for decrementer emulation. Now the jitter is < 0.5%.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: Use minimum and maximum address mapped by TLB1
Bharat Bhushan [Thu, 22 Mar 2012 18:39:11 +0000 (18:39 +0000)]
KVM: Use minimum and maximum address mapped by TLB1

Keep track of minimum and maximum address mapped by tlb1.
This helps in TLBMISS handling in KVM to quick check whether the address lies in mapped range.
If address does not lies in this range then no need to look in each tlb1 entry of tlb1 array.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
11 years agoKVM: x86 emulator: Avoid pushing back ModRM byte fetched for group decoding
Takuya Yoshikawa [Mon, 30 Apr 2012 08:48:25 +0000 (17:48 +0900)]
KVM: x86 emulator: Avoid pushing back ModRM byte fetched for group decoding

Although ModRM byte is fetched for group decoding, it is soon pushed
back to make decode_modrm() fetch it later again.

Now that ModRM flag can be found in the top level opcode tables, fetch
ModRM byte before group decoding to make the code simpler.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoKVM: x86 emulator: Move ModRM flags for groups to top level opcode tables
Takuya Yoshikawa [Mon, 30 Apr 2012 08:46:31 +0000 (17:46 +0900)]
KVM: x86 emulator: Move ModRM flags for groups to top level opcode tables

Needed for the following patch which simplifies ModRM fetching code.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoKVM guest: make kvm_para_available() check hypervisor bit reading cpuid leaf
Gleb Natapov [Mon, 30 Apr 2012 11:45:49 +0000 (14:45 +0300)]
KVM guest: make kvm_para_available() check hypervisor bit reading cpuid leaf

This cpuid range does not exist on real HW and Intel spec says that
"Information returned for highest basic information leaf" will be
returned. Not very well defined.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoKVM: fix cpuid eax for KVM leaf
Michael S. Tsirkin [Wed, 2 May 2012 14:55:56 +0000 (17:55 +0300)]
KVM: fix cpuid eax for KVM leaf

cpuid eax should return the max leaf so that
guests can find out the valid range.
This matches Xen et al.
Update documentation to match.

Tested with -cpu host.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
11 years agoKVM: s390: implement KVM_CAP_NR/MAX_VCPUS
Christian Borntraeger [Wed, 2 May 2012 08:50:38 +0000 (10:50 +0200)]
KVM: s390: implement KVM_CAP_NR/MAX_VCPUS

Let userspace know the number of max and supported cpus for kvm on s390.
Return KVM_MAX_VCPUS (currently 64) for both values.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: s390: Handle sckpf instruction
Cornelia Huck [Tue, 24 Apr 2012 07:24:44 +0000 (09:24 +0200)]
KVM: s390: Handle sckpf instruction

Handle the mandatory intercept SET CLOCK PROGRAMMABLE FIELD
instruction.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: s390: use kvm_vcpu_on_spin for diag 0x44
Christian Borntraeger [Wed, 25 Apr 2012 13:30:39 +0000 (15:30 +0200)]
KVM: s390: use kvm_vcpu_on_spin for diag 0x44

Lets replace the old open coded version of diag 0x44 (which relied on
compat_sched_yield) with kvm_vcpu_on_spin.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
11 years agoKVM: s390: Implement the directed yield (diag 9c) hypervisor call for KVM
Konstantin Weitz [Wed, 25 Apr 2012 13:30:38 +0000 (15:30 +0200)]
KVM: s390: Implement the directed yield (diag 9c) hypervisor call for KVM

This patch implements the directed yield hypercall found on other
System z hypervisors. It delegates execution time to the virtual cpu
specified in the instruction's parameter.

Useful to avoid long spinlock waits in the guest.

Christian Borntraeger: moved common code in virt/kvm/

Signed-off-by: Konstantin Weitz <WEITZKON@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86: Run PIT work in own kthread
Jan Kiszka [Tue, 24 Apr 2012 14:40:17 +0000 (16:40 +0200)]
KVM: x86: Run PIT work in own kthread

We can't run PIT IRQ injection work in the interrupt context of the host
timer. This would allow the user to influence the handler complexity by
asking for a broadcast to a large number of VCPUs. Therefore, this work
was pushed into workqueue context in 9d244caf2e. However, this prevents
prioritizing the PIT injection over other task as workqueues share
kernel threads.

This replaces the workqueue with a kthread worker and gives that thread
a name in the format "kvm-pit/<owner-process-pid>". That allows to
identify and adjust the kthread priority according to the VM process
parameters.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86: Document in-kernel PIT API
Jan Kiszka [Tue, 24 Apr 2012 14:40:16 +0000 (16:40 +0200)]
KVM: x86: Document in-kernel PIT API

Add descriptions for KVM_CREATE_PIT2 and KVM_GET/SET_PIT2.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: Improve readability of KVM API doc
Jan Kiszka [Tue, 24 Apr 2012 14:40:15 +0000 (16:40 +0200)]
KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86 emulator: fix asm constraint in flush_pending_x87_faults
Avi Kivity [Sun, 22 Apr 2012 12:12:50 +0000 (15:12 +0300)]
KVM: x86 emulator: fix asm constraint in flush_pending_x87_faults

'bool' wants 8-bit registers.

Reported-by: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
12 years agoKVM: Introduce bitmask for apic attention reasons
Gleb Natapov [Thu, 19 Apr 2012 11:06:29 +0000 (14:06 +0300)]
KVM: Introduce bitmask for apic attention reasons

The patch introduces a bitmap that will hold reasons apic should be
checked during vmexit. This is in a preparation for vp eoi patch
that will add one more check on vmexit. With the bitmap we can do
if(apic_attention) to check everything simultaneously which will
add zero overhead on the fast path.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
12 years agoKVM: Introduce direct MSI message injection for in-kernel irqchips
Jan Kiszka [Thu, 29 Mar 2012 19:14:12 +0000 (21:14 +0200)]
KVM: Introduce direct MSI message injection for in-kernel irqchips

Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.

By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
12 years agoKVM: add kvm_arch_para_features stub to asm-generic/kvm_para.h
Marcelo Tosatti [Fri, 20 Apr 2012 21:21:46 +0000 (18:21 -0300)]
KVM: add kvm_arch_para_features stub to asm-generic/kvm_para.h

Needed by kvm_para_has_feature().

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: ia64: fix build due to typo
Avi Kivity [Wed, 18 Apr 2012 16:23:50 +0000 (19:23 +0300)]
KVM: ia64: fix build due to typo

s/kcm/kvm/.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: Fix page-crossing MMIO
Avi Kivity [Wed, 18 Apr 2012 16:22:47 +0000 (19:22 +0300)]
KVM: Fix page-crossing MMIO

MMIO that are split across a page boundary are currently broken - the
code does not expect to be aborted by the exit to userspace for the
first MMIO fragment.

This patch fixes the problem by generalizing the current code for handling
16-byte MMIOs to handle a number of "fragments", and changes the MMIO
code to create those fragments.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoMerge branch 'linus' into queue
Marcelo Tosatti [Thu, 19 Apr 2012 20:06:26 +0000 (17:06 -0300)]
Merge branch 'linus' into queue

Merge reason: development work has dependency on kvm patches merged
upstream.

Conflicts:
Documentation/feature-removal-schedule.txt

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Thu, 19 Apr 2012 19:08:11 +0000 (12:08 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "It's like a grab bag of one liners:

  - core: fix page flip error path, reorder object teardown.
  - usb: fix the drm_usb module license.
  - i915: VT switch on SNB with non-native modes fix, and a regression
    fix from 3.3.
  - radeon: missing unreserve on SI, AGP/VRAM setup fix (fixes radeon on
    IA64, but its a generic bug), an rn50 regression from 3.3, turn off
    MSIs on rv515 (it loses rearms every so often)."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  nouveau: Set special lane map for the right chipset
  drm/radeon: fix load detect on rn50 with hardcoded EDIDs.
  drm: Releasing FBs before releasing GEM objects during drm_release
  drm/nouveau/pm: don't read/write beyond end of stack buffer
  drivers: gpu: drm: gma500: mdfld_dsi_output.h: Remove not unneeded include of version.h
  radeon: fix r600/agp when vram is after AGP (v3)
  drm: fix page_flip error handling
  drm/radeon/kms: fix the regression of DVI connector check
  drm/usb: fix module license on drm/usb layer.
  drm/i915: Do not set "Enable Panel Fitter" on SNB pageflips
  drm/i915: Hold mode_config lock whilst changing mode for lastclose()
  drm/radeon/si: add missing radeon_bo_unreserve in si_rlc_init() v2
  drm/radeon: disable MSI on RV515
  drm/i915: don't clobber the special upscaling lvds timings

12 years agoMerge git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 19 Apr 2012 17:28:59 +0000 (10:28 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: lock slots_lock around device assignment
  KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context
  KVM: unmap pages from the iommu when slots are removed
  KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset

12 years agonouveau: Set special lane map for the right chipset
Henrik Rydberg [Thu, 12 Apr 2012 22:37:00 +0000 (00:37 +0200)]
nouveau: Set special lane map for the right chipset

The refactoring of the nv50 logic, introduced in 8663bc7c, modified the
test for the special lane map used on some Apple computers with Nvidia
chipsets. The tested MBA3,1 would still boot, but resume from suspend
stopped working. This patch restores the old test, which fixes the problem.

Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm/radeon: fix load detect on rn50 with hardcoded EDIDs.
Dave Airlie [Thu, 19 Apr 2012 14:42:58 +0000 (15:42 +0100)]
drm/radeon: fix load detect on rn50 with hardcoded EDIDs.

When the force changes went in back in 3.3.0, we ended up returning
disconnected in the !force case, and the connected in when forced,
as it hit the hardcoded check.

Fix it so all exits go via the hardcoded check and stop spurious
modesets on platforms with hardcoded EDIDs.

Reported-by: Evan McNabb (Red Hat)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm: Releasing FBs before releasing GEM objects during drm_release
Prathyush [Sat, 14 Apr 2012 11:52:13 +0000 (17:22 +0530)]
drm: Releasing FBs before releasing GEM objects during drm_release

During DRM release, all the FBs and gem objects are released. If
a gem object is being used as a FB and set to a crtc, it must not
be freed before releasing the framebuffer first.

If FBs are released first, the crtc using the FB is disabled first
so now the GEM object can be freed safely. The CRTC will be enabled
again when the driver restores fbdev mode.

Signed-off-by: Prathyush K <prathyush.k@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm/nouveau/pm: don't read/write beyond end of stack buffer
Jim Meyering [Tue, 17 Apr 2012 19:27:54 +0000 (21:27 +0200)]
drm/nouveau/pm: don't read/write beyond end of stack buffer

NUL-terminate after strncpy.

If the parameter "profile" has length 16 or more, then strncpy
leaves "string" with no NUL terminator, so the following search
for '\n' may read beyond the end of that 16-byte buffer.
If it finds a newline there, then it will also write beyond the
end of that stack buffer.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrivers: gpu: drm: gma500: mdfld_dsi_output.h: Remove not unneeded include of version.h
Marcos Paulo de Souza [Wed, 18 Apr 2012 04:30:02 +0000 (01:30 -0300)]
drivers: gpu: drm: gma500: mdfld_dsi_output.h: Remove not unneeded include of version.h

The output of "make versioncheck" points a incorrect include of
version.h in the drivers/gpu/drm/gma500/mdfld_dsi_output.h:

drivers/gpu/drm/gma500/mdfld_dsi_output.h: 32 linux/version.h not needed.

If we take a look in the file, we can agree to remove it.

Cc: David Airlie <airlied@linux.ie>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agoradeon: fix r600/agp when vram is after AGP (v3)
Jerome Glisse [Tue, 17 Apr 2012 20:51:38 +0000 (16:51 -0400)]
radeon: fix r600/agp when vram is after AGP (v3)

If AGP is placed in the middle, the size_af is off-by-one, it results
in VRAM being placed at 0x7fffffff instead of 0x8000000.

v2: fix the vram_start setup.
v3: also fix r7xx & newer ASIC

Reported-by: russiane39 on #radeon
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm: fix page_flip error handling
Joonyoung Shim [Wed, 18 Apr 2012 04:47:02 +0000 (13:47 +0900)]
drm: fix page_flip error handling

Free event and restore event_space only when page_flip->flags has
DRM_MODE_PAGE_FLIP_EVENT if page_flip() is failed.

Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agoMerge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel...
Dave Airlie [Thu, 19 Apr 2012 13:13:52 +0000 (14:13 +0100)]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Do not set "Enable Panel Fitter" on SNB pageflips
  drm/i915: Hold mode_config lock whilst changing mode for lastclose()
  drm/i915: don't clobber the special upscaling lvds timings

12 years agodrm/radeon/kms: fix the regression of DVI connector check
Takashi Iwai [Wed, 18 Apr 2012 13:21:07 +0000 (15:21 +0200)]
drm/radeon/kms: fix the regression of DVI connector check

The check of the encoder type in the commit [e00e8b5e: drm/radeon/kms:
fix analog load detection on DVI-I connectors] is obviously wrong, and
it's the culprit of the regression on my workstation with DVI-analog
connection resulting in the blank output.

Fixed the typo now.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm/usb: fix module license on drm/usb layer.
Dave Airlie [Thu, 19 Apr 2012 08:33:32 +0000 (09:33 +0100)]
drm/usb: fix module license on drm/usb layer.

Allows this module to load correctly with certain debugging options on.

Reported on irc by scientes
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agomemcg: fix Bad page state after replace_page_cache
Hugh Dickins [Thu, 19 Apr 2012 06:34:46 +0000 (23:34 -0700)]
memcg: fix Bad page state after replace_page_cache

My 9ce70c0240d0 "memcg: fix deadlock by inverting lrucare nesting" put a
nasty little bug into v3.3's version of mem_cgroup_replace_page_cache(),
sometimes used for FUSE.  Replacing __mem_cgroup_commit_charge_lrucare()
by __mem_cgroup_commit_charge(), I used the "pc" pointer set up earlier:
but it's for oldpage, and needs now to be for newpage.  Once oldpage was
freed, its PageCgroupUsed bit (cleared above but set again here) caused
"Bad page state" messages - and perhaps worse, being missed from newpage.
(I didn't find this by using FUSE, but in reusing the function for tmpfs.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [v3.3 only]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Thu, 19 Apr 2012 03:16:02 +0000 (20:16 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security: fix compile error in commoncap.c

12 years agoKVM: lock slots_lock around device assignment
Alex Williamson [Wed, 18 Apr 2012 03:46:44 +0000 (21:46 -0600)]
KVM: lock slots_lock around device assignment

As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu.  This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots.  Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings.  Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agosecurity: fix compile error in commoncap.c
Jonghwan Choi [Wed, 18 Apr 2012 21:23:04 +0000 (17:23 -0400)]
security: fix compile error in commoncap.c

Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
12 years agoKVM: VMX: Fix kvm_set_shared_msr() called in preemptible context
Avi Kivity [Wed, 18 Apr 2012 12:03:04 +0000 (15:03 +0300)]
KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context

kvm_set_shared_msr() may not be called in preemptible context,
but vmx_set_msr() does so:

  BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713
  caller is kvm_set_shared_msr+0x32/0xa0 [kvm]
  Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39
  Call Trace:
   [<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100
   [<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm]
   [<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel]
   ...

Making kvm_set_shared_msr() work in preemptible is cleaner, but
it's used in the fast path.  Making two variants is overkill, so
this patch just disables preemption around the call.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: MMU: use page table level macro
Davidlohr Bueso [Wed, 18 Apr 2012 10:24:29 +0000 (12:24 +0200)]
KVM: MMU: use page table level macro

Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi...
Linus Torvalds [Thu, 19 Apr 2012 00:29:05 +0000 (17:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: use flexible array in fuse.h
  fuse: allow nanosecond granularity
  fuse: O_DIRECT support for files
  fuse: fix nlink after unlink

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 19 Apr 2012 00:27:50 +0000 (17:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 updates from Martin Schwidefsky:
 "A couple of bug fixes, one of them is a TLB flush fix.  Included as
  well is one small coding style patch and a patch to update the default
  configuration."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  [S390] Fix compile error in swab.h
  [S390] Fix stfle() lowcore protection problem
  [S390] cpum_cf: get rid of compile warnings
  [S390] irq: simple coding style change
  [S390] update default configuration
  [S390] fix tlb flushing for page table pages
  [S390] kernel: Use local_irq_save() for memcpy_real()
  [S390] s390/char/vmur.c: fix memory leak
  [S390] drivers/s390/block/dasd_eckd.c: add missing dasd_sfree_request

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Wed, 18 Apr 2012 20:23:44 +0000 (13:23 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull security subsystem fixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  fcaps: clear the same personality flags as suid when fcaps are used
  mpi: Avoid using freed pointer in mpi_lshift_limbs()
  Smack: move label list initialization

12 years agoxz: Enable BCJ filters on SPARC and 32-bit x86
Lasse Collin [Wed, 18 Apr 2012 16:55:44 +0000 (19:55 +0300)]
xz: Enable BCJ filters on SPARC and 32-bit x86

The BCJ filters were meant to be enabled already on these
archs, but the xz_wrap.sh script was buggy. Enabling the
filters should give smaller kernel images.

xz_wrap.sh will now use $SRCARCH instead of $ARCH to detect
the architecture. That way it doesn't need to care about the
subarchs (like i386 vs. x86_64) since the BCJ filters don't
care either.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Wed, 18 Apr 2012 19:58:29 +0000 (12:58 -0700)]
Merge tag 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

Pull libara fixes from Jeff Garzik:

 - Notable regression fix.  Forbid dynamic runtime power management by
   default, due to issues with suspend/resume and hotplug.

   To re-enable, use sysfs.

 - make ata_print_id atomic, due to ref from multiple contexts

 - sata_mv warning fix

 - ata_piix new PCI ID

* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: forbid port runtime pm by default, fixing regression
  libata: make ata_print_id atomic
  sata_mv: silence an uninitialized variable warning
  ata_piix: IDE-mode SATA patch for Intel DH89xxCC DeviceIDs

12 years agolibata: forbid port runtime pm by default, fixing regression
Lin Ming [Wed, 18 Apr 2012 01:29:47 +0000 (09:29 +0800)]
libata: forbid port runtime pm by default, fixing regression

Forbid port runtime pm by default because it has known hotplug issue.
User can allow it by, for example

echo auto > /sys/devices/pci0000:00/0000:00:1f.2/ata2/power/control

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
12 years agoRevert "ACPI: Make ACPI interrupt threaded"
Thomas Gleixner [Wed, 18 Apr 2012 10:29:32 +0000 (12:29 +0200)]
Revert "ACPI: Make ACPI interrupt threaded"

This reverts commit 6fe0d0628245fdcd6fad8b837c81e8f7ebc3364d.

Paul bisected this regression.

The conversion was done blindly and is wrong, as it does not provide a
primary handler to disable the level type irq on the device level.
Neither does it set the IRQF_ONESHOT flag which handles that at the irq
line level.  This can't be done as the interrupt might be shared, though
we might extend the core to force it.

So an interrupt on this line will wake up the thread, but immediately
unmask the irq after that.  Due to the interrupt being level type the
hardware interrupt is raised over and over and prevents the irq thread
from handling it.  Fail.

request_irq() unfortunately does not refuse such a request and the patch
was obviously never tested with real interrupts.

Bisected-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodrm/i915: Do not set "Enable Panel Fitter" on SNB pageflips
Chris Wilson [Tue, 17 Apr 2012 19:37:00 +0000 (20:37 +0100)]
drm/i915: Do not set "Enable Panel Fitter" on SNB pageflips

Not only do the pageflip work without it at non-native modes (i.e. with
the panel fitter enabled), it also causes normal (non-pageflipped)
modesets to fail.

Reported-by: Adam Jackson <ajax@redhat.com>
Tested-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Wanted-by-for-fixes: Dave Airlie <airlied@gmail.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
12 years agofcaps: clear the same personality flags as suid when fcaps are used
Eric Paris [Tue, 17 Apr 2012 20:26:54 +0000 (16:26 -0400)]
fcaps: clear the same personality flags as suid when fcaps are used

If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
12 years agompi: Avoid using freed pointer in mpi_lshift_limbs()
Jesper Juhl [Mon, 6 Feb 2012 09:07:04 +0000 (20:07 +1100)]
mpi: Avoid using freed pointer in mpi_lshift_limbs()

At the start of the function we assign 'a->d' to 'ap'. Then we use the
RESIZE_IF_NEEDED macro on 'a' - this may free 'a->d' and replace it
with newly allocaetd storage. In that case, we'll be operating on
freed memory further down in the function when we index into 'ap[]'.
Since we don't actually need 'ap' until after the use of the
RESIZE_IF_NEEDED macro we can just delay the assignment to it until
after we've potentially resized, thus avoiding the issue.

While I was there anyway I also changed the integer variable 'n' to be
const. It might as well be since we only assign to it once and use it
as a constant, and then the compiler will tell us if we ever assign to
it in the future.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
12 years agoSmack: move label list initialization
Casey Schaufler [Wed, 18 Apr 2012 01:55:46 +0000 (18:55 -0700)]
Smack: move label list initialization

A kernel with Smack enabled will fail if tmpfs has xattr support.

Move the initialization of predefined Smack label
list entries to the LSM initialization from the
smackfs setup. This became an issue when tmpfs
acquired xattr support, but was never correct.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
12 years agoMerge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Tue, 17 Apr 2012 20:30:34 +0000 (13:30 -0700)]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 regression fixes from Ted Ts'o:
 "This fixes a scalability problem reported by Andi Kleen and Tim Chen;
  they were quite secretive about the precise nature of their workload,
  but they later admitted that it only showed up when they were using a
  large sparse file, so the amount of data I/O that was needed was close
  to zero.

  I'm not sure how realistic this is and it's only a regression if you
  consider changes made since 2.6.39 to be a "regression" vis-a-vis the
  policy regarding post-merge window bug fixes, but Linus agreed it was
  worth fixing, so I'm including it in this pull request.

  This also fixes the journalled quota mount options, which I
  accidentally broke while I was cleaning up the mount option handling."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix handling of journalled quota options
  ext4: address scalability issue by removing extent cache statistics

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Tue, 17 Apr 2012 20:21:50 +0000 (13:21 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "A bunch of endianness fixes and a couple of nfsd error value fixes.

  Speaking of endianness stuff, I'm rather tempted to slap

ccflags-y += -D__CHECK_ENDIAN__

  in fs/Makefile, if not making it default for the entire tree; nfsd
  regressions I've caught make one hell of a pile and we'd obviously
  benefit from having that kind of stuff caught earlier..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  lockd: fix the endianness bug
  ocfs2: ->e_leaf_clusters endianness breakage
  ocfs2: ->rl_count endianness breakage
  ocfs: ->rl_used breakage on big-endian
  ocfs2: ->l_next_free_req breakage on big-endian
  btrfs: btrfs_root_readonly() broken on big-endian
  ext4: fix endianness breakage in ext4_split_extent_at()
  nfsd: fix compose_entry_fh() failure exits
  nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()
  nfsd: fix endianness breakage in TEST_STATEID handling
  nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails
  nfsd: fix b0rken error value for setattr on read-only mount

12 years agoMerge git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Tue, 17 Apr 2012 16:19:29 +0000 (09:19 -0700)]
Merge git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  Fix number parsing in cifs_parse_mount_options
  Cleanup handling of NULL value passed for a mount option

12 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 17 Apr 2012 01:35:21 +0000 (18:35 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Handle failures of parsing immediate operands in the instruction decoder
  perf archive: Correct cutting of symbolic link
  perf tools: Ignore auto-generated bison/flex files
  perf tools: Fix parsers' rules to dependencies
  perf tools: fix NO_GTK2 Makefile config error
  perf session: Skip event correctly for unknown id/machine

12 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Tue, 17 Apr 2012 01:34:12 +0000 (18:34 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio fixes from Michael S. Tsirkin:
 "Here are some virtio fixes for 3.4: a test build fix, a patch by Ren
  fixing naming for systems with a massive number of virtio blk devices,
  and balloon fixes for powerpc by David Gibson.

  There was some discussion about Ren's patch for virtio disc naming:
  some people wanted to move the legacy name mangling function to the
  block core.  But there's no concensus on that yet, and we can always
  deduplicate later.  Added comments in the hope that this will stop
  people from copying this legacy naming scheme into future drivers."

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: fix handling of PAGE_SIZE != 4k
  virtio_balloon: Fix endian bug
  virtio_blk: helper function to format disk names
  tools/virtio: fix up vhost/test module build

12 years agoPCI: Retry BARs restoration for Type 0 headers only
Rafael J. Wysocki [Mon, 16 Apr 2012 21:07:50 +0000 (23:07 +0200)]
PCI: Retry BARs restoration for Type 0 headers only

Some shortcomings introduced into pci_restore_state() by commit
26f41062f28d ("PCI: check for pci bar restore completion and retry")
have been fixed by recent commit ebfc5b802fa76 ("PCI: Fix regression in
pci_restore_state(), v3"), but that commit treats all PCI devices as
those with Type 0 configuration headers.

That is not entirely correct, because Type 1 and Type 2 headers have
different layouts.  In particular, the area occupied by BARs in Type 0
config headers contains the secondary status register in Type 1 ones and
it doesn't make sense to retry the restoration of that register even if
the value read back from it after a write is not the same as the written
one (it very well may be different).

For this reason, make pci_restore_state() only retry the restoration
of BARs for Type 0 config headers.  This effectively makes it behave
as before commit 26f41062f28d for all header types except for Type 0.

Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoDocumentation: maintainer change
Randy Dunlap [Tue, 17 Apr 2012 02:21:39 +0000 (19:21 -0700)]
Documentation: maintainer change

I'm dropping off as Documentation/ maintainer.
Rob Landley has agreed to take it over.  Thanks, Rob.

I'll still be around reviewing patches and testing.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoia64: fix futex_atomic_cmpxchg_inatomic()
Luck, Tony [Mon, 16 Apr 2012 23:28:01 +0000 (16:28 -0700)]
ia64: fix futex_atomic_cmpxchg_inatomic()

Michel Lespinasse cleaned up the futex calling conventions in commit
37a9d912b24f ("futex: Sanitize cmpxchg_futex_value_locked API").

But the ia64 implementation was subtly broken.  Gcc does not know that
register "r8" will be updated by the fault handler if the cmpxchg
instruction takes an exception.  So it feels safe in letting the
initialization of r8 slide to after the cmpxchg.  Result: we always
return 0 whether the user address faulted or not.

Fix by moving the initialization of r8 into the __asm__ code so gcc
won't move it.

Reported-by: <emeric.maschino@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42757
Tested-by: <emeric.maschino@gmail.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoKVM: dont clear TMR on EOI
Michael S. Tsirkin [Wed, 11 Apr 2012 15:49:55 +0000 (18:49 +0300)]
KVM: dont clear TMR on EOI

Intel spec says that TMR needs to be set/cleared
when IRR is set, but kvm also clears it on  EOI.

I did some tests on a real (AMD based) system,
and I see same TMR values both before
and after EOI, so I think it's a minor bug in kvm.

This patch fixes TMR to be set/cleared on IRR set
only as per spec.

And now that we don't clear TMR, we can save
an atomic read of TMR on EOI that's not propagated
to ioapic, by checking whether ioapic needs
a specific vector first and calculating
the mode afterwards.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86 emulator: implement MMX MOVQ (opcodes 0f 6f, 0f 7f)
Avi Kivity [Mon, 9 Apr 2012 15:40:03 +0000 (18:40 +0300)]
KVM: x86 emulator: implement MMX MOVQ (opcodes 0f 6f, 0f 7f)

Needed by some framebuffer drivers.  See

https://bugzilla.kernel.org/show_bug.cgi?id=42779

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86 emulator: MMX support
Avi Kivity [Mon, 9 Apr 2012 15:40:02 +0000 (18:40 +0300)]
KVM: x86 emulator: MMX support

General support for the MMX instruction set.  Special care is taken
to trap pending x87 exceptions so that they are properly reflected
to the guest.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86 emulator: implement movntps
Avi Kivity [Mon, 9 Apr 2012 15:40:01 +0000 (18:40 +0300)]
KVM: x86 emulator: implement movntps

Used to write to framebuffers (by at least Icaros).

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86: emulate movdqa
Stefan Hajnoczi [Mon, 9 Apr 2012 15:40:00 +0000 (18:40 +0300)]
KVM: x86: emulate movdqa

An Ubuntu 9.10 Karmic Koala guest is unable to boot or install due to
missing movdqa emulation:

kvm_exit: reason EXCEPTION_NMI rip 0x7fef3e025a7b info 7fef3e799000 80000b0e
kvm_page_fault: address 7fef3e799000 error_code f
kvm_emulate_insn: 0:7fef3e025a7b: 66 0f 7f 07 (prot64)

movdqa %xmm0,(%rdi)

[avi: mark it explicitly aligned]

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: x86 emulator: add support for vector alignment
Avi Kivity [Mon, 9 Apr 2012 15:39:59 +0000 (18:39 +0300)]
KVM: x86 emulator: add support for vector alignment

x86 defines three classes of vector instructions: explicitly
aligned (#GP(0) if unaligned, explicitly unaligned, and default
(which depends on the encoding: AVX is unaligned, SSE is aligned).

Add support for marking an instruction as explicitly aligned or
unaligned, and mark MOVDQU as unaligned.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoKVM: SVM: Auto-load on CPUs with SVM
Josh Triplett [Wed, 28 Mar 2012 18:32:28 +0000 (11:32 -0700)]
KVM: SVM: Auto-load on CPUs with SVM

Enable x86 feature-based autoloading for the kvm-amd module on CPUs
with X86_FEATURE_SVM.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
12 years agoext4: fix handling of journalled quota options
Theodore Ts'o [Mon, 16 Apr 2012 22:55:26 +0000 (18:55 -0400)]
ext4: fix handling of journalled quota options

Commit 26092bf5 broke handling of journalled quota mount options by
trying to parse argument of every mount option as a number.  Fix this
by dealing with the quota options before we call match_int().

Thanks to Jan Kara for discovering this regression.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
12 years agocheckpatch: revert --strict test for net/ and drivers/net block comment style
Joe Perches [Mon, 16 Apr 2012 19:35:11 +0000 (13:35 -0600)]
checkpatch: revert --strict test for net/ and drivers/net block comment style

Revert the --strict test for the old preferred block
comment style in drivers/net and net/

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agodrm/i915: Hold mode_config lock whilst changing mode for lastclose()
Chris Wilson [Mon, 16 Apr 2012 14:16:42 +0000 (15:16 +0100)]
drm/i915: Hold mode_config lock whilst changing mode for lastclose()

Upon lastclose(), we switch back to the fbcon configuration. This
requires taking the mode_config lock in order to serialise the change
with output probing elsewhere.

Reported-by: Oleksij Rempel <bug-track@fisher-privat.net>
References: https://bugs.freedesktop.org/show_bug.cgi?id=48652
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
12 years agoext4: address scalability issue by removing extent cache statistics
Theodore Ts'o [Mon, 16 Apr 2012 16:16:20 +0000 (12:16 -0400)]
ext4: address scalability issue by removing extent cache statistics

Andi Kleen and Tim Chen have reported that under certain circumstances
the extent cache statistics are causing scalability problems due to
cache line bounces.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
12 years agodrm/radeon/si: add missing radeon_bo_unreserve in si_rlc_init() v2
Alex Deucher [Fri, 13 Apr 2012 14:26:36 +0000 (10:26 -0400)]
drm/radeon/si: add missing radeon_bo_unreserve in si_rlc_init() v2

Forget to unreserve after pinning.  This can lead to problems in
soft reset and resume.

v2: rework patch as per Michel's suggestion.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm/radeon: disable MSI on RV515
Dave Airlie [Fri, 13 Apr 2012 10:14:50 +0000 (11:14 +0100)]
drm/radeon: disable MSI on RV515

My rv515 card is very flaky with msi enabled. Every so often it loses a rearm
and never comes back, manually banging the rearm brings it back.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agodrm/i915: don't clobber the special upscaling lvds timings
Daniel Vetter [Sun, 15 Apr 2012 17:53:19 +0000 (19:53 +0200)]
drm/i915: don't clobber the special upscaling lvds timings

This regression has been introduced in

commit ca9bfa7eed20ea34e862804e62aae10eb159edbb
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Jan 28 14:49:20 2012 +0100

    drm/i915: fixup interlaced vertical timings confusion, part 1

Unfortunately that commit failed to take into account that the lvds
code does some special adjustements to the crtc timings for upscaling
an centering.

Fix this by explicitly computing crtc timings in the lvds mode fixup
function and setting a special flag in mode->private_flags if the crtc
timings have been adjusted.

v2: Add a comment to explain the new mode driver private flag,
suggested by Eugeni Dodonov.

v3: Kill the confusing and now redundant set_crtcinfo call in
intel_fixed_panel_mode, noticed by Chris Wilson.

Reported-and-Tested-by: Hans de Bruin <jmdebruin@xmsnet.nl>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43071
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
12 years agox86: Handle failures of parsing immediate operands in the instruction decoder
Masami Hiramatsu [Fri, 13 Apr 2012 03:24:27 +0000 (12:24 +0900)]
x86: Handle failures of parsing immediate operands in the instruction decoder

This can happen if the instruction is much longer than the maximum length,
or if insn->opnd_bytes is manually changed.

This patch also fixes warnings from -Wswitch-default flag.

Reported-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120413032427.32577.42602.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
12 years agoLinux 3.4-rc3 v3.4-rc3
Linus Torvalds [Mon, 16 Apr 2012 01:28:29 +0000 (18:28 -0700)]
Linux 3.4-rc3

12 years agoMerge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Linus Torvalds [Mon, 16 Apr 2012 00:35:19 +0000 (17:35 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
 "Nothing too disasterous, the biggest thing being the removal of the
  regulator support for vcore in the AMBA driver; only one SoC was using
  this and it got broken during the last merge window, which then
  started causing problems for other people.  Mutual agreement was
  reached for it to be removed."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7386/1: jump_label: fixup for rename to static_key
  ARM: 7384/1: ThumbEE: Disable userspace TEEHBR access for !CONFIG_ARM_THUMBEE
  ARM: 7382/1: mm: truncate memory banks to fit in 4GB space for classic MMU
  ARM: 7359/2: smp_twd: Only wait for reprogramming on active cpus
  ARM: 7383/1: nommu: populate vectors page from paging_init
  ARM: 7381/1: nommu: fix typo in mm/Kconfig
  ARM: 7380/1: DT: do not add a zero-sized memory property
  ARM: 7379/1: DT: fix atags_to_fdt() second call site
  ARM: 7366/3: amba: Remove AMBA level regulator support
  ARM: 7377/1: vic: re-read status register before dispatching each IRQ handler
  ARM: 7368/1: fault.c: correct how the tsk->[maj|min]_flt gets incremented

12 years agox86-32: fix up strncpy_from_user() sign error
Linus Torvalds [Mon, 16 Apr 2012 00:23:00 +0000 (17:23 -0700)]
x86-32: fix up strncpy_from_user() sign error

The 'max' range needs to be unsigned, since the size of the user address
space is bigger than 2GB.

We know that 'count' is positive in 'long' (that is checked in the
caller), so we will truncate 'max' down to something that fits in a
signed long, but before we actually do that, that comparison needs to be
done in unsigned.

Bug introduced in commit 92ae03f2ef99 ("x86: merge 32/64-bit versions of
'strncpy_from_user()' and speed it up").  On x86-64 you can't trigger
this, since the user address space is much smaller than 63 bits, and on
x86-32 it works in practice, since you would seldom hit the strncpy
limits anyway.

I had actually tested the corner-cases, I had only tested them on
x86-64.  Besides, I had only worried about the case of a pointer *close*
to the end of the address space, rather than really far away from it ;)

This also changes the "we hit the user-specified maximum" to return
'res', for the trivial reason that gcc seems to generate better code
that way.  'res' and 'count' are the same in that case, so it really
doesn't matter which one we return.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoARM: 7386/1: jump_label: fixup for rename to static_key
Rabin Vincent [Sat, 14 Apr 2012 20:51:32 +0000 (21:51 +0100)]
ARM: 7386/1: jump_label: fixup for rename to static_key

c5905afb0 ("static keys: Introduce 'struct static_key'...") renamed
struct jump_label_key to struct static_key.  Fixup ARM for this to
eliminate these build warnings:

  include/linux/jump_label.h:113:2:
  warning: passing argument 1 of 'arch_static_branch' from incompatible pointer type
  include/asm/jump_label.h:17:82:
  note: expected 'struct jump_label_key *' but argument is of type 'struct static_key *'

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
12 years agoARM: 7384/1: ThumbEE: Disable userspace TEEHBR access for !CONFIG_ARM_THUMBEE
Jonathan Austin [Thu, 12 Apr 2012 16:45:25 +0000 (17:45 +0100)]
ARM: 7384/1: ThumbEE: Disable userspace TEEHBR access for !CONFIG_ARM_THUMBEE

Currently when ThumbEE is not enabled (!CONFIG_ARM_THUMBEE) the ThumbEE
register states are not saved/restored at context switch. The default state
of the ThumbEE Ctrl register (TEECR) allows userspace accesses to the
ThumbEE Base Handler register (TEEHBR). This can cause unexpected behaviour
when people use ThumbEE on !CONFIG_ARM_THUMBEE kernels, as well as allowing
covert communication - eg between userspace tasks running inside chroot
jails.

This patch sets up TEECR in order to prevent user-space access to TEEHBR
when !CONFIG_ARM_THUMBEE. In this case, tasks are sent SIGILL if they try to
access TEEHBR.

Cc: stable@vger.kernel.org
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
12 years agoARM: 7382/1: mm: truncate memory banks to fit in 4GB space for classic MMU
Will Deacon [Thu, 12 Apr 2012 16:15:08 +0000 (17:15 +0100)]
ARM: 7382/1: mm: truncate memory banks to fit in 4GB space for classic MMU

If a bank of memory spanning the 4GB boundary is added on a !CONFIG_LPAE
kernel then we will hang early during boot since the memory bank will
have wrapped around to zero.

This patch truncates memory banks for !LPAE configurations when the end
address is not representable in 32 bits.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>