pandora-kernel.git
13 years agoKVM: MMU: calculate correct gfn for small host pages backing large guest pages
Lai Jiangshan [Wed, 26 May 2010 08:48:19 +0000 (16:48 +0800)]
KVM: MMU: calculate correct gfn for small host pages backing large guest pages

In Documentation/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in function FNAME(fetch)(), sp->gfn is incorrect when one of following
situations occurred:

 1) guest is 32bit paging and the guest PDE maps a 4-MByte page
    (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant.

    And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);"
    is incorrect.

 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page
    (backed by 4k or 2M host pages).

So we fix it to suit to the document and suit to the code which
requires sp->gfn correct when sp->role.direct=1.

We use the goal mapping gfn(gw->gfn) to calculate the base page frame
for linear translations, it is simple and easy to be understood.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Calculate correct base gfn for direct non-DIR level
Lai Jiangshan [Wed, 26 May 2010 08:48:25 +0000 (16:48 +0800)]
KVM: MMU: Calculate correct base gfn for direct non-DIR level

In Document/kvm/mmu.txt:
  gfn:
    Either the guest page table containing the translations shadowed by this
    page, or the base page frame for linear translations. See role.direct.

But in __direct_map(), the base gfn calculation is incorrect,
it does not calculate correctly when level=3 or 4.

Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Don't allocate gfns page for direct mmu pages
Lai Jiangshan [Wed, 26 May 2010 08:49:59 +0000 (16:49 +0800)]
KVM: MMU: Don't allocate gfns page for direct mmu pages

When sp->role.direct is set, sp->gfns does not contain any essential
information, leaf sptes reachable from this sp are for a continuous
guest physical memory range (a linear range).
So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL)
Obviously, it is not essential information, we can calculate it when need.

It means we don't need sp->gfns when sp->role.direct=1,
Thus we can save one page usage for every kvm_mmu_page.

Note:
  Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn()
  or kvm_mmu_page_set_gfn().
  It is only exposed in FNAME(sync_page).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Add constant for invalid guest state exit reason
Mohammed Gamal [Sun, 23 May 2010 22:01:04 +0000 (01:01 +0300)]
KVM: VMX: Add constant for invalid guest state exit reason

For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: allow more page become unsync at getting sp time
Xiao Guangrong [Mon, 24 May 2010 07:41:33 +0000 (15:41 +0800)]
KVM: MMU: allow more page become unsync at getting sp time

Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level > 1), we should unsync all
gfn's unsync page

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: allow more page become unsync at gfn mapping time
Xiao Guangrong [Mon, 24 May 2010 07:40:07 +0000 (15:40 +0800)]
KVM: MMU: allow more page become unsync at gfn mapping time

In current code, shadow page can become asynchronous only if one
shadow page for a gfn, this rule is too strict, in fact, we can
let all last mapping page(i.e, it's the pte page) become unsync,
and sync them at invlpg or flush tlb time.

This patch allow more page become asynchronous at gfn mapping time

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Update Red Hat copyrights
Avi Kivity [Sun, 23 May 2010 15:37:00 +0000 (18:37 +0300)]
KVM: Update Red Hat copyrights

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: correctly trace irq injection
Gleb Natapov [Sun, 23 May 2010 11:28:26 +0000 (14:28 +0300)]
KVM: SVM: correctly trace irq injection

On SVM interrupts are injected by svm_set_irq() not svm_inject_irq().
The later is used only to wait for irq window.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: only update unsync page in invlpg path
Xiao Guangrong [Sat, 15 May 2010 10:53:35 +0000 (18:53 +0800)]
KVM: MMU: only update unsync page in invlpg path

Only unsync pages need updated at invlpg time since other shadow
pages are write-protected

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: don't write-protect if have new mapping to unsync page
Xiao Guangrong [Sat, 15 May 2010 10:52:34 +0000 (18:52 +0800)]
KVM: MMU: don't write-protect if have new mapping to unsync page

Two cases maybe happen in kvm_mmu_get_page() function:

- one case is, the goal sp is already in cache, if the sp is unsync,
  we only need update it to assure this mapping is valid, but not
  mark it sync and not write-protect sp->gfn since it not broke unsync
  rule(one shadow page for a gfn)

- another case is, the goal sp not existed, we need create a new sp
  for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
  we should sync(mark sync and write-protect) gfn's unsync shadow page.
  After enabling multiple unsync shadows, we sync those shadow pages
  only when the new sp not allow to become unsync(also for the unsyc
  rule, the new rule is: allow all pte page become unsync)

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: split kvm_sync_page() function
Xiao Guangrong [Sat, 15 May 2010 10:51:24 +0000 (18:51 +0800)]
KVM: MMU: split kvm_sync_page() function

Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient()
to clarify the code address Avi's suggestion

kvm_sync_page_transient() function only update shadow page but not mark
it sync and not write protect sp->gfn. it will be used by later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Use FPU API
Sheng Yang [Mon, 17 May 2010 09:08:28 +0000 (17:08 +0800)]
KVM: x86: Use FPU API

Convert KVM to use generic FPU API.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Use unlazy_fpu() for host FPU
Sheng Yang [Mon, 17 May 2010 09:08:27 +0000 (17:08 +0800)]
KVM: x86: Use unlazy_fpu() for host FPU

We can avoid unnecessary fpu load when userspace process
didn't use FPU frequently.

Derived from Avi's idea.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agox86: Export FPU API for KVM use
Sheng Yang [Mon, 17 May 2010 09:22:23 +0000 (17:22 +0800)]
x86: Export FPU API for KVM use

Also add some constants.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Consolidate arch specific vcpu ioctl locking
Avi Kivity [Thu, 13 May 2010 09:35:17 +0000 (12:35 +0300)]
KVM: Consolidate arch specific vcpu ioctl locking

Now that all arch specific ioctls have centralized locking, it is easy to
move it to the central dispatcher.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: PPC: Centralize locking of arch specific vcpu ioctls
Avi Kivity [Thu, 13 May 2010 09:30:43 +0000 (12:30 +0300)]
KVM: PPC: Centralize locking of arch specific vcpu ioctls

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: s390: Centrally lock arch specific vcpu ioctls
Avi Kivity [Thu, 13 May 2010 09:21:46 +0000 (12:21 +0300)]
KVM: s390: Centrally lock arch specific vcpu ioctls

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: Lock arch specific vcpu ioctls centrally
Avi Kivity [Thu, 13 May 2010 08:53:06 +0000 (11:53 +0300)]
KVM: x86: Lock arch specific vcpu ioctls centrally

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: move vcpu locking to dispatcher for generic vcpu ioctls
Avi Kivity [Thu, 13 May 2010 08:25:04 +0000 (11:25 +0300)]
KVM: move vcpu locking to dispatcher for generic vcpu ioctls

All vcpu ioctls need to be locked, so instead of locking each one specifically
we lock at the generic dispatcher.

This patch only updates generic ioctls and leaves arch specific ioctls alone.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: cleanup unused local variable
Xiao Guangrong [Thu, 13 May 2010 02:09:57 +0000 (10:09 +0800)]
KVM: x86: cleanup unused local variable

fix:
 arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’:
 arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page
Xiao Guangrong [Thu, 13 May 2010 02:08:08 +0000 (10:08 +0800)]
KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page

sp->gfns[] contain unaliased gfns, but gpte might contain pointer
to aliased region.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: remove rmap before clear spte
Xiao Guangrong [Thu, 13 May 2010 02:07:00 +0000 (10:07 +0800)]
KVM: MMU: remove rmap before clear spte

Remove rmap before clear spte otherwise it will trigger BUG_ON() in
some functions such as rmap_write_protect().

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: use proper cache object freeing function
Xiao Guangrong [Thu, 13 May 2010 02:06:02 +0000 (10:06 +0800)]
KVM: MMU: use proper cache object freeing function

Use kmem_cache_free to free objects allocated by kmem_cache_alloc.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq
Alex Williamson [Wed, 12 May 2010 13:46:31 +0000 (09:46 -0400)]
KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq

Remove this check in an effort to allow kvm guests to run without
root privileges.  This capability check doesn't seem to add any
security since the device needs to have already been added via the
assign device ioctl and the io actually occurs through the pci
sysfs interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Only reset MMU when necessary
Sheng Yang [Wed, 12 May 2010 08:40:42 +0000 (16:40 +0800)]
KVM: VMX: Only reset MMU when necessary

Only modifying some bits of CR0/CR4 needs paging mode switch.

Modify EFER.NXE bit would result in reserved bit updates.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86: Clean up duplicate assignment
Sheng Yang [Wed, 12 May 2010 08:40:41 +0000 (16:40 +0800)]
KVM: x86: Clean up duplicate assignment

mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the
destory_kvm_mmu().

kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to
vcpu->arch.cr4/efer, no need to do it again later.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: Add missing decoder flags for xor instructions
Mohammed Gamal [Tue, 11 May 2010 22:39:22 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for xor instructions

This adds missing decoder flags for xor instructions (opcodes 0x34 - 0x35)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: Add missing decoder flags for sub instruction
Mohammed Gamal [Tue, 11 May 2010 22:39:21 +0000 (01:39 +0300)]
KVM: x86 emulator: Add missing decoder flags for sub instruction

This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d)

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)
Mohammed Gamal [Tue, 11 May 2010 19:22:40 +0000 (22:22 +0300)]
KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)

This adds test acc, imm instruction to the x86 emulator

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: pass correct parameter to kvm_mmu_free_some_pages
Marcelo Tosatti [Thu, 13 May 2010 00:00:35 +0000 (21:00 -0300)]
KVM: pass correct parameter to kvm_mmu_free_some_pages

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: VMXON/VMXOFF usage changes
Dongxiao Xu [Tue, 11 May 2010 10:29:48 +0000 (18:29 +0800)]
KVM: VMX: VMXON/VMXOFF usage changes

SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: VMCLEAR/VMPTRLD usage changes
Dongxiao Xu [Tue, 11 May 2010 10:29:45 +0000 (18:29 +0800)]
KVM: VMX: VMCLEAR/VMPTRLD usage changes

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Some minor changes to code structure
Dongxiao Xu [Tue, 11 May 2010 10:29:42 +0000 (18:29 +0800)]
KVM: VMX: Some minor changes to code structure

Do some preparations for vmm coexistence support.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: VMX: Define new functions to wrapper direct call of asm code
Dongxiao Xu [Tue, 11 May 2010 10:29:38 +0000 (18:29 +0800)]
KVM: VMX: Define new functions to wrapper direct call of asm code

Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: update mmu documetation for role.nxe
Gui Jianfeng [Tue, 11 May 2010 06:36:58 +0000 (14:36 +0800)]
KVM: update mmu documetation for role.nxe

There's no member "cr4_nxe" in struct kvm_mmu_page_role, it names "nxe" now.
Update mmu document.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: MMU: Fix free memory accounting race in mmu_alloc_roots()
Avi Kivity [Mon, 10 May 2010 09:09:56 +0000 (12:09 +0300)]
KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()

We drop the mmu lock between freeing memory and allocating the roots; this
allows some other vcpu to sneak in and allocate memory.

While the race is benign (resulting only in temporary overallocation, not oom)
it is simple and easy to fix by moving the freeing close to the allocation.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: inject #UD if instruction emulation fails and exit to userspace
Gleb Natapov [Mon, 10 May 2010 08:16:56 +0000 (11:16 +0300)]
KVM: inject #UD if instruction emulation fails and exit to userspace

Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
13 years agoKVM: Document KVM_SET_BOOT_CPU_ID
Avi Kivity [Thu, 29 Apr 2010 09:12:57 +0000 (12:12 +0300)]
KVM: Document KVM_SET_BOOT_CPU_ID

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Document KVM_SET_IDENTITY_MAP ioctl
Avi Kivity [Thu, 29 Apr 2010 09:08:56 +0000 (12:08 +0300)]
KVM: Document KVM_SET_IDENTITY_MAP ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed
Gui Jianfeng [Wed, 5 May 2010 01:03:49 +0000 (09:03 +0800)]
KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed

Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Fix debug output error in walk_addr()
Gui Jianfeng [Wed, 5 May 2010 01:58:33 +0000 (09:58 +0800)]
KVM: MMU: Fix debug output error in walk_addr()

Fix a debug output error in walk_addr

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: mark page table dirty when a pte is actually modified
Gui Jianfeng [Wed, 5 May 2010 01:09:21 +0000 (09:09 +0800)]
KVM: MMU: mark page table dirty when a pte is actually modified

Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Allow EFER.LMSLE to be set with nested svm
Joerg Roedel [Wed, 5 May 2010 14:04:44 +0000 (16:04 +0200)]
KVM: SVM: Allow EFER.LMSLE to be set with nested svm

This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: SVM: Dump vmcb contents on failed vmrun
Joerg Roedel [Wed, 5 May 2010 14:04:42 +0000 (16:04 +0200)]
KVM: SVM: Dump vmcb contents on failed vmrun

This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Get rid of KVM_REQ_KICK
Avi Kivity [Mon, 3 May 2010 13:54:48 +0000 (16:54 +0300)]
KVM: Get rid of KVM_REQ_KICK

KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation.  This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.

Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: do not inject exception directly into vcpu
Gleb Natapov [Wed, 28 Apr 2010 16:15:44 +0000 (19:15 +0300)]
KVM: x86 emulator: do not inject exception directly into vcpu

Return exception as a result of instruction emulation and handle
injection in KVM code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: move interruptibility state tracking out of emulator
Gleb Natapov [Wed, 28 Apr 2010 16:15:43 +0000 (19:15 +0300)]
KVM: x86 emulator: move interruptibility state tracking out of emulator

Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: handle shadowed registers outside emulator
Gleb Natapov [Wed, 28 Apr 2010 16:15:42 +0000 (19:15 +0300)]
KVM: x86 emulator: handle shadowed registers outside emulator

Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: use shadowed register in emulate_sysexit()
Gleb Natapov [Wed, 28 Apr 2010 16:15:41 +0000 (19:15 +0300)]
KVM: x86 emulator: use shadowed register in emulate_sysexit()

emulate_sysexit() should use shadowed registers copy instead of
looking into vcpu state directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: set RFLAGS outside x86 emulator code
Gleb Natapov [Wed, 28 Apr 2010 16:15:40 +0000 (19:15 +0300)]
KVM: x86 emulator: set RFLAGS outside x86 emulator code

Removes the need for set_flags() callback.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: advance RIP outside x86 emulator code
Gleb Natapov [Wed, 28 Apr 2010 16:15:39 +0000 (19:15 +0300)]
KVM: x86 emulator: advance RIP outside x86 emulator code

Return new RIP as part of instruction emulation result instead of
updating KVM's RIP from x86 emulator code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: handle emulation failure case first
Gleb Natapov [Wed, 28 Apr 2010 16:15:38 +0000 (19:15 +0300)]
KVM: handle emulation failure case first

If emulation failed return immediately.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: do not inject #PF in (read|write)_emulated() callbacks
Gleb Natapov [Wed, 28 Apr 2010 16:15:37 +0000 (19:15 +0300)]
KVM: do not inject #PF in (read|write)_emulated() callbacks

Return error to x86 emulator instead of injection exception behind its back.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: remove export of emulator_write_emulated()
Gleb Natapov [Wed, 28 Apr 2010 16:15:36 +0000 (19:15 +0300)]
KVM: remove export of emulator_write_emulated()

It is not called directly outside of the file it's defined in anymore.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure
Gleb Natapov [Wed, 28 Apr 2010 16:15:35 +0000 (19:15 +0300)]
KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure

Currently emulator returns -1 when emulation failed or IO is needed.
Caller tries to guess whether emulation failed by looking at other
variables. Make it easier for caller to recognise error condition by
always returning -1 in case of failure. For this new emulator
internal return value X86EMUL_IO_NEEDED is introduced. It is used to
distinguish between error condition (which returns X86EMUL_UNHANDLEABLE)
and condition that requires IO exit to userspace to continue emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: fill in run->mmio details in (read|write)_emulated function
Gleb Natapov [Wed, 28 Apr 2010 16:15:34 +0000 (19:15 +0300)]
KVM: fill in run->mmio details in (read|write)_emulated function

Fill in run->mmio details in (read|write)_emulated function just like
pio does. There is no point in filling only vcpu fields there just to
copy them into vcpu->run a little bit later.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values
Gleb Natapov [Wed, 28 Apr 2010 16:15:33 +0000 (19:15 +0300)]
KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values

Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and
X86EMUL_CMPXCHG_FAILED have the same value so caller cannot
distinguish why function such as emulator_cmpxchg_emulated()
(which can return both X86EMUL_PROPAGATE_FAULT and
X86EMUL_CMPXCHG_FAILED) failed.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: make (get|set)_dr() callback return error if it fails
Gleb Natapov [Wed, 28 Apr 2010 16:15:32 +0000 (19:15 +0300)]
KVM: x86 emulator: make (get|set)_dr() callback return error if it fails

Make (get|set)_dr() callback return error if it fails instead of
injecting exception behind emulator's back.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: make set_cr() callback return error if it fails
Gleb Natapov [Wed, 28 Apr 2010 16:15:31 +0000 (19:15 +0300)]
KVM: x86 emulator: make set_cr() callback return error if it fails

Make set_cr() callback return error if it fails instead of injecting #GP
behind emulator's back.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
Gleb Natapov [Wed, 28 Apr 2010 16:15:30 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops
Gleb Natapov [Wed, 28 Apr 2010 16:15:29 +0000 (19:15 +0300)]
KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops

On VMX it is expensive to call get_cached_descriptor() just to get segment
base since multiple vmcs_reads are done instead of only one. Introduce
new call back get_cached_segment_base() for efficiency.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops
Gleb Natapov [Wed, 28 Apr 2010 16:15:28 +0000 (19:15 +0300)]
KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops

Add (set|get)_msr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops
Gleb Natapov [Wed, 28 Apr 2010 16:15:27 +0000 (19:15 +0300)]
KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops

Add (set|get)_dr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: handle "far address" source operand
Gleb Natapov [Wed, 28 Apr 2010 16:15:26 +0000 (19:15 +0300)]
KVM: x86 emulator: handle "far address" source operand

ljmp/lcall instruction operand contains address and segment.
It can be 10 bytes long. Currently we decode it as two different
operands. Fix it by introducing new kind of operand that can hold
entire far address.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: cleanup nop emulation
Gleb Natapov [Wed, 28 Apr 2010 16:15:25 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup nop emulation

Make it more explicit what we are checking for.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: cleanup xchg emulation
Gleb Natapov [Wed, 28 Apr 2010 16:15:24 +0000 (19:15 +0300)]
KVM: x86 emulator: cleanup xchg emulation

Dst operand is already initialized during decoding stage. No need to
reinitialize.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: fix Move r/m16 to segment register decoding
Gleb Natapov [Wed, 28 Apr 2010 16:15:23 +0000 (19:15 +0300)]
KVM: x86 emulator: fix Move r/m16 to segment register decoding

This instruction does not need generic decoding for its dst operand.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86 emulator: introduce read cache
Gleb Natapov [Wed, 28 Apr 2010 16:15:22 +0000 (19:15 +0300)]
KVM: x86 emulator: introduce read cache

Introduce read cache which is needed for instruction that require more
then one exit to userspace. After returning from userspace the instruction
will be re-executed with cached read value.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Avoid writing HOST_CR0 every entry
Avi Kivity [Mon, 3 May 2010 13:05:44 +0000 (16:05 +0300)]
KVM: VMX: Avoid writing HOST_CR0 every entry

cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each
entry.  That is slow, so instead, set HOST_CR0 to have TS set unconditionally
(which is a safe value), and issue a clts() just before exiting vcpu context
if the task indeed owns the fpu.

Saves ~50 cycles/exit.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: kvm_pdptr_read() may sleep
Avi Kivity [Tue, 4 May 2010 10:00:55 +0000 (13:00 +0300)]
KVM: kvm_pdptr_read() may sleep

Annotate it thusly.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Takuya Yoshikawa [Wed, 28 Apr 2010 09:50:36 +0000 (18:50 +0900)]
KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. In my test, the improved ioctl
was about 4 to 10 times faster than the original one for clean slots.
Furthermore, reducing memory allocations and copies will produce good
effects to caches too.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: Simplify vmx_get_nmi_mask()
Avi Kivity [Tue, 4 May 2010 09:24:12 +0000 (12:24 +0300)]
KVM: VMX: Simplify vmx_get_nmi_mask()

!! is not needed due to the cast to bool.

Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: Avoid killing userspace through guest SRAO MCE on unmapped pages
Huang Ying [Mon, 31 May 2010 06:28:19 +0000 (14:28 +0800)]
KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages

In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]

Reported-by: Max Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel...
Linus Torvalds [Thu, 29 Jul 2010 03:01:26 +0000 (20:01 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jwessel/linux-2.6-kgdb

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  x86,kgdb: Fix hw breakpoint regression

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
Linus Torvalds [Thu, 29 Jul 2010 03:00:42 +0000 (20:00 -0700)]
Merge git://git./linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] ibmvscsi: Fix oops when an interrupt is pending during probe
  [SCSI] zfcp: Update status read mempool
  [SCSI] zfcp: Do not wait for SBALs on stopped queue
  [SCSI] zfcp: Fix check whether unchained ct_els is possible
  [SCSI] ipr: fix resource path display and formatting

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
Linus Torvalds [Thu, 29 Jul 2010 02:59:55 +0000 (19:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/lrg/voltage-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  davinci: da850/omap-l138 evm: account for DEFDCDC{2,3} being tied high
  regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
  wm8350-regulator: fix wm8350_register_regulator error handling
  ab3100: fix off-by-one value range checking for voltage selector

13 years agoecryptfs: Bugfix for error related to ecryptfs_hash_buckets
Andre Osterhues [Tue, 13 Jul 2010 20:59:17 +0000 (15:59 -0500)]
ecryptfs: Bugfix for error related to ecryptfs_hash_buckets

The function ecryptfs_uid_hash wrongly assumes that the
second parameter to hash_long() is the number of hash
buckets instead of the number of hash bits.
This patch fixes that and renames the variable
ecryptfs_hash_buckets to ecryptfs_hash_bits to make it
clearer.

Fixes: CVE-2010-2492

Signed-off-by: Andre Osterhues <aosterhues@escrypt.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agox86,kgdb: Fix hw breakpoint regression
Jason Wessel [Thu, 29 Jul 2010 00:10:30 +0000 (19:10 -0500)]
x86,kgdb: Fix hw breakpoint regression

HW breakpoints events stopped working correctly with kgdb
as a result of commit: 018cbffe6819f6f8db20a0a3acd9bab9bfd667e4
(Merge commit 'v2.6.33' into perf/core).

The regression occurred because the behavior changed for setting
NOTIFY_STOP as the return value to the die notifier if the breakpoint
was known to the HW breakpoint API.  Because kgdb is using the HW
breakpoint API to register HW breakpoints slots, it must also now
implement the overflow_handler call back else kgdb does not get to see
the events from the die notifier.

The kgdb_ll_trap function will be changed to be general purpose code
which can allow an easy way to implement the hw_breakpoint API
overflow call back.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph...
Linus Torvalds [Wed, 28 Jul 2010 18:10:53 +0000 (11:10 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: use complete_all and wake_up_all
  ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES"
  ceph: fix dentry lease release
  ceph: fix leak of dentry in ceph_init_dentry() error path
  ceph: fix pg_mapping leak on pg_temp updates
  ceph: fix d_release dop for snapdir, snapped dentries
  ceph: avoid dcache readdir for snapdir

13 years agoGFS2: Use kmalloc when possible for ->readdir()
Steven Whitehouse [Wed, 28 Jul 2010 16:56:23 +0000 (17:56 +0100)]
GFS2: Use kmalloc when possible for ->readdir()

If we don't need a huge amount of memory in ->readdir() then
we can use kmalloc rather than vmalloc to allocate it. This
should cut down on the greater overheads associated with
vmalloc for smaller directories.

We may be able to eliminate vmalloc entirely at some stage,
but this is easy to do right away.

Also using GFP_NOFS to avoid any issues wrt to deleting inodes
while under a glock, and suggestion from Linus to factor out
the alloc/dealloc.

I've given this a test with a variety of different sized
directories and it seems to work ok.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodavinci: da850/omap-l138 evm: account for DEFDCDC{2,3} being tied high
Sekhar Nori [Mon, 12 Jul 2010 12:26:21 +0000 (17:56 +0530)]
davinci: da850/omap-l138 evm: account for DEFDCDC{2,3} being tied high

Per the da850/omap-l138 Beta EVM SOM schematic, the DEFDCDC2 and
DEFDCDC3 lines are tied high. This leads to a 3.3V IO and 1.2V CVDD
voltage.

Pass the right platform data to the TPS6507x driver so it can operate
on the DEFDCDC{2,3}_HIGH register to read and change voltage levels.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
13 years agoregulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
Anuj Aggarwal [Mon, 12 Jul 2010 12:24:06 +0000 (17:54 +0530)]
regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register

Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
In TPS6507x, depending on the status of DEFDCDC{2,3} pin either
DEFDCDC{2,3}_LOW or DEFDCDC{2,3}_HIGH register needs to be read or
programmed to change the output voltage.

The current driver assumes DEFDCDC{2,3} pins are always tied low
and thus operates only on DEFDCDC{2,3}_LOW register. This need
not always be the case (as is found on OMAP-L138 EVM).

Unfortunately, software cannot read the status of DEFDCDC{2,3} pins.
So, this information is passed through platform data depending on
how the board is wired.

Signed-off-by: Anuj Aggarwal <anuj.aggarwal@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh...
Linus Torvalds [Tue, 27 Jul 2010 21:32:59 +0000 (14:32 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Pass the correct end of buffer to p9stat_read

13 years agogpio: fix spurious printk when freeing a gpio
Jon Povey [Tue, 27 Jul 2010 20:18:06 +0000 (13:18 -0700)]
gpio: fix spurious printk when freeing a gpio

When freeing a gpio that has not been exported, gpio_unexport() prints a
debug message when it should just fall through silently.

Example spurious message:

gpio_unexport: gpio0 status -22

Signed-off-by: Jon Povey <jon.povey@racelogic.co.uk>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Uwe Kleine-K?nig <u.kleine-koenig@pengutronix.de>
Cc: Gregory Bean <gbean@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoedac: mpc85xx: fix coldplug/hotplug module autoloading
Anton Vorontsov [Tue, 27 Jul 2010 20:18:05 +0000 (13:18 -0700)]
edac: mpc85xx: fix coldplug/hotplug module autoloading

The MPC85xx EDAC driver is missing module device aliases, so the driver
won't load automatically on boot.  This patch fixes the issue by adding
proper MODULE_DEVICE_TABLE() macros.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-rx8581.c: fix setdatetime
Rudolf Marek [Tue, 27 Jul 2010 20:18:02 +0000 (13:18 -0700)]
drivers/rtc/rtc-rx8581.c: fix setdatetime

Fix the logic while writing new date/time to the chip.  The driver
incorrectly wrote back register values to different registers and even
with wrong mask.  The patch adds clearing of the VLF register, which
should be cleared if all date/time values are set.

Signed-off-by: Rudolf Marek <rudolf.marek@sysgo.com>
Acked-by: Wan ZongShun <mcuos.com@gmail.com>
Cc: Martyn Welch <martyn.welch@gefanuc.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodynamic debug: move ddebug_remove_module() down into free_module()
Jason Baron [Tue, 27 Jul 2010 20:18:01 +0000 (13:18 -0700)]
dynamic debug: move ddebug_remove_module() down into free_module()

The command

echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

causes an oops.

Move the call to ddebug_remove_module() down into free_module().  In this
way it should be called from all error paths.  Currently, we are missing
the remove if the module init routine fails.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Reported-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Cc: <stable@kernel.org> [2.6.32+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoceph: use complete_all and wake_up_all
Yehuda Sadeh [Tue, 27 Jul 2010 20:11:08 +0000 (13:11 -0700)]
ceph: use complete_all and wake_up_all

This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years ago9p: Pass the correct end of buffer to p9stat_read
Latchesar Ionkov [Mon, 19 Jul 2010 20:40:03 +0000 (15:40 -0500)]
9p: Pass the correct end of buffer to p9stat_read

Pass the correct end of the buffer to p9stat_read.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
13 years ago[SCSI] ibmvscsi: Fix oops when an interrupt is pending during probe
Anton Blanchard [Tue, 13 Jul 2010 04:59:29 +0000 (14:59 +1000)]
[SCSI] ibmvscsi: Fix oops when an interrupt is pending during probe

A driver needs to be ready to take an interrupt as soon as it registers
an interrupt handler. I noticed the following oops when testing kdump:

ipr: IBM Power RAID SCSI Device Driver version: 2.5.0 (February 11, 2010)
ibmvscsi 30000002: SRP_VERSION: 16.a
ibmvscsi 30000002: SRP_VERSION: 16.a
Unable to handle kernel paging request for data at address 0x00000000
...
pc: c000000004085e34: .tasklet_action+0xf4/0x1dc
...
c000000004086fe4 .__do_softirq+0x16c/0x2c0
c00000000403138c .call_do_softirq+0x14/0x24
c00000000400ee14 .do_softirq+0xa0/0x104
c00000000408690c .irq_exit+0x70/0xd0
c00000000400f190 .do_IRQ+0x214/0x2a8
c000000004004804 hardware_interrupt_entry+0x1c/0x98
--- Exception: 501 (Hardware Interrupt) at c00000000400c544 .raw_local_irq_restore+0x48/0x54
c00000000465d2a8 ._raw_spin_unlock_irqrestore+0x74/0xa0
c0000000040e7f00 .__setup_irq+0x2ec/0x3f0
c0000000040e8198 .request_threaded_irq+0x194/0x22c
c00000000446d854 .rpavscsi_init_crq_queue+0x284/0x3f0
c00000000446c764 .ibmvscsi_probe+0x688/0x710
c00000000402903c .vio_bus_probe+0x37c/0x3e4
c000000004403f10 .driver_probe_device+0xec/0x1b8
c000000004404088 .__driver_attach+0xac/0xf4
c000000004403184 .bus_for_each_dev+0x98/0x104
c000000004403c98 .driver_attach+0x40/0x60
c0000000044026f0 .bus_add_driver+0x154/0x324
c0000000044045d0 .driver_register+0xe8/0x1ac
c00000000402b2a8 .vio_register_driver+0x54/0x74
c000000004933ea4 .ibmvscsi_module_init+0x80/0xc0
c000000004009834 .do_one_initcall+0x98/0x1d8
c0000000049005b4 .kernel_init+0x27c/0x33c
c000000004031550 .kernel_thread+0x54/0x70

srp_task needs to be setup before request_irq. The patch below fixes the oops.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
13 years agoMerge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perf
Linus Torvalds [Tue, 27 Jul 2010 16:23:39 +0000 (09:23 -0700)]
Merge branch 'urgent' of git://git./linux/kernel/git/paulus/perf

* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perf:
  perf, powerpc: Use perf_sample_data_init() for the FSL code

13 years agoMerge git://git.infradead.org/users/cbou/battery-2.6.35
Linus Torvalds [Tue, 27 Jul 2010 16:22:55 +0000 (09:22 -0700)]
Merge git://git.infradead.org/users/cbou/battery-2.6.35

* git://git.infradead.org/users/cbou/battery-2.6.35:
  ds2782_battery: Rename get_current to fix build failure / name conflict

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Tue, 27 Jul 2010 16:21:00 +0000 (09:21 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  s2io: fixing DBG_PRINT() macro
  ath9k: fix dma direction for map/unmap in ath_rx_tasklet
  net: dev_forward_skb should call nf_reset
  net sched: fix race in mirred device removal
  tun: avoid BUG, dump packet on GSO errors
  bonding: set device in RLB ARP packet handler
  wimax/i2400m: Add PID & VID for Intel WiMAX 6250
  ipv6: Don't add routes to ipv6 disabled interfaces.
  net: Fix skb_copy_expand() handling of ->csum_start
  net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c
  macvtap: Limit packet queue length
  ixgbe/igb: catch invalid VF settings
  bnx2x: Advance a module version
  bnx2x: Protect statistics ramrod and sequence number
  bnx2x: Protect a SM state change
  wireless: use netif_rx_ni in ieee80211_send_layer2_update

13 years agoperf, powerpc: Use perf_sample_data_init() for the FSL code
Peter Zijlstra [Fri, 9 Jul 2010 08:21:21 +0000 (10:21 +0200)]
perf, powerpc: Use perf_sample_data_init() for the FSL code

We should use perf_sample_data_init() to initialize struct
perf_sample_data.  As explained in the description of commit dc1d628a
("perf: Provide generic perf_sample_data initialization"), it is
possible for userspace to get the kernel to dereference data.raw,
so if it is not initialized, that means that unprivileged userspace
can possibly oops the kernel.  Using perf_sample_data_init makes sure
it gets initialized to NULL.

This conversion should have been included in commit dc1d628a, but it
got missed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
13 years agowm8350-regulator: fix wm8350_register_regulator error handling
Axel Lin [Mon, 26 Jul 2010 02:41:58 +0000 (10:41 +0800)]
wm8350-regulator: fix wm8350_register_regulator error handling

In the case of platform_device_add() fail, we should call
platform_device_put() instead of platform_device_del()

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
13 years agoab3100: fix off-by-one value range checking for voltage selector
Axel Lin [Mon, 26 Jul 2010 07:34:14 +0000 (15:34 +0800)]
ab3100: fix off-by-one value range checking for voltage selector

We use voltage selector as an array index for typ_voltages.
Thus the valid range for voltage selector should be 0..voltages_len-1.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
13 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 26 Jul 2010 23:02:07 +0000 (16:02 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Do not try to disable hpet if it hasn't been initialized before
  x86, i8259: Only register sysdev if we have a real 8259 PIC

13 years agos2io: fixing DBG_PRINT() macro
Breno Leitao [Mon, 26 Jul 2010 22:37:30 +0000 (15:37 -0700)]
s2io: fixing DBG_PRINT() macro

Patch 9e39f7c5b311a306977c5471f9e2ce4c456aa038 changed the
DBG_PRINT() macro and the if clause was wrongly changed. It means
that currently all the DBG_PRINT are being printed, flooding the
kernel log buffer with things like:

s2io: eth6: Next block at: c0000000b9c90000
s2io: eth6: In Neterion Tx routine

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Acked-by: Sreenivasa Honnur <Sreenivasa.Honnur@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Linus Torvalds [Mon, 26 Jul 2010 22:35:53 +0000 (15:35 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/davej/cpufreq

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] powernow-k8: Limit Pstate transition latency check
  [CPUFREQ] Fix PCC driver error path
  [CPUFREQ] fix double freeing in error path of pcc-cpufreq
  [CPUFREQ] pcc driver should check for pcch method before calling _OSC
  [CPUFREQ] fix memory leak in cpufreq_add_dev
  [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

13 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Mon, 26 Jul 2010 22:35:04 +0000 (15:35 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/upstream-linus

* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Set io_map_base for several PCI bridges lacking it
  MIPS: Alchemy: Define eth platform devices in the correct order
  MIPS: BCM63xx: Prevent second enet registration on BCM6338
  MIPS: Quit using undefined behavior of ADDU in 64-bit atomic operations.
  MIPS: N32: Define getdents64.
  MIPS: MTX-1: Fix PCI on the MeshCube and related boards
  MIPS: Make init_vdso a subsys_initcall.
  MIPS: "Fix" useless 'init_vdso successfully' message.
  MIPS: PowerTV: Move register setup to before reading registers.
  SOUND: Au1000: Fix section mismatch
  VIDEO: Au1100fb: Fix section mismatch
  VIDEO: PMAGB-B: Fix section mismatch
  VIDEO: PMAG-BA: Fix section mismatch
  NET: declance: Fix section mismatches
  VIDEO. gbefb: Fix section mismatches.