pandora-kernel.git
9 years agox86/kvm: Resolve shadow warnings in macro expansion
Mark D Rustad [Wed, 30 Jul 2014 21:19:26 +0000 (14:19 -0700)]
x86/kvm: Resolve shadow warnings in macro expansion

Resolve shadow warnings that appear in W=2 builds. Instead of
using ret to hold the return pointer, save the length in a new
variable saved_len and compute the pointer on exit. This also
resolves a very technical error, in that ret was declared as
a const char *, when it really was a char * const.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge tag 'kvm-s390-20140730' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms39...
Paolo Bonzini [Thu, 31 Jul 2014 14:31:49 +0000 (16:31 +0200)]
Merge tag 'kvm-s390-20140730' of git://git./linux/kernel/git/kvms390/linux into kvm-next

Two fixes for recently introduced regressions
- a memory leak on busy SIGP
- pontentially lost SIGP stop in rare situations (shutdown loops)

The first issue is not part of a released kernel. The 2nd issue is
present in all KVM versions, but did not trigger before commit
7dfc63cf977447e09b1072911c2 (KVM: s390: allow only one SIGP STOP
(AND STORE STATUS) at a time) with Linux as a guest.
So no need for cc stable

9 years agoKVM: s390: rework broken SIGP STOP interrupt handling
David Hildenbrand [Mon, 28 Jul 2014 12:05:41 +0000 (14:05 +0200)]
KVM: s390: rework broken SIGP STOP interrupt handling

A VCPU might never stop if it intercepts (for whatever reason) between
"fake interrupt delivery" and execution of the stop function.

Heart of the problem is that SIGP STOP is an interrupt that has to be
processed on every SIE entry until the VCPU finally executes the stop
function.

This problem was made apparent by commit 7dfc63cf977447e09b1072911c2
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time).
With the old code, the guest could (incorrectly) inject SIGP STOPs
multiple times. The bug of losing a sigp stop exists in KVM before
7dfc63cf97, but it was hidden by Linux guests doing a sigp stop loop.
The new code (rightfully) returns CC=2 and does not queue a new
interrupt.

This patch is a simple fix of the problem. Longterm we are going to
rework that code - e.g. get rid of the action bits and so on.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[some additional patch description]

9 years agoKVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
Paolo Bonzini [Wed, 30 Jul 2014 16:07:24 +0000 (18:07 +0200)]
KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table

Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked.  However, this can cause a bug that manifests
as an interrupt storm inside the guest.  Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)

The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.

As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).

Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work.  What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.

My guess was that the guest is masking the interrupt in the redirection
table in the interrupt routine, i.e. while the interrupt is set in a
LAPIC's ISR, The simplest fix is to ignore the masking state, we would
rather have an unnecessary exit rather than a missed IRQ ACK and anyway
IOAPIC interrupts are not as performance-sensitive as for example MSIs.
Alex tested this patch and it fixed his bug.

[Thanks to Alex for his precise description of the problem
 and initial debugging effort.  A lot of the text above is
 based on emails exchanged with him.]

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: vmx: remove duplicate vmx_mpx_supported() prototype
Chris J Arges [Tue, 29 Jul 2014 21:14:10 +0000 (16:14 -0500)]
KVM: vmx: remove duplicate vmx_mpx_supported() prototype

Remove a prototype which was added by both 93c4adc7afe and 36be0b9deb2.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: s390: Fix memory leak on busy SIGP stop
Christian Borntraeger [Mon, 28 Jul 2014 09:52:02 +0000 (11:52 +0200)]
KVM: s390: Fix memory leak on busy SIGP stop

commit 7dfc63cf977447e09b1072911c22564f900fc578
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time)
introduced a memory leak if a sigp stop is already pending. Free
the allocated inti structure.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
9 years agox86/kvm: Resolve shadow warning from min macro
Mark Rustad [Fri, 25 Jul 2014 13:27:05 +0000 (06:27 -0700)]
x86/kvm: Resolve shadow warning from min macro

Resolve a shadow warning generated in W=2 builds by the nested
use of the min macro by instead using the min3 macro for the
minimum of 3 values.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agokvm: Resolve missing-field-initializers warnings
Mark Rustad [Fri, 25 Jul 2014 13:27:03 +0000 (06:27 -0700)]
kvm: Resolve missing-field-initializers warnings

Resolve missing-field-initializers warnings seen in W=2 kernel
builds by having macros generate more elaborated initializers.
That is enough to silence the warnings.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoReplace NR_VMX_MSR with its definition
Paolo Bonzini [Thu, 24 Jul 2014 12:21:57 +0000 (14:21 +0200)]
Replace NR_VMX_MSR with its definition

Using ARRAY_SIZE directly makes it easier to read the code.  While touching
the code, replace the division by a multiplication in the recently added
BUILD_BUG_ON.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Assertions to check no overrun in MSR lists
Nadav Amit [Thu, 24 Jul 2014 12:06:56 +0000 (15:06 +0300)]
KVM: x86: Assertions to check no overrun in MSR lists

Currently there is no check whether shared MSRs list overrun the allocated size
which can results in bugs. In addition there is no check that vmx->guest_msrs
has sufficient space to accommodate all the VMX msrs.  This patch adds the
assertions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: set rflags.rf during fault injection
Nadav Amit [Thu, 24 Jul 2014 11:51:24 +0000 (14:51 +0300)]
KVM: x86: set rflags.rf during fault injection

x86 does not automatically set rflags.rf during event injection. This patch
does partial job, setting rflags.rf upon fault injection.  It does not handle
the setting of RF upon interrupt injection on rep-string instruction.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Setting rflags.rf during rep-string emulation
Nadav Amit [Thu, 24 Jul 2014 11:51:23 +0000 (14:51 +0300)]
KVM: x86: Setting rflags.rf during rep-string emulation

This patch updates RF for rep-string emulation.  The flag is set upon the first
iteration, and cleared after the last (if emulated). It is intended to make
sure that if a trap (in future data/io #DB emulation) or interrupt is delivered
to the guest during the rep-string instruction, RF will be set correctly. RF
affects whether instruction breakpoint in the guest is masked.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge tag 'kvm-s390-20140721' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms39...
Paolo Bonzini [Tue, 22 Jul 2014 08:22:53 +0000 (10:22 +0200)]
Merge tag 'kvm-s390-20140721' of git://git./linux/kernel/git/kvms390/linux into kvm-next

Bugfixes
--------
- add IPTE to trace event decoder
- document and advertise KVM_CAP_S390_IRQCHIP

Cleanups
--------
- Reuse kvm_vcpu_block for s390
- Get rid of tasklet for wakup processing

9 years agoKVM: x86: DR6/7.RTM cannot be written
Nadav Amit [Tue, 15 Jul 2014 14:37:46 +0000 (17:37 +0300)]
KVM: x86: DR6/7.RTM cannot be written

Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: clean up nested_release_vmcs12 and code around it
Paolo Bonzini [Thu, 17 Jul 2014 09:55:46 +0000 (11:55 +0200)]
KVM: nVMX: clean up nested_release_vmcs12 and code around it

Make nested_release_vmcs12 idempotent.

Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: fix lifetime issues for vmcs02
Paolo Bonzini [Thu, 17 Jul 2014 10:25:16 +0000 (12:25 +0200)]
KVM: nVMX: fix lifetime issues for vmcs02

free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in
order to detach it from the shadow vmcs.  However, this is not
available anymore after commit 26a865f4aa8e (KVM: VMX: fix use after
free of vmx->loaded_vmcs, 2014-01-03).

Revert that patch, and fix its problem by forcing a vmcs01 as the
active VMCS before freeing all the nested VMX state.

Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Defining missing x86 vectors
Nadav Amit [Mon, 21 Jul 2014 11:37:24 +0000 (14:37 +0300)]
KVM: x86: Defining missing x86 vectors

Defining XE, XM and VE vector numbers.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: emulator injects #DB when RFLAGS.RF is set
Nadav Amit [Wed, 16 Jul 2014 22:19:31 +0000 (01:19 +0300)]
KVM: x86: emulator injects #DB when RFLAGS.RF is set

If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints.
However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes
this behavior. KVM, however, still appears not to update RFLAGS.RF correctly,
regardless of this patch.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Cleanup of rflags.rf cleaning
Nadav Amit [Mon, 21 Jul 2014 11:37:30 +0000 (14:37 +0300)]
KVM: x86: Cleanup of rflags.rf cleaning

RFLAGS.RF was cleaned in several functions (e.g., syscall) in the x86 emulator.
Now that we clear it before the execution of an instruction in the emulator, we
can remove the specific cleanup of RFLAGS.RF.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Clear rflags.rf on emulated instructions
Nadav Amit [Mon, 21 Jul 2014 11:37:29 +0000 (14:37 +0300)]
KVM: x86: Clear rflags.rf on emulated instructions

When an instruction is emulated RFLAGS.RF should be cleared. KVM previously did
not do so. This patch clears RFLAGS.RF after interception is done.  If a fault
occurs during the instruction, RFLAGS.RF will be set by a previous patch.  This
patch does not handle the case of traps/interrupts during rep-strings. Traps
are only expected to occur on debug watchpoints, and those are anyhow not
handled by the emulator.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: popf emulation should not change RF
Nadav Amit [Mon, 21 Jul 2014 11:37:28 +0000 (14:37 +0300)]
KVM: x86: popf emulation should not change RF

RFLAGS.RF is always zero after popf. Therefore, popf should not updated RF, as
anyhow emulating popf, just as any other instruction should clear RFLAGS.RF.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Clearing rflags.rf upon skipped emulated instruction
Nadav Amit [Mon, 21 Jul 2014 11:37:26 +0000 (14:37 +0300)]
KVM: x86: Clearing rflags.rf upon skipped emulated instruction

When skipping an emulated instruction, rflags.rf should be cleared as it would
be on real x86 CPU.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge tag 'kvm-s390-20140715' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms39...
Paolo Bonzini [Mon, 21 Jul 2014 11:35:43 +0000 (13:35 +0200)]
Merge tag 'kvm-s390-20140715' of git://git./linux/kernel/git/kvms390/linux into kvm-next

This series enables the "KVM_(S|G)ET_MP_STATE" ioctls on s390 to make
the cpu state settable by user space.

This is necessary to avoid races in s390 SIGP/reset handling which
happen because some SIGPs are handled in QEMU, while others are
handled in the kernel. Together with the busy conditions as return
value of SIGP races happen especially in areas like starting and
stopping of CPUs. (For example, there is a program 'cpuplugd', that
runs on several s390 distros which does automatic onlining and
offlining on cpus.)

As soon as the MPSTATE interface is used, user space takes complete
control of the cpu states. Otherwise the kernel will use the old way.

Therefore, the new kernel continues to work fine with old QEMUs.

9 years agoKVM: s390: add ipte to trace event decoding
Christian Borntraeger [Wed, 16 Jul 2014 11:58:19 +0000 (13:58 +0200)]
KVM: s390: add ipte to trace event decoding

IPTE intercept can happen, let's decode that.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
9 years agoKVM: s390: advertise KVM_CAP_S390_IRQCHIP
Cornelia Huck [Tue, 15 Jul 2014 07:54:39 +0000 (09:54 +0200)]
KVM: s390: advertise KVM_CAP_S390_IRQCHIP

We should advertise all capabilities, including those that can
be enabled.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: document KVM_CAP_S390_IRQCHIP
Cornelia Huck [Fri, 27 Jun 2014 09:06:25 +0000 (11:06 +0200)]
KVM: s390: document KVM_CAP_S390_IRQCHIP

Let's document that this is a capability that may be enabled per-vm.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: document target of capability enablement
Cornelia Huck [Fri, 27 Jun 2014 07:29:26 +0000 (09:29 +0200)]
KVM: document target of capability enablement

Capabilities can be enabled on a vcpu or (since recently) on a vm. Document
this and note for the existing capabilites whether they are per-vcpu or
per-vm.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: remove the tasklet used by the hrtimer
David Hildenbrand [Fri, 16 May 2014 10:08:29 +0000 (12:08 +0200)]
KVM: s390: remove the tasklet used by the hrtimer

We can get rid of the tasklet used for waking up a VCPU in the hrtimer
code but wakeup the VCPU directly.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: move vcpu wakeup code to a central point
David Hildenbrand [Fri, 16 May 2014 09:59:46 +0000 (11:59 +0200)]
KVM: s390: move vcpu wakeup code to a central point

Let's move the vcpu wakeup code to a central point.

We should set the vcpu->preempted flag only if the target is actually sleeping
and before the real wakeup happens. Otherwise the preempted flag might be set,
when not necessary. This may result in immediate reschedules after schedule()
in some scenarios.

The wakeup code doesn't require the local_int.lock to be held.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: remove _bh locking from start_stop_lock
David Hildenbrand [Tue, 6 May 2014 14:11:14 +0000 (16:11 +0200)]
KVM: s390: remove _bh locking from start_stop_lock

The start_stop_lock is no longer acquired when in atomic context, therefore we
can convert it into an ordinary spin_lock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: remove _bh locking from local_int.lock
David Hildenbrand [Fri, 16 May 2014 08:23:53 +0000 (10:23 +0200)]
KVM: s390: remove _bh locking from local_int.lock

local_int.lock is not used in a bottom-half handler anymore, therefore we can
turn it into an ordinary spin_lock at all occurrences.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: cleanup handle_wait by reusing kvm_vcpu_block
David Hildenbrand [Tue, 13 May 2014 14:54:32 +0000 (16:54 +0200)]
KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block

This patch cleans up the code in handle_wait by reusing the common code
function kvm_vcpu_block.

signal_pending(), kvm_cpu_has_pending_timer() and kvm_arch_vcpu_runnable() are
sufficient for checking if we need to wake-up that VCPU. kvm_vcpu_block
uses these functions, so no checks are lost.

The flag "timer_due" can be removed - kvm_cpu_has_pending_timer() tests whether
the timer is pending, thus the vcpu is correctly woken up.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: nVMX: Fix virtual interrupt delivery injection
Wanpeng Li [Thu, 17 Jul 2014 11:03:00 +0000 (19:03 +0800)]
KVM: nVMX: Fix virtual interrupt delivery injection

This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331,
after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is
some progress and the L2 can boot up, however, slowly. The original idea of this
fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>.

Interrupt which delivered by vid should be injected to L1 by L0 if current is in
L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't
have set External-interrupt exiting bit. The current logic doen't consider these
cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old
injection way if L1 doen't have External-interrupt exiting bit set.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Emulator support for #UD on CPL>0
Nadav Amit [Wed, 18 Jun 2014 14:19:35 +0000 (17:19 +0300)]
KVM: x86: Emulator support for #UD on CPL>0

Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
are executed in user mode. This is in contrast to the regular privileged
instructions which cause #GP. In order not to mess with SVM interception of
mwait and monitor which assumes privilege level assertions take place before
interception, a flag has been added.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Emulator flag for instruction that only support 16-bit addresses in real...
Nadav Amit [Wed, 18 Jun 2014 14:19:34 +0000 (17:19 +0300)]
KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode

Certain instructions, such as monitor and xsave do not support big real mode
and cause a #GP exception if any of the accessed bytes effective address are
not within [0, 0xffff].  This patch introduces a flag to mark these
instructions, including the necassary checks.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: use kvm_read_guest_page for emulator accesses
Paolo Bonzini [Tue, 13 May 2014 12:02:13 +0000 (14:02 +0200)]
KVM: x86: use kvm_read_guest_page for emulator accesses

Emulator accesses are always done a page at a time, either by the emulator
itself (for fetches) or because we need to query the MMU for address
translations.  Speed up these accesses by using kvm_read_guest_page
and, in the case of fetches, by inlining kvm_read_guest_virt_helper and
dropping the loop around kvm_read_guest_page.

This final tweak saves 30-100 more clock cycles (4-10%), bringing the
count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on
a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series
and 925-1700 after the first two low-hanging fruit changes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: ensure emulator fetches do not span multiple pages
Paolo Bonzini [Thu, 19 Jun 2014 09:37:06 +0000 (11:37 +0200)]
KVM: x86: ensure emulator fetches do not span multiple pages

When the CS base is not page-aligned, the linear address of the code could
get close to the page boundary (e.g. 0x...ffe) even if the EIP value is
not.  So we need to first linearize the address, and only then compute
the number of valid bytes that can be fetched.

This happens relatively often when executing real mode code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: put pointers in the fetch_cache
Paolo Bonzini [Tue, 6 May 2014 14:33:01 +0000 (16:33 +0200)]
KVM: emulate: put pointers in the fetch_cache

This simplifies the code a bit, especially the overflow checks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: avoid per-byte copying in instruction fetches
Paolo Bonzini [Tue, 6 May 2014 11:05:25 +0000 (13:05 +0200)]
KVM: emulate: avoid per-byte copying in instruction fetches

We do not need a memory copying loop anymore in insn_fetch; we
can use a byte-aligned pointer to access instruction fields directly
from the fetch_cache.  This eliminates 50-150 cycles (corresponding to
a 5-10% improvement in performance) from each instruction.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: avoid repeated calls to do_insn_fetch_bytes
Paolo Bonzini [Tue, 6 May 2014 11:05:25 +0000 (13:05 +0200)]
KVM: emulate: avoid repeated calls to do_insn_fetch_bytes

do_insn_fetch_bytes will only be called once in a given insn_fetch and
insn_fetch_arr, because in fact it will only be called at most twice
for any instruction and the first call is explicit in x86_decode_insn.
This observation lets us hoist the call out of the memory copying loop.
It does not buy performance, because most fetches are one byte long
anyway, but it prepares for the next patch.

The overflow check is tricky, but correct.  Because do_insn_fetch_bytes
has already been called once, we know that fc->end is at least 15.  So
it is okay to subtract the number of bytes we want to read.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: speed up do_insn_fetch
Paolo Bonzini [Tue, 6 May 2014 10:24:32 +0000 (12:24 +0200)]
KVM: emulate: speed up do_insn_fetch

Hoist the common case up from do_insn_fetch_byte to do_insn_fetch,
and prime the fetch_cache in x86_decode_insn.  This helps a bit the
compiler and the branch predictor, but above all it lays the
ground for further changes in the next few patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: do not initialize memopp
Bandan Das [Wed, 16 Apr 2014 16:46:14 +0000 (12:46 -0400)]
KVM: emulate: do not initialize memopp

rip_relative is only set if decode_modrm runs, and if you have ModRM
you will also have a memopp.  We can then access memopp unconditionally.
Note that rip_relative cannot be hoisted up to decode_modrm, or you
break "mov $0, xyz(%rip)".

Also, move typecast on "out of range value" of mem.ea to decode_modrm.

Together, all these optimizations save about 50 cycles on each emulated
instructions (4-6%).

Signed-off-by: Bandan Das <bsd@redhat.com>
[Fix immediate operands with rip-relative addressing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: rework seg_override
Bandan Das [Wed, 16 Apr 2014 16:46:13 +0000 (12:46 -0400)]
KVM: emulate: rework seg_override

x86_decode_insn already sets a default for seg_override,
so remove it from the zeroed area. Also replace set/get functions
with direct access to the field.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: clean up initializations in init_decode_cache
Bandan Das [Wed, 16 Apr 2014 16:46:12 +0000 (12:46 -0400)]
KVM: emulate: clean up initializations in init_decode_cache

A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Optimize
placement of fields in x86_emulate_ctxt

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: cleanup decode_modrm
Bandan Das [Wed, 16 Apr 2014 16:46:11 +0000 (12:46 -0400)]
KVM: emulate: cleanup decode_modrm

Remove the if conditional - that will help us avoid
an "else initialize to 0" Also, rearrange operators
for slightly better code.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks
Bandan Das [Wed, 16 Apr 2014 16:46:10 +0000 (12:46 -0400)]
KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks

The same information can be gleaned from ctxt->d and avoids having
to zero/NULL initialize intercept and check_perm

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: move init_decode_cache to emulate.c
Bandan Das [Wed, 16 Apr 2014 16:46:09 +0000 (12:46 -0400)]
KVM: emulate: move init_decode_cache to emulate.c

Core emulator functions all belong in emulator.c,
x86 should have no knowledge of emulator internals

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: simplify writeback
Paolo Bonzini [Tue, 1 Apr 2014 11:23:24 +0000 (13:23 +0200)]
KVM: emulate: simplify writeback

The "if/return" checks are useless, because we return X86EMUL_CONTINUE
anyway if we do not return.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: speed up emulated moves
Paolo Bonzini [Thu, 27 Mar 2014 10:36:25 +0000 (11:36 +0100)]
KVM: emulate: speed up emulated moves

We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about
200 cycles on kvm-unit-tests for register-to-register moves, and makes
them about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: protect checks on ctxt->d by a common "if (unlikely())"
Paolo Bonzini [Thu, 27 Mar 2014 10:58:02 +0000 (11:58 +0100)]
KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())"

There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn.  Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
Make this more effective by adding two more flag bits that say whether the
.intercept and .check_perm fields are valid.  We will reuse these
flags later to avoid initializing fields of the emulate_ctxt struct.

This skims about 30 cycles for each emulated instructions, which is
approximately a 3% improvement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: move around some checks
Paolo Bonzini [Thu, 27 Mar 2014 11:00:57 +0000 (12:00 +0100)]
KVM: emulate: move around some checks

The only purpose of this patch is to make the next patch simpler
to review.  No semantic change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
Paolo Bonzini [Thu, 27 Mar 2014 10:29:28 +0000 (11:29 +0100)]
KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation

Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting handle_invalid_guest_state,
because x86_emulate_instruction always sets KVM_REQ_EVENT.

However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (other instructions
than STI can set it without enabling an interrupt shadow).

This cuts another 700-900 cycles from the cost of emulating an
instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles
before the patch on kvm-unit-tests, 925-1700 afterwards).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: return all bits from get_interrupt_shadow
Paolo Bonzini [Tue, 20 May 2014 12:29:47 +0000 (14:29 +0200)]
KVM: x86: return all bits from get_interrupt_shadow

For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: vmx: speed up emulation of invalid guest state
Paolo Bonzini [Thu, 27 Mar 2014 08:51:52 +0000 (09:51 +0100)]
KVM: vmx: speed up emulation of invalid guest state

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 1650-2600 cycles
for common instructions, versus 2300-3200 before this patch on
a Sandy Bridge Xeon.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: svm: writes to MSR_K7_HWCR generates GPE in guest
Matthias Lange [Thu, 26 Jun 2014 11:50:15 +0000 (13:50 +0200)]
KVM: svm: writes to MSR_K7_HWCR generates GPE in guest

Since commit 575203 the MCE subsystem in the Linux kernel for AMD sets bit 18
in MSR_K7_HWCR. Running such a kernel as a guest in KVM on an AMD host results
in a GPE injected into the guest because kvm_set_msr_common returns 1. This
patch fixes this by masking bit 18 from the MSR value desired by the guest.

Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Pending interrupt may be delivered after INIT
Nadav Amit [Mon, 30 Jun 2014 09:03:02 +0000 (12:03 +0300)]
KVM: x86: Pending interrupt may be delivered after INIT

We encountered a scenario in which after an INIT is delivered, a pending
interrupt is delivered, although it was sent before the INIT.  As the SDM
states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as
KVM does.  This also means that pending interrupts should be cleared.  This
patch clears upon reset (and INIT) the pending interrupts; and at the same
occassion clears the pending exceptions, since they may cause a similar issue.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Synthesize G bit for all segments.
Jim Mattson [Tue, 8 Jul 2014 04:17:41 +0000 (09:47 +0530)]
KVM: Synthesize G bit for all segments.

We have noticed that qemu-kvm hangs early in the BIOS when runnning nested
under some versions of VMware ESXi.

The problem we believe is because KVM assumes that the platform preserves
the 'G' but for any segment register. The SVM specification itemizes the
segment attribute bits that are observed by the CPU, but the (G)ranularity bit
is not one of the bits itemized, for any segment. Though current AMD CPUs keep
track of the (G)ranularity bit for all segment registers other than CS, the
specification does not require it. VMware's virtual CPU may not track the
(G)ranularity bit for any segment register.

Since kvm already synthesizes the (G)ranularity bit for the CS segment. It
should do so for all segments. The patch below does that, and helps get rid of
the hangs. Patch applies on top of Linus' tree.

Signed-off-by: Jim Mattson <jmattson@vmware.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control
David Hildenbrand [Thu, 10 Apr 2014 15:35:00 +0000 (17:35 +0200)]
KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control

This patch
- adds s390 specific MP states to linux headers and documents them
- implements the KVM_{SET,GET}_MP_STATE ioctls
- enables KVM_CAP_MP_STATE
- allows user space to control the VCPU state on s390.

If user space sets the VCPU state using the ioctl KVM_SET_MP_STATE, we can disable
manual changing of the VCPU state and trust user space to do the right thing.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: prepare for KVM_(S|G)ET_MP_STATE on other architectures
David Hildenbrand [Mon, 12 May 2014 14:05:13 +0000 (16:05 +0200)]
KVM: prepare for KVM_(S|G)ET_MP_STATE on other architectures

Highlight the aspects of the ioctls that are actually specific to x86
and ia64. As defined restrictions (irqchip) and mp states may not apply
to other architectures, these parts are flagged to belong to x86 and ia64.

In preparation for the use of KVM_(S|G)ET_MP_STATE by s390.
Fix a spelling error (KVM_SET_MP_STATE vs. KVM_SET_MPSTATE) on the way.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: remove __cpu_is_stopped and expose is_vcpu_stopped
David Hildenbrand [Mon, 5 May 2014 14:26:19 +0000 (16:26 +0200)]
KVM: s390: remove __cpu_is_stopped and expose is_vcpu_stopped

The function "__cpu_is_stopped" is not used any more. Let's remove it and
expose the function "is_vcpu_stopped" instead, which is actually what we want.

This patch also converts an open coded check for CPUSTAT_STOPPED to
is_vcpu_stopped().

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: move finalization of SIGP STOP orders to kvm_s390_vcpu_stop
David Hildenbrand [Mon, 14 Apr 2014 10:40:03 +0000 (12:40 +0200)]
KVM: s390: move finalization of SIGP STOP orders to kvm_s390_vcpu_stop

Let's move the finalization of SIGP STOP and SIGP STOP AND STORE STATUS orders to
the point where the VCPU is actually stopped.

This change is needed to prepare for a user space driven VCPU state change. The
action_bits may only be cleared when setting the cpu state to STOPPED while
holding the local irq lock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time
David Hildenbrand [Thu, 15 May 2014 12:25:25 +0000 (14:25 +0200)]
KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time

A SIGP STOP (AND STORE STATUS) order is complete as soon as the VCPU has been
stopped. This patch makes sure that only one SIGP STOP (AND STORE STATUS) may
be pending at a time (as defined by the architecture). If the action_bits are
still set, a SIGP STOP has been issued but not completed yet. The VCPU is busy
for further SIGP STOP orders.

Also set the CPUSTAT_STOP_INT after the action_bits variable has been modified
(the same order that is used when injecting a KVM_S390_SIGP_STOP from
userspace).

Both changes are needed in preparation for a user space driven VCPU state change
(to avoid race conditions).

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
9 years agoKVM: MIPS: Document MIPS specifics of KVM API.
James Hogan [Fri, 4 Jul 2014 14:11:35 +0000 (15:11 +0100)]
KVM: MIPS: Document MIPS specifics of KVM API.

Document the MIPS specific parts of the KVM API, including:
 - The layout of the kvm_regs structure.
 - The interrupt number passed to KVM_INTERRUPT.
 - The registers supported by the KVM_{GET,SET}_ONE_REG interface, and
   the encoding of those register ids.
 - That KVM_INTERRUPT and KVM_GET_REG_LIST are supported on MIPS.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Reformat KVM_SET_ONE_REG register documentation
James Hogan [Fri, 4 Jul 2014 14:11:34 +0000 (15:11 +0100)]
KVM: Reformat KVM_SET_ONE_REG register documentation

Some of the MIPS registers that can be accessed with the
KVM_{GET,SET}_ONE_REG interface have fairly long names, so widen the
Register column of the table in the KVM_SET_ONE_REG documentation to
allow them to fit.

Tabs in the table are replaced with spaces at the same time for
consistency.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: Document KVM_SET_SIGNAL_MASK as universal
James Hogan [Fri, 4 Jul 2014 14:11:33 +0000 (15:11 +0100)]
KVM: Document KVM_SET_SIGNAL_MASK as universal

KVM_SET_SIGNAL_MASK is implemented in generic code and isn't x86
specific, so document it as being applicable for all architectures.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Fix lapic.c debug prints
Nadav Amit [Sun, 29 Jun 2014 09:28:51 +0000 (12:28 +0300)]
KVM: x86: Fix lapic.c debug prints

In two cases lapic.c does not use the apic_debug macro correctly. This patch
fixes them.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: fix TSC matching
Tomasz Grabiec [Tue, 24 Jun 2014 07:42:43 +0000 (09:42 +0200)]
KVM: x86: fix TSC matching

I've observed kvmclock being marked as unstable on a modern
single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.

The culprit was failure in TSC matching because of overflow of
kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
in a single synchronization cycle.

Turns out that qemu does multiple TSC writes during init, below is the
evidence of that (qemu-2.0.0):

The first one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]

The second one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
 #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
 #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
 #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390

The third one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]

 #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
 #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
 #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
 #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
 #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
 #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
 #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482

The fix is to count each vCPU only once when matched, so that
nr_vcpus_matched_tsc holds the size of the matched set. This is
achieved by reusing generation counters. Every vCPU with
this_tsc_generation == cur_tsc_generation is in the matched set. The
match set is cleared by setting cur_tsc_generation to a value which no
other vCPU is set to (by incrementing it).

I needed to bump up the counter size form u8 to u64 to ensure it never
overflows. Otherwise in cases TSC is not written the same number of
times on each vCPU the counter could overflow and incorrectly indicate
some vCPUs as being in the matched set. This scenario seems unlikely
but I'm not sure if it can be disregarded.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nSVM: Set correct port for IOIO interception evaluation
Jan Kiszka [Mon, 30 Jun 2014 10:52:55 +0000 (12:52 +0200)]
KVM: nSVM: Set correct port for IOIO interception evaluation

Obtaining the port number from DX is bogus as a) there are immediate
port accesses and b) user space may have changed the register content
while processing the PIO access. Forward the correct value from the
instruction emulator instead.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nSVM: Fix IOIO size reported on emulation
Jan Kiszka [Mon, 30 Jun 2014 09:07:05 +0000 (11:07 +0200)]
KVM: nSVM: Fix IOIO size reported on emulation

The access size of an in/ins is reported in dst_bytes, and that of
out/outs in src_bytes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nSVM: Fix IOIO bitmap evaluation
Jan Kiszka [Mon, 30 Jun 2014 08:54:17 +0000 (10:54 +0200)]
KVM: nSVM: Fix IOIO bitmap evaluation

First, kvm_read_guest returns 0 on success. And then we need to take the
access size into account when testing the bitmap: intercept if any of
bits corresponding to the access is set.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1
Jan Kiszka [Sun, 29 Jun 2014 19:55:53 +0000 (21:55 +0200)]
KVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1

CLTS only changes TS which is not monitored by selected CR0
interception. So skip any attempt to translate WRITE_CR0 to
CR0_SEL_WRITE for this instruction.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoarch: x86: kvm: x86.c: Cleaning up variable is set more than once
Rickard Strandqvist [Wed, 25 Jun 2014 12:25:58 +0000 (14:25 +0200)]
arch: x86: kvm: x86.c: Cleaning up variable is set more than once

A struct member variable is set to the same value more than once

This was found using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Remove dead code of TLB index error in kvm_mips_emul_tlbwr()
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:40 +0000 (12:11 -0700)]
MIPS: KVM: Remove dead code of TLB index error in kvm_mips_emul_tlbwr()

It's impossible to fall into the error handling of the TLB index after
being masked by (KVM_MIPS_GUEST_TLB_SIZE - 1). Remove the dead code.

Reported-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Skip memory cleaning in kvm_mips_commpage_init()
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:39 +0000 (12:11 -0700)]
MIPS: KVM: Skip memory cleaning in kvm_mips_commpage_init()

The commpage is allocated using kzalloc(), so there's no need of cleaning
the memory of the kvm_mips_commpage struct and its internal mips_coproc.

Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Rename files to remove the prefix "kvm_" and "kvm_mips_"
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:38 +0000 (12:11 -0700)]
MIPS: KVM: Rename files to remove the prefix "kvm_" and "kvm_mips_"

Since all the files are in arch/mips/kvm/, there's no need of the prefixes
"kvm_" and "kvm_mips_".

Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Remove unneeded volatile
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:37 +0000 (12:11 -0700)]
MIPS: KVM: Remove unneeded volatile

The keyword volatile for idx in the TLB functions is unnecessary.

Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Simplify functions by removing redundancy
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:36 +0000 (12:11 -0700)]
MIPS: KVM: Simplify functions by removing redundancy

No logic changes inside.

Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Use KVM internal logger
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:35 +0000 (12:11 -0700)]
MIPS: KVM: Use KVM internal logger

Replace printks with kvm_[err|info|debug].

Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMIPS: KVM: Reformat code and comments
Deng-Cheng Zhu [Thu, 26 Jun 2014 19:11:34 +0000 (12:11 -0700)]
MIPS: KVM: Reformat code and comments

No logic changes inside.

Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoMerge tag 'kvms390-20140626' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390...
Paolo Bonzini [Mon, 30 Jun 2014 14:51:47 +0000 (16:51 +0200)]
Merge tag 'kvms390-20140626' of git://git./linux/kernel/git/kvms390/linux into HEAD

Fix sie.h header related problems introduced during the 3.16 development
cycle.

9 years agoMerge commit '33b458d276bb' into kvm-next
Paolo Bonzini [Mon, 30 Jun 2014 14:51:07 +0000 (16:51 +0200)]
Merge commit '33b458d276bb' into kvm-next

Fix bad x86 regression introduced during merge window.

9 years agoKVM: SVM: Fix CPL export via SS.DPL
Jan Kiszka [Sun, 29 Jun 2014 15:12:43 +0000 (17:12 +0200)]
KVM: SVM: Fix CPL export via SS.DPL

We import the CPL via SS.DPL since ae9fedc793. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: s390: add sie.h uapi header file to Kbuild and remove header dependency
Heiko Carstens [Thu, 5 Jun 2014 11:22:49 +0000 (13:22 +0200)]
KVM: s390: add sie.h uapi header file to Kbuild and remove header dependency

sie.h was missing in arch/s390/include/uapi/asm/Kbuild and therefore missed
the "make headers_check" target.
If added it reveals that also arch/s390/include/asm/sigp.h would become uapi.
This is something we certainly do not want. So remove that dependency as well.

The header file was merged with ceae283bb2e0176c "KVM: s390: add sie exit
reasons tables", therefore we never had a kernel release with this commit and
can still change anything.

Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
9 years agoKVM: vmx: vmx instructions handling does not consider cs.l
Nadav Amit [Wed, 18 Jun 2014 14:19:26 +0000 (17:19 +0300)]
KVM: vmx: vmx instructions handling does not consider cs.l

VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in
64-bit mode.  The current implementation is broken since it does not use the
register operands correctly, and always uses 64-bit for reads and writes.
Moreover, write to memory in vmwrite only considers long-mode, so it ignores
cs.l. This patch fixes this behavior.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: vmx: handle_cr ignores 32/64-bit mode
Nadav Amit [Wed, 18 Jun 2014 14:19:25 +0000 (17:19 +0300)]
KVM: vmx: handle_cr ignores 32/64-bit mode

On 32-bit mode only bits [31:0] of the CR should be used for setting the CR
value.  Otherwise, the host may incorrectly assume the value is invalid if bits
[63:32] are not zero.  Moreover, the CR is currently being read twice when CR8
is used.  Last, nested mov-cr exiting is modified to handle the CR value
correctly as well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Hypercall handling does not considers opsize correctly
Nadav Amit [Wed, 18 Jun 2014 14:19:24 +0000 (17:19 +0300)]
KVM: x86: Hypercall handling does not considers opsize correctly

Currently, the hypercall handling routine only considers LME as an indication
to whether the guest uses 32/64-bit mode. This is incosistent with hyperv
hypercalls handling and against the common sense of considering cs.l as well.
This patch uses is_64_bit_mode instead of is_long_mode for that matter. In
addition, the result is masked in respect to the guest execution mode. Last, it
changes kvm_hv_hypercall to use is_64_bit_mode as well to simplify the code.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: check DR6/7 high-bits are clear only on long-mode
Nadav Amit [Wed, 18 Jun 2014 14:19:23 +0000 (17:19 +0300)]
KVM: x86: check DR6/7 high-bits are clear only on long-mode

When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and
otherwise injects a #GP exception. This exception should only be injected only
if running in long-mode.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: Fix returned value of MSR_IA32_VMX_VMCS_ENUM
Jan Kiszka [Mon, 16 Jun 2014 11:59:44 +0000 (13:59 +0200)]
KVM: nVMX: Fix returned value of MSR_IA32_VMX_VMCS_ENUM

Many real CPUs get this wrong as well, but ours is totally off: bits 9:1
define the highest index value.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: Allow to disable VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS
Jan Kiszka [Mon, 16 Jun 2014 11:59:43 +0000 (13:59 +0200)]
KVM: nVMX: Allow to disable VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS

Allow L1 to "leak" its debug controls into L2, i.e. permit cleared
VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS. This requires to manually
transfer the state of DR7 and IA32_DEBUGCTLMSR from L1 into L2 as both
run on different VMCS.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLS
Jan Kiszka [Mon, 16 Jun 2014 11:59:42 +0000 (13:59 +0200)]
KVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLS

SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: Allow to disable CR3 access interception
Jan Kiszka [Mon, 16 Jun 2014 11:59:41 +0000 (13:59 +0200)]
KVM: nVMX: Allow to disable CR3 access interception

We already have this control enabled by exposing a broken
MSR_IA32_VMX_PROCBASED_CTLS value. This will properly advertise our
capability once the value is fixed by clearing the right bits in
MSR_IA32_VMX_TRUE_PROCBASED_CTLS. We also have to ensure to test the
right value on L2 entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLS
Jan Kiszka [Mon, 16 Jun 2014 11:59:40 +0000 (13:59 +0200)]
KVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLS

We already implemented them but failed to advertise them. Currently they
all return the identical values to the capability MSRs they are
augmenting. So there is no change in exposed features yet.

Drop related comments at this chance that are partially incorrect and
redundant anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Fix constant value of VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS
Jan Kiszka [Thu, 12 Jun 2014 17:40:32 +0000 (19:40 +0200)]
KVM: x86: Fix constant value of VM_{EXIT_SAVE,ENTRY_LOAD}_DEBUG_CONTROLS

The spec says those controls are at bit position 2 - makes 4 as value.

The impact of this mistake is effectively zero as we only use them to
ensure that these features are set at position 2 (or, previously, 1) in
MSR_IA32_VMX_{EXIT,ENTRY}_CTLS - which is and will be always true
according to the spec.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX
Nadav Amit [Sun, 15 Jun 2014 13:13:01 +0000 (16:13 +0300)]
KVM: x86: NOP emulation clears (incorrectly) the high 32-bits of RAX

On long-mode the current NOP (0x90) emulation still writes back to RAX.  As a
result, EAX is zero-extended and the high 32-bits of RAX are cleared.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: emulation of dword cmov on long-mode should clear [63:32]
Nadav Amit [Sun, 15 Jun 2014 13:13:00 +0000 (16:13 +0300)]
KVM: x86: emulation of dword cmov on long-mode should clear [63:32]

Even if the condition of cmov is not satisfied, bits[63:32] should be cleared.
This is clearly stated in Intel's CMOVcc documentation.  The solution is to
reassign the destination onto itself if the condition is unsatisfied.  For that
matter the original destination value needs to be read.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Inter-privilege level ret emulation is not implemeneted
Nadav Amit [Sun, 15 Jun 2014 13:12:59 +0000 (16:12 +0300)]
KVM: x86: Inter-privilege level ret emulation is not implemeneted

Return unhandlable error on inter-privilege level ret instruction.  This is
since the current emulation does not check the privilege level correctly when
loading the CS, and does not pop RSP/SS as needed.

Cc: stable@vger.kernel.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: Wrong emulation on 'xadd X, X'
Nadav Amit [Sun, 15 Jun 2014 13:12:58 +0000 (16:12 +0300)]
KVM: x86: Wrong emulation on 'xadd X, X'

The emulator does not emulate the xadd instruction correctly if the two
operands are the same.  In this (unlikely) situation the result should be the
sum of X and X (2X) when it is currently X.  The solution is to first perform
writeback to the source, before writing to the destination.  The only
instruction which should be affected is xadd, as the other instructions that
perform writeback to the source use the extended accumlator (e.g., RAX:RDX).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: x86: bit-ops emulation ignores offset on 64-bit
Nadav Amit [Sun, 15 Jun 2014 13:12:57 +0000 (16:12 +0300)]
KVM: x86: bit-ops emulation ignores offset on 64-bit

The current emulation of bit operations ignores the offset from the destination
on 64-bit target memory operands. This patch fixes this behavior.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoarch/x86/kvm/vmx.c: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE
Fabian Frederick [Sat, 14 Jun 2014 21:44:29 +0000 (23:44 +0200)]
arch/x86/kvm/vmx.c: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE

use mm.h definition

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9 years agoKVM: emulate: fix harmless typo in MMX decoding
Paolo Bonzini [Tue, 6 May 2014 12:03:29 +0000 (14:03 +0200)]
KVM: emulate: fix harmless typo in MMX decoding

It was using the wrong member of the union.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>