cpuops: Use cmpxchg for xchg to avoid lock semantics
authorChristoph Lameter <cl@linux.com>
Tue, 14 Dec 2010 16:28:47 +0000 (10:28 -0600)
committerTejun Heo <tj@kernel.org>
Sat, 18 Dec 2010 14:54:04 +0000 (15:54 +0100)
commit8270137a0d50507a5b40f880db636527045b8466
tree3490a31fcbea09ab5fffb6b2f4330dc92896f413
parent7296e08abac0a22a2534a4f6e493c764f2c77583
cpuops: Use cmpxchg for xchg to avoid lock semantics

Use cmpxchg instead of xchg to realize this_cpu_xchg.

xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
will not.

Baselines:

xchg() = 18 cycles (no segment prefix, LOCK semantics)
__this_cpu_xchg = 1 cycle

(simulated using this_cpu_read/write, two prefixes. Looks like the
cpu can use loop optimization to get rid of most of the overhead)

Cycles before:

this_cpu_xchg  = 37 cycles (segment prefix and LOCK (implied by xchg))

After:

this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics)

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
arch/x86/include/asm/percpu.h