cxl: Use call_rcu to reduce latency when releasing the afu fd
authorIan Munsie <imunsie@au1.ibm.com>
Fri, 8 May 2015 12:55:18 +0000 (22:55 +1000)
committerMichael Ellerman <mpe@ellerman.id.au>
Wed, 3 Jun 2015 03:27:15 +0000 (13:27 +1000)
commit8ac75b96be71e20ec1785ca18170890c4dfffe87
tree8b06520ebf5859bf62fee63e53b2a0581728ec6e
parente36f6fe1f7aa4238478d4b253aac7d3fcfff6ee0
cxl: Use call_rcu to reduce latency when releasing the afu fd

The afu fd release path was identified as a significant bottleneck in
the overall performance of cxl. While an optimal AFU design would
minimise the need to close & reopen the AFU fd, it is not always
practical to avoid.

The bottleneck seems to be down to the call to synchronize_rcu(), which
will block until every other thread is guaranteed to be out of an RCU
critical section. Replace it with call_rcu() to free the context
structures later so we can return to the application sooner.

This reduces the time spent in the fd release path from 13356 usec to
13.3 usec - about a 100x speed up.

Reported-by: Fei K Chen <uchen@cn.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
drivers/misc/cxl/context.c
drivers/misc/cxl/cxl.h