Taking the spinlock for every iteration is very expensive; instead,
batch iterations up into 4K blocks, releasing and reacquiring the
spinlock between each block.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com>