On a dual issue processor GCC generates code that saves a couple of
clock cycles per loop if we rearrange things slightly. Checking for
p != end saves a SLTU per loop, moving the increment to the middle can
let it dual issue on multi-issue processors.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4249/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>