Secondary CPUs already take care of the D-cache bits through the common
cache initialization path, and the only thing that is necessary after
twiddling around with stack_start is ensuring that the I-cache changes
are visible (particularly since this tends to be the only part lacking
coherency).