x86, mce: Fix mce_start_timer semantics
authorBorislav Petkov <bp@suse.de>
Mon, 23 Dec 2013 17:05:02 +0000 (18:05 +0100)
committerBorislav Petkov <bp@suse.de>
Sun, 12 Jan 2014 14:22:25 +0000 (15:22 +0100)
So mce_start_timer() has a 'cpu' argument which is supposed to mean to
start a timer on that cpu. However, the code currently starts a timer on
the *current* cpu the function runs on and causes the sanity-check in
mce_timer_fn to fire:

WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn

because it is running on the wrong cpu.

This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining
all the cpus in succession.

Then, we were fiddling with the CMCI storm settings when starting the
timer whereas there's no need for that - if there's storm happening
on this newly restarted cpu, we're going to be in normal CMCI mode
initially and then when the CMCI interrupt starts firing, we're going to
go to the polling mode with the timer real soon.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
arch/x86/kernel/cpu/mcheck/mce.c

index a389c1d..4d5419b 100644 (file)
@@ -1638,15 +1638,15 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 
 static void mce_start_timer(unsigned int cpu, struct timer_list *t)
 {
-       unsigned long iv = mce_adjust_timer(check_interval * HZ);
-
-       __this_cpu_write(mce_next_interval, iv);
+       unsigned long iv = check_interval * HZ;
 
        if (mca_cfg.ignore_ce || !iv)
                return;
 
+       per_cpu(mce_next_interval, cpu) = iv;
+
        t->expires = round_jiffies(jiffies + iv);
-       add_timer_on(t, smp_processor_id());
+       add_timer_on(t, cpu);
 }
 
 static void __mcheck_cpu_init_timer(void)