summary |
shortlog |
log |
commit | commitdiff |
tree
raw |
patch |
inline | side by side (from parent 1:
4fce5e6)
commit
2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.
When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.
Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.
Here's the full report:
https://lkml.org/lkml/2014/9/19/230
To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes:
950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.4:
- adjust context
- check current->flags & PF_MEMPOLICY rather than current->mempolicy]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The implementation is simple.
Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
The implementation is simple.
Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
-PF_SPREAD_PAGE for each task that is in that cpuset or subsequently
+PFA_SPREAD_PAGE for each task that is in that cpuset or subsequently
joins that cpuset. The page allocation calls for the page cache
joins that cpuset. The page allocation calls for the page cache
-is modified to perform an inline check for this PF_SPREAD_PAGE task
+is modified to perform an inline check for this PFA_SPREAD_PAGE task
flag, and if set, a call to a new routine cpuset_mem_spread_node()
returns the node to prefer for the allocation.
Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
flag, and if set, a call to a new routine cpuset_mem_spread_node()
returns the node to prefer for the allocation.
Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
-PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
+PFA_SPREAD_SLAB, and appropriately marked slab caches will allocate
pages from the node returned by cpuset_mem_spread_node().
The cpuset_mem_spread_node() routine is also simple. It uses the
pages from the node returned by cpuset_mem_spread_node().
The cpuset_mem_spread_node() routine is also simple. It uses the
static inline int cpuset_do_page_mem_spread(void)
{
static inline int cpuset_do_page_mem_spread(void)
{
- return current->flags & PF_SPREAD_PAGE;
+ return task_spread_page(current);
}
static inline int cpuset_do_slab_mem_spread(void)
{
}
static inline int cpuset_do_slab_mem_spread(void)
{
- return current->flags & PF_SPREAD_SLAB;
+ return task_spread_slab(current);
}
extern int current_cpuset_is_being_rebound(void);
}
extern int current_cpuset_is_being_rebound(void);
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
-#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */
-#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
#define used_math() tsk_used_math(current)
/* Per-process atomic flags. */
#define used_math() tsk_used_math(current)
/* Per-process atomic flags. */
+#define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */
+#define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */
#define TASK_PFA_TEST(name, func) \
static inline bool task_##func(struct task_struct *p) \
#define TASK_PFA_TEST(name, func) \
static inline bool task_##func(struct task_struct *p) \
+TASK_PFA_TEST(SPREAD_PAGE, spread_page)
+TASK_PFA_SET(SPREAD_PAGE, spread_page)
+TASK_PFA_CLEAR(SPREAD_PAGE, spread_page)
+
+TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
+TASK_PFA_SET(SPREAD_SLAB, spread_slab)
+TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
+
/*
* Do not use outside of architecture code which knows its limitations.
*
/*
* Do not use outside of architecture code which knows its limitations.
*
struct task_struct *tsk)
{
if (is_spread_page(cs))
struct task_struct *tsk)
{
if (is_spread_page(cs))
- tsk->flags |= PF_SPREAD_PAGE;
+ task_set_spread_page(tsk);
- tsk->flags &= ~PF_SPREAD_PAGE;
+ task_clear_spread_page(tsk);
+
- tsk->flags |= PF_SPREAD_SLAB;
+ task_set_spread_slab(tsk);
- tsk->flags &= ~PF_SPREAD_SLAB;
+ task_clear_spread_slab(tsk);
- * Try allocating on another node if PF_SPREAD_SLAB|PF_MEMPOLICY.
+ * Try allocating on another node if PFA_SPREAD_SLAB|PF_MEMPOLICY.
*
* If we are in_interrupt, then process context, including cpusets and
* mempolicy, may not apply and should not be used for allocation policy.
*
* If we are in_interrupt, then process context, including cpusets and
* mempolicy, may not apply and should not be used for allocation policy.
- if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
+ if (unlikely((current->flags & PF_MEMPOLICY) || cpuset_do_slab_mem_spread())) {
objp = alternate_node_alloc(cache, flags);
if (objp)
goto out;
objp = alternate_node_alloc(cache, flags);
if (objp)
goto out;