From f75ca962037ffd639a44fd88933cd9b84c4c4411 Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Tue, 10 Aug 2010 18:03:02 -0700 Subject: [PATCH] memcg: avoid css_get() Now, memory cgroup increments css(cgroup subsys state)'s reference count per a charged page. And the reference count is kept until the page is uncharged. But this has 2 bad effect. 1. Because css_get/put calls atomic_inc()/dec, heavy call of them on large smp will not scale well. 2. Because css's refcnt cannot be in a state as "ready-to-release", cgroup's notify_on_release handler can't work with memcg. 3. css's refcnt is atomic_t, it means smaller than 32bit. Maybe too small. This has been a problem since the 1st merge of memcg. This is a trial to remove css's refcnt per a page. Even if we remove refcnt, pre_destroy() does enough synchronization as - check res->usage == 0. - check no pages on LRU. This patch removes css's refcnt per page. Even after this patch, at the 1st look, it seems css_get() is still called in try_charge(). But the logic is. - If a memcg of mm->owner is cached one, consume_stock() will work. At success, return immediately. - If consume_stock returns false, css_get() is called and go to slow path which may be blocked. At the end of slow path, css_put() is called and restart from the start if necessary. So, in the fast path, we don't call css_get() and can avoid access to shared counter. This patch can make the most possible case fast. Here is a result of multi-threaded page fault benchmark. [Before] 25.32% multi-fault-all [kernel.kallsyms] [k] clear_page_c 9.30% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave 8.02% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm <=====(*) 7.83% multi-fault-all [kernel.kallsyms] [k] down_read_trylock 5.38% multi-fault-all [kernel.kallsyms] [k] __css_put 5.29% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask 4.92% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq 4.24% multi-fault-all [kernel.kallsyms] [k] up_read 3.53% multi-fault-all [kernel.kallsyms] [k] css_put 2.11% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault 1.76% multi-fault-all [kernel.kallsyms] [k] __rmqueue 1.64% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge [After] 28.41% multi-fault-all [kernel.kallsyms] [k] clear_page_c 10.08% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq 9.58% multi-fault-all [kernel.kallsyms] [k] down_read_trylock 9.38% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave 5.86% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask 5.65% multi-fault-all [kernel.kallsyms] [k] up_read 2.82% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault 2.64% multi-fault-all [kernel.kallsyms] [k] mem_cgroup_add_lru_list 2.48% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge Then, 8.02% of try_get_mem_cgroup_from_mm() disappears because this patch removes css_tryget() in it. (But yes, this is an extreme case.) Signed-off-by: Daisuke Nishimura Signed-off-by: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Reading git-format-patch failed