From: Peter Zijlstra Date: Wed, 8 Aug 2012 19:46:40 +0000 (+0200) Subject: sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies X-Git-Tag: v3.6-rc3~27^2~4 X-Git-Url: http://git.openpandora.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=a35b6466aabb051568b844e8c63f87a356d3d129;p=pandora-kernel.git sched, cgroup: Reduce rq->lock hold times for large cgroup hierarchies Peter Portante reported that for large cgroup hierarchies (and or on large CPU counts) we get immense lock contention on rq->lock and stuff stops working properly. His workload was a ton of processes, each in their own cgroup, everybody idling except for a sporadic wakeup once every so often. It was found that: schedule() idle_balance() load_balance() local_irq_save() double_rq_lock() update_h_load() walk_tg_tree(tg_load_down) tg_load_down() Results in an entire cgroup hierarchy walk under rq->lock for every new-idle balance and since new-idle balance isn't throttled this results in a lot of work while holding the rq->lock. This patch does two things, it removes the work from under rq->lock based on the good principle of race and pray which is widely employed in the load-balancer as a whole. And secondly it throttles the update_h_load() calculation to max once per jiffy. I considered excluding update_h_load() for new-idle balance all-together, but purely relying on regular balance passes to update this data might not work out under some rare circumstances where the new-idle busiest isn't the regular busiest for a while (unlikely, but a nightmare to debug if someone hits it and suffers). Cc: pjt@google.com Cc: Larry Woodman Cc: Mike Galbraith Reported-by: Peter Portante Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org Signed-off-by: Thomas Gleixner --- Reading git-diff-tree failed