Revert "USB: don't explicitly reenable root-hub status interrupts"

[pandora-kernel.git] / Documentation / cpusets.txt
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt

index ad2bb3b..1f5a924 100644 (file)
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -8,6 +8,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
  Modified by Paul Jackson <pj@sgi.com>
  Modified by Christoph Lameter <clameter@sgi.com>
  Modified by Paul Menage <menage@google.com>
+Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
  
  CONTENTS:
  =========
@@ -20,7 +21,8 @@ CONTENTS:
    1.5 What is memory_pressure ?
    1.6 What is memory spread ?
    1.7 What is sched_load_balance ?
-  1.8 How do I use cpusets ?
+  1.8 What is sched_relax_domain_level ?
+  1.9 How do I use cpusets ?
  2. Usage Examples and Syntax
    2.1 Basic Usage
    2.2 Adding/removing cpus
@@ -152,13 +154,15 @@ browsing and modifying the cpusets presently known to the kernel.  No
  new system calls are added for cpusets - all support for querying and
  modifying cpusets is via this cpuset file system.
  
-The /proc/<pid>/status file for each task has two added lines,
+The /proc/<pid>/status file for each task has four added lines,
  displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
  and mems_allowed (on which Memory Nodes it may obtain memory),
-in the format seen in the following example:
+in the two formats seen in the following example:
  
    Cpus_allowed:   ffffffff,ffffffff,ffffffff,ffffffff
+  Cpus_allowed_list:      0-127
    Mems_allowed:   ffffffff,ffffffff
+  Mems_allowed_list:      0-63
  
  Each cpuset is represented by a directory in the cgroup file system
  containing (on top of the standard cgroup files) the following
@@ -169,6 +173,7 @@ files describing that cpuset:
   - memory_migrate flag: if set, move pages to cpusets nodes
   - cpu_exclusive flag: is cpu placement exclusive?
   - mem_exclusive flag: is memory placement exclusive?
+ - mem_hardwall flag:  is memory allocation hardwalled
   - memory_pressure: measure of how much paging pressure in cpuset
  
  In addition, the root cpuset only has the following file:
@@ -196,7 +201,7 @@ using the sched_setaffinity, mbind and set_mempolicy system calls.
  The following rules apply to each cpuset:
  
   - Its CPUs and Memory Nodes must be a subset of its parents.
- - It can only be marked exclusive if its parent is.
+ - It can't be marked exclusive unless its parent is.
   - If its cpu or memory is exclusive, they may not overlap any sibling.
  
  These rules, and the natural hierarchy of cpusets, enable efficient
@@ -220,17 +225,18 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than
  a direct ancestor or descendent, may share any of the same CPUs or
  Memory Nodes.
  
-A cpuset that is mem_exclusive restricts kernel allocations for
-page, buffer and other data commonly shared by the kernel across
-multiple users.  All cpusets, whether mem_exclusive or not, restrict
-allocations of memory for user space.  This enables configuring a
-system so that several independent jobs can share common kernel data,
-such as file system pages, while isolating each jobs user allocation in
-its own cpuset.  To do this, construct a large mem_exclusive cpuset to
-hold all the jobs, and construct child, non-mem_exclusive cpusets for
-each individual job.  Only a small amount of typical kernel memory,
-such as requests from interrupt handlers, is allowed to be taken
-outside even a mem_exclusive cpuset.
+A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
+i.e. it restricts kernel allocations for page, buffer and other data
+commonly shared by the kernel across multiple users.  All cpusets,
+whether hardwalled or not, restrict allocations of memory for user
+space.  This enables configuring a system so that several independent
+jobs can share common kernel data, such as file system pages, while
+isolating each job's user allocation in its own cpuset.  To do this,
+construct a large mem_exclusive cpuset to hold all the jobs, and
+construct child, non-mem_exclusive cpusets for each individual job.
+Only a small amount of typical kernel memory, such as requests from
+interrupt handlers, is allowed to be taken outside even a
+mem_exclusive cpuset.
  
  
  1.5 What is memory_pressure ?
@@ -341,7 +347,7 @@ is modified to perform an inline check for this PF_SPREAD_PAGE task
  flag, and if set, a call to a new routine cpuset_mem_spread_node()
  returns the node to prefer for the allocation.
  
-Similarly, setting 'memory_spread_cache' turns on the flag
+Similarly, setting 'memory_spread_slab' turns on the flag
  PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
  pages from the node returned by cpuset_mem_spread_node().
  
@@ -497,7 +503,76 @@ the cpuset code to update these sched domains, it compares the new
  partition requested with the current, and updates its sched domains,
  removing the old and adding the new, for each change.
  
-1.8 How do I use cpusets ?
+
+1.8 What is sched_relax_domain_level ?
+--------------------------------------
+
+In sched domain, the scheduler migrates tasks in 2 ways; periodic load
+balance on tick, and at time of some schedule events.
+
+When a task is woken up, scheduler try to move the task on idle CPU.
+For example, if a task A running on CPU X activates another task B
+on the same CPU X, and if CPU Y is X's sibling and performing idle,
+then scheduler migrate task B to CPU Y so that task B can start on
+CPU Y without waiting task A on CPU X.
+
+And if a CPU run out of tasks in its runqueue, the CPU try to pull
+extra tasks from other busy CPUs to help them before it is going to
+be idle.
+
+Of course it takes some searching cost to find movable tasks and/or
+idle CPUs, the scheduler might not search all CPUs in the domain
+everytime.  In fact, in some architectures, the searching ranges on
+events are limited in the same socket or node where the CPU locates,
+while the load balance on tick searchs all.
+
+For example, assume CPU Z is relatively far from CPU X.  Even if CPU Z
+is idle while CPU X and the siblings are busy, scheduler can't migrate
+woken task B from X to Z since it is out of its searching range.
+As the result, task B on CPU X need to wait task A or wait load balance
+on the next tick.  For some applications in special situation, waiting
+1 tick may be too long.
+
+The 'sched_relax_domain_level' file allows you to request changing
+this searching range as you like.  This file takes int value which
+indicates size of searching range in levels ideally as follows,
+otherwise initial value -1 that indicates the cpuset has no request.
+
+  -1  : no request. use system default or follow request of others.
+   0  : no search.
+   1  : search siblings (hyperthreads in a core).
+   2  : search cores in a package.
+   3  : search cpus in a node [= system wide on non-NUMA system]
+ ( 4  : search nodes in a chunk of node [on NUMA system] )
+ ( 5  : search system wide [on NUMA system] )
+
+The system default is architecture dependent.  The system default
+can be changed using the relax_domain_level= boot parameter.
+
+This file is per-cpuset and affect the sched domain where the cpuset
+belongs to.  Therefore if the flag 'sched_load_balance' of a cpuset
+is disabled, then 'sched_relax_domain_level' have no effect since
+there is no sched domain belonging the cpuset.
+
+If multiple cpusets are overlapping and hence they form a single sched
+domain, the largest value among those is used.  Be careful, if one
+requests 0 and others are -1 then 0 is used.
+
+Note that modifying this file will have both good and bad effects,
+and whether it is acceptable or not will be depend on your situation.
+Don't modify this file if you are not sure.
+
+If your situation is:
+ - The migration costs between each cpu can be assumed considerably
+   small(for you) due to your special application's behavior or
+   special hardware support for CPU cache etc.
+ - The searching cost doesn't have impact(for you) or you can make
+   the searching cost enough small by managing cpuset to compact etc.
+ - The latency is required even it sacrifices cache hit rate etc.
+then increasing 'sched_relax_domain_level' would benefit you.
+
+
+1.9 How do I use cpusets ?
  --------------------------
  
  In order to minimize the impact of cpusets on critical kernel
@@ -639,7 +714,10 @@ Now you want to do something with this cpuset.
  
  In this directory you can find several files:
  # ls
-cpus  cpu_exclusive  mems  mem_exclusive  tasks
+cpu_exclusive  memory_migrate      mems                      tasks
+cpus           memory_pressure     notify_on_release
+mem_exclusive  memory_spread_page  sched_load_balance
+mem_hardwall   memory_spread_slab  sched_relax_domain_level
  
  Reading them will give you information about the state of this cpuset:
  the CPUs and Memory Nodes it can use, the processes that are using