Documentation/trace/events-kmem.txt

   1                         Subsystem Trace Points: kmem
   2
   3 The kmem tracing system captures events related to object and page allocation
   4 within the kernel. Broadly speaking there are five major subheadings.
   5
   6   o Slab allocation of small objects of unknown type (kmalloc)
   7   o Slab allocation of small objects of known type
   8   o Page allocation
   9   o Per-CPU Allocator Activity
  10   o External Fragmentation
  11
  12 This document describes what each of the tracepoints is and why they
  13 might be useful.
  14
  15 1. Slab allocation of small objects of unknown type
  16 ===================================================
  17 kmalloc         call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
  18 kmalloc_node    call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
  19 kfree           call_site=%lx ptr=%p
  20
  21 Heavy activity for these events may indicate that a specific cache is
  22 justified, particularly if kmalloc slab pages are getting significantly
  23 internal fragmented as a result of the allocation pattern. By correlating
  24 kmalloc with kfree, it may be possible to identify memory leaks and where
  25 the allocation sites were.
  26
  27
  28 2. Slab allocation of small objects of known type
  29 =================================================
  30 kmem_cache_alloc        call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
  31 kmem_cache_alloc_node   call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
  32 kmem_cache_free         call_site=%lx ptr=%p
  33
  34 These events are similar in usage to the kmalloc-related events except that
  35 it is likely easier to pin the event down to a specific cache. At the time
  36 of writing, no information is available on what slab is being allocated from,
  37 but the call_site can usually be used to extrapolate that information.
  38
  39 3. Page allocation
  40 ==================
  41 mm_page_alloc             page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
  42 mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
  43 mm_page_free_direct       page=%p pfn=%lu order=%d
  44 mm_pagevec_free           page=%p pfn=%lu order=%d cold=%d
  45
  46 These four events deal with page allocation and freeing. mm_page_alloc is
  47 a simple indicator of page allocator activity. Pages may be allocated from
  48 the per-CPU allocator (high performance) or the buddy allocator.
  49
  50 If pages are allocated directly from the buddy allocator, the
  51 mm_page_alloc_zone_locked event is triggered. This event is important as high
  52 amounts of activity imply high activity on the zone->lock. Taking this lock
  53 impairs performance by disabling interrupts, dirtying cache lines between
  54 CPUs and serialising many CPUs.
  55
  56 When a page is freed directly by the caller, the mm_page_free_direct event
  57 is triggered. Significant amounts of activity here could indicate that the
  58 callers should be batching their activities.
  59
  60 When pages are freed using a pagevec, the mm_pagevec_free is
  61 triggered. Broadly speaking, pages are taken off the LRU lock in bulk and
  62 freed in batch with a pagevec. Significant amounts of activity here could
  63 indicate that the system is under memory pressure and can also indicate
  64 contention on the zone->lru_lock.
  65
  66 4. Per-CPU Allocator Activity
  67 =============================
  68 mm_page_alloc_zone_locked       page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
  69 mm_page_pcpu_drain              page=%p pfn=%lu order=%d cpu=%d migratetype=%d
  70
  71 In front of the page allocator is a per-cpu page allocator. It exists only
  72 for order-0 pages, reduces contention on the zone->lock and reduces the
  73 amount of writing on struct page.
  74
  75 When a per-CPU list is empty or pages of the wrong type are allocated,
  76 the zone->lock will be taken once and the per-CPU list refilled. The event
  77 triggered is mm_page_alloc_zone_locked for each page allocated with the
  78 event indicating whether it is for a percpu_refill or not.
  79
  80 When the per-CPU list is too full, a number of pages are freed, each one
  81 which triggers a mm_page_pcpu_drain event.
  82
  83 The individual nature of the events is so that pages can be tracked
  84 between allocation and freeing. A number of drain or refill pages that occur
  85 consecutively imply the zone->lock being taken once. Large amounts of per-CPU
  86 refills and drains could imply an imbalance between CPUs where too much work
  87 is being concentrated in one place. It could also indicate that the per-CPU
  88 lists should be a larger size. Finally, large amounts of refills on one CPU
  89 and drains on another could be a factor in causing large amounts of cache
  90 line bounces due to writes between CPUs and worth investigating if pages
  91 can be allocated and freed on the same CPU through some algorithm change.
  92
  93 5. External Fragmentation
  94 =========================
  95 mm_page_alloc_extfrag           page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
  96
  97 External fragmentation affects whether a high-order allocation will be
  98 successful or not. For some types of hardware, this is important although
  99 it is avoided where possible. If the system is using huge pages and needs
 100 to be able to resize the pool over the lifetime of the system, this value
 101 is important.
 102
 103 Large numbers of this event implies that memory is fragmenting and
 104 high-order allocations will start failing at some time in the future. One
 105 means of reducing the occurrence of this event is to increase the size of
 106 min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
 107 pageblock_size is usually the size of the default hugepage size.