Pandora Sourcecodes - pandora-kernel.git/log

OMAPFB: use dma_alloc_attrs to allocate memory

Use dma_alloc_attrs to allocate memory instead of omap specific vram
allocator. After this we can remove the omap vram allocator.

There are some downsides to this change:

1) dma_alloc_attrs doesn't let us allocate at certain physical address.
However, this should not be a problem as this feature of vram allocator
is only used when reserving the framebuffer that was initialized by the
bootloader, and we don't currently support "passing" a framebuffer from
the bootloader to the kernel anyway.

2) dma_alloc_attrs, as of now, always ioremaps the allocated area, and
we don't need the ioremap when using VRFB. This patch uses
DMA_ATTR_NO_KERNEL_MAPPING for the allocation, but the flag is currently
not operational.

3) OMAPFB_GET_VRAM_INFO ioctl cannot return real values anymore. I
changed the ioctl to return 64M for all the values, which, I hope, the
applications will interpret as "there's enough vram".

4) "vram" kernel parameter to define how much ram to reserve for video
use no longer works. The user needs to enable CMA and use "cma"
parameter.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Conflicts:

drivers/video/omap2/omapfb/omapfb-main.c

common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute

This patch adds DMA_ATTR_NO_KERNEL_MAPPING attribute which lets the
platform to avoid creating a kernel virtual mapping for the allocated
buffer. On some architectures creating such mapping is non-trivial task
and consumes very limited resources (like kernel virtual address space
or dma consistent address space). Buffers allocated with this attribute
can be only passed to user space by calling dma_mmap_attrs().

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

omap2+: add drm device

Register OMAP DRM/KMS platform device. DMM is split into a
separate device using hwmod.

Signed-off-by: Andy Gross <andy.gross@ti.com>
Signed-off-by: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Conflicts:

arch/arm/mach-omap2/Makefile
drivers/staging/omapdrm/omap_drv.h
include/linux/platform_data/omap_drm.h

staging: add omapdrm DRM/KMS driver for TI OMAP platforms

A DRM display driver for TI OMAP platform.  Similar to omapfb (fbdev)
and omap_vout (v4l2 display) drivers in the past, this driver uses the
DSS2 driver to access the display hardware, including support for
HDMI, DVI, and various types of LCD panels.  And it implements GEM
support for buffer allocation (for KMS as well as offscreen buffers
used by the xf86-video-omap userspace xorg driver).

The driver maps CRTCs to overlays, encoders to overlay-managers, and
connectors to dssdev's.  Note that this arrangement might change slightly
when support for drm_plane overlays is added.

For GEM support, non-scanout buffers are using the shmem backed pages
provided by GEM core (In drm_gem_object_init()).  In the case of scanout
buffers, which need to be physically contiguous, those are allocated
with CMA and use drm_gem_private_object_init().

See userspace xorg driver:
git://github.com/robclark/xf86-video-omap.git

Refer to this link for CMA (Continuous Memory Allocator):
http://lkml.org/lkml/2011/8/19/302

Links to previous versions of the patch:
v1: http://lwn.net/Articles/458137/
v2: http://patches.linaro.org/4156/
v3: http://patches.linaro.org/4688/
v4: http://patches.linaro.org/4791/

History:

v5: move headers from include/drm at Greg KH's request, minor rebasing
    on 3.2-rc1, pull in private copies of drm_gem_{get,put}_pages()
    because "drm/gem: add functions to get/put pages" patch is not
    merged yet
v4: bit of rework of encoder/connector _dpms() code, modeset_init()
    rework to not use nested functions, update TODO.txt
v3: minor cleanups, improved error handling for dev_load(), some minor
    API changes that will be needed later for tiled buffer support
v2: replace omap_vram with CMA for scanout buffer allocation, remove
    unneeded functions, use dma_addr_t for physical addresses, error
    handling cleanup, refactor attach/detach pages into common drm
    functions, split non-userspace-facing API into omap_priv.h, remove
    plugin API

v1: original

Signed-off-by: Rob Clark <rob@ti.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Conflicts:

drivers/staging/Kconfig
drivers/staging/Makefile

ARM: relax conditions required for enabling Contiguous Memory Allocator

Contiguous Memory Allocator requires only paging and MMU enabled not
particular CPU architectures, so there is no need for strict dependency
on CPU type. This enables to use CMA on some older ARM v5 systems which
also might need large contiguous blocks for the multimedia processing hw
modules.

Reported-by: Prabhakar Lad <prabhakar.lad@ti.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Prabhakar Lad <prabhakar.lad@ti.com>

mm: cma: fix alignment requirements for contiguous regions

Contiguous Memory Allocator requires each of its regions to be aligned
in such a way that it is possible to change migration type for all
pageblocks holding it and then isolate page of largest possible order from
the buddy allocator (which is MAX_ORDER-1). This patch relaxes alignment
requirements by one order, because MAX_ORDER alignment is not really
needed.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

ARM: dma-mapping: fix incorrect freeing of atomic allocations

Commit e9da6e9905e639b0f842a244bc770b48ad0523e9 (ARM: dma-mapping:
remove custom consistent dma region) changed the way atomic allocations
are handled. However, arm_dma_free() was not modified accordingly, and
as a result freeing of atomic allocations does not work correctly when
CMA is disabled. Memory is leaked and following WARNINGs are seen:

[   57.698911] ------------[ cut here ]------------
[   57.753518] WARNING: at arch/arm/mm/dma-mapping.c:263 arm_dma_free+0x88/0xe4()
[   57.811473] trying to free invalid coherent area: e0848000
[   57.867398] Modules linked in: sata_mv(-)
[   57.921373] [<c000d270>] (unwind_backtrace+0x0/0xf0) from [<c0015430>] (warn_slowpath_common+0x50/0x68)
[   58.033924] [<c0015430>] (warn_slowpath_common+0x50/0x68) from [<c00154dc>] (warn_slowpath_fmt+0x30/0x40)
[   58.152024] [<c00154dc>] (warn_slowpath_fmt+0x30/0x40) from [<c000dc18>] (arm_dma_free+0x88/0xe4)
[   58.219592] [<c000dc18>] (arm_dma_free+0x88/0xe4) from [<c008fa30>] (dma_pool_destroy+0x100/0x148)
[   58.345526] [<c008fa30>] (dma_pool_destroy+0x100/0x148) from [<c019a64c>] (release_nodes+0x144/0x218)
[   58.475782] [<c019a64c>] (release_nodes+0x144/0x218) from [<c0197e10>] (__device_release_driver+0x60/0xb8)
[   58.614260] [<c0197e10>] (__device_release_driver+0x60/0xb8) from [<c0198608>] (driver_detach+0xd8/0xec)
[   58.756527] [<c0198608>] (driver_detach+0xd8/0xec) from [<c0197c54>] (bus_remove_driver+0x7c/0xc4)
[   58.901648] [<c0197c54>] (bus_remove_driver+0x7c/0xc4) from [<c004bfac>] (sys_delete_module+0x19c/0x220)
[   59.051447] [<c004bfac>] (sys_delete_module+0x19c/0x220) from [<c0009140>] (ret_fast_syscall+0x0/0x2c)
[   59.207996] ---[ end trace 0745420412c0325a ]---
[   59.287110] ------------[ cut here ]------------
[   59.366324] WARNING: at arch/arm/mm/dma-mapping.c:263 arm_dma_free+0x88/0xe4()
[   59.450511] trying to free invalid coherent area: e0847000
[   59.534357] Modules linked in: sata_mv(-)
[   59.616785] [<c000d270>] (unwind_backtrace+0x0/0xf0) from [<c0015430>] (warn_slowpath_common+0x50/0x68)
[   59.790030] [<c0015430>] (warn_slowpath_common+0x50/0x68) from [<c00154dc>] (warn_slowpath_fmt+0x30/0x40)
[   59.972322] [<c00154dc>] (warn_slowpath_fmt+0x30/0x40) from [<c000dc18>] (arm_dma_free+0x88/0xe4)
[   60.070701] [<c000dc18>] (arm_dma_free+0x88/0xe4) from [<c008fa30>] (dma_pool_destroy+0x100/0x148)
[   60.256817] [<c008fa30>] (dma_pool_destroy+0x100/0x148) from [<c019a64c>] (release_nodes+0x144/0x218)
[   60.445201] [<c019a64c>] (release_nodes+0x144/0x218) from [<c0197e10>] (__device_release_driver+0x60/0xb8)
[   60.634148] [<c0197e10>] (__device_release_driver+0x60/0xb8) from [<c0198608>] (driver_detach+0xd8/0xec)
[   60.823623] [<c0198608>] (driver_detach+0xd8/0xec) from [<c0197c54>] (bus_remove_driver+0x7c/0xc4)
[   61.013268] [<c0197c54>] (bus_remove_driver+0x7c/0xc4) from [<c004bfac>] (sys_delete_module+0x19c/0x220)
[   61.203472] [<c004bfac>] (sys_delete_module+0x19c/0x220) from [<c0009140>] (ret_fast_syscall+0x0/0x2c)
[   61.393390] ---[ end trace 0745420412c0325b ]---

The patch fixes this.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: fix error path for memory allocation failure

This patch fixes incorrect check in error path. When the allocation of
first page fails, the kernel ops appears due to accessing -1 element of
the pages array.

Reported-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: modify condition check while freeing pages

WARNING: at mm/vmalloc.c:1471 __iommu_free_buffer+0xcc/0xd0()
Trying to vfree() nonexistent vm area (ef095000)
Modules linked in:
[<c0015a18>] (unwind_backtrace+0x0/0xfc) from [<c0025a94>] (warn_slowpath_common+0x54/0x64)
[<c0025a94>] (warn_slowpath_common+0x54/0x64) from [<c0025b38>] (warn_slowpath_fmt+0x30/0x40)
[<c0025b38>] (warn_slowpath_fmt+0x30/0x40) from [<c0016de0>] (__iommu_free_buffer+0xcc/0xd0)
[<c0016de0>] (__iommu_free_buffer+0xcc/0xd0) from [<c0229a5c>] (exynos_drm_free_buf+0xe4/0x138)
[<c0229a5c>] (exynos_drm_free_buf+0xe4/0x138) from [<c022b358>] (exynos_drm_gem_destroy+0x80/0xfc)
[<c022b358>] (exynos_drm_gem_destroy+0x80/0xfc) from [<c0211230>] (drm_gem_object_free+0x28/0x34)
[<c0211230>] (drm_gem_object_free+0x28/0x34) from [<c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8)
[<c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8) from [<c01abe10>] (idr_for_each+0x74/0xb8)
[<c01abe10>] (idr_for_each+0x74/0xb8) from [<c02114e4>] (drm_gem_release+0x1c/0x30)
[<c02114e4>] (drm_gem_release+0x1c/0x30) from [<c0210ae8>] (drm_release+0x608/0x694)
[<c0210ae8>] (drm_release+0x608/0x694) from [<c00b75a0>] (fput+0xb8/0x228)
[<c00b75a0>] (fput+0xb8/0x228) from [<c00b40c4>] (filp_close+0x64/0x84)
[<c00b40c4>] (filp_close+0x64/0x84) from [<c0029d54>] (put_files_struct+0xe8/0x104)
[<c0029d54>] (put_files_struct+0xe8/0x104) from [<c002b930>] (do_exit+0x608/0x774)
[<c002b930>] (do_exit+0x608/0x774) from [<c002bae4>] (do_group_exit+0x48/0xb4)
[<c002bae4>] (do_group_exit+0x48/0xb4) from [<c002bb60>] (sys_exit_group+0x10/0x18)
[<c002bb60>] (sys_exit_group+0x10/0x18) from [<c000ee80>] (ret_fast_syscall+0x0/0x30)

This patch modifies the condition while freeing to match the condition
used while allocation. This fixes the above warning which arises when
array size is equal to PAGE_SIZE where allocation is done using kzalloc
but free is done using vfree.

Signed-off-by: Prathyush K <prathyush.k@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

mm: cma: fix condition check when setting global cma area

dev_set_cma_area incorrectly assigned cma to global area on first call
due to incorrect check. This patch fixes this issue.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

mm: cma: don't replace lowmem pages with highmem

The filesystem layer expects pages in the block device's mapping to not
be in highmem (the mapping's gfp mask is set in bdget()), but CMA can
currently replace lowmem pages with highmem pages, leading to crashes in
filesystem code such as the one below:

  Unable to handle kernel NULL pointer dereference at virtual address 00000400
  pgd = c0c98000
  [00000400] *pgd=00c91831, *pte=00000000, *ppte=00000000
  Internal error: Oops: 817 [#1] PREEMPT SMP ARM
  CPU: 0    Not tainted  (3.5.0-rc5+ #80)
  PC is at __memzero+0x24/0x80
  ...
  Process fsstress (pid: 323, stack limit = 0xc0cbc2f0)
  Backtrace:
  [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98)
  [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc)
   r4:c15337f0
  [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98)
  [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac)
   r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0
  [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24)
   r6:beccdcf0 r5:00074000 r4:beccdbbc
  [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)

Fix this by replacing only highmem pages with highmem.

Reported-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: fix warning caused by wrongly typed arm_dma_limit

arch/arm/mm/init.c: In function 'arm_memblock_init':
arch/arm/mm/init.c:380: warning: comparison of distinct pointer types lacks a cast

by fixing the typecast in its definition when DMA_ZONE is disabled.
This was missed in 4986e5c7c (ARM: mm: fix type of the arm_dma_limit
global variable).

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: dma-mapping: fix buffer chunk allocation order

IOMMU-aware dma_alloc_attrs() implementation allocates buffers in
power-of-two chunks to improve performance and take advantage of large
page mappings provided by some IOMMU hardware. However current code, due
to a subtle bug, allocated those chunks in the smallest-to-largest
order, what completely killed all the advantages of using larger than
page chunks. If a 4KiB chunk has been mapped as a first chunk, the
consecutive chunks are not aligned correctly to the power-of-two which
match their size and IOMMU drivers were not able to use internal
mappings of size other than the 4KiB (largest common denominator of
alignment and chunk size).

This patch fixes this issue by changing to the correct largest-to-smallest
chunk size allocation sequence.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: fix debug messages in dmabounce code

This patch fixes the usage of uninitialized variables in dmabounce code
intoduced by commit a227fb92 ('ARM: dma-mapping: remove offset parameter
to prepare for generic dma_ops'):
arch/arm/common/dmabounce.c: In function ‘dmabounce_sync_for_device’:
arch/arm/common/dmabounce.c:409: warning: ‘off’ may be used uninitialized in this function
arch/arm/common/dmabounce.c:407: note: ‘off’ was declared here
arch/arm/common/dmabounce.c: In function ‘dmabounce_sync_for_cpu’:
arch/arm/common/dmabounce.c:369: warning: ‘off’ may be used uninitialized in this function
arch/arm/common/dmabounce.c:367: note: ‘off’ was declared here

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: mm: fix type of the arm_dma_limit global variable

arm_dma_limit stores physical address of maximal address accessible by DMA,
so the phys_addr_t type makes much more sense for it instead of u32. This
patch fixes the following build warning:

arch/arm/mm/init.c:380: warning: comparison of distinct pointer types lacks a cast

Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: Add missing static storage class specifier

Fixes the following sparse warnings:
arch/arm/mm/dma-mapping.c:231:15: warning: symbol 'consistent_base' was not
declared. Should it be static?
arch/arm/mm/dma-mapping.c:326:8: warning: symbol 'coherent_pool_size' was not
declared. Should it be static?

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: remove unconditional dependency on CMA

CMA has been enabled unconditionally on all ARMv6+ systems to solve the
long standing issue of double kernel mappings for all dma coherent
buffers. This however created a dependency on CONFIG_EXPERIMENTAL for
the whole ARM architecture what should be really avoided. This patch
removes this dependency and lets one use old, well-tested dma-mapping
implementation also on ARMv6+ systems without the need to use
EXPERIMENTAL stuff.

Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Conflicts:

arch/arm/mm/dma-mapping.c

ARM: dma-mapping: use PMD size for section unmap

The dma_contiguous_remap() function clears existing section maps using
the wrong size (PGDIR_SIZE instead of PMD_SIZE). This is a bug which
does not affect non-LPAE systems, where PGDIR_SIZE and PMD_SIZE are the same.
On LPAE systems, however, this bug causes the kernel to hang at this point.

This fix has been tested on both LPAE and non-LPAE kernel builds.

Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

cma: fix migration mode

__alloc_contig_migrate_range calls migrate_pages with wrong argument
for migrate_mode. Fix it.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

ARM: dma-mapping: fix some CMA related mismerge

no idea what happened :/

ARM: integrate CMA with DMA-mapping subsystem

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialisation).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

[notasas@gmail.com: backport to 3.2]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>
Conflicts:

arch/arm/Kconfig
arch/arm/kernel/setup.c
arch/arm/mm/dma-mapping.c
arch/arm/mm/init.c

drivers: add Contiguous Memory Allocator

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with MIGRATE_CMA migrate type
and gives back to the system. Kernel is allowed to allocate only movable
pages within CMA's managed memory so that it can be used for example for
page cache when DMA mapping do not use it. On
dma_alloc_from_contiguous() request such pages are migrated out of CMA
area to free required contiguous block and fulfill the request. This
allows to allocate large contiguous chunks of memory at any time
assuming that there is enough free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks

alloc_contig_range() performs memory allocation so it also should keep
track on keeping the correct level of memory watermarks. This commit adds
a call to *_slowpath style reclaim to grab enough pages to make sure that
the final collection of contiguous pages from freelists will not starve
the system.

[notasas@gmail.com: update out_of_memory args]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: extract reclaim code from __alloc_pages_direct_reclaim()

This patch extracts common reclaim code from __alloc_pages_direct_reclaim()
function to separate function: __perform_reclaim() which can be later used
by alloc_contig_range().

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: Serialize access to min_free_kbytes

There is a race between the min_free_kbytes sysctl, memory hotplug
and transparent hugepage support enablement. Memory hotplug uses a
zonelists_mutex to avoid a race when building zonelists. Reuse it to
serialise watermark updates.

[a.p.zijlstra@chello.nl: Older patch fixed the race with spinlock]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: page_isolation: MIGRATE_CMA isolation functions added

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: mmzone: MIGRATE_CMA migration type added

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees (to some degree) that page in a MIGRATE_CMA page
block can always be migrated somewhere else (unless there's no
memory left in the system).

It is designed to be used for allocating big chunks (eg. 10MiB)
of physically contiguous memory. Once driver requests
contiguous memory, pages from MIGRATE_CMA pageblocks may be
migrated away to create a contiguous block.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types when requested.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: page_alloc: change fallbacks array handling

This commit adds a row for MIGRATE_ISOLATE type to the fallbacks array
which was missing from it. It also, changes the array traversal logic
a little making MIGRATE_RESERVE an end marker. The letter change,
removes the implicit MIGRATE_UNMOVABLE from the end of each row which
was read by __rmqueue_fallback() function.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: page_alloc: introduce alloc_contig_range()

This commit adds the alloc_contig_range() function which tries
to allocate given range of pages. It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: compaction: export some of the functions

This commit exports some of the functions from compaction.c file
outside of it adding their declaration into internal.h header
file so that other mm related code can use them.

This forced compaction.c to always be compiled (as opposed to being
compiled only if CONFIG_COMPACTION is defined) but as to avoid
introducing code that user did not ask for, part of the compaction.c
is now wrapped in on #ifdef.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: compaction: introduce isolate_freepages_range()

This commit introduces isolate_freepages_range() function which
generalises isolate_freepages_block() so that it can be used on
arbitrary PFN ranges.

isolate_freepages_block() is left with only minor changes.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: compaction: introduce map_pages()

This commit creates a map_pages() function which map pages freed
using split_free_pages(). This merely moves some code from
isolate_freepages() so that it can be reused in other places.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: compaction: introduce isolate_migratepages_range()

This commit introduces isolate_migratepages_range() function which
extracts functionality from isolate_migratepages() so that it can be
used on arbitrary PFN ranges.

isolate_migratepages() function is implemented as a simple wrapper
around isolate_migratepages_range().

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Rob Clark <rob.clark@linaro.org>
Tested-by: Ohad Ben-Cohen <ohad@wizery.com>
Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Tested-by: Robert Nelson <robertcnelson@gmail.com>
Tested-by: Barry Song <Baohua.Song@csr.com>

mm: page_alloc: remove trailing whitespace

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>

ARM: dma-mapping: add support for IOMMU mapper

This patch add a complete implementation of DMA-mapping API for
devices which have IOMMU support.

This implementation tries to optimize dma address space usage by remapping
all possible physical memory chunks into a single dma address space chunk.

DMA address space is managed on top of the bitmap stored in the
dma_iommu_mapping structure stored in device->archdata. Platform setup
code has to initialize parameters of the dma address space (base address,
size, allocation precision order) with arm_iommu_create_mapping() function.
To reduce the size of the bitmap, all allocations are aligned to the
specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with
alloc_pages() function to avoid failing if the physical memory gets
fragmented. In worst case the allocated buffer is composed of 4 KiB page
chunks.

dma_map_sg() function minimizes the total number of dma address space
chunks by merging of physical memory chunks into one larger dma address
space chunk. If requested chunk (scatter list entry) boundaries
match physical page boundaries, most calls to dma_map_sg() requests will
result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma
address space.

All dma functions also perform required cache operation like their
counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by:
- Krishna Reddy <vdumpa@nvidia.com>,
- Andrzej Pietrasiewicz <andrzej.p@samsung.com>,
- Hiroshi DOYU <hdoyu@nvidia.com>

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: use alloc, mmap, free from dma_ops

This patch converts dma_alloc/free/mmap_{coherent,writecombine}
functions to use generic alloc/free/mmap methods from dma_map_ops
structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been
introduced to implement writecombine methods.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: remove redundant code and do the cleanup

This patch just performs a global cleanup in DMA mapping implementation
for ARM architecture. Some of the tiny helper functions have been moved
to the caller code, some have been merged together.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: move all dma bounce code to separate dma ops structure

This patch removes dma bounce hooks from the common dma mapping
implementation on ARM architecture and creates a separate set of
dma_map_ops for dma bounce devices.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: implement dma sg methods on top of any generic dma ops

This patch converts all dma_sg methods to be generic (independent of the
current DMA mapping implementation for ARM architecture). All dma sg
operations are now implemented on top of respective
dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

Before this patch there were custom methods for all scatter/gather
related operations. They iterated over the whole scatter list and called
cache related operations directly (which in turn checked if we use dma
bounce code or not and called respective version). This patch changes
them not to use such shortcut. Instead it provides similar loop over
scatter list and calls methods from the device's dma_map_ops structure.
This enables us to use device dependent implementations of cache related
operations (direct linear or dma bounce) depending on the provided
dma_map_ops structure.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: use asm-generic/dma-mapping-common.h

This patch modifies dma-mapping implementation on ARM architecture to
use common dma_map_ops structure and asm-generic/dma-mapping-common.h
helpers.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops

This patch removes the need for the offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture to use common, generic dma_map_ops based dma-mapping
helpers.

Background and more detailed explaination:

dma_*_range_* functions are available from the early days of the dma
mapping api. They are the correct way of doing a partial syncs on the
buffer (usually used by the network device drivers). This patch changes
only the internal implementation of the dma bounce functions to let
them tunnel through dma_map_ops structure. The driver api stays
unchanged, so driver are obliged to call dma_*_range_* functions to
keep code clean and easy to understand.

The only drawback from this patch is reduced detection of the dma api
abuse. Let us consider the following code:

dma_addr = dma_map_single(dev, ptr, 64, DMA_TO_DEVICE);
dma_sync_single_range_for_cpu(dev, dma_addr+16, 0, 32, DMA_TO_DEVICE);

Without the patch such code fails, because dma bounce code is unable
to find the bounce buffer for the given dma_address. After the patch
the above sync call will be equivalent to:

dma_sync_single_range_for_cpu(dev, dma_addr, 16, 32, DMA_TO_DEVICE);

which succeeds.

I don't consider this as a real problem, because DMA API abuse should be
caught by debug_dma_* function family. This patch lets us to simplify
the internal low-level implementation without chaning the driver visible
API.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: introduce DMA_ERROR_CODE constant

Replace all uses of ~0 with DMA_ERROR_CODE, what should make the code
easier to read.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: use pr_* instread of printk

Replace all calls to printk with pr_* functions family.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

ARM: dma-mapping: use dma_mmap_from_coherent()

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

common: add dma_mmap_from_coherent() function

Add a common helper for dma-mapping core for mapping a coherent buffer
to userspace.

Reported-by: Subash Patel <subashrp@gmail.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-By: Subash Patel <subash.ramaswamy@linaro.org>

common: DMA-mapping: add NON-CONSISTENT attribute

DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either
consistent or non-consistent memory as it sees fit. By using this API,
you are guaranteeing to the platform that you have all the correct and
necessary sync points for this memory in the driver.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>

common: DMA-mapping: add WRITE_COMBINE attribute

DMA_ATTR_WRITE_COMBINE specifies that writes to the mapping may be
buffered to improve performance. It will be used by the replacement for
ARM/ARV32 specific dma_alloc_writecombine() function.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>

common: dma-mapping: introduce mmap method

Introduce new generic mmap method with attributes argument.

This method lets drivers to create a userspace mapping for a DMA buffer
in generic, architecture independent way.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>

common: dma-mapping: remove old alloc_coherent and free_coherent methods

Remove old, unused alloc_coherent and free_coherent methods from
dma_map_ops structure.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>

common: dma-mapping: introduce generic alloc() and free() methods

Introduce new generic alloc and free methods with attributes argument.

Existing alloc_coherent and free_coherent can be implemented on top of the
new calls with NULL attributes argument. Later also dma_alloc_non_coherent
can be implemented using DMA_ATTR_NONCOHERENT attribute as well as
dma_alloc_writecombine with separate DMA_ATTR_WRITECOMBINE attribute.

This way the drivers will get more generic, platform independent way of
allocating dma buffers with specific parameters.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: David Gibson <david@gibson.dropbear.ud.au>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>

ARM: dma-mapping: convert ARCH_HAS_DMA_SET_COHERENT_MASK to kconfig symbol

The only users of ARCH_HAS_DMA_SET_COHERENT_MASK are 2 ARM platforms:
ixp4xx and pxa cm_x2xx. We've been getting lucky that the define is
implicitly included before dma-mapping.h, but the removal of io.h broke
things (c334bc1 ARM: make mach/io.h include optional). Since memory.h
is the correct place, but no longer exists, convert the define to a
kconfig entry.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Imre Kaloz <kaloz@openwrt.org>
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Eric Miao <eric.y.miao@gmail.com>
Acked-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>

ARM: move initialization of the high_memory variable earlier

Some upcoming changes must know the VMALLOC_START value, which is based
on high_memory, before bootmem_init() is called.

The best location to set it is in sanity_check_meminfo() where the needed
computation is already done, and in the non MMU case it is trivial to do
now that the meminfo array is already sorted at that point.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>

ARM: sort the meminfo array earlier

The meminfo array has to be sorted before sanity_check_meminfo() in
arch/arm/mm/mmu.c is called for it to work properly. This also allows
for a simpler find_limits() in arch/arm/mm/init.c.

The sort is moved to arch/arm/kernel/setup.c because that's where the
meminfo array is populated. Eventually this should be improved upon
to make the memory bank parser a bit more robust against problems
such as overlapping memory ranges.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>

ARM: add dma coherent region reporting via procfs

Add a new seqfile for reporting coherent DMA allocations. This contains
the address range, size and the function which was used to allocate
each region, allowing these allocations to be viewed in much the same
way as /proc/vmallocinfo.

The DMA coherent region has limited space, so this allows allocation
failures to be viewed, as well as finding out how much space is being
used.

Make sure this file is only readable by root - same as vmallocinfo - to
prevent information leakage.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

mm: fix off-by-two in __zone_watermark_ok()

Commit 88f5acf88ae6 ("mm: page allocator: adjust the per-cpu counter
threshold when memory is low") changed the form how free_pages is
calculated but it forgot that we used to do free_pages - ((1 << order) -
1) so we ended up with off-by-two when calculating free_pages.

Reported-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: compaction: push isolate search base of compact control one pfn ahead

After isolated the current pfn will no longer be scanned and isolated if
the next round is necessary, so push the isolate_migratepages search base
of the given compact_control one step ahead.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: exclude reserved pages from dirtyable memory

Per-zone dirty limits try to distribute page cache pages allocated for
writing across zones in proportion to the individual zone sizes, to reduce
the likelihood of reclaim having to write back individual pages from the
LRU lists in order to make progress.

This patch:

The amount of dirtyable pages should not include the full number of free
pages: there is a number of reserved pages that the page allocator and
kswapd always try to keep free.

The closer (reclaimable pages - dirty pages) is to the number of reserved
pages, the more likely it becomes for reclaim to run into dirty pages:

       +----------+ ---
       |   anon   |  |
       +----------+  |
       |          |  |
       |          |  -- dirty limit new    -- flusher new
       |   file   |  |                     |
       |          |  |                     |
       |          |  -- dirty limit old    -- flusher old
       |          |                        |
       +----------+                       --- reclaim
       | reserved |
       +----------+
       |  kernel  |
       +----------+

This patch introduces a per-zone dirty reserve that takes both the lowmem
reserve as well as the high watermark of the zone into account, and a
global sum of those per-zone values that is subtracted from the global
amount of dirtyable pages.  The lowmem reserve is unavailable to page
cache allocations and kswapd tries to keep the high watermark free.  We
don't want to end up in a situation where reclaim has to clean pages in
order to balance zones.

Not treating reserved pages as dirtyable on a global level is only a
conceptual fix.  In reality, dirty pages are not distributed equally
across zones and reclaim runs into dirty pages on a regular basis.

But it is important to get this right before tackling the problem on a
per-zone level, where the distance between reclaim and the dirty pages is
mostly much smaller in absolute numbers.

[akpm@linux-foundation.org: fix highmem build]
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/page-writeback.c: make determine_dirtyable_memory static again

The tracing ring-buffer used this function briefly, but not anymore.
Make it local to the writeback code again.

Also, move the function so that no forward declaration needs to be
reintroduced.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: more intensive memory corruption debugging

With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception
on access (read,write) to an unallocated page, which permits us to catch
code which corrupts memory. However the kernel is trying to maximise
memory usage, hence there are usually few free pages in the system and
buggy code usually corrupts some crucial data.

This patch changes the buddy allocator to keep more free/protected pages
and to interlace free/protected and allocated pages to increase the
probability of catching corruption.

When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC,
debug_guardpage_minorder defines the minimum order used by the page
allocator to grant a request. The requested size will be returned with
the remaining pages used as guard pages.

The default value of debug_guardpage_minorder is zero: no change from
current behaviour.

[akpm@linux-foundation.org: tweak documentation, s/flg/flag/]
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: avoid livelock on !__GFP_FS allocations

Colin Cross reported;

  Under the following conditions, __alloc_pages_slowpath can loop forever:
  gfp_mask & __GFP_WAIT is true
  gfp_mask & __GFP_FS is false
  reclaim and compaction make no progress
  order <= PAGE_ALLOC_COSTLY_ORDER

  These conditions happen very often during suspend and resume,
  when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL
  allocations into __GFP_WAIT.

  The oom killer is not run because gfp_mask & __GFP_FS is false,
  but should_alloc_retry will always return true when order is less
  than PAGE_ALLOC_COSTLY_ORDER.

In his fix, he avoided retrying the allocation if reclaim made no progress
and __GFP_FS was not set.  The problem is that this would result in
GFP_NOIO allocations failing that previously succeeded which would be very
unfortunate.

The big difference between GFP_NOIO and suspend converting GFP_KERNEL to
behave like GFP_NOIO is that normally flushers will be cleaning pages and
kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay.
The same does not necessarily apply during suspend as the storage device
may be suspended.

This patch special cases the suspend case to fail the page allocation if
reclaim cannot make progress and adds some documentation on how
gfp_allowed_mask is currently used.  Failing allocations like this may
cause suspend to abort but that is better than a livelock.

[mgorman@suse.de: Rework fix to be suspend specific]
[rientjes@google.com: Move suspended device check to should_alloc_retry]
Reported-by: Colin Cross <ccross@android.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm-tracepoint: rename page-free events

Rename mm_page_free_direct into mm_page_free and mm_pagevec_free into
mm_page_free_batched

Since v2.6.33-5426-gc475dab the kernel triggers mm_page_free_direct for
all freed pages, not only for directly freed. So, let's name it properly.
For pages freed via page-list we also trigger mm_page_free_batched event.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: remove unused pagevec_free

It not exported and now nobody uses it.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: add free_hot_cold_page_list() helper

This patch adds helper free_hot_cold_page_list() to free list of 0-order
pages.  It frees pages directly from list without temporary page-vector.
It also calls trace_mm_pagevec_free() to simulate pagevec_free()
behaviour.

bloat-o-meter:

add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
function                                     old     new   delta
free_hot_cold_page_list                        -     264    +264
get_page_from_freelist                      2129    2132      +3
__pagevec_free                               243     239      -4
split_free_page                              380     373      -7
release_pages                                606     510     -96
free_page_list                               188       -    -188

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

switch debugfs to umode_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

convert 'memory' sysdev_class to a regular subsystem

This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

mm: thp: set the accessed flag for old pages on access fault

On x86 memory accesses to pages without the ACCESSED flag set result in
the ACCESSED flag being set automatically. With the ARM architecture a
page access fault is raised instead (and it will continue to be raised
until the ACCESSED flag is set for the appropriate PTE/PMD).

For normal memory pages, handle_pte_fault will call pte_mkyoung
(effectively setting the ACCESSED flag). For transparent huge pages,
pmd_mkyoung will only be called for a write fault.

This patch ensures that faults on transparent hugepages which do not
result in a CoW update the access flags for the faulting pmd.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ni zhan Chen <nizhan.chen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

mm/huge_memory.c

mm: thp: fix the update_mmu_cache() last argument passing in mm/huge_memory.c

The update_mmu_cache() takes a pointer (to pte_t by default) as the last
argument but the huge_memory.c passes a pmd_t value. The patch changes
the argument to the pmd_t * pointer.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

mm/huge_memory.c

mm: hugetlb: add arch hook for clearing page flags before entering pool

The core page allocator ensures that page flags are zeroed when freeing
pages via free_pages_check. A number of architectures (ARM, PPC, MIPS)
rely on this property to treat new pages as dirty with respect to the data
cache and perform the appropriate flushing before mapping the pages into
userspace.

This can lead to cache synchronisation problems when using hugepages,
since the allocator keeps its own pool of pages above the usual page
allocator and does not reset the page flags when freeing a page into the
pool.

This patch adds a new architecture hook, arch_clear_hugepage_flags, so
that architectures which rely on the page flags being in a particular
state for fresh allocations can adjust the flags accordingly when a page
is freed into the pool.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

arch/tile/include/asm/hugetlb.h

thp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE

Cleanup patch in preparation for transparent hugepage support on s390.
Adding new architectures to the TRANSPARENT_HUGEPAGE config option can
make the "depends" line rather ugly, like "depends on (X86 || (S390 &&
64BIT)) && MMU".

This patch adds a HAVE_ARCH_TRANSPARENT_HUGEPAGE instead. x86 already has
MMU "def_bool y", so the MMU check is superfluous there and
HAVE_ARCH_TRANSPARENT_HUGEPAGE can be selected in arch/x86/Kconfig.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:

arch/Kconfig
arch/x86/Kconfig

ARM: mm: Transparent huge page support for non-LPAE systems.

Much of the required code for THP has been implemented in the
earlier non-LPAE HugeTLB patch.

One more domain bit is used (to store whether or not the THP is
splitting).

Some THP helper functions are defined; and we have to re-define
pmd_page such that it distinguishes between page tables and
sections.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>

ARM: mm: Transparent huge page support for LPAE systems.

The patch adds support for THP (transparent huge pages) to LPAE
systems. When this feature is enabled, the kernel tries to map
anonymous pages as 2MB sections where possible.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper@arm.com: symbolic constants used, value of PMD_SECT_SPLITTING
adjusted, tlbflush.h included in pgtable.h]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>

ARM: mm: HugeTLB support for non-LPAE systems.

Based on Bill Carson's HugeTLB patch, with the big difference being
in the way PTEs are passed back to the memory manager. Rather than
store a "Linux Huge PTE" separately; we make one up on the fly in
huge_ptep_get. Also rather than consider 16M supersections, we focus
solely on 2x1M sections.

To construct a huge PTE on the fly we need additional information
(such as the accessed flag and dirty bit) which we choose to store
in the domain bits of the short section descriptor. In order to use
these domain bits for storage, we need to make ourselves a client
for all 16 domains and this is done in head.S.

Storing extra information in the domain bits also makes it a lot
easier to implement Transparent Huge Pages, and some of the code in
pgtable-2level.h is arranged to facilitate THP support in a later
patch.

Non-LPAE HugeTLB pages are incompatible with the huge page migration
code (enabled when CONFIG_MEMORY_FAILURE is selected) as that code
dereferences PTEs directly, rather than calling huge_ptep_get and
set_huge_pte_at.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>

ARM: mm: HugeTLB support for LPAE systems.

This patch adds support for hugetlbfs based on the x86 implementation.
It allows mapping of 2MB sections (see Documentation/vm/hugetlbpage.txt
for usage). The 64K pages configuration is not supported (section size
is 512MB in this case).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper@arm.com: symbolic constants replace numbers in places.
Split up into multiple files, to simplify future non-LPAE support,
removed huge_pmd_share code, as this is very rarely executed].
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>

ARM: mm: Add support for flushing HugeTLB pages.

On ARM we use the __flush_dcache_page function to flush the dcache
of pages when needed; usually when the PG_dcache_clean bit is unset
and we are setting a PTE.

A HugeTLB page is represented as a compound page consisting of an
array of pages. Thus to flush the dcache of a HugeTLB page, one must
flush more than a single page.

This patch modifies __flush_dcache_page such that all constituent
pages of a HugeTLB page are flushed.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>

ARM: mm: correct pte_same behaviour for LPAE.

For 3 levels of paging the PTE_EXT_NG bit will be set for user
address ptes that are written to a page table but not for ptes
created with mk_pte.

This can cause some comparison tests made by pte_same to fail
spuriously and lead to other problems.

To correct this behaviour, we mask off PTE_EXT_NG for any pte that
is present before running the comparison.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>

ARM: mm: introduce L_PTE_VALID for page table entries

For long-descriptor translation table formats, the ARMv7 architecture
defines the last two bits of the second- and third-level descriptors to
be:

x0b - Invalid
01b - Block (second-level), Reserved (third-level)
11b - Table (second-level), Page (third-level)

This allows us to define L_PTE_PRESENT as (3 << 0) and use this value to
create ptes directly. However, when determining whether a given pte
value is present in the low-level page table accessors, we only need to
check the least significant bit of the descriptor, allowing us to write
faulting, present entries which are required for PROT_NONE mappings.

This patch introduces L_PTE_VALID, which can be used to test whether a
pte should fault, and updates the low-level page table accessors
accordingly.

Signed-off-by: Will Deacon <will.deacon@arm.com>

mm: thp: fix the pmd_clear() arguments in pmdp_get_and_clear()

The CONFIG_TRANSPARENT_HUGEPAGE implementation of pmdp_get_and_clear()
calls pmd_clear() with 3 arguments instead of 1.

This happens only for !__HAVE_ARCH_PMDP_GET_AND_CLEAR which doesn't seem
to happen because x86 defines this and it uses pmd_update.

[mhocko@suse.cz: changelog addition]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ARM: 7323/1: Do not allow ARM_LPAE on pre-ARMv7 architectures

This patch expands the Kconfig dependencies for ARM_LPAE to not allow
enabling when architectures other than ARMv7 are built into the kernel.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7275/1: LPAE: Check the CPU support for the long descriptor format

This patch adds a check for the presence of the LPAE feature during the
CPU initialisation. If not present, it reports an error when
CONFIG_DEBUG_LL is enabled.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: kexec: use soft_restart for branching to the reboot buffer

Now that there is a common way to reset the machine, let's use it
instead of reinventing the wheel in the kexec backend.

Signed-off-by: Will Deacon <will.deacon@arm.com>

ARM: stop: execute platform callback from cpu_stop code

Sending IPI_CPU_STOP to a CPU causes it to execute a busy cpu_relax
loop forever. This makes it impossible to kexec successfully on an SMP
system since the secondary CPUs do not reset.

This patch adds a callback to platform_cpu_kill, defined when
CONFIG_HOTPLUG_CPU=y, from the ipi_cpu_stop handling code. This function
currently just returns 1 on all platforms that define it but allows them
to do something more sophisticated in the future.

Signed-off-by: Will Deacon <will.deacon@arm.com>

ARM: reset: implement soft_restart for jumping to a physical address

Tools such as kexec and CPU hotplug require a way to reset the processor
and branch to some code in physical space. This requires various bits of
jiggery pokery with the caches and MMU which, when it goes wrong, tends
to lock up the system.

This patch fleshes out the soft_restart implementation so that it
branches to the reset code using the identity mapping. This requires us
to change to a temporary stack, held within the kernel image as a static
array, to avoid conflicting with the new view of memory.

Signed-off-by: Will Deacon <will.deacon@arm.com>

ARM: lib: add call_with_stack function for safely changing stack

When disabling the MMU, it is necessary to take out a 1:1 identity map
of the reset code so that it can safely be executed with and without
the MMU active. To avoid the situation where the physical address of the
reset code aliases with the virtual address of the active stack (which
cannot be included in the 1:1 mapping), it is desirable to change to a
new stack at a location which is less likely to alias.

This code adds a new lib function, call_with_stack:

void call_with_stack(void (*fn)(void *), void *arg, void *sp);

which changes the stack to point at the sp parameter, before invoking
fn(arg) with the new stack selected.

Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

ARM: restart: only perform setup for restart when soft-restarting

We only need to set the system up for a soft-restart if we're going to
be doing a soft-restart. Provide a new function (soft_restart()) which
does the setup and final call for this, and make platforms use it.
Eliminate the call to setup_restart() from the default handler.

This means that platforms arch_reset() function is no longer called with
the page tables prepared for a soft-restart, and caches will still be
enabled.

Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@st.com>
Acked-by: Krzysztof Ha■asa <khc@pm.waw.pl>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Acked-by: Wan ZongShun <mcuos.com@gmail.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: restart: remove argument to setup_mm_for_reboot()

setup_mm_for_reboot() doesn't make use of its argument, so remove it.

Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: restart: move reboot failure handing into machine_restart()

Move the failure to reboot into machine_restart() to always catch
this condition, even if a platform decides to hook the restarting
via arm_pm_restart().

Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Conflicts:

arch/arm/kernel/process.c

ARM: LPAE: Add the Kconfig entries

This patch adds the ARM_LPAE and ARCH_PHYS_ADDR_T_64BIT Kconfig entries
allowing LPAE support to be compiled into the kernel.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: mark memory banks with start > ULONG_MAX as highmem

Memory banks living outside of the 32-bit physical address
space do not have a 1:1 pa <-> va mapping and therefore the
__va macro may wrap.

This patch ensures that such banks are marked as highmem so
that the Kernel doesn't try to split them up when it sees that
the wrapped virtual address overlaps the vmalloc space.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>

ARM: LPAE: Add identity mapping support for the 3-level page table format

With LPAE, the pgd is a separate page table with entries pointing to the
pmd. The identity_mapping_add() function needs to ensure that the pgd is
populated before populating the pmd level. The do..while blocks now loop
over the pmd in order to have the same implementation for the two page
table formats. The pmd_addr_end() definition has been removed and the
generic one used instead. The pmd clean-up is done in the pgd_free()
function.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Add context switching support

With LPAE, TTBRx registers are 64-bit. The ASID is stored in TTBR0
rather than a separate Context ID register. This patch makes the
necessary changes to handle context switching on LPAE.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Add fault handling support

The DFSR and IFSR register format is different when LPAE is enabled. In
addition, DFSR and IFSR have similar definitions for the fault type.
This modifies the fault code to correctly handle the new format.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Invalidate the TLB before freeing the PMD

Similar to the PTE freeing, this patch introduced __pmd_free_tlb() which
invalidates the TLB before freeing a PMD page. This is needed because on
newer processors the entry in the upper page table may be cached by the
TLB and point to random data after the PMD has been freed.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: MMU setup for the 3-level page table format

This patch adds the MMU initialisation for the LPAE page table format.
The swapper_pg_dir size with LPAE is 5 rather than 4 pages. A new
proc-v7-3level.S file contains the TTB initialisation, context switch
and PTE setting code with the LPAE. The TTBRx split is based on the
PAGE_OFFSET with TTBR1 used for the kernel mappings. The 36-bit mappings
(supersections) and a few other memory types in mmu.c are conditionally
compiled.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Conflicts:

arch/arm/kernel/head.S

ARM: LPAE: Page table maintenance for the 3-level format

This patch modifies the pgd/pmd/pte manipulation functions to support
the 3-level page table format. Since there is no need for an 'ext'
argument to cpu_set_pte_ext(), this patch conditionally defines a
different prototype for this function when CONFIG_ARM_LPAE.

The patch also introduces the L_PGD_SWAPPER flag to mark pgd entries
pointing to pmd tables pre-allocated in the swapper_pg_dir and avoid
trying to free them at run-time. This flag is 0 with the classic page
table format.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Introduce the 3-level page table format definitions

This patch introduces the pgtable-3level*.h files with definitions
specific to the LPAE page table format (3 levels of page tables).

Each table is 4KB and has 512 64-bit entries. An entry can point to a
40-bit physical address. The young, write and exec software bits share
the corresponding hardware bits (negated). Other software bits use spare
bits in the PTE.

The patch also changes some variable types from unsigned long or int to
pteval_t or pgprot_t.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: add ISBs around MMU enabling code

Before we enable the MMU, we must ensure that the TTBR registers contain
sane values. After the MMU has been enabled, we jump to the *virtual*
address of the following function, so we also need to ensure that the
SCTLR write has taken effect.

This patch adds ISB instructions around the SCTLR write to ensure the
visibility of the above.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Factor out classic-MMU specific code into proc-v7-2level.S

This patch modifies the proc-v7.S file so that it only contains code
shared between classic MMU and LPAE. The non-common code is factored out
into a separate file.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Move the FSR definitions to separate files

The FSR structure is different with LPAE and this patch moves the
classic MMU specific definition to a separate fsr-2level.c file that is
included in fault.c. It also moves the fsr_fs and FSR bits to the
fault.h file.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: LPAE: Move page table maintenance macros to pgtable-2level.h

The page table maintenance macros need to be duplicated between the
classic and the LPAE MMU so this patch moves those that are not common
to the pgtable-2level.h file.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

ARM: pgtable: switch to use pgtable-nopud.h

Nick Piggin noted upon introducing 4level-fixup.h:

| Add a temporary "fallback" header so architectures can run with
| the 4level pagetables patch without modification. All architectures
| should be converted to use the folding headers (include/asm-generic/
| pgtable-nop?d.h) as soon as possible, and the fallback header removed.

This makes ARM compliant with this statement.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>