pandora-kernel.git
16 years agox86: move pci fixup to pci-dma.c
Glauber Costa [Tue, 8 Apr 2008 16:20:53 +0000 (13:20 -0300)]
x86: move pci fixup to pci-dma.c

via_no_dac provides a fixup that is the same for both
architectures. Move it to pci-dma.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move x86_64-specific to common code.
Glauber Costa [Tue, 8 Apr 2008 16:20:54 +0000 (13:20 -0300)]
x86: move x86_64-specific to common code.

This patch moves the bootmem functions, that are largely
x86_64-specific into pci-dma.c. The code goes inside an ifdef.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move initialization functions to pci-dma.c
Glauber Costa [Tue, 8 Apr 2008 16:20:51 +0000 (13:20 -0300)]
x86: move initialization functions to pci-dma.c

initcalls that triggers the various possibiities for
dma subsys are moved to pci-dma.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: unify pci-nommu
Glauber Costa [Tue, 8 Apr 2008 16:20:52 +0000 (13:20 -0300)]
x86: unify pci-nommu

merge pci-base_32.c and pci-nommu_64.c into pci-nommu.c
Their code were made the same, so now they can be merged.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move definition to pci-dma.c
Glauber Costa [Tue, 8 Apr 2008 16:20:50 +0000 (13:20 -0300)]
x86: move definition to pci-dma.c

Move dma_ops structure definition to pci-dma.c, where it
belongs.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: use dma_length in i386
Glauber Costa [Tue, 8 Apr 2008 16:20:48 +0000 (13:20 -0300)]
x86: use dma_length in i386

This is done to get the code closer to x86_64.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: use WARN_ON in mapping functions
Glauber Costa [Tue, 8 Apr 2008 16:20:49 +0000 (13:20 -0300)]
x86: use WARN_ON in mapping functions

In the very same way i386 do, we use WARN_ON functions
in map_simple and map_sg.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: use sg_phys in x86_64
Glauber Costa [Tue, 8 Apr 2008 16:20:47 +0000 (13:20 -0300)]
x86: use sg_phys in x86_64

To make the code usable in i386, where we have high memory mappings,
we drop te virt_to_bus(sg_virt()) construction in favour of sg_phys.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: Add flush_write_buffers in nommu functions
Glauber Costa [Tue, 8 Apr 2008 16:20:46 +0000 (13:20 -0300)]
x86: Add flush_write_buffers in nommu functions

This patch adds flush_write_buffers() in some functions of pci-nommu_64.c
They are added anywhere i386 would also have it. This is not a problem
for x86_64, since flush_rite_buffers() an nop for it.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: implement mapping_error in pci-nommu_64.c
Glauber Costa [Tue, 8 Apr 2008 16:20:45 +0000 (13:20 -0300)]
x86: implement mapping_error in pci-nommu_64.c

This patch implements mapping_error for pci-nommu_64.c.
It takes care to keep the same compatible behaviour it already
had. Although this file is not (yet) used for i386, we introduce
the i386 version here. Again, care is taken, even at the expense of
an ifdef, to keep the same behaviour inconditionally.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: delete empty functions from pci-nommu_64.c
Glauber Costa [Tue, 8 Apr 2008 16:20:44 +0000 (13:20 -0300)]
x86: delete empty functions from pci-nommu_64.c

This functions are now called conditionally on their
existence in the struct. So just delete them, instead
of keeping an empty implementation.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: introduce pci-dma.c
Glauber Costa [Tue, 8 Apr 2008 16:20:43 +0000 (13:20 -0300)]
x86: introduce pci-dma.c

This patch introduces pci-dma.c, a common file for pci dma
between i386 and x86_64. As a start, dma_set_mask() is the same
between architectures, and is placed there.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_supported and dma_set_mask to pci-dma_32.c, fix
Mark McLoughlin [Thu, 27 Mar 2008 11:03:15 +0000 (11:03 +0000)]
x86: move dma_supported and dma_set_mask to pci-dma_32.c, fix

ERROR: "dma_supported" [drivers/ssb/ssb.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic7xxx.ko] undefined!
ERROR: "dma_set_mask" [drivers/scsi/aic7xxx/aic79xx.ko] undefined!
ERROR: "dma_supported" [drivers/net/pcnet32.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/saa7134/saa7134.ko] undefined!
ERROR: "dma_set_mask" [drivers/media/video/meye.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8802.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx8800.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx88/cx88-alsa.ko] undefined!
ERROR: "dma_supported" [drivers/media/video/cx23885/cx23885.ko] undefined!

They just need to be exported like on x86_64.

dma_supported() and dma_set_mask() were previously inlined,
but are now moved to pci-dma_32.c.

Since they're used by various drivers, they need to be
exported.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: delete the arch-specific dma-mapping headers.
Glauber Costa [Tue, 25 Mar 2008 21:36:39 +0000 (18:36 -0300)]
x86: delete the arch-specific dma-mapping headers.

all the code that is left is ready to be merged as-is
in dma-mapping.h.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY to dma-mapping.h
Glauber Costa [Tue, 25 Mar 2008 21:36:38 +0000 (18:36 -0300)]
x86: move ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY to dma-mapping.h

define it conditionally to i386.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: unify dma_mapping_error
Glauber Costa [Tue, 25 Mar 2008 21:36:37 +0000 (18:36 -0300)]
x86: unify dma_mapping_error

We provide a map_error function in pci-base_32.c to make
sure i386 keeps with the same behaviour it used to.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: provide a bad_dma_address symbol for i386
Glauber Costa [Tue, 25 Mar 2008 21:36:36 +0000 (18:36 -0300)]
x86: provide a bad_dma_address symbol for i386

It's initially 0, since we don't expect any DMA there.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: align to clflush size
Glauber Costa [Tue, 25 Mar 2008 21:36:35 +0000 (18:36 -0300)]
x86: align to clflush size

Do it instead of using the conservative approach we're currently
doing. This is the way x86_64 does, and this patch makes this piece
of code the same between them, ready to be integrated.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_supported and dma_set_mask to pci-dma_32.c
Glauber Costa [Tue, 25 Mar 2008 21:36:34 +0000 (18:36 -0300)]
x86: move dma_supported and dma_set_mask to pci-dma_32.c

This is the way x86_64 does, so this make them equal. They have
to be extern now in the header, and the extern definition is moved to
the common dma-mapping.h header.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_cache_sync to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:33 +0000 (18:36 -0300)]
x86: move dma_cache_sync to common header

they are the same in both architectures.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: dma-ops on highmem fix
Ingo Molnar [Sat, 19 Apr 2008 17:19:56 +0000 (19:19 +0200)]
x86: dma-ops on highmem fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_map_page and dma_unmap_page to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:32 +0000 (18:36 -0300)]
x86: move dma_map_page and dma_unmap_page to common header

They are similar enough to do this move.
the macro version is ugly, and we use inline functions instead.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move alloc and free coherent to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:31 +0000 (18:36 -0300)]
x86: move alloc and free coherent to common header

they are the same between architectures. (except for the fact
that x86_64 has duplicate code)

move them to dma-mapping.h

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_sync_sg_for_device to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:30 +0000 (18:36 -0300)]
x86: move dma_sync_sg_for_device to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_sync_sg_for_cpu to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:29 +0000 (18:36 -0300)]
x86: move dma_sync_sg_for_cpu to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_sync_single_range_for_device to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:28 +0000 (18:36 -0300)]
x86: move dma_sync_single_range_for_device to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_sync_single_range_for_cpu to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:27 +0000 (18:36 -0300)]
x86: move dma_sync_single_range_for_cpu to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_sync_single_for_device to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:26 +0000 (18:36 -0300)]
x86: move dma_sync_single_for_device to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_sync_single_for_cpu to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:25 +0000 (18:36 -0300)]
x86: move dma_sync_single_for_cpu to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_unmap_sg to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:24 +0000 (18:36 -0300)]
x86: move dma_unmap_sg to common header

i386 gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_map_sg to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:23 +0000 (18:36 -0300)]
x86: move dma_map_sg to common header

the old i386 implementation is moved to pci-base_32.c

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_unmap_single to common header
Glauber Costa [Tue, 25 Mar 2008 21:36:22 +0000 (18:36 -0300)]
x86: move dma_unmap_single to common header

i386 base does not need it, so it gets an empty function.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: implement dma_map_single through dma_ops
Glauber Costa [Tue, 25 Mar 2008 21:36:21 +0000 (18:36 -0300)]
x86: implement dma_map_single through dma_ops

That's already the name of the game for x86_64. For i386,
we add a pci-base_32.c, that will hold the default operations.
The function call itself goes through dma-mapping.h , the common
header

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: move dma_ops struct definition to dma-mapping.h
Glauber Costa [Tue, 25 Mar 2008 21:36:20 +0000 (18:36 -0300)]
x86: move dma_ops struct definition to dma-mapping.h

take it off the x86_64 specific header

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: reserve dma32 early for gart
Yinghai Lu [Fri, 7 Mar 2008 23:02:50 +0000 (15:02 -0800)]
x86: reserve dma32 early for gart

a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:

Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33

Call Trace:
 [<ffffffff84037c62>] panic+0xb2/0x190
 [<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
 [<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
 [<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
 [<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
 [<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
 [<ffffffff84506a2f>] ? _etext+0x0/0x1
 [<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
 [<ffffffff847ac795>] mem_init+0x45/0x1a0
 [<ffffffff8479ff35>] start_kernel+0x295/0x380
 [<ffffffff8479f1c2>] _sinittext+0x1c2/0x230

the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.

solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.

the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP

will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agosrat, x86: add support for nodes spanning other nodes
Suresh Siddha [Tue, 25 Mar 2008 17:14:35 +0000 (10:14 -0700)]
srat, x86: add support for nodes spanning other nodes

For example, If the physical address layout on a two node system with 8 GB
memory is something like:
node 0: 0-2GB, 4-6GB
node 1: 2-4GB, 6-8GB

Current kernels fail to boot/detect this NUMA topology.

ACPI SRAT tables can expose such a topology which needs to be supported.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86 vDSO: compile with -g, 64-bit
Roland McGrath [Mon, 14 Apr 2008 19:19:30 +0000 (12:19 -0700)]
x86 vDSO: compile with -g, 64-bit

The 64-bit vDSO's sources are compiled with -g0 for no good reason.
Using -g when enabled lets their separate debug files be used at
runtime via build ID matching, same as we can see 32-bit vDSO's
assembly sources.

Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fpu xstate split fix
Suresh Siddha [Wed, 16 Apr 2008 08:25:35 +0000 (10:25 +0200)]
x86: fpu xstate split fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fpu xstate split cleanup
Suresh Siddha [Wed, 16 Apr 2008 08:27:53 +0000 (10:27 +0200)]
x86: fpu xstate split cleanup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86, fpu: lazy allocation of FPU area - v5
Suresh Siddha [Mon, 10 Mar 2008 22:28:05 +0000 (15:28 -0700)]
x86, fpu: lazy allocation of FPU area - v5

Only allocate the FPU area when the application actually uses FPU, i.e., in the
first lazy FPU trap. This could save memory for non-fpu using apps.

for example: on my system after boot, there are around 300 processes, with
only 17 using FPU.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86, fpu: split FPU state from task struct - v5
Suresh Siddha [Mon, 10 Mar 2008 22:28:04 +0000 (15:28 -0700)]
x86, fpu: split FPU state from task struct - v5

Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two optimizations:

1) only allocate when the application actually uses FPU, so in the first
lazy FPU trap. This could save memory for non-fpu using apps. Next patch
does this lazy allocation.

2) allocate the right size for the actual cpu rather than 512 bytes always.
Patches enabling xsave/xrstor support (coming shortly) will take advantage
of this.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: rename find_max_pfn() to propagate_e820_map()
Ingo Molnar [Wed, 16 Apr 2008 00:29:42 +0000 (02:29 +0200)]
x86: rename find_max_pfn() to propagate_e820_map()

this function doesnt just 'find' the max_pfn - it also has
other side-effects such as registering sparse memory maps.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: tsc prevent time going backwards
Thomas Gleixner [Tue, 1 Apr 2008 17:45:18 +0000 (19:45 +0200)]
x86: tsc prevent time going backwards

We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code forever. This can cause
time jumps in the range of hours.

This was reported in:
     http://lkml.org/lkml/2007/8/23/96
and
     http://lkml.org/lkml/2008/3/31/23

I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.

CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.

The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.

The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.

To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.

I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agogeneric, x86: add tests for prctl PR_GET_TSC and PR_SET_TSC
Erik Bosman [Fri, 11 Apr 2008 16:57:22 +0000 (18:57 +0200)]
generic, x86: add tests for prctl PR_GET_TSC and PR_SET_TSC

This patch adds three tests that test whether the PR_GET_TSC and
PR_SET_TSC commands have the desirable effect.

The tests check whether the control register is updated correctly
at context switches and try to discover bugs while enabling/disabling
the timestamp counter.

Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: implement prctl PR_GET_TSC and PR_SET_TSC
Erik Bosman [Sun, 13 Apr 2008 22:24:18 +0000 (00:24 +0200)]
x86: implement prctl PR_GET_TSC and PR_SET_TSC

This patch implements the PR_GET_TSC and PR_SET_TSC prctl()
commands on the x86 platform (both 32 and 64 bit.) These
commands control the ability to read the timestamp counter
from userspace (the RDTSC instruction.)

While the RDTSC instuction is a useful profiling tool,
it is also the source of some non-determinism in ring-3.
For deterministic replay applications it is useful to be
able to trap and emulate (and record the outcome of) this
instruction.

This patch uses code earlier used to disable the timestamp
counter for the SECCOMP framework. A side-effect of this
patch is that the SECCOMP environment will now also disable
the timestamp counter on x86_64 due to the addition of the
TIF_NOTSC define on this platform.

The code which enables/disables the RDTSC instruction during
context switches is in the __switch_to_xtra function, which
already handles other unusual conditions, so normal
performance should not have to suffer from this change.

Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agogeneric, x86: add prctl commands PR_GET_TSC and PR_SET_TSC
Erik Bosman [Fri, 11 Apr 2008 16:54:17 +0000 (18:54 +0200)]
generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC

This patch adds prctl commands that make it possible
to deny the execution of timestamp counters in userspace.
If this is not implemented on a specific architecture,
prctl will return -EINVAL.

ned-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoftrace: add notrace annotations for NMI routines
Steven Rostedt [Sat, 19 Apr 2008 17:19:55 +0000 (19:19 +0200)]
ftrace: add notrace annotations for NMI routines

This annotates NMI functions with notrace. Some tracers may be able
to live with this, but some cannot. The safest is to turn it off,
it's not particularly interesting anyway.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: clean up =0 initializations in arch/x86/kernel/tsc_32.c
Pavel Machek [Tue, 19 Feb 2008 10:02:30 +0000 (11:02 +0100)]
x86: clean up =0 initializations in arch/x86/kernel/tsc_32.c

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fix exec mappings comments
Jiri Slaby [Sat, 12 Apr 2008 08:28:25 +0000 (10:28 +0200)]
x86: fix exec mappings comments

- noexec32 is on by default for years already
- add noexec32 to kernel-parameters and fix noexec typo in there

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: cleanup: change _end to end_before_pgt
Yinghai Lu [Thu, 10 Apr 2008 22:06:38 +0000 (15:06 -0700)]
x86: cleanup: change _end to end_before_pgt

cleanup: change the _end in compressed vmlinux_64.lds.

also change _heap to _ebss that is not needed.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: cleanup boot-heap usage
Alexander van Heukelum [Tue, 8 Apr 2008 10:54:30 +0000 (12:54 +0200)]
x86: cleanup boot-heap usage

The kernel decompressor wrapper uses memory located beyond the
end of the image. This might lead to hard to debug problems,
but even if it can be proven to be safe, it is at the very
least unclean. I don't see any advantages either, unless you
count it not being zeroed out as an advantage. This patch
moves the boot-heap area to the bss segment.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fix arch/x86/mm/ioremap.c warning
Randy Dunlap [Thu, 10 Apr 2008 22:09:50 +0000 (15:09 -0700)]
x86: fix arch/x86/mm/ioremap.c warning

Fix printk formats in x86/mm/ioremap.c:

next-20080410/arch/x86/mm/ioremap.c:137: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
next-20080410/arch/x86/mm/ioremap.c:188: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'resource_size_t'
next-20080410/arch/x86/mm/ioremap.c:188: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'long unsigned int'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: e820_64, fix section mismatch warning
Jacek Luczak [Thu, 10 Apr 2008 11:40:57 +0000 (13:40 +0200)]
x86: e820_64, fix section mismatch warning

fix section mismatch warnings which occurs on my x86_64 box while compiling
linux-next-20080410:

Warning messages:

WARNING: arch/x86/kernel/built-in.o(.text+0x7bc2): Section mismatch in reference from the function bad_addr() to the
variable .init.data:early_res
The function bad_addr() references
the variable __initdata early_res.
This is often because bad_addr lacks a __initdata
annotation or the annotation of early_res is wrong.

WARNING: arch/x86/kernel/built-in.o(.text+0x7c3b): Section mismatch in reference from the function bad_addr_size() to
the variable .init.data:early_res
The function bad_addr_size() references
the variable __initdata early_res.
This is often because bad_addr_size lacks a __initdata
annotation or the annotation of early_res is wrong.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: remove vm86.h inclusion from process_32.c
Jacek Luczak [Wed, 9 Apr 2008 20:53:50 +0000 (22:53 +0200)]
x86: remove vm86.h inclusion from process_32.c

I've made a small investigation about vm86.h inclusion rules and it
looks like everything is more or less ok.

Files that rely on asm/vm86.h symbols are:

  - kprobes.c
  - process_32.c
  - signal_32.c
  - traps_32.c
  - vm86_32.c

File process_32.c includes vm86.h explicitly. We can remove that
include and it won't break anything.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: remove pointless comments
WANG Cong [Sat, 8 Mar 2008 10:15:06 +0000 (18:15 +0800)]
x86: remove pointless comments

Remove old comments that include the old arch/i386 directory.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: pageattr.c fix shadowed variable warning
Harvey Harrison [Wed, 5 Mar 2008 06:05:27 +0000 (22:05 -0800)]
x86: pageattr.c fix shadowed variable warning

irqs_disabled() uses flags internally, use _flags to avoid shadowing
code calling into this macro.

Introduced between 2.6.25-rc3 and -rc4

Fixes the sparse warning:
arch/x86/mm/pageattr.c:383:21: warning: symbol 'flags' shadows an earlier one
arch/x86/mm/pageattr.c:369:16: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: clean up cpu capabilities accesses, p4-clockmod.c
Ingo Molnar [Tue, 26 Feb 2008 07:52:16 +0000 (08:52 +0100)]
x86: clean up cpu capabilities accesses, p4-clockmod.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86_64: do not reserve ramdisk two times
Yinghai Lu [Tue, 18 Mar 2008 19:51:22 +0000 (12:51 -0700)]
x86_64: do not reserve ramdisk two times

ramdisk is reserved via reserve_early in x86_64_start_kernel,
later early_res_to_bootmem() will convert to reservation in bootmem.

so don't need to reserve that again.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: EFI_PAGE_SHIFT fix
Huang, Ying [Mon, 25 Feb 2008 07:18:37 +0000 (15:18 +0800)]
x86: EFI_PAGE_SHIFT fix

Make x86 EFI code works when EFI_PAGE_SHIFT != PAGE_SHIFT. The
memrage_efi_to_native() provided in this patch can be used on other
EFI platform such as IA64 too.

This patch has been tested on Intel x86_64 platform with EFI 64/32
firmware.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: KGDB build fix
Ingo Molnar [Sat, 19 Apr 2008 17:19:54 +0000 (19:19 +0200)]
x86: KGDB build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Sat, 19 Apr 2008 01:18:30 +0000 (18:18 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  security: fix up documentation for security_module_enable
  Security: Introduce security= boot parameter
  Audit: Final renamings and cleanup
  SELinux: use new audit hooks, remove redundant exports
  Audit: internally use the new LSM audit hooks
  LSM/Audit: Introduce generic Audit LSM hooks
  SELinux: remove redundant exports
  Netlink: Use generic LSM hook
  Audit: use new LSM hooks instead of SELinux exports
  SELinux: setup new inode/ipc getsecid hooks
  LSM: Introduce inode_getsecid and ipc_getsecid hooks

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26
Linus Torvalds [Sat, 19 Apr 2008 01:02:35 +0000 (18:02 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6.26

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
  [NET]: Fix and allocate less memory for ->priv'less netdevices
  [IPV6]: Fix dangling references on error in fib6_add().
  [NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
  [PKT_SCHED]: Fix datalen check in tcf_simp_init().
  [INET]: Uninline the __inet_inherit_port call.
  [INET]: Drop the inet_inherit_port() call.
  SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
  [netdrvr] forcedeth: internal simplifications; changelog removal
  phylib: factor out get_phy_id from within get_phy_device
  PHY: add BCM5464 support to broadcom PHY driver
  cxgb3: Fix __must_check warning with dev_dbg.
  tc35815: Statistics cleanup
  natsemi: fix MMIO for PPC 44x platforms
  [TIPC]: Cleanup of TIPC reference table code
  [TIPC]: Optimized initialization of TIPC reference table
  [TIPC]: Remove inlining of reference table locking routines
  e1000: convert uint16_t style integers to u16
  ixgb: convert uint16_t style integers to u16
  sb1000.c: make const arrays static
  sb1000.c: stop inlining largish static functions
  ...

16 years agosecurity: fix up documentation for security_module_enable
James Morris [Fri, 7 Mar 2008 01:23:49 +0000 (12:23 +1100)]
security: fix up documentation for security_module_enable

security_module_enable() can only be called during kernel init.

Signed-off-by: James Morris <jmorris@namei.org>
16 years agoSecurity: Introduce security= boot parameter
Ahmed S. Darwish [Thu, 6 Mar 2008 16:09:10 +0000 (18:09 +0200)]
Security: Introduce security= boot parameter

Add the security= boot parameter. This is done to avoid LSM
registration clashes in case of more than one bult-in module.

User can choose a security module to enable at boot. If no
security= boot parameter is specified, only the first LSM
asking for registration will be loaded. An invalid security
module name will be treated as if no module has been chosen.

LSM modules must check now if they are allowed to register
by calling security_module_enable(ops) first. Modify SELinux
and SMACK to do so.

Do not let SMACK register smackfs if it was not chosen on
boot. Smackfs assumes that smack hooks are registered and
the initial task security setup (swapper->security) is done.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoAudit: Final renamings and cleanup
Ahmed S. Darwish [Fri, 18 Apr 2008 23:59:43 +0000 (09:59 +1000)]
Audit: Final renamings and cleanup

Rename the se_str and se_rule audit fields elements to
lsm_str and lsm_rule to avoid confusion.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoSELinux: use new audit hooks, remove redundant exports
Ahmed S. Darwish [Sat, 1 Mar 2008 20:03:14 +0000 (22:03 +0200)]
SELinux: use new audit hooks, remove redundant exports

Setup the new Audit LSM hooks for SELinux.
Remove the now redundant exported SELinux Audit interface.

Audit: Export 'audit_krule' and 'audit_field' to the public
since their internals are needed by the implementation of the
new LSM hook 'audit_rule_known'.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoAudit: internally use the new LSM audit hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 20:01:11 +0000 (22:01 +0200)]
Audit: internally use the new LSM audit hooks

Convert Audit to use the new LSM Audit hooks instead of
the exported SELinux interface.

Basically, use:
security_audit_rule_init
secuirty_audit_rule_free
security_audit_rule_known
security_audit_rule_match

instad of (respectively) :
selinux_audit_rule_init
selinux_audit_rule_free
audit_rule_has_selinux
selinux_audit_rule_match

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
16 years agoLSM/Audit: Introduce generic Audit LSM hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 20:00:05 +0000 (22:00 +0200)]
LSM/Audit: Introduce generic Audit LSM hooks

Introduce a generic Audit interface for security modules
by adding the following new LSM hooks:

audit_rule_init(field, op, rulestr, lsmrule)
audit_rule_known(krule)
audit_rule_match(secid, field, op, rule, actx)
audit_rule_free(rule)

Those hooks are only available if CONFIG_AUDIT is enabled.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoSELinux: remove redundant exports
Ahmed S. Darwish [Sat, 1 Mar 2008 19:58:32 +0000 (21:58 +0200)]
SELinux: remove redundant exports

Remove the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)

They can be substitued with the following generic equivalents
respectively:
new LSM hook, inode_getsecid(inode, secid)
new LSM hook, ipc_getsecid*(ipcp, secid)
LSM hook, task_getsecid(tsk, secid)
LSM hook, sid_to_secctx(sid, ctx, len)

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoNetlink: Use generic LSM hook
Ahmed S. Darwish [Sat, 1 Mar 2008 19:56:22 +0000 (21:56 +0200)]
Netlink: Use generic LSM hook

Don't use SELinux exported selinux_get_task_sid symbol.
Use the generic LSM equivalent instead.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoAudit: use new LSM hooks instead of SELinux exports
Ahmed S. Darwish [Sat, 1 Mar 2008 19:54:38 +0000 (21:54 +0200)]
Audit: use new LSM hooks instead of SELinux exports

Stop using the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
kfree(ctx)

and use following generic LSM equivalents respectively:
security_inode_getsecid(inode, secid)
security_ipc_getsecid*(ipcp, secid)
security_task_getsecid(tsk, secid)
security_sid_to_secctx(sid, ctx, len)
security_release_secctx(ctx, len)

Call security_release_secctx only if security_secid_to_secctx
succeeded.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoSELinux: setup new inode/ipc getsecid hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 19:52:30 +0000 (21:52 +0200)]
SELinux: setup new inode/ipc getsecid hooks

Setup the new inode_getsecid and ipc_getsecid() LSM hooks
for SELinux.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years agoLSM: Introduce inode_getsecid and ipc_getsecid hooks
Ahmed S. Darwish [Sat, 1 Mar 2008 19:51:09 +0000 (21:51 +0200)]
LSM: Introduce inode_getsecid and ipc_getsecid hooks

Introduce inode_getsecid(inode, secid) and ipc_getsecid(ipcp, secid)
LSM hooks. These hooks will be used instead of similar exported
SELinux interfaces.

Let {inode,ipc,task}_getsecid hooks set the secid to 0 by default
if CONFIG_SECURITY is not defined or if the hook is set to
NULL (dummy). This is done to notify the caller that no valid
secid exists.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
16 years ago[NET]: Fix and allocate less memory for ->priv'less netdevices
Alexey Dobriyan [Fri, 18 Apr 2008 22:43:32 +0000 (15:43 -0700)]
[NET]: Fix and allocate less memory for ->priv'less netdevices

This patch effectively reverts commit d0498d9ae1a5cebac363e38907266d5cd2eedf89
aka "[NET]: Do not allocate unneeded memory for dev->priv alignment."
It was found to be buggy because of final unconditional += NETDEV_ALIGN_CONST
removal.

For example, for sizeof(struct net_device) being 2048 bytes, "alloc_size"
was also 2048 bytes, but allocator with debugging options turned on started
giving out !32-byte aligned memory resulting in redzones overwrites.

Patch does small optimization in ->priv'less case: bumping size to next
32-byte boundary was always done to ensure ->priv will also be aligned.
But, no ->priv, no need to do that.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agox86 PAT: fix mmap() of holes
Ingo Molnar [Fri, 18 Apr 2008 19:32:22 +0000 (21:32 +0200)]
x86 PAT: fix mmap() of holes

do not return a -EINVAL when mmap()-ing PCI holes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
Linus Torvalds [Fri, 18 Apr 2008 18:25:31 +0000 (11:25 -0700)]
Merge git://git./linux/kernel/git/jejb/scsi-misc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
  [SCSI] iscsi: bidi support for iscsi_tcp
  [SCSI] iscsi: bidi support at the generic libiscsi level
  [SCSI] iscsi: extended cdb support
  [SCSI] zfcp: Fix error handling for blocked unit for send FCP command
  [SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
  [SCSI] zfcp: fix 31 bit compile warnings
  [SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
  [SCSI] bsg: remove minor in struct bsg_device
  [SCSI] bsg: use better helper list functions
  [SCSI] bsg: replace kobject_get with blk_get_queue
  [SCSI] bsg: takes a ref to struct device in fops->open
  [SCSI] qla1280: remove version check
  [SCSI] libsas: fix endianness bug in sas_ata
  [SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
  [SCSI] aacraid: Do not describe check_reset parameter with its value
  [SCSI] aacraid: Fix down_interruptible() to check the return value
  [SCSI] sun3_scsi_vme: add MODULE_LICENSE
  [SCSI] st: rename flush_write_buffer()
  [SCSI] tgt: use KMEM_CACHE macro
  [SCSI] initio: fix big endian problems for auto request sense
  ...

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Fri, 18 Apr 2008 18:24:29 +0000 (11:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (43 commits)
  firewire: cleanups
  firewire: fix synchronization of gap counts
  firewire: wait until PHY configuration packet was transmitted (fix bus reset loop)
  firewire: remove unused struct member
  firewire: use bitwise and to get reg in handle_registers
  firewire: replace more hex values with defined csr constants
  firewire: reread config ROM when device reset the bus
  firewire: replace static ROM cache by allocated cache
  firewire: fw-ohci: work around generation bug in TI controllers (fix AV/C and more)
  firewire: fw-ohci: extend logging of bus generations and node ID
  firewire: fw-ohci: conditionally log busReset interrupts
  firewire: fw-ohci: don't append to AT context when it's not active
  firewire: fw-ohci: log regAccessFail events
  firewire: fw-ohci: make sure HCControl register LPS bit is set
  firewire: fw-ohci: missing PPC PMac feature calls in failure path
  firewire: fw-ohci: untangle a mixed unsigned/signed expression
  firewire: debug interrupt events
  firewire: fw-ohci: catch self_id_count == 0
  firewire: fw-ohci: add self ID error check
  firewire: fw-ohci: refactor probe, remove, suspend, resume
  ...

16 years agolibata: fix boot panic with SATAPI devices on non-SFF HBAs
James Bottomley [Fri, 18 Apr 2008 18:18:48 +0000 (13:18 -0500)]
libata: fix boot panic with SATAPI devices on non-SFF HBAs

The kernel now panics reliably on boot if you have a SATAPI device
connected.

The problem was introduced by the libata merge trying to pull out all
the SFF code into a separate module.  Unfortunately, if you're a satapi
device you usually need to call atapi_request_sense, which has a bare
invocation of a SFF callback which is NULL on non-SFF HBAs.  Fix this by
making the call conditional.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Fri, 18 Apr 2008 17:15:22 +0000 (10:15 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (64 commits)
  ocfs2/net: Add debug interface to o2net
  ocfs2: Only build ocfs2/dlm with the o2cb stack module
  ocfs2/cluster: Get rid of arguments to the timeout routines
  ocfs2: Put tree in MAINTAINERS
  ocfs2: Use BUG_ON
  ocfs2: Convert ocfs2 over to unlocked_ioctl
  ocfs2: Improve rename locking
  fs/ocfs2/aops.c: test for IS_ERR rather than 0
  ocfs2: Add inode stealing for ocfs2_reserve_new_inode
  ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
  ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
  ocfs2: Enable cross extent block merge.
  ocfs2: Add support for cross extent block
  ocfs2: Move /sys/o2cb to /sys/fs/o2cb
  sysfs: Allow removal of symlinks in the sysfs root
  ocfs2:  Reconnect after idle time out.
  ocfs2/dlm: Cleanup lockres print
  ocfs2/dlm: Fix lockname in lockres print function
  ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
  ocfs2/dlm: Dumps the purgelist into a debugfs file
  ...

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
Linus Torvalds [Fri, 18 Apr 2008 17:02:46 +0000 (10:02 -0700)]
Merge git://git./linux/kernel/git/steve/gfs2-2.6-nmw

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (49 commits)
  [GFS2] fix assertion in log_refund()
  [GFS2] fix GFP_KERNEL misuses
  [GFS2] test for IS_ERR rather than 0
  [GFS2] Invalidate cache at correct point
  [GFS2] fs/gfs2/recovery.c: suppress warnings
  [GFS2] Faster gfs2_bitfit algorithm
  [GFS2] Streamline quota lock/check for no-quota case
  [GFS2] Remove drop of module ref where not needed
  [GFS2] gfs2_adjust_quota has broken unstuffing code
  [GFS2] possible null pointer dereference fixup
  [GFS2] Need to ensure that sector_t is 64bits for GFS2
  [GFS2] re-support special inode
  [GFS2] remove gfs2_dev_iops
  [GFS2] fix file_system_type leak on gfs2meta mount
  [GFS2] Allow bmap to allocate extents
  [GFS2] Fix a page lock / glock deadlock
  [GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops
  [GFS2] gfs2/ops_file.c should #include "ops_inode.h"
  [GFS2] be*_add_cpu conversion
  [GFS2] Fix bug where we called drop_bh incorrectly
  ...

16 years agox86: kgdb build fix
Harvey Harrison [Fri, 18 Apr 2008 16:54:38 +0000 (09:54 -0700)]
x86: kgdb build fix

TF_MASK is no longer defined, use X86_EFLAGS_TF.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years ago[SCSI] iscsi: bidi support for iscsi_tcp
Boaz Harrosh [Fri, 18 Apr 2008 15:11:53 +0000 (10:11 -0500)]
[SCSI] iscsi: bidi support for iscsi_tcp

access the right scsi_in() and/or scsi_out() side of things.
also for resid

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Reviewed-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] iscsi: bidi support at the generic libiscsi level
Boaz Harrosh [Fri, 18 Apr 2008 15:11:52 +0000 (10:11 -0500)]
[SCSI] iscsi: bidi support at the generic libiscsi level

- prepare the additional bidi_read rlength header.
- access the right scsi_in() and/or scsi_out() side of things.
  also for resid.
- Handle BIDI underflow overflow from target

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Reviewed-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] iscsi: extended cdb support
Boaz Harrosh [Fri, 18 Apr 2008 15:11:51 +0000 (10:11 -0500)]
[SCSI] iscsi: extended cdb support

Support for extended CDBs in iscsi.
All we need is to check if command spills over 16 bytes then allocate
an iscsi-extended-header for the leftovers.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Reviewed-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
Christof Schmitt [Fri, 18 Apr 2008 10:51:57 +0000 (12:51 +0200)]
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command

In the case the unit is blocked, zfcp_unit_get has not been called
yet, so the error handling path should not call zfcp_unit_put.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
Christof Schmitt [Fri, 18 Apr 2008 10:51:56 +0000 (12:51 +0200)]
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock

The testcase
# chchp -v 0 0.da && sleep 59 && chchp -v 1 0.da
results in this deadlock situation:

STACK TRACE FOR TASK: 0x7e9a2048 (zfcperp0.0.c613)
0 schedule+816 [0x356b3c]
1 schedule_timeout+172 [0x357340]
2 wait_for_common+192 [0x3565fc]
3 flush_cpu_workqueue+116 [0x52af0]
4 flush_workqueue+116 [0x533b8]
5 fc_remote_port_add+64 [0x1c83ec]
6 zfcp_erp_thread+4534 [0x26585a]
7 kernel_thread_starter+6 [0x195d2]

STACK TRACE FOR TASK: 0x7f8ec048 (fc_wq_0)
0 schedule+816 [0x356b3c]
1 zfcp_erp_wait+104 [0x264568]
2 zfcp_scsi_slave_destroy+64 [0x261b24]
3 __scsi_remove_device+154 [0x1c24ba]
4 scsi_remove_device+62 [0x1c2512]
5 __scsi_remove_target+198 [0x1c25ea]
6 __remove_child+58 [0x1c26d6]
7 device_for_each_child+66 [0x1ab566]
8 scsi_remove_target+98 [0x1c268a]
9 run_workqueue+200 [0x5272c]
10 worker_thread+146 [0x52882]
11 kthread+140 [0x58360]
12 kernel_thread_starter+6 [0x195d2]

Remove the zfcp_erp_wait call that is not required here to prevent the
deadlock situation.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] zfcp: fix 31 bit compile warnings
Martin Peschke [Fri, 18 Apr 2008 10:51:55 +0000 (12:51 +0200)]
[SCSI] zfcp: fix 31 bit compile warnings

drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_rscn’:
drivers/s390/scsi/zfcp_aux.c:1379: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_plogi’:
drivers/s390/scsi/zfcp_aux.c:1432: warning: cast from pointer to integer of
different size
drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_fsf_incoming_els_logo’:
drivers/s390/scsi/zfcp_aux.c:1457: warning: cast from pointer to integer of
different size
..

Just passing pointers rids us of these warnings and improves readability.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:42 +0000 (10:03 +0900)]
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands

Before bsg_complete_all_commands is called, BSG_F_BLOCK bit is always
set.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: remove minor in struct bsg_device
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:41 +0000 (10:03 +0900)]
[SCSI] bsg: remove minor in struct bsg_device

minor in struct bsg_device is used as identifier to find the
corresponding struct bsg_device_class. However, request_queuse can be
used as identifier for that and the minor in struct bsg_device is
unnecessary.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: use better helper list functions
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:40 +0000 (10:03 +0900)]
[SCSI] bsg: use better helper list functions

This replace hlist_for_each and list_entry with hlist_for_each_entry
and list_first_entry respectively.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: replace kobject_get with blk_get_queue
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:39 +0000 (10:03 +0900)]
[SCSI] bsg: replace kobject_get with blk_get_queue

Both takes a ref to a queue. But blk_get_queue checks QUEUE_FLAG_DEAD
and is more appropriate interface here.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years ago[SCSI] bsg: takes a ref to struct device in fops->open
FUJITA Tomonori [Mon, 31 Mar 2008 01:03:38 +0000 (10:03 +0900)]
[SCSI] bsg: takes a ref to struct device in fops->open

bsg_register_queue() takes a ref to struct device that a caller
passes. For example, bsg takes a ref to the sdev_gendev for scsi
devices. However, bsg doesn't inrease the refcount in fops->open. So
while an application opens a bsg device, the scsi device that the bsg
device holds can go away (bsg also takes a ref to a queue, but it
doesn't prevent the device from going away).

With this patch, bsg increases the refcount of struct device in
fops->open and decreases it in fops->release.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
16 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Fri, 18 Apr 2008 16:44:55 +0000 (09:44 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (27 commits)
  [IA64] kdump: Add crash_save_vmcoreinfo for INIT
  [IA64] Fix NUMA configuration issue
  [IA64] Itanium Spec updates
  [IA64] Untangle sync_icache_dcache() page size determination
  [IA64] arch/ia64/kernel/: use time_* macros
  [IA64] remove redundant display of free swap space in show_mem()
  [IA64] make IOMMU respect the segment boundary limits
  [IA64] kprobes: kprobe-booster for ia64
  [IA64] fix getpid and set_tid_address fast system calls for pid namespaces
  [IA64] Replace explicit jiffies tests with time_* macros.
  [IA64] use goto to jump out do/while_each_thread
  [IA64] Fix unlock ordering in smp_callin
  [IA64] pgd_offset() constfication.
  [IA64] kdump: crash.c coding style fix
  [IA64] kdump: add kdump_on_fatal_mca
  [IA64] Minimize per_cpu reservations.
  [IA64] Correct pernodesize calculation.
  [IA64] Kernel parameter for max number of concurrent global TLB purges
  [IA64] Multiple outstanding ptc.g instruction support
  [IA64] Implement smp_call_function_mask for ia64
  ...

16 years agoocfs2/net: Add debug interface to o2net
Sunil Mushran [Mon, 14 Apr 2008 17:46:19 +0000 (10:46 -0700)]
ocfs2/net: Add debug interface to o2net

This patch exposes o2net information via debugfs. The information includes
the list of sockets (sock_containers) as well as the list of outstanding
messages (send_tracking). Useful for o2dlm debugging.

(This patch is derived from an earlier one written by Zach Brown that
exposed the same information via /proc.)

[Mark: checkpatch fixes]

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Only build ocfs2/dlm with the o2cb stack module
Mark Fasheh [Fri, 4 Apr 2008 19:45:55 +0000 (12:45 -0700)]
ocfs2: Only build ocfs2/dlm with the o2cb stack module

fs/ocfs2/dlm/ocfs2_dlm.ko and fs/ocfs2/dlm/ocfs2_dlmfs.ko get built if
CONFIG_FS_OCFS2 is specified. This isn't quite how it should happen any more
- the "o2cb" dlm modules should only be built if CONFIG_FS_OCFS2_O2CB is
set, so update the dlm Makefile accordingly.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
16 years agoocfs2/cluster: Get rid of arguments to the timeout routines
Jeff Mahoney [Fri, 28 Mar 2008 23:44:13 +0000 (16:44 -0700)]
ocfs2/cluster: Get rid of arguments to the timeout routines

We keep seeing bug reports related to NULL pointer derefs in
o2net_set_nn_state(). When I originally wrote up the configurable timeout
patch, I had tried to plan for multiple clusters. This was silly.

The timeout routines all use o2nm_single_cluster so there's no point in
passing an argument at all. This patch removes the arguments and kills those
bugs dead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Put tree in MAINTAINERS
Joel Becker [Sun, 23 Mar 2008 05:08:07 +0000 (22:08 -0700)]
ocfs2: Put tree in MAINTAINERS

The ocfs2 MAINTAINERS entry should have the git tree URL.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Use BUG_ON
Julia Lawall [Tue, 4 Mar 2008 23:21:05 +0000 (15:21 -0800)]
ocfs2: Use BUG_ON

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Convert ocfs2 over to unlocked_ioctl
Andi Kleen [Sun, 27 Jan 2008 02:17:17 +0000 (03:17 +0100)]
ocfs2: Convert ocfs2 over to unlocked_ioctl

As far as I can see there is nothing in ocfs2_ioctl that requires the BKL,
so use unlocked_ioctl

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
16 years agoocfs2: Improve rename locking
Jan Kara [Thu, 21 Feb 2008 17:00:00 +0000 (18:00 +0100)]
ocfs2: Improve rename locking

ocfs2_rename() was being too aggressive with the rename lock - we only need
it for certain forms of directory rename.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>