pandora-kernel.git
12 years agopowerpc: Radix trees are available before init_IRQ
Milton Miller [Tue, 10 May 2011 19:29:53 +0000 (19:29 +0000)]
powerpc: Radix trees are available before init_IRQ

Since the generic irq code uses a radix tree for sparse interrupts,
the initcall ordering has been changed to initialize radix trees before
irqs.   We no longer need to defer creating revmap radix trees to the
arch_initcall irq_late_init.

Also, the kmem caches are allocated so we don't need to use
zalloc_maybe_bootmem.

Signed-off-by: Milton Miller <miltonm@bga.com>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/xics: Cleanup xics_host_map and ipi
Milton Miller [Tue, 10 May 2011 19:29:49 +0000 (19:29 +0000)]
powerpc/xics: Cleanup xics_host_map and ipi

Since we already have a special case in map to set the ipi handler, use
the desired flow.

If we don't find an ics to handle the interrupt complain instead of
returning 0 without having set a chip or handler.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Use bytes instead of bitops in smp ipi multiplexing
Milton Miller [Tue, 10 May 2011 19:29:46 +0000 (19:29 +0000)]
powerpc: Use bytes instead of bitops in smp ipi multiplexing

Since there are only 4 messages, we can replace the atomic bit set
(which uses atomic load reserve and store conditional sequence) with
a byte stores to seperate bytes.  We still have to perform a load
reserve and store conditional sequence to avoid loosing messages on
reception but we can do that with a single call to xchg.

The do {} while and __BIG_ENDIAN specific mask testing was chosen by
looking at the generated asm code.  On gcc-4.4, the bit masking becomes
a simple bit mask and test of the register returned from xchg without
storing and loading the value to the stack like attempts with a union
of bytes and an int (or worse, loading single bit constants from the
constant pool into non-voliatle registers that had to be preseved on
the stack).  The do {} while avoids an unconditional branch to the
end of the loop to test the entry / repeat condition of a while loop
and instead optimises for the expected single iteration of the loop.

We have a full mb() at the beginning to cover ordering between send,
ipi, and receive so we can use xchg_local and forgo the further
acquire and release barriers of xchg.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Add kconfig for muxed smp ipi support
Milton Miller [Tue, 10 May 2011 19:29:42 +0000 (19:29 +0000)]
powerpc: Add kconfig for muxed smp ipi support

Compile the new smp ipi mux and demux code only if a platform
will make use of it.  The new config is selected as required.

The new cause_ipi smp op is only available conditionally to point out
configs where the select is required; this makes setting the op an
immediate fail instead of a deferred unresolved symbol at link.

This also creates a new config for power surge powermac upgrade support
that can be disabled in expert mode but is default on.

I also removed the depends / default y on CONFIG_XICS since it is selected
by PSERIES.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Consolidate ipi message mux and demux
Milton Miller [Tue, 10 May 2011 19:29:39 +0000 (19:29 +0000)]
powerpc: Consolidate ipi message mux and demux

Consolidate the mux and demux of ipi messages into smp.c and call
a new smp_ops callback to actually trigger the ipi.

The powerpc architecture code is optimised for having 4 distinct
ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
single, scheduler ipi, and enter debugger).  However, several interrupt
controllers only provide a single software triggered interrupt that
can be delivered to each cpu.  To resolve this limitation, each smp_ops
implementation created a per-cpu variable that is manipulated with atomic
bitops.  Since these lines will be contended they are optimialy marked as
shared_aligned and take a full cache line for each cpu.  Distro kernels
may have 2 or 3 of these in their config, each taking per-cpu space
even though at most one will be in use.

This consolidation removes smp_message_recv and replaces the single call
actions cases with direct calls from the common message recognition loop.
The complicated debugger ipi case with its muxed crash handling code is
moved to debug_ipi_action which is now called from the demux code (instead
of the multi-message action calling smp_message_recv).

I put a call to reschedule_action to increase the likelyhood of correctly
merging the anticipated scheduler_ipi() hook coming from the scheduler
tree; that single required call can be inlined later.

The actual message decode is a copy of the old pseries xics code with its
memory barriers and cache line spacing, augmented with a per-cpu unsigned
long based on the book-e doorbell code.  The optional data is set via a
callback from the implementation and is passed to the new cause-ipi hook
along with the logical cpu number.  While currently only the doorbell
implemntation uses this data it should be almost zero cost to retrieve and
pass it -- it adds a single register load for the argument from the same
cache line to which we just completed a store and the register is dead
on return from the call.  I extended the data element from unsigned int
to unsigned long in case some other code wanted to associate a pointer.

The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
conditioned on the CPU_DBELL feature.  The ifdef guard could be relaxed
to CONFIG_SMP but I left it with BOOKE for now.

Also, the doorbell interrupt vector for book-e was not calling irq_enter
and irq_exit, which throws off cpu accounting and causes code to not
realize it is running in interrupt context.  Add the missing calls.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Move smp_ops_t from machdep.h to smp.h
Milton Miller [Tue, 10 May 2011 19:29:35 +0000 (19:29 +0000)]
powerpc: Move smp_ops_t from machdep.h to smp.h

I can't see any reason these functions are needed by machdep.h
and they are all hidden by CONFIG_SMP with no UP alternative.

Also move the declarations for the fallback timebase ops, which
are used to fill in the smp ops.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove stubbed beat smp support
Milton Miller [Tue, 10 May 2011 19:29:28 +0000 (19:29 +0000)]
powerpc: Remove stubbed beat smp support

I have no idea if the beat hypervisor supports multiple cpus in
a partition, but the code has not been touched since these stubs
were added in February of 2007 except to move them in April of 2008.
These are stubs: start_cpu always returns fail (which is dropped),
the message passing and reciving are empty functions, and the top
of file comment says "Incomplete".

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove alloc_maybe_bootmem for zalloc version
Milton Miller [Tue, 10 May 2011 19:29:24 +0000 (19:29 +0000)]
powerpc: Remove alloc_maybe_bootmem for zalloc version

Replace all remaining callers of alloc_maybe_bootmem with
zalloc_maybe_bootmem.   The callsite in pci_dn is followed with a
memset to clear the memory, and not zeroing at the other callsites
in the celleb fake pci code could lead to following uninitialized
memory as pointers or even freeing said pointers on error paths.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove powermac/pic.h
Milton Miller [Tue, 10 May 2011 19:29:20 +0000 (19:29 +0000)]
powerpc: Remove powermac/pic.h

Its unused, and of the three declarations, one is duplicated in pmac.h,
the second is static and the third is renamed and static.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/mpic: Simplify ipi cpu mask handling
Milton Miller [Tue, 10 May 2011 19:29:17 +0000 (19:29 +0000)]
powerpc/mpic: Simplify ipi cpu mask handling

Now that MSG_ALL and MSG_ALL_BUT_SELF have been eliminated,
smp_mpic_mesage_pass no longer needs to lookup the cpumask just to
have mpic_send_ipi extract part of it and recode it in a NR_CPUS loop
by mpic_physmask.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove checks for MSG_ALL and MSG_ALL_BUT_SELF
Milton Miller [Tue, 10 May 2011 19:29:10 +0000 (19:29 +0000)]
powerpc: Remove checks for MSG_ALL and MSG_ALL_BUT_SELF

Now that smp_ops->smp_message_pass is always called with an (online) cpu
number for the target remove the checks for MSG_ALL and MSG_ALL_BUT_SELF.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove call sites of MSG_ALL_BUT_SELF
Milton Miller [Tue, 10 May 2011 19:29:06 +0000 (19:29 +0000)]
powerpc: Remove call sites of MSG_ALL_BUT_SELF

The only user of MSG_ALL_BUT_SELF in the whole kernel tree is powerpc,
and it only uses it to start the debugger. Both debuggers always call
smp_send_debugger_break with MSG_ALL_BUT_SELF, and only mpic can do
anything more optimal than a loop over all online cpus, but all message
passing implementations have to code for this special delivery target.

Convert smp_send_debugger_break to take void and loop calling the smp_ops
message_pass function for each of the other cpus in the online cpumask.

Use raw_smp_processor_id() because we are either entering the debugger
or trying to start kdump and the additional warning it not useful were
it to trigger.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/mpic: Break cpumask abstraction earlier
Milton Miller [Tue, 10 May 2011 19:29:02 +0000 (19:29 +0000)]
powerpc/mpic: Break cpumask abstraction earlier

mpic_set_affinity is allocating and freeing a cpumask var even though
it was breaking the cpumask abstraction when passing the mask to
mpic_physmask.  It also didn't have any check for allocatin failure.

Break the cpumask abstraction earlier and use simple bitwise and of the
bits from the mask with the bits of cpu_online_mask.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/mpic: Limit NR_CPUS loop to 32 bit
Milton Miller [Tue, 10 May 2011 19:28:59 +0000 (19:28 +0000)]
powerpc/mpic: Limit NR_CPUS loop to 32 bit

mpic_physmask was looping NR_CPUS times over a mask that was passed as
a u32. Since mpic is architecturaly limited to 32 physical cpus, clamp
the logical cpus to 32 when compiling (we could also clamp at runtime
to nr_cpu_ids).

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Call no-longer static setup_nr_cpu_ids instead of replicating it
Milton Miller [Tue, 10 May 2011 19:28:55 +0000 (19:28 +0000)]
powerpc: Call no-longer static setup_nr_cpu_ids instead of replicating it

c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set nr_cpu_ids early
and use it to free PACAs) copied the formerly static setup_nr_cpu_ids
from init/main.c but 34db18a054c600b6f81787165669dc572fe4de25 (smp:
move smp setup functions to kernel/smp.c) moved it to kernel/smp.c
with a declaration in include/linux/smp.h, so we can call it instead of
replicating it.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Use nr_cpu_ids in initial paca allocation
Milton Miller [Tue, 10 May 2011 19:28:52 +0000 (19:28 +0000)]
powerpc: Use nr_cpu_ids in initial paca allocation

Now that we never set a cpu above nr_cpu_ids possible we can
limit our initial paca allocation to nr_cpu_ids.  We can then
clamp the number of cpus in platforms/iseries/setup.c.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Respect nr_cpu_ids when calling set_cpu_possible and set_cpu_present
Milton Miller [Tue, 10 May 2011 19:28:48 +0000 (19:28 +0000)]
powerpc: Respect nr_cpu_ids when calling set_cpu_possible and set_cpu_present

We should not set cpus above nr_cpu_ids to possible.  While we
will trigger a warning with CONFIG_CPUMASK_DEBUG, even then the mask
initializers will set the bits beyond what the iterators check and cause
nr_cpu_ids to increase.

Respecting nr_cpu_ids during setup will allow us to use it in our initial
paca allocation.  It can be reduced from NR_CPUS by the existing early param
nr_cpus=, which was added in 2b633e3fac5efada088b57d31e65401f22bcc18f (smp:
Use nr_cpus= to set nr_cpu_ids early).  We already call parse_early_parms
between finding the command line and allocating the pacas.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/iseries: Cleanup and fix secondary startup
Milton Miller [Tue, 10 May 2011 19:28:44 +0000 (19:28 +0000)]
powerpc/iseries: Cleanup and fix secondary startup

9cb82f2f4692293a27c578c3038518ce4477de72 (Make iSeries spin on
__secondary_hold_spinloop, like pSeries) added a load of current_set
but this load was repeated later and we don't even have the paca yet.
It also checked __secondary_hold_spinloop with a 32 bit compare instead
of a 64 bit compare.

b6f6b98a4e91fcf31db7de54c3aa86252fc6fb5f (Don't spin on sync instruction
at boot time) missed the copy of the startup code in iseries.

1426d5a3bd07589534286375998c0c8c6fdc5260 (Dynamically allocate pacas)
doesn't allow for pacas to be less than lppacas and recalculated the paca
location from the cpu id in r0 every time through the secondary loop.

Various revisions over time made the comments on conditional branches
confusing with respect to being a hold loop or forward progress

Mostly in-order description of the changes:

Replicate the few lines of code saved by the ugly scoped ifdef CONFIG_SMP
in the secondary loop between yielding on UP and marking time with the
hypervisor on SMP.  Always compile the iseries_secondary_yield loop and
use it if the cpu id is above nr_cpu_ids.  Change all forward progress
paths to be forward branches to the next numerical label.  Assign a
label to all loops.  Move all sync instructions from the loops to the
forward progress path.  Wait to load current_set until paca is set to go.
Move the iseries_secondary_smp_loop label to cover the whole spin loop.
Add HMT_MEDIUM when we make forward progress.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/kdump64: Don't reference freed memory as pacas
Milton Miller [Tue, 10 May 2011 19:28:41 +0000 (19:28 +0000)]
powerpc/kdump64: Don't reference freed memory as pacas

Starting with 1426d5a3bd07589534286375998c0c8c6fdc5260 (powerpc:
Dynamically allocate pacas) the space for pacas beyond cpu_possible
is freed, but we failed to update the loop in crash.c.

Since c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) the number of pacas allocated is
always nr_cpu_ids.

Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org> # .34.x
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Don't search for paca in freed memory
Milton Miller [Tue, 10 May 2011 19:28:37 +0000 (19:28 +0000)]
powerpc: Don't search for paca in freed memory

Starting with 1426d5a3bd07589534286375998c0c8c6fdc5260 (powerpc:
Dynamically allocate pacas) we free the memory for pacas beyond
cpu_possible, but we failed to update the loop the secondary cpus use
to find their paca.  If the system has running cpu threads for which
the kernel did not allocate a paca for they will search the memory that
was freed.  For instance this could happen when the device tree for
a kdump kernel was not updated after a cpu hotplug, or the kernel is
running with more cpus than the kernel was configured.

Since c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) we set nr_cpu_ids before telling the
cpus to advance, so use that to limit the search.

We can't reference nr_cpu_ids without CONFIG_SMP because it is defined
as 1 instead of a memory location, but any extra threads should be sent
to kexec_wait in that case anyways, so make that explicit and remove
the search loop for UP.

Note to stable: The fix also requires
c1854e00727f50f7ac99e98d26ece04c087ef785 (powerpc: Set
nr_cpu_ids early and use it to free PACAs) to function.  Also
9d07bc841c9779b4d7902e417f4e509996ce805d (Properly handshake CPUs going
out of boot spin loop) affects the second chunk, specifically the branch
target was 3b before and is 4b after that patch, and there was a blank
line before the #ifdef CONFIG_SMP that was removed

Cc: <stable@kernel.org> # .34.x: c1854e0072 powerpc: Set nr_cpu_ids early
Cc: <stable@kernel.org> # .34.x
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/kexec: Fix memory corruption from unallocated slaves
Milton Miller [Tue, 10 May 2011 19:28:33 +0000 (19:28 +0000)]
powerpc/kexec: Fix memory corruption from unallocated slaves

Commit 1fc711f7ffb01089efc58042cfdbac8573d1b59a (powerpc/kexec: Fix race
in kexec shutdown) moved the write to signal the cpu had exited the kernel
from before the transition to real mode in kexec_smp_wait to kexec_wait.

Unfornately it missed that kexec_wait is used both by cpus leaving the
kernel and by secondary slave cpus that were not allocated a paca for
what ever reason -- they could be beyond nr_cpus or not described in
the current device tree for whatever reason (for example, kexec-load
was not refreshed after a cpu hotplug operation).  Cpus coming through
that path they will write to paca[NR_CPUS] which is beyond the space
allocated for the paca data and overwrite memory not allocated to pacas
but very likely still real mode accessable).

Move the write back to kexec_smp_wait, which is used only by cpus that
found their paca, but after the transition to real mode.

Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org> # (1fc711f was backported to 2.6.32)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/pseries: Print corrupt r3 in FWNMI code
Anton Blanchard [Tue, 10 May 2011 13:34:03 +0000 (13:34 +0000)]
powerpc/pseries: Print corrupt r3 in FWNMI code

I have a report of an FWNMI with an r3 value that we think is
corrupt, but since we don't print r3 we have no idea what was
wrong with it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopseries/iommu: Restore iommu table pointer when restoring iommu ops
Nishanth Aravamudan [Mon, 9 May 2011 12:58:03 +0000 (12:58 +0000)]
pseries/iommu: Restore iommu table pointer when restoring iommu ops

When we swtich to direct dma ops, we set the dma data union to have the
dma offset.  When we switch back to iommu table ops because of a later
dma_set_mask, we need to restore the iommu table pointer. Without this
change, crashes have been observed on kexec where (for reasons still
being investigated) we fall back to a 32-bit dma mask on a particular
device and then panic because the table pointer is not valid.

The easiset way to find this value is to call
pci_dma_dev_setup_pSeriesLP which will search up the pci tree until it
finds the node with the table.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove ioremap_flags
Anton Blanchard [Sun, 8 May 2011 21:43:47 +0000 (21:43 +0000)]
powerpc: Remove ioremap_flags

We have a confusing number of ioremap functions. Make things just a
bit simpler by merging ioremap_flags and ioremap_prot.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Add ioremap_wc
Anton Blanchard [Sun, 8 May 2011 21:41:59 +0000 (21:41 +0000)]
powerpc: Add ioremap_wc

Add ioremap_wc so drivers can request write combining on kernel
mappings.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Improve scheduling of system call entry instructions
Anton Blanchard [Sun, 8 May 2011 21:36:44 +0000 (21:36 +0000)]
powerpc: Improve scheduling of system call entry instructions

After looking at our system call path, Mary Brown suggested that we
should put all mfspr SRR* instructions before any mtspr SRR*.

To test this I used a very simple null syscall (actually getppid)
testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c

I tested with the following changes against the pseries_defconfig:

CONFIG_VIRT_CPU_ACCOUNTING=n
CONFIG_AUDIT=n

to remove the overhead of virtual CPU accounting and syscall
auditing.

POWER6:
baseline:       mean = 757.2 cycles       sd = 2.108
modified:       mean = 759.1 cycles       sd = 2.020

POWER7:
baseline:       mean = 411.4 cycles       sd = 0.138
modified:       mean = 404.1 cycles       sd = 0.109

So we have 1.77% improvement on POWER7 which looks significant. The
POWER6 suggest a 0.25% slowdown, but the results are within 1
standard deviation and may be in the noise.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove static branch hint in giveup_altivec
Anton Blanchard [Sun, 8 May 2011 21:20:19 +0000 (21:20 +0000)]
powerpc: Remove static branch hint in giveup_altivec

A static branch hint will override dynamic branch prediction on
recent POWER CPUs. Since we are about to use more altivec in the
kernel remove the static hint in giveup_altivec that assumes
a userspace task is using altivec.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Simplify 4k/64k copy_page logic
Anton Blanchard [Sun, 8 May 2011 21:18:38 +0000 (21:18 +0000)]
powerpc: Simplify 4k/64k copy_page logic

To make it easier to add optimised versions of copy_page, remove
the 4kB loop for 64kB pages and just do all the work in copy_page.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/pseries: Enable iSCSI support for a number of cards
Anton Blanchard [Sun, 8 May 2011 13:19:30 +0000 (13:19 +0000)]
powerpc/pseries: Enable iSCSI support for a number of cards

Enable iSCSI support for a number of cards. We had the base
networking devices enabled but forgot to enable iSCSI.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/pseries: Enable Emulex and Qlogic 10Gbit cards
Anton Blanchard [Sun, 8 May 2011 13:18:27 +0000 (13:18 +0000)]
powerpc/pseries: Enable Emulex and Qlogic 10Gbit cards

Enable the Qlogic and Emulex 10Gbit adapters.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc/mm: Fix compiler warning in pgtable-ppc64.h [-Wunused-but-set-variable]
Stratos Psomadakis [Sat, 7 May 2011 04:11:31 +0000 (04:11 +0000)]
powerpc/mm: Fix compiler warning in pgtable-ppc64.h [-Wunused-but-set-variable]

The variable 'old' is set but not used in the wrprotect functions in
arch/powerpc/include/asm/pgtable-ppc64.h, which can trigger a compiler warning.

Remove the variable, since it's not used anyway.

Signed-off-by: Stratos Psomadakis <psomas@ece.ntua.gr>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Ensure dtl buffers do not cross 4k boundary
Nishanth Aravamudan [Wed, 4 May 2011 12:54:16 +0000 (12:54 +0000)]
powerpc: Ensure dtl buffers do not cross 4k boundary

Future releases of fimrware will enforce a requirement that DTL buffers
do not cross a 4k boundary. Commit
127493d5dc73589cbe00ea5ec8357cc2a4c0d82a satisfies this requirement for
CONFIG_VIRT_CPU_ACCOUNTING=y kernels, but if !CONFIG_VIRT_CPU_ACCOUNTING
&& CONFIG_DTL=y, the current code will fail at dtl registration time.
Fix this by making the kmem cache from
127493d5dc73589cbe00ea5ec8357cc2a4c0d82a visible outside of setup.c and
using the same cache in both dtl.c and setup.c. This requires a bit of
reorganization to ensure ordering of the kmem cache and buffer
allocations.

Note: Since firmware now limits the size of the buffer, I made
dtl_buf_entries read-only in debugfs.

Tested with upcoming firmware with the 4 combinations of
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_DTL.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Fix kexec with dynamic dma windows
Nishanth Aravamudan [Fri, 6 May 2011 13:27:30 +0000 (13:27 +0000)]
powerpc: Fix kexec with dynamic dma windows

When we kexec we look for a particular property added by the first
kernel, "linux,direct64-ddr-window-info", per-device where we already
have set up dynamic dma windows. The current code, though, wasn't
initializing the size of this property and thus when we kexec'd, we
would find the property but read uninitialized memory resulting in
garbage ddw values for the kexec'd kernel and panics. Fix this by
setting the size at enable_ddw() time and ensuring that the size of the
found property is valid at dupe_ddw_if_kexec() time.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Use the deterministic mode of ar
Michal Marek [Thu, 5 May 2011 05:22:55 +0000 (05:22 +0000)]
powerpc: Use the deterministic mode of ar

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Remove unused config in the Makefile
Justin Mattock [Tue, 5 Apr 2011 06:58:22 +0000 (06:58 +0000)]
powerpc: Remove unused config in the Makefile

The patch below removes an unused config variable found by using a kernel
cleanup script.
Note: I did try to cross compile these but hit erros while doing so..
(gcc is not setup to cross compile) and am unsure if anymore needs to be done.
Please have a look if/when anybody has free time.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
12 years agopowerpc: Call gzip with -n
Michal Marek [Tue, 5 Apr 2011 04:58:50 +0000 (04:58 +0000)]
powerpc: Call gzip with -n

The timestamps recorded in the .gz files add no value.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add early debug for WSP platforms
Jack Miller [Thu, 14 Apr 2011 22:32:08 +0000 (22:32 +0000)]
powerpc: Add early debug for WSP platforms

Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add WSP platform
David Gibson [Thu, 14 Apr 2011 22:32:06 +0000 (22:32 +0000)]
powerpc: Add WSP platform

Add a platform for the Wire Speed Processor, based on the PPC A2.

This includes code for the ICS & OPB interrupt controllers, as well
as a SCOM backend, and SCOM based cpu bringup.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/eeh: Display eeh error location for bus and device
Richard A Lary [Wed, 4 May 2011 12:57:18 +0000 (12:57 +0000)]
powerpc/eeh: Display eeh error location for bus and device

  For adapters which have devices under a PCIe switch/bridge it is informative
  to display information for both the PCIe switch/bridge and the device on
  which the bus error was detected.

  rebased to powerpc-next

Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Rename slb0_limit() to safe_stack_limit() and add Book3E support
Benjamin Herrenschmidt [Tue, 3 May 2011 14:07:01 +0000 (14:07 +0000)]
powerpc: Rename slb0_limit() to safe_stack_limit() and add Book3E support

slb0_limit() wasn't a very descriptive name. This changes it along with
a comment explaining what it's used for, and provides a 64-bit BookE
implementation.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pseries: Add support for IO event interrupts
Tseng-Hui (Frank) Lin [Thu, 5 May 2011 12:32:48 +0000 (12:32 +0000)]
powerpc/pseries: Add support for IO event interrupts

This patch adds support for handling IO Event interrupts which come
through at the /event-sources/ibm,io-events device tree node.

The interrupts come through ibm,io-events device tree node are generated
by the firmware to report IO events. The firmware uses the same interrupt
to report multiple types of events for multiple devices. Each device may
have its own event handler. This patch implements a plateform interrupt
handler that is triggered by the IO event interrupts come through
ibm,io-events device tree node, pull in the IO events from RTAS and call
device event handlers registered in the notifier list.

Device event handlers are expected to use atomic_notifier_chain_register()
and atomic_notifier_chain_unregister() to register/unregister their
event handler in pseries_ioei_notifier_list list with IO event interrupt.
Device event handlers are responsible to identify if the event belongs
to the device event handler. The device event handle should return NOTIFY_OK
after the event is handled if the event belongs to the device event handler,
or NOTIFY_DONE otherwise.

Signed-off-by: Tseng-Hui (Frank) Lin <thlin@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pseries: Add RTAS event log v6 definition
Tseng-Hui (Frank) Lin [Tue, 3 May 2011 18:28:43 +0000 (18:28 +0000)]
powerpc/pseries: Add RTAS event log v6 definition

This patch adds definitions of non-IBM specific v6 extended log
definitions to rtas.h.

Signed-off-by: Tseng-Hui (Frank) Lin <tsenglin@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Fix compile with icwsx support
Stephen Rothwell [Fri, 6 May 2011 00:39:08 +0000 (10:39 +1000)]
powerpc: Fix compile with icwsx support

Due to a collision between NO_CONTEXT->MMU_NO_CONTEXT change and
Anton's patch.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pseries/bsr: Remove redundant initialization of bsr dev_t declaration.
Robert P. J. Day [Thu, 21 Apr 2011 10:00:18 +0000 (10:00 +0000)]
powerpc/pseries/bsr: Remove redundant initialization of bsr dev_t declaration.

Remove the unnecessary initialization of "dev_t bsr_dev" since it's
subsequently used in an "alloc_chrdev_region()" call which uses that
variable in an output-only fashion.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pseries/eeh: Handle functional reset on non-PCIe device
Richard A Lary [Fri, 22 Apr 2011 10:00:05 +0000 (10:00 +0000)]
powerpc/pseries/eeh: Handle functional reset on non-PCIe device

  Fundamental reset is an optional reset type supported only by PCIe adapters.
  Handle the unexpected case where a non-PCIe device has requested a
  fundamental reset. Try hot-reset as a fallback to handle this case.

Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pseries/eeh: Propagate needs_freset flag to device at PE
Richard A Lary [Fri, 22 Apr 2011 09:59:47 +0000 (09:59 +0000)]
powerpc/pseries/eeh: Propagate needs_freset flag to device at PE

  For multifunction adapters with a PCI bridge or switch as the device
  at the Partitionable Endpoint(PE), if one or more devices below PE
  sets dev->needs_freset, that value will be set for the PE device.

  In other words, if any device below PE requires a fundamental reset
  the PE will request a fundamental reset.

Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pseries: Add page coalescing support
Brian King [Wed, 4 May 2011 06:01:20 +0000 (16:01 +1000)]
powerpc/pseries: Add page coalescing support

Adds support for page coalescing, which is a feature on IBM Power servers
which allows for coalescing identical pages between logical partitions.
Hint text pages as coalesce candidates, since they are the most likely
pages to be able to be coalesced between partitions. This patch also
exports some page coalescing statistics available from firmware via
lparcfg.

[BenH: Moved a couple of things around to fix compile problems]

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/kexec: Fix build failure on 32-bit SMP
Ben Hutchings [Sun, 24 Apr 2011 15:04:31 +0000 (15:04 +0000)]
powerpc/kexec: Fix build failure on 32-bit SMP

Commit b987812b3fcaf70fdf0037589e5d2f5f2453e6ce left
crash_kexec_wait_realmode() undefined for UP.

Commit 7c7a81b53e581d727d069cc45df5510516faac31 defined it for UP but
left it undefined for 32-bit SMP.

Seems like people are getting confused by nested #ifdef's, so move the
definitions of crash_kexec_wait_realmode() after the #ifdef CONFIG_SMP
section.

Compile-tested with 32-bit UP, 32-bit SMP and 64-bit SMP configurations.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Convert old cpumask API into new one
KOSAKI Motohiro [Thu, 28 Apr 2011 05:07:23 +0000 (05:07 +0000)]
powerpc: Convert old cpumask API into new one

Adapt new API.

Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.

-       ctx->cpus_allowed = current->cpus_allowed;

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Save Come-From Address Register (CFAR) in exception frame
Paul Mackerras [Sun, 1 May 2011 19:48:20 +0000 (19:48 +0000)]
powerpc: Save Come-From Address Register (CFAR) in exception frame

Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
Address Register" (CFAR), that records the address of the most recent
branch or rfid (return from interrupt) instruction for debugging purposes.

This saves the value of the CFAR in the exception entry code and stores
it in the exception frame.  We also make xmon print the CFAR value in
its register dump code.

Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
field, which is only used for system calls, and use it for the CFAR value
for all exceptions/interrupts other than system calls.  This means we
don't save the CFAR on system calls, which is not a great problem since
system calls tend not to happen unexpectedly, and also avoids adding the
overhead of reading the CFAR to the system call entry path.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Save register r9-r13 values accurately on interrupt with bad stack
Paul Mackerras [Sun, 1 May 2011 19:46:44 +0000 (19:46 +0000)]
powerpc: Save register r9-r13 values accurately on interrupt with bad stack

When we take an interrupt or exception from kernel mode and the stack
pointer is obviously not a kernel address (i.e. the top bit is 0), we
switch to an emergency stack, save register values and panic.  However,
on 64-bit server machines, we don't actually save the values of r9 - r13
at the time of the interrupt, but rather values corrupted by the
exception entry code for r12-r13, and nothing at all for r9-r11.

This fixes it by passing a pointer to the register save area in the paca
through to the bad_stack code in r3.  The register values are saved in
one of the paca register save areas (depending on which exception this
is).  Using the pointer in r3, the bad_stack code now retrieves the
saved values of r9 - r13 and stores them in the exception frame on the
emergency stack.  This also stores the normal exception frame marker
("regshere") in the exception frame.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add Initiate Coprocessor Store Word (icswx) support
Tseng-Hui (Frank) Lin [Mon, 2 May 2011 20:43:04 +0000 (20:43 +0000)]
powerpc: Add Initiate Coprocessor Store Word (icswx) support

Icswx is a PowerPC instruction to send data to a co-processor. On Book-S
processors the LPAR_ID and process ID (PID) of the owning process are
registered in the window context of the co-processor at initialization
time. When the icswx instruction is executed the L2 generates a cop-reg
transaction on PowerBus. The transaction has no address and the
processor does not perform an MMU access to authenticate the transaction.
The co-processor compares the LPAR_ID and the PID included in the
transaction and the LPAR_ID and PID held in the window context to
determine if the process is authorized to generate the transaction.

The OS needs to assign a 16-bit PID for the process. This cop-PID needs
to be updated during context switch. The cop-PID needs to be destroyed
when the context is destroyed.

Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com>
Signed-off-by: Tseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Use new CPU feature bit to select 2.06 tlbie
Michael Neuling [Wed, 6 Apr 2011 18:23:29 +0000 (18:23 +0000)]
powerpc: Use new CPU feature bit to select 2.06 tlbie

This removes MMU_FTR_TLBIE_206 as we can now use CPU_FTR_HVMODE_206.  It
also changes the logic to select which tlbie to use to be based on this
new CPU feature bit.

This also duplicates the ASM_FTR_IF/SET/CLR defines for CPU features
(copied from MMU features).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/irq: Stop exporting irq_map
Grant Likely [Wed, 4 May 2011 05:02:15 +0000 (15:02 +1000)]
powerpc/irq: Stop exporting irq_map

First step in eliminating irq_map[] table entirely

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/eeh: Add support for ibm,configure-pe RTAS call
Richard A. Lary [Wed, 6 Apr 2011 12:50:45 +0000 (12:50 +0000)]
powerpc/eeh: Add support for ibm,configure-pe RTAS call

Added support for ibm,configure-pe RTAS call introduced with
PAPR 2.2.

Signed-off-by: Richard A. Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Free up some CPU feature bits by moving out MMU-related features
Matt Evans [Wed, 6 Apr 2011 19:48:50 +0000 (19:48 +0000)]
powerpc: Free up some CPU feature bits by moving out MMU-related features

Some of the 64bit PPC CPU features are MMU-related, so this patch moves
them to MMU_FTR_ bits.  All cpu_has_feature()-style tests are moved to
mmu_has_feature(), and seven feature bits are freed as a result.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/rtas: Only sleep in rtas_busy_delay if we have useful work to do
Anton Blanchard [Thu, 7 Apr 2011 01:54:07 +0000 (01:54 +0000)]
powerpc/rtas: Only sleep in rtas_busy_delay if we have useful work to do

RTAS returns extended error codes as a hint of how long the
OS might want to wait before retrying a call. If we have nothing
else useful to do we may as well call back straight away.

This was found when testing the new dynamic dma window feature.
Firmware split the zeroing of the TCE table into 32k chunks but
returned 9901 (which is a suggested wait of 10ms). All up this took
about 10 minutes to complete since msleep is jiffies based and will
round 10ms up to 20ms.

With the patch below we take 3 seconds to complete the same test.
The hint firmware is returning in the RTAS call should definitely
be decreased, but even if we slept 1ms each iteration this would
take 32s.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/book3e: Fix extlb size
Michael Ellerman [Thu, 7 Apr 2011 21:22:23 +0000 (21:22 +0000)]
powerpc/book3e: Fix extlb size

The calculation of the size for the exception save area of the TLB
miss handler is wrong, luckily it's too big not too small.

Rework it to make it a bit clearer, and also correct. We want 3 save
areas, each EX_TLB_SIZE _bytes_.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Use MSR_64BIT in sstep.c, fix kprobes on BOOK3E
Michael Ellerman [Thu, 7 Apr 2011 21:56:04 +0000 (21:56 +0000)]
powerpc: Use MSR_64BIT in sstep.c, fix kprobes on BOOK3E

We check MSR_SF a lot in sstep.c, to decide if we need to emulate the
truncation of values when running in 32-bit mode. Factor out that code
into a helper, and convert it and the other uses to use MSR_64BIT.

This fixes a bug on BOOK3E where kprobes would end up returning to a
32-bit address, because regs->nip was truncated, because (msr & MSR_SF)
was false.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Use MSR_64BIT in places
Michael Ellerman [Thu, 7 Apr 2011 21:56:03 +0000 (21:56 +0000)]
powerpc: Use MSR_64BIT in places

Use the new MSR_64BIT in a few places. Some of these are already ifdef'ed
for BOOKE vs BOOKS, but it's still clearer, MSR_SF does not immediately
parse as "MSR bit for 64bit".

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add MSR_64BIT
Michael Ellerman [Thu, 7 Apr 2011 21:56:02 +0000 (21:56 +0000)]
powerpc: Add MSR_64BIT

The MSR bit which indicates 64-bit-ness is different between server and
booke, so add a #define which gives you the right mask regardless.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Fix build warning of the defconfigs
Wanlong Gao [Sat, 9 Apr 2011 08:09:46 +0000 (08:09 +0000)]
powerpc: Fix build warning of the defconfigs

BT_L2CAP and BT_SCO have changed to bool .
Value 'm' has invalid .

Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/ps3: Update debug message for irq_set_chip_data()
Geert Uytterhoeven [Sat, 9 Apr 2011 22:59:07 +0000 (22:59 +0000)]
powerpc/ps3: Update debug message for irq_set_chip_data()

commit ec775d0e70eb6b7116406b3441cb8501c2849dd2 ("powerpc: Convert to new irq_*
function names") changed a call from set_irq_chip_data() to
irq_set_chip_data(), but forgot to update the corresponding debug message

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/irq: Dump chip data pointer in virq_mapping
Michael Ellerman [Sun, 10 Apr 2011 20:26:15 +0000 (20:26 +0000)]
powerpc/irq: Dump chip data pointer in virq_mapping

This can be useful for differentiating interrupts on the same host
but with different chip data.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/numa: Look for ibm, associativity-reference-points at the root
Michael Ellerman [Sun, 10 Apr 2011 20:42:05 +0000 (20:42 +0000)]
powerpc/numa: Look for ibm, associativity-reference-points at the root

If we don't find ibm,associativity-reference-points as a child of
/rtas, look for it at the root of the tree instead. We use this on
Book3E where we have no RTAS but still use the sPAPR conventions
for NUMA.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pci: Properly initialize IO workaround "private"
Michael Ellerman [Mon, 11 Apr 2011 21:25:02 +0000 (21:25 +0000)]
powerpc/pci: Properly initialize IO workaround "private"

Even when no initfunc is provided.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pci: Make IO workarounds init implicit when first bus is registered
Michael Ellerman [Mon, 11 Apr 2011 21:25:02 +0000 (21:25 +0000)]
powerpc/pci: Make IO workarounds init implicit when first bus is registered

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pci: Move IO workarounds to the common kernel dir
Michael Ellerman [Mon, 11 Apr 2011 21:25:01 +0000 (21:25 +0000)]
powerpc/pci: Move IO workarounds to the common kernel dir

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/pci: Split IO vs MMIO indirect access hooks
Michael Ellerman [Mon, 11 Apr 2011 21:25:01 +0000 (21:25 +0000)]
powerpc/pci: Split IO vs MMIO indirect access hooks

The goal is to avoid adding overhead to MMIO when only PIO is needed

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agocxgb4: use pgprot_writecombine() on powerpc
Nishanth Aravamudan [Mon, 14 Mar 2011 10:36:11 +0000 (10:36 +0000)]
cxgb4: use pgprot_writecombine() on powerpc

Commit fe3cc0d99de6a9bf99b6c279a8afb5833888c1f7 ("powerpc: Add
pgprot_writecombine") in benh's tree exposes the pgprot_writecombine()
API to drivers on powerpc. cxgb4 has an open-coded version of the same,
so use the common API now that it's available.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Per process DSCR + some fixes (try#4)
Alexey Kardashevskiy [Wed, 2 Mar 2011 15:18:48 +0000 (15:18 +0000)]
powerpc: Per process DSCR + some fixes (try#4)

The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.

This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.

If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.

Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/book3e: Flush IPROT protected TLB entries leftover by firmware
Jack Miller [Thu, 14 Apr 2011 22:32:05 +0000 (22:32 +0000)]
powerpc/book3e: Flush IPROT protected TLB entries leftover by firmware

When we set up the TLB for ourselves on Book3E, we need to flush out any
old mappings established by the firmware or bootloader.  At present we
attempt this with a tlbilx to flush everything, but this will leave behind
any entries with the IPROT bit set.

There are several good reason firmware might establish mappings with IPROT,
and in fact ePAPR compliant firmwares are required to establish their
initial mapped area with IPROT.

This patch, therefore adds more complex code to scan through the TLB upon
entry and flush away any entries that are not our own.

Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/book3e: Use way 3 for linear mapping bolted entry
Benjamin Herrenschmidt [Thu, 14 Apr 2011 22:32:04 +0000 (22:32 +0000)]
powerpc/book3e: Use way 3 for linear mapping bolted entry

An erratum on A2 can lead to the bolted entry we insert for the linear
mapping being evicted, to avoid that write the bolted entry to way 3.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Index crit/dbg/mcheck stacks using cpu number on 64bit
Michael Ellerman [Thu, 14 Apr 2011 22:32:04 +0000 (22:32 +0000)]
powerpc: Index crit/dbg/mcheck stacks using cpu number on 64bit

In exc_lvl_ctx_init() we index into the crit/dbg/mcheck stacks using
the hard cpu id, but that assumes the hard cpu id is zero based and
contiguous. That is not the case on A2.

The root of the problem is that the 32bit code has no equivalent of the
paca to allow it to do the hard->soft mapping in assembler. Until the
32bit code is updated to handle that, index the stacks using the soft
cpu ids on 64bit and hard on 32 bit.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add TLB size detection for TYPE_3E MMUs
Benjamin Herrenschmidt [Thu, 14 Apr 2011 22:32:02 +0000 (22:32 +0000)]
powerpc: Add TLB size detection for TYPE_3E MMUs

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add A2 cpu support
Benjamin Herrenschmidt [Thu, 14 Apr 2011 22:32:01 +0000 (22:32 +0000)]
powerpc: Add A2 cpu support

Add the cputable entry, regs and setup & restore entries for
the PowerPC A2 core.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/nvram: Search for nvram using compatible
Benjamin Herrenschmidt [Thu, 14 Apr 2011 22:32:00 +0000 (22:32 +0000)]
powerpc/nvram: Search for nvram using compatible

As well as searching for nodes with type = "nvram", search for nodes
that have compatible = "nvram". This can't be converted into a single
call to of_find_compatible_node() with a non-NULL type, because that
searches for a node that has _both_ type & compatible = "nvram".

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/xics: Move irq_host matching into the ics backend
Michael Ellerman [Thu, 14 Apr 2011 22:31:59 +0000 (22:31 +0000)]
powerpc/xics: Move irq_host matching into the ics backend

An upcoming new ics backend will need to implement different matching
semantics to the current ones, which are essentially the RTAS ics
backends. So move the current match into the RTAS backend, and allow
other ics backends to override.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add SCOM infrastructure
Benjamin Herrenschmidt [Thu, 14 Apr 2011 22:31:58 +0000 (22:31 +0000)]
powerpc: Add SCOM infrastructure

SCOM is a side-band configuration bus implemented on some processors.
This code provides a way for code to map and operate on devices via
SCOM, while the details of how that is implemented is left up to a
SCOM "controller" in the platform code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/xics: xics.h relies on linux/interrupt.h
Michael Ellerman [Thu, 14 Apr 2011 22:31:58 +0000 (22:31 +0000)]
powerpc/xics: xics.h relies on linux/interrupt.h

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agoof: Export of_irq_find_parent()
Michael Ellerman [Thu, 14 Apr 2011 22:31:57 +0000 (22:31 +0000)]
of: Export of_irq_find_parent()

We have platform code that needs to find a node's interrupt parent, so
export of_irq_find_parent() so we can use it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/a2: Add some #defines for A2 specific instructions
Benjamin Herrenschmidt [Thu, 14 Apr 2011 22:31:56 +0000 (22:31 +0000)]
powerpc/a2: Add some #defines for A2 specific instructions

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Replace open coded instruction patching with patch_instruction/patch_branch
Anton Blanchard [Mon, 4 Apr 2011 23:56:18 +0000 (23:56 +0000)]
powerpc: Replace open coded instruction patching with patch_instruction/patch_branch

There are a few places we patch instructions without using
patch_instruction and patch_branch, probably because they
predated it. Fix it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/nohash: Allocate stale_map[cpu] on CPU_UP_PREPARE not CPU_ONLINE
Michael Ellerman [Mon, 4 Apr 2011 20:57:27 +0000 (20:57 +0000)]
powerpc/nohash: Allocate stale_map[cpu] on CPU_UP_PREPARE not CPU_ONLINE

Currently we allocate the stale_map for a cpu when it comes online,
this leaves open a small window where a process can be scheduled
on the cpu before the stale_map is allocated. Instead allocate
the stale_map at CPU_UP_PREPARE time, that way it will be always
available before tasks start running.

It is possible the cpu fails to come up, in which case we should free
the stale_map, so add a CPU_UP_CANCELED case to do that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/smp: smp_ops->kick_cpu() should be able to fail
Michael Ellerman [Mon, 11 Apr 2011 21:46:19 +0000 (21:46 +0000)]
powerpc/smp: smp_ops->kick_cpu() should be able to fail

When we start a cpu we use smp_ops->kick_cpu(), which currently
returns void, it should be able to fail. Convert it to return
int, and update all uses.

Convert all the current error cases to return -ENOENT, which is
what would eventually be returned by __cpu_up() currently when
it doesn't detect the cpu as coming up in time.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/boot: Add an ePAPR compliant boot wrapper
David Gibson [Thu, 14 Apr 2011 18:29:16 +0000 (18:29 +0000)]
powerpc/boot: Add an ePAPR compliant boot wrapper

This is a first cut at making bootwrapper code which will
produce a zImage compliant with the requirements set down
by ePAPR.

This is a very simple bootwrapper, taking the device tree
blob supplied by the ePAPR boot program and passing it on
to the kernel. It builds on the earlier patch to build a
relocatable ET_DYN zImage to meet the other ePAPR image
requirements.

For good measure we have some paranoid checks which will
generate warnings if some of the ePAPR entry condition
guarantees are not met.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN
Michael Ellerman [Tue, 12 Apr 2011 20:38:55 +0000 (20:38 +0000)]
powerpc/boot: Allow building the zImage wrapper as a relocatable ET_DYN

This patch adds code, linker script and makefile support to allow
building the zImage wrapper around the kernel as a position independent
executable.  This results in an ET_DYN instead of an ET_EXEC ELF output
file, which can be loaded at any location by the firmware and will
process its own relocations to work correctly at the loaded address.

This is of interest particularly since the standard ePAPR image format
must be an ET_DYN (although this patch alone is not sufficient to
produce a fully ePAPR compliant boot image).

Note for now we don't enable building with -pie for anything.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/mm: Fix slice state initialization for Book3E
Michael Ellerman [Tue, 12 Apr 2011 19:00:05 +0000 (19:00 +0000)]
powerpc/mm: Fix slice state initialization for Book3E

On Book3E, MMU_NO_CONTEXT != 0, but the slice_mm_new_context()
macro assumes that it is.  This means that the map of the
page sizes for each slice is always initialized to zeroes
(which happens to be 4k pages), rather than to the correct
default base page size value - which might be 64k.

This patch corrects the problem.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc/mm: Standardise on MMU_NO_CONTEXT
Michael Ellerman [Tue, 12 Apr 2011 19:00:04 +0000 (19:00 +0000)]
powerpc/mm: Standardise on MMU_NO_CONTEXT

Use MMU_NO_CONTEXT as the initialiser for mm_context.id on
nohash and hash64.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Improve prom_printf()
Benjamin Herrenschmidt [Wed, 6 Apr 2011 00:51:17 +0000 (10:51 +1000)]
powerpc: Improve prom_printf()

Adds the ability to print decimal numbers and adds some more
format string variants

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Perform an isync to synchronize CPUs coming out of secondary_hold
Benjamin Herrenschmidt [Tue, 5 Apr 2011 04:34:58 +0000 (14:34 +1000)]
powerpc: Perform an isync to synchronize CPUs coming out of secondary_hold

We need to do that to guarantee they see any code change done by
dynamic patching during boot.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Add NAP mode support on Power7 in HV mode
Benjamin Herrenschmidt [Mon, 24 Jan 2011 07:42:41 +0000 (18:42 +1100)]
powerpc: Add NAP mode support on Power7 in HV mode

Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Properly handshake CPUs going out of boot spin loop
Benjamin Herrenschmidt [Wed, 16 Mar 2011 03:54:35 +0000 (14:54 +1100)]
powerpc: Properly handshake CPUs going out of boot spin loop

We need to wait a bit for them to have done their CPU setup
or we might end up with translation and EE on with different
LPCR values between threads

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Call CPU ->restore callback earlier on secondary CPUs
Benjamin Herrenschmidt [Tue, 1 Feb 2011 01:13:09 +0000 (12:13 +1100)]
powerpc: Call CPU ->restore callback earlier on secondary CPUs

We do it before we loop on the PACA start flag. This way, we get a
chance to set critical SPRs on all CPUs before Linux tries to start
them up, which avoids problems when changing some bits such as LPCR
bits that need to be identical on all threads of a core or similar
things like that. Ideally, some of that should also be done before
the MMU is enabled, but that's a separate issue which would require
moving some of the SMP startup code earlier, let's not get there
for now, it works with that change alone.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Initialize TLB and LPID register on HV mode Power7
Benjamin Herrenschmidt [Tue, 1 Mar 2011 04:46:09 +0000 (15:46 +1100)]
powerpc: Initialize TLB and LPID register on HV mode Power7

In case entry from the bootloader isn't "clean"

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Initialize LPCR:DPFD on power7 to a sane default
Benjamin Herrenschmidt [Mon, 24 Jan 2011 02:25:55 +0000 (13:25 +1100)]
powerpc: Initialize LPCR:DPFD on power7 to a sane default

This sets the default data stream prefetch size for operating
systems that don't set their own value in DSCR. We use 4 which
is "medium".

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Always use SPRN_SPRG_HSCRATCH0 when running in HV mode
Paul Mackerras [Tue, 5 Apr 2011 03:59:58 +0000 (13:59 +1000)]
powerpc: Always use SPRN_SPRG_HSCRATCH0 when running in HV mode

This uses feature sections to arrange that we always use HSPRG1
as the scratch register in the interrupt entry code rather than
SPRG2 when we're running in hypervisor mode on POWER7.  This will
ensure that we don't trash the guest's SPRG2 when we are running
KVM guests.  To simplify the code, we define GET_SCRATCH0() and
SET_SCRATCH0() macros like the GET_PACA/SET_PACA macros.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: More work to support HV exceptions
Benjamin Herrenschmidt [Tue, 5 Apr 2011 04:27:11 +0000 (14:27 +1000)]
powerpc: More work to support HV exceptions

Rework exception macros a bit to split offset from vector and add
some basic support for HDEC, HDSI, HISI and a few more.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: Base support for exceptions using HSRR0/1
Benjamin Herrenschmidt [Tue, 5 Apr 2011 04:20:31 +0000 (14:20 +1000)]
powerpc: Base support for exceptions using HSRR0/1

Pass the register type to the prolog, also provides alternate "HV"
version of hardware interrupt (0x500) and adjust LPES accordingly

We tag those interrupts by setting bit 0x2 in the trap number

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agopowerpc: In HV mode, use HSPRG0 for PACA
Benjamin Herrenschmidt [Thu, 20 Jan 2011 06:50:21 +0000 (17:50 +1100)]
powerpc: In HV mode, use HSPRG0 for PACA

When running in Hypervisor mode (arch 2.06 or later), we store the PACA
in HSPRG0 instead of SPRG1. The architecture specifies that SPRGs may be
lost during a "nap" power management operation (though they aren't
currently on POWER7) and this enables use of SPRG1 by KVM guests.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>