pandora-kernel.git
16 years agox86: merge arch/x86/kernel/ldt_32/64.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:14 +0000 (13:30 +0100)]
x86: merge arch/x86/kernel/ldt_32/64.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: prepare arch/x86/kernel/ldt_32/64.c for merging
Thomas Gleixner [Wed, 30 Jan 2008 12:30:13 +0000 (13:30 +0100)]
x86: prepare arch/x86/kernel/ldt_32/64.c for merging

White space and coding style cleanups.

Change unsigned to int. There is no win when we compare mincount against pc->size,
which is an int as well. Casting pc->size to unsigned just might hide real problems.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: introduce ldt_write accessor
Thomas Gleixner [Wed, 30 Jan 2008 12:30:13 +0000 (13:30 +0100)]
x86: introduce ldt_write accessor

Create a ldt write accessor like the 32 bit one.

Preparatory patch for merging ldt.c and anyway necessary for
64bit paravirt ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up include/asm-x86/desc_64.h
Thomas Gleixner [Wed, 30 Jan 2008 12:30:13 +0000 (13:30 +0100)]
x86: clean up include/asm-x86/desc_64.h

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/kernel/ldt_32/64.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:13 +0000 (13:30 +0100)]
x86: clean up arch/x86/kernel/ldt_32/64.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/kernel/e820_64.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:12 +0000 (13:30 +0100)]
x86: clean up arch/x86/kernel/e820_64.c

White space and coding style cleanup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: code cleanups in arch/x86/kernel/pci-gart_64.c
Ingo Molnar [Wed, 30 Jan 2008 12:30:12 +0000 (13:30 +0100)]
x86: code cleanups in arch/x86/kernel/pci-gart_64.c

code cleanups:

                                       errors   lines of code   errors/KLOC
 arch/x86/kernel/pci-gart_64.c            183             748         244.6
 arch/x86/kernel/pci-gart_64.c              0             790             0

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: lindent arch/i386/math-emu, cleanup
Ingo Molnar [Wed, 30 Jan 2008 12:30:12 +0000 (13:30 +0100)]
x86: lindent arch/i386/math-emu, cleanup

manually clean up some of the damage that lindent caused.
(this is a separate commit so that in the unlikely case of
a typo we can bisect it down to the manual edits.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: lindent arch/i386/math-emu
Ingo Molnar [Wed, 30 Jan 2008 12:30:11 +0000 (13:30 +0100)]
x86: lindent arch/i386/math-emu

lindent these files:
                                       errors   lines of code   errors/KLOC
 arch/x86/math-emu/                      2236            9424         237.2
 arch/x86/math-emu/                       128            8706          14.7

no other changes. No code changed:

   text    data     bss     dec     hex filename
   5589802  612739 3833856 10036397         9924ad vmlinux.before
   5589802  612739 3833856 10036397         9924ad vmlinux.after

the intent of this patch is to ease the automated tracking of kernel
code quality - it's just much easier for us to maintain it if every file
in arch/x86 is supposed to be clean.

NOTE: it is a known problem of lindent that it causes some style damage
of its own, but it's a safe tool (well, except for the gcc array range
initializers extension), so we did the bulk of the changes via lindent,
and did the manual fixups in a followup patch.

the resulting math-emu code has been tested by Thomas Gleixner on a real
386 DX CPU as well, and it works fine.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: mach-voyager, lindent
Ingo Molnar [Wed, 30 Jan 2008 12:30:10 +0000 (13:30 +0100)]
x86: mach-voyager, lindent

lindent the mach-voyager files to get rid of more than 300 style errors:

                                       errors   lines of code   errors/KLOC
 arch/x86/mach-voyager/   [old]           409            3729         109.6
 arch/x86/mach-voyager/   [new]            71            3678          19.3

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: clean up arch/x86/kernel/aperture_64.c printk()s
Ingo Molnar [Wed, 30 Jan 2008 12:30:10 +0000 (13:30 +0100)]
x86: clean up arch/x86/kernel/aperture_64.c printk()s

clean up arch/x86/kernel/aperture_64.c printk()s.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: clean up arch/x86/kernel/aperture_64.c
Ingo Molnar [Wed, 30 Jan 2008 12:30:09 +0000 (13:30 +0100)]
x86: clean up arch/x86/kernel/aperture_64.c

whitespace cleanup. No code changed:

   text    data     bss     dec     hex filename
   2080      76       4    2160     870 aperture_64.o.before
   2080      76       4    2160     870 aperture_64.o.after

                                       errors   lines of code   errors/KLOC
 arch/x86/kernel/aperture_64.c            114             299         381.2
 arch/x86/kernel/aperture_64.c              0             315             0

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: clean up arch/x86/ia32/mmap32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:09 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/mmap32.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/syscall32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:08 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/syscall32.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/sys_ia32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:08 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/sys_ia32.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/ptrace32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:08 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/ptrace32.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/ipc32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:08 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/ipc32.c

White space and coding style cleanup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/ia32_signal.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:07 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/ia32_signal.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/aout32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:07 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/aout32.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: clean up arch/x86/ia32/fpu32.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:07 +0000 (13:30 +0100)]
x86: clean up arch/x86/ia32/fpu32.c

White space and coding style clenaup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: unify asm/cpufeature.h
H. Peter Anvin [Wed, 30 Jan 2008 12:30:07 +0000 (13:30 +0100)]
x86: unify asm/cpufeature.h

asm/cpufeature.h was already almost unified; this completes the job.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: add <asm/asm.h>
H. Peter Anvin [Wed, 30 Jan 2008 12:30:06 +0000 (13:30 +0100)]
x86: add <asm/asm.h>

Create <asm/asm.h>, with common definitions suitable for assembly
unification.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: enable irq in default_idle on 64-bit
Hiroshi Shimamoto [Wed, 30 Jan 2008 12:30:06 +0000 (13:30 +0100)]
x86: enable irq in default_idle on 64-bit

local_irq_enable() is missing after sched_clock_idle_wakeup_event().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: idle wakeup event in the HLT loop
Ingo Molnar [Wed, 30 Jan 2008 12:30:06 +0000 (13:30 +0100)]
x86: idle wakeup event in the HLT loop

do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.

(the ACPI idle code already does this.)

[ update the 64-bit side too, as noticed by Jiri Slaby. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: scale cyc_2_nsec according to CPU frequency
Guillaume Chazarain [Wed, 30 Jan 2008 12:30:06 +0000 (13:30 +0100)]
x86: scale cyc_2_nsec according to CPU frequency

scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.

[ mingo@elte.hu: simplified it and fixed it for SMP. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: protect against sigaltstack wraparound
Roland McGrath [Wed, 30 Jan 2008 12:30:06 +0000 (13:30 +0100)]
x86: protect against sigaltstack wraparound

cf http://lkml.org/lkml/2007/10/3/41

To summarize: on Linux, SA_ONSTACK decides whether you are already on the
signal stack based on the value of the SP at the time of a signal.  If
you are not already inside the range, you are not "on the signal stack"
and so the new signal handler frame starts over at the base of the signal
stack.

sigaltstack (and sigstack before it) was invented in BSD.  There, the
SA_ONSTACK behavior has always been different.  It uses a kernel state
flag to decide, rather than the SP value.  When you first take an
SA_ONSTACK signal and switch to the alternate signal stack, it sets the
SS_ONSTACK flag in the thread's sigaltstack state in the kernel.
Thereafter you are "on the signal stack" and don't switch SP before
pushing a handler frame no matter what the SP value is.  Only when you
sigreturn from the original handler context do you clear the SS_ONSTACK
flag so that a new handler frame will start over at the base of the
alternate signal stack.

The undesireable effect of the Linux behavior is that an overflow of the
alternate signal stack can not only go undetected, but lead to a ring
buffer effect of clobbering the original handler frame at the base of the
signal stack for each successive signal that comes just after the
overflow.  This is what Shi Weihua's test case demonstrates.  Normally
this does not come up because of the signal mask, but the test case uses
SA_NODEFER for its SIGSEGV handler.

The other subtle part of the existing Linux semantics is that a simple
longjmp out of a signal handler serves to take you off the signal stack
in a safe and reliable fashion without having used sigreturn (nor having
just returned from the handler normally, which means the same).  After
the longjmp (or even informal stack switching not via any proper libc or
kernel interface), the alternate signal stack stands ready to be used
again.

A paranoid program would allocate a PROT_NONE red zone around its
alternate signal stack.  Then a small overflow would trigger a SIGSEGV in
handler setup, and be fatal (core dump) whether or not SIGSEGV is
blocked.  As with thread stack red zones, that cannot catch all overflows
(or underflows).  e.g., a local array as large as page size allocated in
a function called from a handler, but not actually touched before more
calls push more stack, could cause an overflow that silently pushes into
some unrelated allocated pages.

The BSD behavior does not do anything in particular about overflow.  But
it does at least avoid the wraparound or "ring buffer effect", so you'll
just get a straightforward all-out overflow down your address space past
the low end of the alternate signal stack.  I don't know what the BSD
behavior is for longjmp out of an SA_ONSTACK handler.

The POSIX wording relating to sigaltstack is pretty minimal.  I don't
think it speaks to this issue one way or another.  (The program that
overflows its stack is clearly in undefined behavior territory of one
sort or another anyhow.)

Given the longjmp issue and the potential for highly subtle complications
in existing programs relying on this in arcane ways deep in their code, I
am very dubious about changing the behavior to the BSD style persistent
flag.  I think Shi Weihua's patches have a similar effect by tracking the
SP used in the last handler setup.

I think it would be sensible for the signal handler setup code to detect
when it would itself be causing a stack overflow.  Maybe something like
the following patch (untested).  This issue exists in the same way on all
machines, so ideally they would all do a similar check.

When it's the handler function itself or its callees that cause the
overflow, rather than the signal handler frame setup alone crossing the
boundary, this still won't help.  But I don't see any way to distinguish
that from the valid longjmp case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: add DMI quirk for io-delay hangs on Compaq Presario V6000 laptops
Ingo Molnar [Wed, 30 Jan 2008 12:30:05 +0000 (13:30 +0100)]
x86: add DMI quirk for io-delay hangs on Compaq Presario V6000 laptops

add the DMI strings provided by Islam Amer <pharon@gmail.com>, for
the Compaq Presario V6000 (Quanta/30B7).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: make io_delay=0xed the default
Ingo Molnar [Wed, 30 Jan 2008 12:30:05 +0000 (13:30 +0100)]
x86: make io_delay=0xed the default

make io_delay=0xed the default. This frees up port 0x80 which is
a debug port on some machines and locks up certain laptops.

Testing only for now. Try the io_delay=0x80 boot option if this does not
work for you.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: various changes and cleanups to in_p/out_p delay details
Ingo Molnar [Wed, 30 Jan 2008 12:30:05 +0000 (13:30 +0100)]
x86: various changes and cleanups to in_p/out_p delay details

various changes to the in_p/out_p delay details:

- add the io_delay=none method
- make each method selectable from the kernel config
- simplify the delay code a bit by getting rid of an indirect function call
- add the /proc/sys/kernel/io_delay_type sysctl
- change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed
- make the io delay config not depend on CONFIG_DEBUG_KERNEL

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "David P. Reed" <dpreed@reed.com>
16 years agox86: provide a DMI based port 0x80 I/O delay override.
Rene Herman [Wed, 30 Jan 2008 12:30:05 +0000 (13:30 +0100)]
x86: provide a DMI based port 0x80 I/O delay override.

x86: provide a DMI based port 0x80 I/O delay override.

Certain (HP) laptops experience trouble from our port 0x80 I/O delay
writes. This patch provides for a DMI based switch to the "alternate
diagnostic port" 0xed (as used by some BIOSes as well) for these.

David P. Reed confirmed that port 0xed works for him and provides a
proper delay. The symptoms of _not_ working are a hanging machine,
with "hwclock" use being a direct trigger.

Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist but that approach has
two problems.

First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but...

Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally some drivers may be racy on SMP without the bus
locking outb.

Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.

This also introduces a command-line parameter "io_delay" to override
the DMI based choice again:

io_delay=<standard|alternate>

where "standard" means using the standard port 0x80 and "alternate"
port 0xed.

This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and
command-line ("io_delay=udelay") choice for testing purposes as well.

This does not change the io_delay() in the boot code which is using
the same port 0x80 I/O delay but those do not appear to be a problem
as David P. Reed reported the problem was already gone after using the
udelay version. He moreover reported that booting with "acpi=off" also
fixed things and seeing as how ACPI isn't touched until after this DMI
based I/O port switch I believe it's safe to leave the ones in the boot
code be.

The DMI strings from David's HP Pavilion dv9000z are in there already
and we need to get/verify the DMI info from other machines with the
problem, notably the HP Pavilion dv6000z.

This patch is partly based on earlier patches from Pavel Machek and
David P. Reed.

Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fix: s2ram + P4 + tsc = annoyance
Mike Galbraith [Wed, 30 Jan 2008 12:30:04 +0000 (13:30 +0100)]
x86: fix: s2ram + P4 + tsc = annoyance

s2ram recently became useful here, except for the kernel's annoying
habit of disabling my P4's perfectly good TSC.

[  107.894470] CPU 1 is now offline
[  107.894474] SMP alternatives: switching to UP code
[  107.895832] CPU0 attaching sched-domain:
[  107.895836]  domain 0: span 1
[  107.895838]   groups: 1
[  107.896097] CPU1 is down
[    3.726156] Intel machine check architecture supported.
[    3.726165] Intel machine check reporting enabled on CPU#0.
[    3.726167] CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
[    3.726170] CPU0: Thermal monitoring enabled
[    3.726175] Back to C!
[    3.726708] Force enabled HPET at resume
[    3.726775] Enabling non-boot CPUs ...
[    3.727049] CPU0 attaching NULL sched-domain.
[    3.727165] SMP alternatives: switching to SMP code
[    3.727858] Booting processor 1/1 eip 3000
[    3.727862] CPU 1 irqstacks, hard=b042f000 soft=b042d000
[    3.738173] Initializing CPU#1
[    3.798912] Calibrating delay using timer specific routine.. 5986.12 BogoMIPS (lpj=2993061)
[    3.798920] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000
[    3.798931] CPU: Trace cache: 12K uops, L1 D cache: 8K
[    3.798934] CPU: L2 cache: 512K
[    3.798936] CPU: Physical Processor ID: 0
[    3.798938] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000
[    3.798946] Intel machine check architecture supported.
[    3.798952] Intel machine check reporting enabled on CPU#1.
[    3.798955] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
[    3.798959] CPU1: Thermal monitoring enabled
[    3.799161] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
[    3.799187] checking TSC synchronization [CPU#0 -> CPU#1]:
[    3.819181] Measured 63588552840 cycles TSC warp between CPUs, turning off TSC clock.
[    3.819184] Marking TSC unstable due to: check_tsc_sync_source failed.

If check_tsc_warp() is called after initial boot, and the TSC has in the
meantime been set (BIOS, user, silicon, elves) to a value lower than the
last stored/stale value, we blame the TSC.  Reset to pristine condition
after every test.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: hibernation: document __save_processor_state() on x86
Rafael J. Wysocki [Wed, 30 Jan 2008 12:30:04 +0000 (13:30 +0100)]
x86: hibernation: document __save_processor_state() on x86

Document the fact that __save_processor_state() has to save all CPU
registers referred to by the kernel in case a different kernel is
used to load and restore a hibernation image containing it.

Sigend-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fix make mrproper
Sam Ravnborg [Wed, 30 Jan 2008 12:30:04 +0000 (13:30 +0100)]
x86: fix make mrproper

Michael Opdenacker reported:

For backward compatibility with earlier (< 2.6.24) kernels,
arch/i386/boot/bzImage or arch/x86_64/boot/bzImage symbolic links to
arch/x86/boot/bzImage are created when you build an x86 kernel. The
arch/i386 or arch/x86_64 directories are then created for this only
purpose.

Issue: these generated directories and symbolic links are *not cleaned
up* when you run "make mrproper" (and thus "make distclean"). This
disturbs the production of patches, because the source tree is left with
generated files and directories.

Sam has an alternative fix:

The directory is killed during make clean as opposed to make mrproper.

Reported-by: Michael Opdenacker <michael-lists@free-electrons.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agotime: track accurate idle time with tick_sched.idle_sleeptime
Venki Pallipadi [Wed, 30 Jan 2008 12:30:04 +0000 (13:30 +0100)]
time: track accurate idle time with tick_sched.idle_sleeptime

Current idle time in kstat is based on jiffies and is coarse grained.
tick_sched.idle_sleeptime is making some attempt to keep track of idle time
in a fine grained manner.  But, it is not handling the time spent in
interrupts fully.

Make tick_sched.idle_sleeptime accurate with respect to time spent on
handling interrupts and also add tick_sched.idle_lastupdate, which keeps
track of last time when idle_sleeptime was updated.

This statistics will be crucial for cpufreq-ondemand governor, which can
shed some conservative gaurd band that is uses today while setting the
frequency.  The ondemand changes that uses the exact idle time is coming
soon.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoNTP: correct inconsistent ntp interval/tick_length usage
john stultz [Wed, 30 Jan 2008 12:30:03 +0000 (13:30 +0100)]
NTP: correct inconsistent ntp interval/tick_length usage

I recently noticed on one of my boxes that when synched with an NTP
server, the drift value reported for the system was ~283ppm. While in
some cases, clock hardware can be that bad, it struck me as unusual as
the system was using the acpi_pm clocksource, which is one of the more
trustworthy and accurate clocksources on x86 hardware.

I brought up another system and let it sync to the same NTP server, and
I noticed a similar 280some ppm drift.

In looking at the code, I found that the acpi_pm's constant frequency
was being computed correctly at boot-up, however once the system was up,
even without the ntp daemon running, the clocksource's frequency was
being modified by the clocksource_adjust() function.

Digging deeper, I realized that in the code that keeps track of how much
the clocksource is skewing from the ntp desired time, we were using
different lengths to establish how long an time interval was.

The clocksource was being setup with the following interval:
NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ

While the ntp code was using the tick_length_base value:
tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
/NTP_INTERVAL_FREQ

The subtle difference is:
(tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC

This difference in calculation was causing the clocksource correction
code to apply a correction factor to the clocksource so the two
intervals were the same, however this results in the actual frequency of
the clocksource to be made incorrect. I believe this difference would
affect all clocksources, although to differing degrees depending on the
clocksource resolution.

The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
so my apologies for the mistake, and for not noticing it until now.

The following patch, corrects the clocksource's initialization code so
it uses the same interval length as the code in ntp.c. After applying
this patch, the drift value for the same system went from ~283ppm to
only 2.635ppm.

I believe this patch to be good, however it does affect all arches and
I've only tested on x86, so some caution is advised. I do think it would
be a likely candidate for a stable 2.6.24.x release.

Any thoughts or feedback would be appreciated.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: assign IRQs to HPET timers, fix
Balaji Rao [Wed, 30 Jan 2008 12:30:03 +0000 (13:30 +0100)]
x86: assign IRQs to HPET timers, fix

Looks like IRQ 31 is assigned to timer 3, even without the patch!
I wonder who wrote the number 31. But the manual says that it is
zero by default.

I think we should check whether the timer has been allocated an IRQ before
proceeding to assign one to it.  Here is a patch that does this.

Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: assign IRQs to HPET timers
Balaji Rao [Wed, 30 Jan 2008 12:30:03 +0000 (13:30 +0100)]
x86: assign IRQs to HPET timers

The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
device. This patch fixes it by allocating IRQs to timer blocks in the HPET.

arch/x86/kernel/hpet.c |   13 +++++--------
drivers/char/hpet.c    |   45 ++++++++++++++++++++++++++++++++++++++-------
include/linux/hpet.h   |    2 +-
3 files changed, 44 insertions(+), 16 deletions(-)

Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: make clockevents more robust
Ingo Molnar [Wed, 30 Jan 2008 12:30:03 +0000 (13:30 +0100)]
x86: make clockevents more robust

detect zero event-device multiplicators - they then cause
division-by-zero crashes if a clockevent has been initialized
incorrectly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: unregister PIT clocksource when PIT is disabled
Thomas Gleixner [Wed, 30 Jan 2008 12:30:03 +0000 (13:30 +0100)]
x86: unregister PIT clocksource when PIT is disabled

The following scenario might leave PIT as a disfunctional clock source:

    PIT is registered as clocksource
    PM_TIMER is registered as clocksource and enables highres/dyntick mode
    PIT is switched to oneshot mode
    -> now the readout of PIT is bogus, but the user might select PIT
    via the sysfs override, which would break the box as the time
    readout is unusable.

Unregister the PIT clocksource when the PIT clock event device is switched
into shutdown / oneshot mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoclocksource: add unregister function to disable unusable clocksources
Thomas Gleixner [Wed, 30 Jan 2008 12:30:02 +0000 (13:30 +0100)]
clocksource: add unregister function to disable unusable clocksources

On x86 the PIT might become an unusable clocksource. Add an unregister
function to provide a possibilty to remove the PIT from the list of
available clock sources.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: restrict PIT clocksource usage
Thomas Gleixner [Wed, 30 Jan 2008 12:30:02 +0000 (13:30 +0100)]
x86: restrict PIT clocksource usage

PIT clocksource is registered unconditionally even when HPET is enabled
or when PIT is replaced by the local APIC timer. In both cases PIT can
not be used as it is stopped and the readout would be stale.

Prevent registering PIT in those cases.

patch depends on:

  x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too
Ingo Molnar [Wed, 30 Jan 2008 12:30:02 +0000 (13:30 +0100)]
x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too

offer is_hpet_enabled() on !CONFIG_HPET_TIMER too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoclocksource: make clocksource watchdog cycle through online CPUs
Andi Kleen [Wed, 30 Jan 2008 12:30:02 +0000 (13:30 +0100)]
clocksource: make clocksource watchdog cycle through online CPUs

This way it checks if the clocks are synchronized between CPUs too.
This might be able to detect slowly drifting TSCs which only
go wrong over longer time.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoclocksource.c: use init_timer_deferrable for clocksource_watchdog
Parag Warudkar [Wed, 30 Jan 2008 12:30:01 +0000 (13:30 +0100)]
clocksource.c: use init_timer_deferrable for clocksource_watchdog

clocksource_watchdog can use a deferrable timer - reduces wakeups from
idle per second.

Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agotime: fold __get_realtime_clock_ts() into getnstimeofday()
Geert Uytterhoeven [Wed, 30 Jan 2008 12:30:01 +0000 (13:30 +0100)]
time: fold __get_realtime_clock_ts() into getnstimeofday()

  - getnstimeofday() was just a wrapper around __get_realtime_clock_ts()
  - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday()
  - Fix bogus reference to get_realtime_clock_ts(), which never existed

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoclocksource: make CLOCKSOURCE_MASK bullet-proof
Atsushi Nemoto [Wed, 30 Jan 2008 12:30:01 +0000 (13:30 +0100)]
clocksource: make CLOCKSOURCE_MASK bullet-proof

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agotimer: clean up tick-broadcast.c
Thomas Gleixner [Wed, 30 Jan 2008 12:30:01 +0000 (13:30 +0100)]
timer: clean up tick-broadcast.c

clean up tick-broadcast.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agotime: more timer related cleanups
Pavel Machek [Wed, 30 Jan 2008 12:30:00 +0000 (13:30 +0100)]
time: more timer related cleanups

I was confused by FSEC = 10^15 NSEC statement, plus small whitespace
fixes. When there's copyright, there should be GPL.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agotime: timer cleanups
Pavel Machek [Wed, 30 Jan 2008 12:30:00 +0000 (13:30 +0100)]
time: timer cleanups

Small cleanups to tick-related code. Wrong preempt count is followed
by BUG(), so it is hardly KERN_WARNING.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agotime: clean hungarian notation from timers
Pavel Machek [Wed, 30 Jan 2008 12:30:00 +0000 (13:30 +0100)]
time: clean hungarian notation from timers

Clean up hungarian notation from timer code.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agokobj: fix threshold_init_device/kobject_uevent_env oops
Greg KH [Wed, 30 Jan 2008 12:29:58 +0000 (13:29 +0100)]
kobj: fix threshold_init_device/kobject_uevent_env oops

the logic in this function is just crazy.  It's recursive, but we
can circumvent the creation for the kobject and whole creation of the
threshold_block if some conditions are met.  That's why we see the
allocate_threshold_blocks so many times in the callstack, yet only a few
kobjects created.

Then we blow up in kobject_uevent_env() on the first debug printk.
Which means that we are just passing in garbage.

Man, this is one time that comments in code would have been very nice to
have, and why forward goto's into major code blocks are just evil...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoMerge git://git.linux-nfs.org/pub/linux/nfs-2.6
Linus Torvalds [Wed, 30 Jan 2008 08:54:24 +0000 (19:54 +1100)]
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6

* git://git.linux-nfs.org/pub/linux/nfs-2.6: (118 commits)
  NFSv4: Iterate through all nfs_clients when the server recalls a delegation
  NFSv4: Deal more correctly with duplicate delegations
  NFS: Fix a potential race between umount and nfs_access_cache_shrinker()
  NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode
  nfs: convert NFS_*(inode) helpers to static inline
  nfs: obliterate NFS_FLAGS macro
  NFS: Address memory leaks in the NFS client mount option parser
  nfs4: allow nfsv4 acls on non-regular-files
  NFS: Optimise away the sigmask code in aio/dio reads and writes
  SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
  SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create()
  SUNRPC: Clean up block comment preceding rpcb_getport_sync()
  SUNRPC: Use appropriate argument types in rpcb client
  SUNRPC: rpcb_getport_sync() should use built-in hostname generator
  SUNRPC: Clean up functions that free address_strings array
  NFS: NFS version number is unsigned
  NLM: Fix a bogus 'return' in nlmclnt_rpc_release
  NLM: Introduce an arguments structure for nlmclnt_init()
  NLM/NFS: Use cached nlm_host when calling nlmclnt_proc()
  NFS: Invoke nlmclnt_init during NFS mount processing
  ...

16 years agoas-iosched: fix double locking bug in as_merged_requests()
Jens Axboe [Tue, 29 Jan 2008 21:25:18 +0000 (22:25 +0100)]
as-iosched: fix double locking bug in as_merged_requests()

If the two requests belong to the same io context, we will attempt
to lock the same lock twice. But swapping contexts is pointless in
that case, so just check for rioc == nioc before doing the double
lock and copy.

Tested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoNFSv4: Iterate through all nfs_clients when the server recalls a delegation
Trond Myklebust [Sat, 26 Jan 2008 06:06:40 +0000 (01:06 -0500)]
NFSv4: Iterate through all nfs_clients when the server recalls a delegation

The same delegation may have been handed out to more than one nfs_client.
Ensure that if a recall occurs, we return all instances.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFSv4: Deal more correctly with duplicate delegations
Trond Myklebust [Fri, 25 Jan 2008 21:38:18 +0000 (16:38 -0500)]
NFSv4: Deal more correctly with duplicate delegations

If a (broken?) server hands out two different delegations for the same
file, then we should return one of them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Fix a potential race between umount and nfs_access_cache_shrinker()
Trond Myklebust [Fri, 25 Jan 2008 21:38:17 +0000 (16:38 -0500)]
NFS: Fix a potential race between umount and nfs_access_cache_shrinker()

Thanks to Yawei Niu for spotting the race.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode
Trond Myklebust [Thu, 24 Jan 2008 23:14:34 +0000 (18:14 -0500)]
NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode

Otherwise, there is a potential deadlock if the last dput() from an NFSv4
close() or other asynchronous operation leads to nfs_clear_inode calling
the synchronous delegreturn.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agonfs: convert NFS_*(inode) helpers to static inline
Benny Halevy [Wed, 23 Jan 2008 06:59:08 +0000 (08:59 +0200)]
nfs: convert NFS_*(inode) helpers to static inline

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agonfs: obliterate NFS_FLAGS macro
Benny Halevy [Wed, 23 Jan 2008 06:58:59 +0000 (08:58 +0200)]
nfs: obliterate NFS_FLAGS macro

use NFS_I(inode)->flags instead

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Address memory leaks in the NFS client mount option parser
Chuck Lever [Wed, 16 Jan 2008 21:38:10 +0000 (16:38 -0500)]
NFS: Address memory leaks in the NFS client mount option parser

David Howells noticed that repeating the same mount option twice during an
NFS mount request can result in orphaned memory in certain cases.

Only the client_address and mount_server.hostname strings are initialized
in the mount parsing loop, so those appear to be the only two pointers that
might be written over by repeating a mount option.  The strings in the
nfs_server section of the nfs_parsed_mount_data structure are set only once
after the options are parsed, thus these are not susceptible to being
overwritten.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agonfs4: allow nfsv4 acls on non-regular-files
J. Bruce Fields [Tue, 15 Jan 2008 21:43:19 +0000 (16:43 -0500)]
nfs4: allow nfsv4 acls on non-regular-files

The rfc doesn't give any reason it shouldn't be possible to set an
attribute on a non-regular file.  And if the server supports it, then it
shouldn't be up to us to prevent it.

Thanks to Erez for the report and Trond for further analysis.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Optimise away the sigmask code in aio/dio reads and writes
Trond Myklebust [Tue, 15 Jan 2008 19:17:12 +0000 (14:17 -0500)]
NFS: Optimise away the sigmask code in aio/dio reads and writes

There are no interruptible waits for asynchronous RPC tasks, so we don't
need to wrap calls to rpc_run_task() with an
rpc_clnt_sigmask/rpc_clnt_unsigmask pair.

Instead we can wrap the wait_for_completion_interruptible() in
nfs_direct_wait(). This means that we completely optimise away sigmask
setting for the case of non-blocking aio/dio.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
Trond Myklebust [Tue, 15 Jan 2008 19:17:11 +0000 (14:17 -0500)]
SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls

The caller will never sleep in rpc_execute, so don't bother setting the
sigmask.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create()
Chuck Lever [Mon, 14 Jan 2008 20:12:08 +0000 (15:12 -0500)]
SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create()

The variable "sin" is a pointer, so sizeof(sin) is the size of a pointer,
not the size of thing that sin points to.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Clean up block comment preceding rpcb_getport_sync()
Chuck Lever [Mon, 14 Jan 2008 20:12:01 +0000 (15:12 -0500)]
SUNRPC: Clean up block comment preceding rpcb_getport_sync()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Use appropriate argument types in rpcb client
Chuck Lever [Mon, 14 Jan 2008 20:11:53 +0000 (15:11 -0500)]
SUNRPC: Use appropriate argument types in rpcb client

Clean up: Follow recommendations of Chapter 5 of Documentation/CodingStyle
and use "u32" instead of "__u32" for types in definitions that are not
shared with user space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: rpcb_getport_sync() should use built-in hostname generator
Chuck Lever [Mon, 14 Jan 2008 20:11:46 +0000 (15:11 -0500)]
SUNRPC: rpcb_getport_sync() should use built-in hostname generator

rpc_create() can already fill in the hostname with a string representation
of the server's IP address, so remove redundant logic in in
rpcb_getport_sync() that does that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Clean up functions that free address_strings array
Chuck Lever [Mon, 14 Jan 2008 17:32:20 +0000 (12:32 -0500)]
SUNRPC: Clean up functions that free address_strings array

Clean up: document the rule (kfree) and the exceptions
(RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in
a transport's address_strings array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: NFS version number is unsigned
Chuck Lever [Mon, 14 Jan 2008 17:32:05 +0000 (12:32 -0500)]
NFS: NFS version number is unsigned

RPC protocol version numbers are unsigned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNLM: Fix a bogus 'return' in nlmclnt_rpc_release
Trond Myklebust [Fri, 11 Jan 2008 22:41:29 +0000 (17:41 -0500)]
NLM: Fix a bogus 'return' in nlmclnt_rpc_release

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNLM: Introduce an arguments structure for nlmclnt_init()
Chuck Lever [Tue, 15 Jan 2008 21:04:20 +0000 (16:04 -0500)]
NLM: Introduce an arguments structure for nlmclnt_init()

Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the
new nfs_client_initdata structure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
16 years agoNLM/NFS: Use cached nlm_host when calling nlmclnt_proc()
Chuck Lever [Fri, 11 Jan 2008 22:09:59 +0000 (17:09 -0500)]
NLM/NFS: Use cached nlm_host when calling nlmclnt_proc()

Now that each NFS mount point caches its own nlm_host structure, it can be
passed to nlmclnt_proc() for each lock request.  By pinning an nlm_host for
each mount point, we trade the overhead of looking up or creating a fresh
nlm_host struct during every NLM procedure call for a little extra memory.

We also restrict the nlmclnt_proc symbol to limit the use of this call to
in-tree modules.

Note that nlm_lookup_host() (just removed from the client's per-request
NLM processing) could also trigger an nlm_host garbage collection.  Now
client-side nlm_host garbage collection occurs only during NFS mount
processing.  Since the NFS client now holds a reference on these nlm_host
structures, they wouldn't have been affected by garbage collection
anyway.

Given that nlm_lookup_host() reorders the global nlm_host chain after
every successful lookup, and that a garbage collection could be triggered
during the call, we've removed a significant amount of per-NLM-request
CPU processing overhead.

Sidebar: there are only a few remaining references to the internals of
NFS inodes in the client-side NLM code.  The only references I found are
related to extracting or comparing the inode's file handle via NFS_FH().
One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Invoke nlmclnt_init during NFS mount processing
Chuck Lever [Fri, 11 Jan 2008 22:09:52 +0000 (17:09 -0500)]
NFS: Invoke nlmclnt_init during NFS mount processing

Cache an appropriate nlm_host structure in the NFS client's mount point
metadata for later use.

Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if
nfs_start_lockd() returns a non-zero value, its callers ensure that the
mount request fails outright.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNLM: Introduce external nlm_host set-up and tear-down functions
Chuck Lever [Fri, 11 Jan 2008 22:09:44 +0000 (17:09 -0500)]
NLM: Introduce external nlm_host set-up and tear-down functions

We would like to remove the per-lock-operation nlm_lookup_host() call from
nlmclnt_proc().

The new architecture pins an nlm_host structure to each NFS client
superblock that has the "lock" mount option set.  The NFS client passes
in the pinned nlm_host structure during each call to nlmclnt_proc().  NFS
client unmount processing "puts" the nlm_host so it can be garbage-
collected later.

This patch introduces externally callable NLM functions that handle
mount-time nlm_host set up and tear-down.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Fix up constant string declarations in struct rpcbind_args
Trond Myklebust [Tue, 8 Jan 2008 02:16:56 +0000 (21:16 -0500)]
SUNRPC: Fix up constant string declarations in struct rpcbind_args

...and eliminate an unnecessary cast.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: fewer conditionals in the format_ip_address routines
Chuck Lever [Mon, 7 Jan 2008 23:34:48 +0000 (18:34 -0500)]
SUNRPC: fewer conditionals in the format_ip_address routines

Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID.  This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agolockd: Eliminate harmless mixed sign comparison in nlmdbg_cookie2a()
Chuck Lever [Thu, 20 Dec 2007 19:55:11 +0000 (14:55 -0500)]
lockd: Eliminate harmless mixed sign comparison in nlmdbg_cookie2a()

The cookie->len field is unsigned, so the loop index variable in
nlmdbg_cookie2a() should also be unsigned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: nfs_write_end clean up
Chuck Lever [Thu, 20 Dec 2007 19:55:04 +0000 (14:55 -0500)]
NFS: nfs_write_end clean up

Clean up: commit 4899f9c8 added nfs_write_end(), which introduces a
conditional expression that returns an unsigned integer in one arm and
a signed integer in the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Fix minor mixed sign comparison in NFS client's write logic
Chuck Lever [Thu, 20 Dec 2007 19:54:57 +0000 (14:54 -0500)]
NFS: Fix minor mixed sign comparison in NFS client's write logic

Clean up: PAGE_CACHE_SIZE is unsigned, and nfs_pageio_init() takes a size_t.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Use size_t for storing name lengths
Chuck Lever [Thu, 20 Dec 2007 19:54:49 +0000 (14:54 -0500)]
NFS: Use size_t for storing name lengths

Clean up: always use the same type when handling buffer lengths.  As a
bonus, this prevents a mixed sign comparison in idmap_lookup_name.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Fix use of copy_to_user() in idmap_pipe_upcall
Chuck Lever [Thu, 20 Dec 2007 19:54:42 +0000 (14:54 -0500)]
NFS: Fix use of copy_to_user() in idmap_pipe_upcall

The idmap_pipe_upcall() function expects the copy_to_user() function to
return a negative error value if the call fails, but copy_to_user()
returns an unsigned long number of bytes that couldn't be copied.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Clean up fs/nfs/idmap.c
Chuck Lever [Thu, 20 Dec 2007 19:54:35 +0000 (14:54 -0500)]
NFS: Clean up fs/nfs/idmap.c

Clean up white space damage and use standard kernel coding conventions for
return statements.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Fix use of copy_to_user() in gss_pipe_upcall()
Chuck Lever [Thu, 20 Dec 2007 19:54:27 +0000 (14:54 -0500)]
SUNRPC: Fix use of copy_to_user() in gss_pipe_upcall()

The gss_pipe_upcall() function expects the copy_to_user() function to
return a negative error value if the call fails, but copy_to_user()
returns an unsigned long number of bytes that couldn't be copied.

Can rpc_pipefs actually retry a partially completed upcall read?  If
not, then gss_pipe_upcall() should punt any partial read, just like the
upcall logic in net/sunrpc/cache.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Fix the 'proto=' mount option
Trond Myklebust [Thu, 3 Jan 2008 21:29:06 +0000 (16:29 -0500)]
NFS: Fix the 'proto=' mount option

Currently, if you have a server mounted using networking protocol, you
cannot specify a different value using the 'proto=' option on another
mountpoint.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Support per-mountpoint timeout parameters.
Trond Myklebust [Thu, 20 Dec 2007 21:03:59 +0000 (16:03 -0500)]
NFS: Support per-mountpoint timeout parameters.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Ensure that we respect NFS_MAX_TCP_TIMEOUT
Trond Myklebust [Thu, 20 Dec 2007 21:03:57 +0000 (16:03 -0500)]
NFS: Ensure that we respect NFS_MAX_TCP_TIMEOUT

It isn't sufficient just to limit timeout->to_initval, we also need to
limit to_maxval.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Add support for per-client timeout values
Trond Myklebust [Thu, 20 Dec 2007 21:03:55 +0000 (16:03 -0500)]
SUNRPC: Add support for per-client timeout values

In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: Clean up the transport timeout initialisation
Trond Myklebust [Thu, 20 Dec 2007 21:03:54 +0000 (16:03 -0500)]
SUNRPC: Clean up the transport timeout initialisation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: cleanup for rpc_new_client()
Trond Myklebust [Thu, 20 Dec 2007 21:03:53 +0000 (16:03 -0500)]
SUNRPC: cleanup for rpc_new_client()

There is no reason why we shouldn't just pass the rpc_create_args.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFSv4: Add socket proto argument to setclientid
Trond Myklebust [Fri, 14 Dec 2007 19:56:07 +0000 (14:56 -0500)]
NFSv4: Add socket proto argument to setclientid

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Pull covers off IPv6 address parsing
Chuck Lever [Mon, 10 Dec 2007 19:59:35 +0000 (14:59 -0500)]
NFS: Pull covers off IPv6 address parsing

Now that the needed IPv6 infrastructure is in place, allow the NFS client's
IP address parser to generate AF_INET6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Support non-IPv4 addresses in nfs_parsed_mount_data
Chuck Lever [Mon, 10 Dec 2007 19:59:28 +0000 (14:59 -0500)]
NFS: Support non-IPv4 addresses in nfs_parsed_mount_data

Replace the nfs_server and mount_server address fields in the
nfs_parsed_mount_data structure with a "struct sockaddr_storage"
instead of a "struct sockaddr_in".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Refactor mount option address parsing into separate function
Chuck Lever [Mon, 10 Dec 2007 19:59:21 +0000 (14:59 -0500)]
NFS: Refactor mount option address parsing into separate function

Refactor the logic to parse incoming text-based IP addresses.  Use the
in4_pton() function instead of the older in_aton(), following the lead
of the in-kernel CIFS client.

Later we'll add IPv6 address parsing using the matching in6_pton()
function.  For now we can't allow IPv6 address parsing: we must expand
the size of the address storage fields in the nfs_parsed_mount_options
struct before we can parse and store IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Remove the NIPQUAD from nfs_try_mount
Chuck Lever [Mon, 10 Dec 2007 19:59:13 +0000 (14:59 -0500)]
NFS: Remove the NIPQUAD from nfs_try_mount

In the name of address family compatibility, we can't have the NIP_FMT and
NIPQUAD macros in nfs_try_mount().  Instead, we can make use of an unused
mount option to display the mount server's hostname.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Adjust nfs_clone_mount structure to store "struct sockaddr *"
Chuck Lever [Mon, 10 Dec 2007 19:59:06 +0000 (14:59 -0500)]
NFS: Adjust nfs_clone_mount structure to store "struct sockaddr *"

Change the addr field in the nfs_clone_mount structure to store a "struct
sockaddr *" to support non-IPv4 addresses in the NFS client.

Note this is mostly a cosmetic change, and does not actually allow
referrals using IPv6 addresses.  The existing referral code assumes that
the server returns a string that represents an IPv4 address.  This code
needs to support hostnames and IPv6 addresses as well as IPv4 addresses,
thus it will need to be reorganized completely (to handle DNS resolution
in user space).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Change nfs4_set_client() to accept struct sockaddr *
Chuck Lever [Mon, 10 Dec 2007 19:58:59 +0000 (14:58 -0500)]
NFS: Change nfs4_set_client() to accept struct sockaddr *

Adjust the arguments and callers of nfs4_set_client() to pass a "struct
sockaddr *" instead of a "struct sockaddr_in *" to support non-IPv4
addresses in the NFS client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Change nfs_get_client() to take sockaddr *
Chuck Lever [Mon, 10 Dec 2007 19:58:51 +0000 (14:58 -0500)]
NFS: Change nfs_get_client() to take sockaddr *

Adjust arguments and callers of nfs_get_client() to pass a
"struct sockaddr *" instead of "struct sockaddr_in *" to support
non-IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Change nfs_find_client() to take "struct sockaddr *"
Chuck Lever [Mon, 10 Dec 2007 19:58:44 +0000 (14:58 -0500)]
NFS: Change nfs_find_client() to take "struct sockaddr *"

Adjust arguments and callers of nfs_find_client() to pass a
"struct sockaddr *" instead of "struct sockaddr_in *" to support non-IPv4
addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Trond: Also fix up protocol version number argument in nfs_find_client() to
use the correct u32 type.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Change cb_recallargs to pass "struct sockaddr *" instead of sockaddr_in
Chuck Lever [Mon, 10 Dec 2007 19:58:29 +0000 (14:58 -0500)]
NFS: Change cb_recallargs to pass "struct sockaddr *" instead of sockaddr_in

Change the addr field in the cb_recallargs struct to a "struct sockaddr *"
to support non-IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: Change cb_getattrargs to pass "struct sockaddr *" instead of sockaddr_in
Chuck Lever [Mon, 10 Dec 2007 19:58:22 +0000 (14:58 -0500)]
NFS: Change cb_getattrargs to pass "struct sockaddr *" instead of sockaddr_in

Change the addr field in the cb_getattrargs struct to a "struct sockaddr *"
to support non-IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>