pandora-kernel.git
16 years agox86: use defconfigs from x86/configs/*
Sam Ravnborg [Tue, 29 Apr 2008 10:48:15 +0000 (12:48 +0200)]
x86: use defconfigs from x86/configs/*

Daniel Drake <dsd@gentoo.org> reported:

In 2.6.23, if you unpacked a kernel source tarball and then
ran "make menuconfig" you'd be presented with this message:
    # using defaults found in arch/i386/defconfig

and the default options would be set.

The same thing in 2.6.24 does not give you any "using defaults" message, and
the default config options within menuconfig are rather blank (e.g. no PCI
support). You can work around this by explicitly running "make defconfig"
before menuconfig, but it would be nice to have the behaviour the way it was
for 2.6.23 (and the way it still is for other archs).

Fixed by adding a x86 specific defconfig list to Kconfig.

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470
Tested-by: dsd@gentoo.org
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agotoshiba: use ioremap_cached
Alan Cox [Tue, 29 Apr 2008 13:20:23 +0000 (14:20 +0100)]
toshiba: use ioremap_cached

The switch of ioremap to default to uncached doesn't break this driver
but it does needlessly slow it down as BIOS space is cachable and this
driver is quite happy scanning cached ROM space.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agorevert: "x86: ioremap(), extend check to all RAM pages"
Ingo Molnar [Tue, 29 Apr 2008 10:04:51 +0000 (12:04 +0200)]
revert: "x86: ioremap(), extend check to all RAM pages"

Vegard Nossum reported a large (150 seconds) boot delay during bootup,
and bisected it to "x86: ioremap(), extend check to all RAM pages"
(commit bdd3cee2e4b). Revert this commit for now.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: don't bother printing compat vdso address
Jeremy Fitzhardinge [Mon, 28 Apr 2008 18:05:07 +0000 (11:05 -0700)]
x86: don't bother printing compat vdso address

The kernel prints the compat vdso address regardless of whether compat
vdso mode is enabled or not, which is confusing.  Given that this
isn't very interesting information anyway, just remove the printk.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Gerhard Mack <gmack@innerfire.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agofix: x86: support for new UV apic
Andi Kleen [Fri, 25 Apr 2008 09:45:26 +0000 (11:45 +0200)]
fix: x86: support for new UV apic

Don't warn in read_apic_id() when preemptible but only one CPU online.

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: fix early-BUG message
Vegard Nossum [Fri, 25 Apr 2008 19:02:34 +0000 (21:02 +0200)]
x86: fix early-BUG message

The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.

Replace .asciz with multiple .ascii directives and terminate with .asciz.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: iommu_sac_force can become static
Dmitri Vorobiev [Sun, 27 Apr 2008 23:15:58 +0000 (03:15 +0400)]
x86: iommu_sac_force can become static

The iommu_sac_force variable is needlessly defined global,
and this patch makes it static. Additionally, this variable
needs not be explicitly initialized.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: add proper header for reboot_force
Dmitri Vorobiev [Sun, 27 Apr 2008 23:15:59 +0000 (03:15 +0400)]
x86: add proper header for reboot_force

This patch fixes one sparse warning by including the appropriate
header for the reboot_force symbol.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86 VISWS: build fix
Ingo Molnar [Mon, 28 Apr 2008 08:46:58 +0000 (10:46 +0200)]
x86 VISWS: build fix

the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.

also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86, voyager: fix ioremap_nocache()
Ingo Molnar [Sun, 27 Apr 2008 21:21:03 +0000 (23:21 +0200)]
x86, voyager: fix ioremap_nocache()

James Bottomley reported that the following commit:

| commit 6371b495991debfd1417b17c2bc4f7d7bae05739
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Wed Jan 30 13:33:40 2008 +0100
|
|     x86: change ioremap() to default to uncached

broke Voyager.

James says:

" it broke a class of voyager machines: those which
  rely on the quad interrupt controller (QIC).  The precis of why they
  broke is because the QIC does IPIs (or CPIs in its terminology) via
  cache line interference: you interrupt a processor by moving a
  designated memory area to write exclusive in the cache (by simply
  writing to the line) and the CPU acks the interrupt by moving it back to
  read shared (by reading from it).  That area, is, of course, mapped by
  ioremap, so reversing the ioremap semantics and adding the uncached bit
  completely breaks the QIC. "

Sorry about that!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agohpet: fix
Ingo Molnar [Sun, 27 Apr 2008 12:04:14 +0000 (14:04 +0200)]
hpet: fix

Al Viro pointed out that there's a missing readl() of timer->hpet_config,
found by Sparse.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: unexport kmap_atomic_to_page
Adrian Bunk [Mon, 21 Apr 2008 08:51:44 +0000 (11:51 +0300)]
x86: unexport kmap_atomic_to_page

This patch removes the no longer used export of kmap_atomic_to_page.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agox86: remove Xgt_desc_struct
Adrian Bunk [Mon, 21 Apr 2008 08:47:46 +0000 (11:47 +0300)]
x86: remove Xgt_desc_struct

The comment says it should have been removed in 2.6.25.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
16 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux...
Linus Torvalds [Wed, 30 Apr 2008 18:52:52 +0000 (11:52 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
  ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
  acpi: fix section mismatch warning in pnpacpi
  intel_menlo: fix build warning
  ACPI: Cleanup: Remove unneeded, multiple local dummy variables
  ACPI: video - fix permissions on some proc entries
  ACPI: video - properly handle errors when registering proc elements
  ACPI: video - do not store invalid entries in attached_array list
  ACPI: re-name acpi_pm_ops to acpi_suspend_ops
  ACER_WMI/ASUS_LAPTOP: fix build bug
  thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
  ACPI: check a return value correctly in acpi_power_get_context()
  #if 0 acpi/bay.c:eject_removable_drive()
  eeepc-laptop: add hwmon fan control
  eeepc-laptop: add backlight
  eeepc-laptop: add base driver
  ACPI: thinkpad-acpi: bump up version to 0.20
  ACPI: thinkpad-acpi: fix selects in Kconfig
  ACPI: thinkpad-acpi: use a private workqueue
  ACPI: thinkpad-acpi: fluff really minor fix
  ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
  ...

Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.

16 years agoMerge branch 'pnp' into release
Len Brown [Wed, 30 Apr 2008 17:59:05 +0000 (13:59 -0400)]
Merge branch 'pnp' into release

16 years agoMerge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla...
Len Brown [Wed, 30 Apr 2008 17:58:00 +0000 (13:58 -0400)]
Merge branches 'release', 'acpica', 'bugzilla-10224', 'bugzilla-9772', 'bugzilla-9916', 'ec', 'eeepc', 'idle', 'misc', 'pm-legacy', 'sysfs-links-2.6.26', 'thermal', 'thinkpad' and 'video' into release

16 years agoACPI: Fix acpi_processor_idle and idle= boot parameters interaction
Venkatesh Pallipadi [Wed, 30 Apr 2008 17:57:15 +0000 (13:57 -0400)]
ACPI: Fix acpi_processor_idle and idle= boot parameters interaction

acpi_processor_idle and "idle=" boot parameter interaction is broken.
The problem is that, at boot time acpi driver is checking for "idle=" boot
option and not registering the acpi idle handler. But, when there is a CST
changed callback (typically when switching AC <-> battery or suspend-resume)
there are no checks for boot_option_idle_override and acpi idle handler tries
to get installed with nasty side effects.

With CPU_IDLE configured this issue causes results in a nasty oops on CST
change callback and without CPU_IDLE there is no oops, but boot option
of "idle=" gets ignored and acpi idle handler gets installed.

Change the behavior to not do anything in acpi idle handler when there is a
"idle=" boot option.

Note that the problem is only there when "idle=" boot option is used.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
16 years agoacpi: fix section mismatch warning in pnpacpi
Sam Ravnborg [Tue, 29 Apr 2008 20:52:01 +0000 (22:52 +0200)]
acpi: fix section mismatch warning in pnpacpi

Fix following section mismatch warning:
WARNING: vmlinux.o(.text+0x153d69): Section mismatch in reference from the function is_exclusive_device() to the variable .init.data:excluded_id_list

is_exclusive_device is only used from __init context so document
this with the __init annotation and get rid of the warning.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Len Brown <len.brown@intel.com>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Wed, 30 Apr 2008 16:22:27 +0000 (09:22 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  [ALSA] soc - neo1973_wm8753.c add suspend and shutdown hooks for lm4857 chip
  [ALSA] soc - neo1973_wm8753.c change maintainer contact info
  [ALSA] soc - neo1973_wm8753.c cleanup checkpatch issues
  [ALSA] soc - ln2440sbc_alc650 - Fix checkpatch warnings
  [ALSA] soc - s3c24xx-pcm - Fix checkpatch warnings
  [ALSA] soc - s3c2443-ac97 - Fix checkpatch warnings
  [ALSA] soc - wm8753 - Clean up checkpatch warnings

16 years ago[ALSA] soc - neo1973_wm8753.c add suspend and shutdown hooks for lm4857 chip
Graeme Gregory [Wed, 30 Apr 2008 18:26:45 +0000 (20:26 +0200)]
[ALSA] soc - neo1973_wm8753.c add suspend and shutdown hooks for lm4857 chip

Patch taken from the openmoko bugtracker
http://bugzilla.openmoko.org/cgi-bin/bugzilla/show_bug.cgi?id=781

This patch adds Suspend/Resume and Shutdown support for the lm4857 to
the driver.

Signed-off-by: Graeme Gregory <graeme@openmoko.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years ago[ALSA] soc - neo1973_wm8753.c change maintainer contact info
Graeme Gregory [Wed, 30 Apr 2008 18:25:23 +0000 (20:25 +0200)]
[ALSA] soc - neo1973_wm8753.c change maintainer contact info

I have moved workplaces since I originally wrote this driver so update
the contact info for new employers.

Signed-off-by: Graeme Gregory <graeme@openmoko.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years ago[ALSA] soc - neo1973_wm8753.c cleanup checkpatch issues
Graeme Gregory [Wed, 30 Apr 2008 18:24:54 +0000 (20:24 +0200)]
[ALSA] soc - neo1973_wm8753.c cleanup checkpatch issues

Clean up a few issues with the file that checkpatch noted, no functionality
changes.

Signed-off-by: Graeme Gregory <graeme@openmoko.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years ago[ALSA] soc - ln2440sbc_alc650 - Fix checkpatch warnings
Mark Brown [Wed, 30 Apr 2008 15:19:57 +0000 (17:19 +0200)]
[ALSA] soc - ln2440sbc_alc650 - Fix checkpatch warnings

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years ago[ALSA] soc - s3c24xx-pcm - Fix checkpatch warnings
Mark Brown [Wed, 30 Apr 2008 15:19:32 +0000 (17:19 +0200)]
[ALSA] soc - s3c24xx-pcm - Fix checkpatch warnings

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years ago[ALSA] soc - s3c2443-ac97 - Fix checkpatch warnings
Mark Brown [Wed, 30 Apr 2008 15:19:07 +0000 (17:19 +0200)]
[ALSA] soc - s3c2443-ac97 - Fix checkpatch warnings

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years ago[ALSA] soc - wm8753 - Clean up checkpatch warnings
Mark Brown [Wed, 30 Apr 2008 15:18:43 +0000 (17:18 +0200)]
[ALSA] soc - wm8753 - Clean up checkpatch warnings

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Wed, 30 Apr 2008 15:46:16 +0000 (08:46 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: remove duplicated include
  sparc: Add kgdb support.
  kgdbts: Sparc needs sstep emulation.
  sparc32: Kill smp_message_pass() and related code.
  sparc64: Kill PIL_RESERVED, unused.
  sparc64: Split entry.S up into seperate files.

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Wed, 30 Apr 2008 15:45:48 +0000 (08:45 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  tcp: Overflow bug in Vegas
  [IPv4] UFO: prevent generation of chained skb destined to UFO device
  iwlwifi: move the selects to the tristate drivers
  ipv4: annotate a few functions __init in ipconfig.c
  atm: ambassador: vcc_sf semaphore to mutex
  MAINTAINERS: The socketcan-core list is subscribers-only.
  netfilter: nf_conntrack: padding breaks conntrack hash on ARM
  ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
  sch_sfq: use del_timer_sync() in sfq_destroy()
  net: Add compat support for getsockopt (MCAST_MSFILTER)
  net: Several cleanups for the setsockopt compat support.
  ipvs: fix oops in backup for fwmark conn templates
  bridge: kernel panic when unloading bridge module
  bridge: fix error handling in br_add_if()
  netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
  netfilter: x_tables: fix net namespace leak when reading /proc/net/xxx_tables_names
  netfilter: xt_TCPOPTSTRIP: signed tcphoff for ipv6_skip_exthdr() retval
  tcp: Limit cwnd growth when deferring for GSO
  tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled
  [netdrvr] gianfar: Determine TBIPA value dynamically
  ...

16 years agoinlining: do not allow gcc below version 4 to optimize inlining
Ingo Molnar [Tue, 29 Apr 2008 22:15:31 +0000 (00:15 +0200)]
inlining: do not allow gcc below version 4 to optimize inlining

fix the condition to match intention: always use the old inlining
behavior on all gcc versions below 4.

this should solve the UML build problem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoUpdate .mailmap
S.Çağlar Onur [Wed, 30 Apr 2008 12:29:02 +0000 (15:29 +0300)]
Update .mailmap

I realize some of the maintainers email clients and/or scripts cannot
handle UTF-8 encoded names properly, as a result your ChangeLogs
displays me as two different person :).

Following patch adds correctly encoded name of mine into .mailmap, to
prevent appearing it not to be so or badly displayed.

Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
Linus Torvalds [Wed, 30 Apr 2008 15:38:30 +0000 (08:38 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6

* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] Update default configuration.
  [S390] use generic sys_ptrace
  [S390] Remove self ptrace IEEE_IP hack.
  [S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP
  [S390] System z large page support.
  [S390] Convert machine feature detection code to C.
  [S390] vmemmap: use clear_table to initialise page tables.
  [S390] Move stfl to system.h and delete duplicated version.
  [S390] uaccess_mvcos: #ifdef config dependent code.
  [S390] cpu topology: Fix possible deadlock.
  [S390] Add topology_core_siblings to topology.h
  [S390] cio: Make isc handling more robust.
  [S390] remove -traditional
  [S390] Automatically detect added cpus.
  [S390] smp: Fix locking order.
  [S390] Add missing ifndef/define to include/asm-s390/sysinfo.h.
  [S390] Move show_regs to traps.c.
  [S390] cio: Use strict_strtoul() for attributes.

16 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Linus Torvalds [Wed, 30 Apr 2008 15:37:40 +0000 (08:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/paulus/powerpc

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Fix crashkernel= handling when no crashkernel= specified
  [POWERPC] Make emergency stack safe for current_thread_info() use
  [POWERPC] spufs: add .gitignore for spu_save_dump.h & spu_restore_dump.h
  [POWERPC] spufs: trace spu_acquire_saved events
  [POWERPC] spufs: fix marker name for find_victim
  [POWERPC] spufs: add marker for destroy_spu_context
  [POWERPC] spufs: add sputrace marker parameter names
  [POWERPC] spufs: add context switch notification log
  [POWERPC] mpc5200: defconfigs for CM5200, Lite5200B, Motion-PRO and TQM5200
  [POWERPC] mpc5200: Switch mpc5200 dts files to dts-v1 format
  [POWERPC] mpc5200: Fix FEC error handling on FIFO errors
  [POWERPC] mpc5200: add Phytec pcm030 board support
  [POWERPC] mpc5200: add gpiolib support for mpc5200
  [POWERPC] mpc5200: add interrupt type function
  [POWERPC] mpc5200: Fix unterminated of_device_id table

16 years agofix drivers/media/common/tuners/ build bug
Ingo Molnar [Wed, 30 Apr 2008 09:50:11 +0000 (11:50 +0200)]
fix drivers/media/common/tuners/ build bug

x86.git randconfig testing found a build failure on latest -git:

 drivers/built-in.o: In function `set_type':
 tuner-core.c:(.text+0x2a9a26): undefined reference to `tea5761_attach'
 tuner-core.c:(.text+0x2a9d05): undefined reference to `tda9887_attach'
 tuner-core.c:(.text+0x2a9d51): undefined reference to `xc2028_attach'
 tuner-core.c:(.text+0x2a9e22): undefined reference to `tda829x_attach'
 tuner-core.c:(.text+0x2a9e3f): undefined reference to `microtune_attach'
 drivers/built-in.o: In function `tuner_probe':
 tuner-core.c:(.text+0x2aa18a): undefined reference to `tda829x_probe'
 tuner-core.c:(.text+0x2aa302): undefined reference to `tea5761_autodetection'

with the following config:

 http://redhat.com/~mingo/misc/config-Wed_Apr_30_10_21_40_CEST_2008.bad

the problem is caused by the drivers/media/common/tuners/ subdirectory
not being part of the kbuild hierarchy anymore, due to commit
7c91f0624 ("V4L/DVB(7767): Move tuners to common/tuners").

this seems similar to the problem also reported by Mike Galbraith.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agorevert "memory hotplug: allocate usemap on the section with pgdat"
Andrew Morton [Wed, 30 Apr 2008 07:55:17 +0000 (00:55 -0700)]
revert "memory hotplug: allocate usemap on the section with pgdat"

This:

commit 86f6dae1377523689bd8468fed2f2dd180fc0560
Author: Yasunori Goto <y-goto@jp.fujitsu.com>
Date:   Mon Apr 28 02:13:33 2008 -0700

    memory hotplug: allocate usemap on the section with pgdat

    Usemaps are allocated on the section which has pgdat by this.

    Because usemap size is very small, many other sections usemaps are allocated
    on only one page.  If a section has usemap, it can't be removed until removing
    other sections.  This dependency is not desirable for memory removing.

    Pgdat has similar feature.  When a section has pgdat area, it must be the last
    section for removing on the node.  So, if section A has pgdat and section B
    has usemap for section A, Both sections can't be removed due to dependency
    each other.

    To solve this issue, this patch collects usemap on same section with pgdat.
    If other sections doesn't have any dependency, this section will be able to be
    removed finally.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
broke davem's sparc64 bootup.  Revert it while we work out what went wrong.

Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: fix warning on memory offline
Nick Piggin [Wed, 30 Apr 2008 07:55:16 +0000 (00:55 -0700)]
mm: fix warning on memory offline

KAMEZAWA Hiroyuki found a warning message in the buffer dirtying code that
is coming from page migration caller.

WARNING: at fs/buffer.c:720 __set_page_dirty+0x330/0x360()
Call Trace:
 [<a000000100015220>] show_stack+0x80/0xa0
 [<a000000100015270>] dump_stack+0x30/0x60
 [<a000000100089ed0>] warn_on_slowpath+0x90/0xe0
 [<a0000001001f8b10>] __set_page_dirty+0x330/0x360
 [<a0000001001ffb90>] __set_page_dirty_buffers+0xd0/0x280
 [<a00000010012fec0>] set_page_dirty+0xc0/0x260
 [<a000000100195670>] migrate_page_copy+0x5d0/0x5e0
 [<a000000100197840>] buffer_migrate_page+0x2e0/0x3c0
 [<a000000100195eb0>] migrate_pages+0x770/0xe00

What was happening is that migrate_page_copy wants to transfer the PG_dirty
bit from old page to new page, so what it would do is set_page_dirty(newpage).
However set_page_dirty() is used to set the entire page dirty, wheras in
this case, only part of the page was dirty, and it also was not uptodate.

Marking the whole page dirty with set_page_dirty would lead to corruption or
unresolvable conditions -- a dirty && !uptodate page and dirty && !uptodate
buffers.

Possibly we could just ClearPageDirty(oldpage); SetPageDirty(newpage);
however in the interests of keeping the change minimal...

Signed-off-by: Nick Piggin <npiggin@suse.de>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoDrop the exporting of empty <linux/byteorder/generic.h>
Robert P. J. Day [Wed, 30 Apr 2008 07:55:14 +0000 (00:55 -0700)]
Drop the exporting of empty <linux/byteorder/generic.h>

Fix up the contents of <linux/byteorder/> so that it doesn't export a
content-free generic.h to user space.  This involves:

* Removing the __KERNEL__ tests from generic.h and dropping it from
  Kbuild.
* Wrapping the inclusions of generic.h in both big_endian.h and
  little_endian.h in __KERNEL__ tests.
* Shifting big_endian.h and little_endian.h from header-y to
  unifdef-y in Kbuild.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoremove __KERNEL__ tests of unexported headers under asm-generic/
Robert P. J. Day [Wed, 30 Apr 2008 07:55:13 +0000 (00:55 -0700)]
remove __KERNEL__ tests of unexported headers under asm-generic/

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoRemove "#ifdef __KERNEL__" checks from unexported headers
Robert P. J. Day [Wed, 30 Apr 2008 07:55:12 +0000 (00:55 -0700)]
Remove "#ifdef __KERNEL__" checks from unexported headers

Remove the "#ifdef __KERNEL__" tests from unexported header files in
linux/include whose entire contents are wrapped in that preprocessor
test.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoserial: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:10 +0000 (00:55 -0700)]
serial: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodrivers/char: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:10 +0000 (00:55 -0700)]
drivers/char: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofs: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:09 +0000 (00:55 -0700)]
fs: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoafs: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:09 +0000 (00:55 -0700)]
afs: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agolib: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:08 +0000 (00:55 -0700)]
lib: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agokernel: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:08 +0000 (00:55 -0700)]
kernel: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: remove remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:55:07 +0000 (00:55 -0700)]
mm: remove remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agobrd: modify ramdisk device to be able to manage partitions
Laurent Vivier [Wed, 30 Apr 2008 07:55:06 +0000 (00:55 -0700)]
brd: modify ramdisk device to be able to manage partitions

This patch adds partition management for Block RAM Device (BRD).

This patch is done to keep in sync BRD and loop device drivers.

This patch adds a parameter to the module, max_part, to specify
the maximum number of partitions per RAM device.

Example:

# modprobe brd max_part=63
# ls -l /dev/ram*
brw-rw---- 1 root disk 1,   0 2008-04-03 13:39 /dev/ram0
brw-rw---- 1 root disk 1,  64 2008-04-03 13:39 /dev/ram1
brw-rw---- 1 root disk 1, 640 2008-04-03 13:39 /dev/ram10
brw-rw---- 1 root disk 1, 704 2008-04-03 13:39 /dev/ram11
brw-rw---- 1 root disk 1, 768 2008-04-03 13:39 /dev/ram12
brw-rw---- 1 root disk 1, 832 2008-04-03 13:39 /dev/ram13
brw-rw---- 1 root disk 1, 896 2008-04-03 13:39 /dev/ram14
brw-rw---- 1 root disk 1, 960 2008-04-03 13:39 /dev/ram15
brw-rw---- 1 root disk 1, 128 2008-04-03 13:39 /dev/ram2
brw-rw---- 1 root disk 1, 192 2008-04-03 13:39 /dev/ram3
brw-rw---- 1 root disk 1, 256 2008-04-03 13:39 /dev/ram4
brw-rw---- 1 root disk 1, 320 2008-04-03 13:39 /dev/ram5
brw-rw---- 1 root disk 1, 384 2008-04-03 13:39 /dev/ram6
brw-rw---- 1 root disk 1, 448 2008-04-03 13:39 /dev/ram7
brw-rw---- 1 root disk 1, 512 2008-04-03 13:39 /dev/ram8
brw-rw---- 1 root disk 1, 576 2008-04-03 13:39 /dev/ram9
# fdisk /dev/ram0
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): o
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-2, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-2, default 2): 2

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
# ls -l /dev/ram0*
brw-rw---- 1 root disk 1, 0 2008-04-03 13:40 /dev/ram0
brw-rw---- 1 root disk 1, 1 2008-04-03 13:40 /dev/ram0p1
# mkfs /dev/ram0p1
mke2fs 1.40-WIP (14-Nov-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
4016 inodes, 16032 blocks
801 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=16515072
2 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
8193

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
# mount /dev/ram0p1 /mnt
df /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/ram0p1              15521       138     14582   1% /mnt
# ls -l /mnt
total 12
drwx------ 2 root root 12288 2008-04-03 13:41 lost+found
# umount /mnt
# rmmod brd

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoadd hrtimer specific debugobjects code
Thomas Gleixner [Wed, 30 Apr 2008 07:55:04 +0000 (00:55 -0700)]
add hrtimer specific debugobjects code

hrtimers have now dynamic users in the network code.  Put them under
debugobjects surveillance as well.

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodebugobjects: add timer specific object debugging code
Thomas Gleixner [Wed, 30 Apr 2008 07:55:03 +0000 (00:55 -0700)]
debugobjects: add timer specific object debugging code

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodebugobjects: add documentation
Thomas Gleixner [Wed, 30 Apr 2008 07:55:02 +0000 (00:55 -0700)]
debugobjects: add documentation

Add a DocBook for debugobjects.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoinfrastructure to debug (dynamic) objects
Thomas Gleixner [Wed, 30 Apr 2008 07:55:01 +0000 (00:55 -0700)]
infrastructure to debug (dynamic) objects

We can see an ever repeating problem pattern with objects of any kind in the
kernel:

1) freeing of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore.  One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.

While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause.  This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.

The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.

Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.

The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash.  Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.

Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.

The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation.  When the sanity check triggers a warning message
and a stack trace is printed.

The list of operations can be extended if the need arises.  For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets.  The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
global lock.

The debug code can be compiled in without being active.  The runtime overhead
is minimal and could be optimized by asm alternatives.  A kernel command line
option enables the debugging code.

Thanks to Ingo Molnar for review, suggestions and cleanup patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoslab: add a flag to prevent debug_free checks on a kmem_cache
Thomas Gleixner [Wed, 30 Apr 2008 07:54:59 +0000 (00:54 -0700)]
slab: add a flag to prevent debug_free checks on a kmem_cache

This is a preperatory patch for the debugobjects infrastructure.  The flag
prevents debug_free checks on kmem_caches.  This is necessary to avoid
resursive calls into a debug mechanism which uses a kmem_cache itself.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodrivers: replace remaining __FUNCTION__ occurrences
Harvey Harrison [Wed, 30 Apr 2008 07:54:57 +0000 (00:54 -0700)]
drivers: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoAdd macros similar to min/max/min_t/max_t
Harvey Harrison [Wed, 30 Apr 2008 07:54:55 +0000 (00:54 -0700)]
Add macros similar to min/max/min_t/max_t

Also, change the variable names used in the min/max macros to avoid shadowed
variable warnings when min/max min_t/max_t are nested.

Small formatting changes to make all the macros have a similar form.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix v4l build]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoalloc_uid: cleanup
Andrew Morton [Wed, 30 Apr 2008 07:54:54 +0000 (00:54 -0700)]
alloc_uid: cleanup

Use kmem_cache_zalloc(), remove large amounts of initialisation code and
ifdeffery.

Note: this assumes that memset(*atomic_t, 0) correctly initialises the
atomic_t.  This is true for all present archtiectures and if it becomes false
for a future architecture then we'll need to make large changes all over the
place anyway.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agohfsplus: fix warning with 64k PAGE_SIZE
Andrew Morton [Wed, 30 Apr 2008 07:54:54 +0000 (00:54 -0700)]
hfsplus: fix warning with 64k PAGE_SIZE

fs/hfsplus/btree.c: In function 'hfsplus_bmap_alloc':
fs/hfsplus/btree.c:239: warning: comparison is always false due to limited range of data type

But this might hide a real bug?

Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agohfs: fix warning with 64k PAGE_SIZE
Andrew Morton [Wed, 30 Apr 2008 07:54:53 +0000 (00:54 -0700)]
hfs: fix warning with 64k PAGE_SIZE

fs/hfs/btree.c: In function 'hfs_bmap_alloc':
fs/hfs/btree.c:263: warning: comparison is always false due to limited range of data type

The patch makes the warning go away, but the code might actually be buggy?

Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoprintk: don't read beyond string arguments' terminating zero
Markus Armbruster [Wed, 30 Apr 2008 07:54:52 +0000 (00:54 -0700)]
printk: don't read beyond string arguments' terminating zero

Fix update_console_cmdline() not to to read beyond the terminating zero of its
name argument.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoBasic braille screen reader support
Samuel Thibault [Wed, 30 Apr 2008 07:54:51 +0000 (00:54 -0700)]
Basic braille screen reader support

This adds a minimalistic braille screen reader support.  This is meant to
be used by blind people e.g.  on boot failures or when / cannot be mounted
etc and thus the userland screen readers can not work.

[akpm@linux-foundation.org: fix exports]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jiri Kosina <jikos@jikos.cz>
Cc: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoasm-*/futex.h should include linux/uaccess.h
Jeff Dike [Wed, 30 Apr 2008 07:54:49 +0000 (00:54 -0700)]
asm-*/futex.h should include linux/uaccess.h

Lots of asm-*/futex.h call pagefault_enable and pagefault_disable, which
are declared in linux/uaccess.h, without including linux/uaccess.h.

They all include asm/uaccess.h, so this patch replaces asm/uaccess.h
with linux/uaccess.h.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agosysv: [bl]e*_add_cpu conversion
Marcin Slusarz [Wed, 30 Apr 2008 07:54:49 +0000 (00:54 -0700)]
sysv: [bl]e*_add_cpu conversion

replace all:
big/little_endian_variable = cpu_to_[bl]eX([bl]eX_to_cpu(big/little_endian_variable) +
expression_in_cpu_byteorder);
with:
[bl]eX_add_cpu(&big/little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoquota: le*_add_cpu conversion
Marcin Slusarz [Wed, 30 Apr 2008 07:54:48 +0000 (00:54 -0700)]
quota: le*_add_cpu conversion

replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
expression_in_cpu_byteorder);
with:
leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agohfs/hfsplus: be*_add_cpu conversion
Marcin Slusarz [Wed, 30 Apr 2008 07:54:47 +0000 (00:54 -0700)]
hfs/hfsplus: be*_add_cpu conversion

replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
expression_in_cpu_byteorder);
with:
beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoaffs: be*_add_cpu conversion
Marcin Slusarz [Wed, 30 Apr 2008 07:54:47 +0000 (00:54 -0700)]
affs: be*_add_cpu conversion

replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
expression_in_cpu_byteorder);
with:
beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoreiserfs: use open_bdev_excl
Christoph Hellwig [Wed, 30 Apr 2008 07:54:46 +0000 (00:54 -0700)]
reiserfs: use open_bdev_excl

Use the proper helper to open a blockdevice by name for filesystem use,
this makes sure it's properly claimed (also added for open-by-number) and
gets rid of the struct file abuse.

Tested by mounting a reiserfs filesystem with external journal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: fix sparse warnings
Miklos Szeredi [Wed, 30 Apr 2008 07:54:45 +0000 (00:54 -0700)]
fuse: fix sparse warnings

fs/fuse/dev.c:306:2: warning: context imbalance in 'wait_answer_interruptible' - unexpected unlock
fs/fuse/dev.c:361:2: warning: context imbalance in 'request_wait_answer' - unexpected unlock
fs/fuse/dev.c:1002:4: warning: context imbalance in 'end_io_requests' - unexpected unlock

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: fix race in llseek
Miklos Szeredi [Wed, 30 Apr 2008 07:54:45 +0000 (00:54 -0700)]
fuse: fix race in llseek

Fuse doesn't use i_mutex to protect setting i_size, and so
generic_file_llseek() can be racy: it doesn't use i_size_read().

So do a fuse specific llseek method, which does use i_size_read().

[akpm@linux-foundation.org: make `retval' loff_t]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: fix node ID type
Miklos Szeredi [Wed, 30 Apr 2008 07:54:44 +0000 (00:54 -0700)]
fuse: fix node ID type

Node ID is 64bit but it is passed as unsigned long to some functions.  This
breakage wasn't noticed, because libfuse uses unsigned long too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: fix max i/o size calculation
Miklos Szeredi [Wed, 30 Apr 2008 07:54:44 +0000 (00:54 -0700)]
fuse: fix max i/o size calculation

Fix a bug that Werner Baumann reported: fuse can send a bigger write request
than the maximum specified.  This only affected direct_io operation.

In addition set a sane minimum for the max_read and max_write tunables, so I/O
always makes some progress.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: update file size on short read
Miklos Szeredi [Wed, 30 Apr 2008 07:54:43 +0000 (00:54 -0700)]
fuse: update file size on short read

If the READ request returned a short count, then either

  - cached size is incorrect
  - filesystem is buggy, as short reads are only allowed on EOF

So assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: implement perform_write
Nick Piggin [Wed, 30 Apr 2008 07:54:42 +0000 (00:54 -0700)]
fuse: implement perform_write

Introduce fuse_perform_write.  With fusexmp (a passthrough filesystem), large
(1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
(256MB/s vs 71MB/s).

[mszeredi@suse.cz]:

 - split into smaller functions
 - testing
 - duplicate generic_file_aio_write(), so that there's no need to add a
   new ->perform_write() a_op.  Comment from hch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: clean up setting i_size in write
Miklos Szeredi [Wed, 30 Apr 2008 07:54:41 +0000 (00:54 -0700)]
fuse: clean up setting i_size in write

Extract common code for setting i_size in write functions into a common
helper.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofuse: support writable mmap
Miklos Szeredi [Wed, 30 Apr 2008 07:54:41 +0000 (00:54 -0700)]
fuse: support writable mmap

Quoting Linus (3 years ago, FUSE inclusion discussions):

  "User-space filesystems are hard to get right. I'd claim that they
   are almost impossible, unless you limit them somehow (shared
   writable mappings are the nastiest part - if you don't have those,
   you can reasonably limit your problems by limiting the number of
   dirty pages you accept through normal "write()" calls)."

Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others).  This nicely solved the biggest problem: limiting the number of pages
used for write caching.

Some small details remained, however, which this largish patch attempts to
address.  It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks.  Performance may not be very
good for certain usage patterns, but generally it should be acceptable.

It has been tested extensively with fsx-linux and bash-shared-mapping.

Fuse page writeback design
--------------------------

fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.

The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.

For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented.  The per-bdi writeback count is not decremented until the actual
write completes.

On dirtying the page, fuse waits for a previous write to finish before
proceeding.  This makes sure, there can only be one temporary page used at a
time for one cached page.

This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?

The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write.  It may be buggy or even
malicious, and fail to complete WRITE requests.  We don't want unrelated parts
of the system to grind to a halt in such cases.

Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request.  There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.

Currently there are several cases where the kernel can block on page
writeback:

  - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
  - page migration
  - throttle_vm_writeout (through NR_WRITEBACK)
  - sync(2)

Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.

As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default.  This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.

With appropriate privileges, this limit can be raised through
'/sys/class/bdi/<bdi>/max_ratio'.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: document missing fields for /proc/meminfo
Miklos Szeredi [Wed, 30 Apr 2008 07:54:39 +0000 (00:54 -0700)]
mm: document missing fields for /proc/meminfo

A few fields in /proc/meminfo were not documented.  Fix.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: Add NR_WRITEBACK_TEMP counter
Miklos Szeredi [Wed, 30 Apr 2008 07:54:38 +0000 (00:54 -0700)]
mm: Add NR_WRITEBACK_TEMP counter

Fuse will use temporary buffers to write back dirty data from memory mappings
(normal writes are done synchronously).  This is needed, because there cannot
be any guarantee about the time in which a write will complete.

By using temporary buffers, from the MM's point if view the page is written
back immediately.  If the writeout was due to memory pressure, this
effectively migrates data from a full zone to a less full zone.

This patch adds a new counter (NR_WRITEBACK_TEMP) for the number of pages used
as temporary buffers.

[Lee.Schermerhorn@hp.com: add vmstat_text for NR_WRITEBACK_TEMP]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: export bdi_writeout_inc()
Miklos Szeredi [Wed, 30 Apr 2008 07:54:37 +0000 (00:54 -0700)]
mm: bdi: export bdi_writeout_inc()

Fuse needs this for writable mmap support.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: add separate writeback accounting capability
Miklos Szeredi [Wed, 30 Apr 2008 07:54:37 +0000 (00:54 -0700)]
mm: bdi: add separate writeback accounting capability

Add a new BDI capability flag: BDI_CAP_NO_ACCT_WB.  If this flag is
set, then don't update the per-bdi writeback stats from
test_set_page_writeback() and test_clear_page_writeback().

Misc cleanups:

 - convert bdi_cap_writeback_dirty() and friends to static inline functions
 - create a flag that includes all three dirty/writeback related flags,
   since almst all users will want to have them toghether

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: move statistics to debugfs
Miklos Szeredi [Wed, 30 Apr 2008 07:54:36 +0000 (00:54 -0700)]
mm: bdi: move statistics to debugfs

Move BDI statistics to debugfs:

   /sys/kernel/debug/bdi/<bdi>/stats

Use postcore_initcall() to initialize the sysfs class and debugfs,
because debugfs is initialized in core_initcall().

Update descriptions in ABI documentation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: allow setting a maximum for the bdi dirty limit
Peter Zijlstra [Wed, 30 Apr 2008 07:54:36 +0000 (00:54 -0700)]
mm: bdi: allow setting a maximum for the bdi dirty limit

Add "max_ratio" to /sys/class/bdi.  This indicates the maximum percentage of
the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in max_ratio_store().
 - export bdi_set_max_ratio() to modules
 - limit bdi_dirty with bdi->max_ratio
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: allow setting a minimum for the bdi dirty limit
Peter Zijlstra [Wed, 30 Apr 2008 07:54:35 +0000 (00:54 -0700)]
mm: bdi: allow setting a minimum for the bdi dirty limit

Under normal circumstances each device is given a part of the total write-back
cache that relates to its current avg writeout speed in relation to the other
devices.

min_ratio - allows one to assign a minimum portion of the write-back cache to
a particular device.  This is useful in situations where you might want to
provide a minimum QoS.  (One request for this feature came from flash based
storage people who wanted to avoid writing out at all costs - they of course
needed some pdflush hacks as well)

max_ratio - allows one to assign a maximum portion of the dirty limit to a
particular device.  This is useful in situations where you want to avoid one
device taking all or most of the write-back cache.  Eg.  an NFS mount that is
prone to get stuck, or a FUSE mount which you don't trust to play fair.

Add "min_ratio" to /sys/class/bdi.  This indicates the minimum percentage of
the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in min_ratio_store()
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: expose the BDI object in sysfs for FUSE
Miklos Szeredi [Wed, 30 Apr 2008 07:54:34 +0000 (00:54 -0700)]
mm: bdi: expose the BDI object in sysfs for FUSE

Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"

Make the fuse control filesystem use s_dev instead of a fuse specific ID.
This makes it easier to match directories under /sys/fs/fuse/connections/ with
directories under /sys/class/bdi, and with actual mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: expose the BDI object in sysfs for NFS
Miklos Szeredi [Wed, 30 Apr 2008 07:54:33 +0000 (00:54 -0700)]
mm: bdi: expose the BDI object in sysfs for NFS

Register NFS' backing_dev_info under sysfs with the name "nfs-MAJOR:MINOR"

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomm: bdi: export BDI attributes in sysfs
Peter Zijlstra [Wed, 30 Apr 2008 07:54:32 +0000 (00:54 -0700)]
mm: bdi: export BDI attributes in sysfs

Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all relevant
users and /sys/block/<block>/queue/read_ahead_kb should be deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

 - split off NFS and FUSE changes into separate patches
 - document new sysfs attributes under Documentation/ABI
 - do bdi_class_init as a core_initcall, otherwise the "default" BDI
   won't be initialized
 - remove bdi_init_fmt macro, it's not used very much

[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopidns: make pid->level and pid_ns->level unsigned
Pavel Emelyanov [Wed, 30 Apr 2008 07:54:31 +0000 (00:54 -0700)]
pidns: make pid->level and pid_ns->level unsigned

These values represent the nesting level of a namespace and pids living in it,
and it's always non-negative.

Turning this from int to unsigned int saves some space in pid.c (11 bytes on
x86 and 64 on ia64) by letting the compiler optimize the pid_nr_ns a bit.
E.g.  on ia64 this removes the sign extension calls, which compiler adds to
optimize access to pid->nubers[ns->level].

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomake marker_debug static
Adrian Bunk [Wed, 30 Apr 2008 07:54:30 +0000 (00:54 -0700)]
make marker_debug static

With the needlessly global marker_debug being static gcc can optimize the
unused code away.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomxser: convert large macros to functions
Christoph Hellwig [Wed, 30 Apr 2008 07:54:29 +0000 (00:54 -0700)]
mxser: convert large macros to functions

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopids: sys_getpgid: fix unsafe *pid usage, s/tasklist/rcu/
Oleg Nesterov [Wed, 30 Apr 2008 07:54:29 +0000 (00:54 -0700)]
pids: sys_getpgid: fix unsafe *pid usage, s/tasklist/rcu/

1. sys_getpgid() needs rcu_read_lock() to derive the pgrp _nr, even if
   the task is current, otherwise we can race with another thread which
   does sys_setpgid().

2. Use rcu_read_lock() instead of tasklist_lock when pid != 0, make sure
   that we don't use the NULL pid if the task exits right after successful
   find_task_by_vpid().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopids: sys_getsid: fix unsafe *pid usage, fix possible 0 instead of -ESRCH
Oleg Nesterov [Wed, 30 Apr 2008 07:54:28 +0000 (00:54 -0700)]
pids: sys_getsid: fix unsafe *pid usage, fix possible 0 instead of -ESRCH

1. sys_getsid() needs rcu_read_lock() to derive the session _nr, even if
   the task is current, otherwise we can race with another thread which
   does sys_setsid().

2. The task can exit between find_task_by_vpid() and task_session_vnr(),
   in that unlikely case sys_getsid() returns 0 instead of -ESRCH.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopids: __set_special_pids: use change_pid() helper
Oleg Nesterov [Wed, 30 Apr 2008 07:54:27 +0000 (00:54 -0700)]
pids: __set_special_pids: use change_pid() helper

Use change_pid() instead of detach_pid() + attach_pid() in
__set_special_pids().

This way task_session() is not NULL in between.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopids: sys_setpgid: use change_pid() helper
Oleg Nesterov [Wed, 30 Apr 2008 07:54:27 +0000 (00:54 -0700)]
pids: sys_setpgid: use change_pid() helper

Use change_pid() instead of detach_pid() + attach_pid() in sys_setpgid().

This way task_pgrp() is not NULL in between.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopids: introduce change_pid() helper
Oleg Nesterov [Wed, 30 Apr 2008 07:54:26 +0000 (00:54 -0700)]
pids: introduce change_pid() helper

Based on Eric W. Biederman's idea.

Without tasklist_lock held task_session()/task_pgrp() can return NULL if the
caller races with setprgp()/setsid() which does detach_pid() + attach_pid().
This can happen even if task == current.

Intoduce the new helper, change_pid(), which should be used instead.  This way
the caller always sees the special pid != NULL, either old or new.

Also change the prototype of attach_pid(), it always returns 0 and nobody
check the returned value.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopids: de_thread: don't clear session/pgrp pids for the old leader
Oleg Nesterov [Wed, 30 Apr 2008 07:54:25 +0000 (00:54 -0700)]
pids: de_thread: don't clear session/pgrp pids for the old leader

Based on Eric W. Biederman's idea.

Unless task == current, without tasklist_lock held task_session()/task_pgrp()
can return NULL if the caller races with de_thread() which switches the group
leader.

Change transfer_pid() to not clear old->pids[type].pid for the old leader.
This means that its .pid can point to "nowhere", but this is already true for
sub-threads, and the old leader is not group_leader() any longer.  IOW, with
or without this change we can't trust task's special pids unless it is the
group leader.

With this change the following code

rcu_read_lock();
task = find_task_by_xxx();
do_something(task_pgrp(task), task_session(task));
rcu_read_unlock();

can't race with exec and hit the NULL pid.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoDeprecate find_task_by_pid()
Pavel Emelyanov [Wed, 30 Apr 2008 07:54:24 +0000 (00:54 -0700)]
Deprecate find_task_by_pid()

There are some places that are known to operate on tasks'
global pids only:

* the rest_init() call (called on boot)
* the kgdb's getthread
* the create_kthread() (since the kthread is run in init ns)

So use the find_task_by_pid_ns(..., &init_pid_ns) there
and schedule the find_task_by_pid for removal.

[sukadev@us.ibm.com: Fix warning in kernel/pid.c]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoUse find_task_by_vpid in taskstats
Pavel Emelyanov [Wed, 30 Apr 2008 07:54:23 +0000 (00:54 -0700)]
Use find_task_by_vpid in taskstats

The pid to lookup a task by is passed inside taskstats code via genetlink
message.

Since netlink packets are now processed in the context of the sending task,
this is correct to lookup the task with find_task_by_vpid() here.

Besides, I fix the call to fill_pid() from taskstats_exit(), since the
tsk->pid is not required in fill_pid() in this case, and the pid field on
task_struct is going to be deprecated as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Jonathan Lim <jlim@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofree_pidmap: turn it into free_pidmap(struct upid *)
Oleg Nesterov [Wed, 30 Apr 2008 07:54:22 +0000 (00:54 -0700)]
free_pidmap: turn it into free_pidmap(struct upid *)

The callers of free_pidmap() pass 2 members of "struct upid", we can just
pass "struct upid *" instead.  Shaves off 10 bytes from pid.o.

Also, simplify the alloc_pid's "out_free:" error path a little bit.  This
way it looks more clear which subset of pid->numbers[] we are freeing.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc :Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodevpts: factor out PTY index allocation
Sukadev Bhattiprolu [Wed, 30 Apr 2008 07:54:21 +0000 (00:54 -0700)]
devpts: factor out PTY index allocation

Factor out the code used to allocate/free a pts index into new interfaces,
devpts_new_index() and devpts_kill_index().  This localizes the external data
structures used in managing the pts indices.

[akpm@linux-foundation.org: undo accidental mutex2sem conversion]
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodevpts: propagate error code from devpts_pty_new
Sukadev Bhattiprolu [Wed, 30 Apr 2008 07:54:20 +0000 (00:54 -0700)]
devpts: propagate error code from devpts_pty_new

Have ptmx_open() propagate any error code returned by devpts_pty_new()
(which returns either 0 or -ENOMEM anyway).

Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agotty: fix routine name in ptmx_open()
Hiroshi Shimamoto [Wed, 30 Apr 2008 07:54:20 +0000 (00:54 -0700)]
tty: fix routine name in ptmx_open()

At ptmx_open(), the 2nd parameter for check_tty_count() should
be "ptmx_open".

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agochar serial: switch drivers to ioremap_nocache
Alan Cox [Wed, 30 Apr 2008 07:54:19 +0000 (00:54 -0700)]
char serial: switch drivers to ioremap_nocache

Simple search/replace except for synclink.c where I noticed a real bug and
fixed it too.  It was doing NULL + offset, then checking for NULL if the remap
failed.

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoip2: switch remaining direct call of ops->flush_buffer
Alan Cox [Wed, 30 Apr 2008 07:54:18 +0000 (00:54 -0700)]
ip2: switch remaining direct call of ops->flush_buffer

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agotty: add throttle/unthrottle helpers
Alan Cox [Wed, 30 Apr 2008 07:54:18 +0000 (00:54 -0700)]
tty: add throttle/unthrottle helpers

Something Arjan suggested which allows us to clean up the code nicely

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>