10 years agoMerge git://
Linus Torvalds [Tue, 17 May 2011 01:38:08 +0000 (18:38 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://
  net: Change netdev_fix_features messages loglevel
  vmxnet3: Fix inconsistent LRO state after initialization
  sfc: Fix oops in register dump after mapping change
  IPVS: fix netns if reading ip_vs_* procfs entries
  bridge: fix forwarding of IPv6

10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Tue, 17 May 2011 01:36:47 +0000 (18:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/cjb/mmc

* 'for-linus' of git://
  Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"

10 years agomm: fix kernel-doc warning in page_alloc.c
Randy Dunlap [Mon, 16 May 2011 20:16:54 +0000 (13:16 -0700)]
mm: fix kernel-doc warning in page_alloc.c

Fix new kernel-doc warning in mm/page_alloc.c:

  Warning(mm/page_alloc.c:2370): No description found for parameter 'nid'

Signed-off-by: Randy Dunlap <>
Signed-off-by: Linus Torvalds <>
10 years agoPCI: Clear bridge resource flags if requested size is 0
Yinghai Lu [Sat, 14 May 2011 01:06:17 +0000 (18:06 -0700)]
PCI: Clear bridge resource flags if requested size is 0

During pci remove/rescan testing found:

  pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
  pci 0000:c0:03.0:   bridge window [io  0x1000-0x0fff]
  pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
  pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
  pci 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
  pci 0000:c0:03.0: Error enabling bridge (-22), continuing
  pci 0000:c0:03.0: enabling bus mastering
  pci 0000:c0:03.0: setting latency timer to 64
  pcieport 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
  pcieport: probe of 0000:c0:03.0 failed with error -22

This bug was caused by commit c8adf9a3e873 ("PCI: pre-allocate
additional resources to devices only after successful allocation of
essential resources.")

After that commit, pci_hotplug_io_size is changed to additional_io_size
from minium size.  So it will not go through resource_size(res) != 0
path, and will not be reset.

The root cause is: pci_bridge_check_ranges will set RESOURCE_IO flag for
pci bridge, and later if children do not need IO resource.  those bridge
resources will not need to be allocated.  but flags is still there.
that will confuse the the pci_enable_bridges later.

related code:

   static void assign_requested_resources_sorted(struct resource_list *head,
                                    struct resource_list_x *fail_head)
           struct resource *res;
           struct resource_list *list;
           int idx;

           for (list = head->next; list; list = list->next) {
                   res = list->res;
                   idx = res - &list->dev->resource[0];
                   if (resource_size(res) && pci_assign_resource(list->dev, idx)) {

At last, We have to clear the flags in pbus_size_mem/io when requested
size == 0 and !add_head.  becasue this case it will not go through

Just make size1 = size0 when !add_head. it will make flags get cleared.

At the same time when requested size == 0, add_size != 0, will still
have in head and add_list.  because we do not clear the flags for it.

After this, we will get right result:

  pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
  pci 0000:c0:03.0:   bridge window [io  disabled]
  pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
  pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
  pci 0000:c0:03.0: enabling bus mastering
  pci 0000:c0:03.0: setting latency timer to 64
  pcieport 0000:c0:03.0: setting latency timer to 64
  pcieport 0000:c0:03.0: irq 160 for MSI/MSI-X
  pcieport 0000:c0:03.0: Signaling PME through PCIe PME interrupt
  pci 0000:c4:00.0: Signaling PME through PCIe PME interrupt
  pcie_pme 0000:c0:03.0:pcie01: service driver pcie_pme loaded
  aer 0000:c0:03.0:pcie02: service driver aer loaded
  pciehp 0000:c0:03.0:pcie04: Hotplug Controller:

v3: more simple fix. also fix one typo in pbus_size_mem

Signed-off-by: Yinghai Lu <>
Reviewed-by: Ram Pai <>
Cc: Jesse Barnes <>
Cc: Bjorn Helgaas <>
Signed-off-by: Linus Torvalds <>
10 years agonet: Change netdev_fix_features messages loglevel
Michał Mirosław [Mon, 16 May 2011 19:14:21 +0000 (15:14 -0400)]
net: Change netdev_fix_features messages loglevel

Those reduced to DEBUG can possibly be triggered by unprivileged processes
and are nothing exceptional. Illegal checksum combinations can only be
caused by driver bug, so promote those messages to WARN.

Since GSO without SG will now only cause DEBUG message from
netdev_fix_features(), remove the workaround from register_netdevice().

Signed-off-by: Michał Mirosław <>
Signed-off-by: David S. Miller <>
10 years agovmxnet3: Fix inconsistent LRO state after initialization
Thomas Jarosch [Mon, 16 May 2011 06:28:15 +0000 (06:28 +0000)]
vmxnet3: Fix inconsistent LRO state after initialization

During initialization of vmxnet3, the state of LRO
gets out of sync with netdev->features.

This leads to very poor TCP performance in a IP forwarding
setup and is hitting many VMware users.

Simplified call sequence:
1. vmxnet3_declare_features() initializes "adapter->lro" to true.

2. The kernel automatically disables LRO if IP forwarding is enabled,
so vmxnet3_set_flags() gets called. This also updates netdev->features.

3. Now vmxnet3_setup_driver_shared() is called. "adapter->lro" is still
set to true and LRO gets enabled again, even though
netdev->features shows it's disabled.

Fix it by updating "adapter->lro", too.

The private vmxnet3 adapter flags are scheduled for removal
in net-next, see commit a0d2730c9571aeba793cb5d3009094ee1d8fda35
"net: vmxnet3: convert to hw_features".

Patch applies to 2.6.37 / 2.6.38 and 2.6.39-rc6.

Please CC: comments.

Signed-off-by: Thomas Jarosch <>
Acked-by: Stephen Hemminger <>
Signed-off-by: David S. Miller <>
10 years agosfc: Fix oops in register dump after mapping change
Ben Hutchings [Mon, 16 May 2011 06:13:49 +0000 (06:13 +0000)]
sfc: Fix oops in register dump after mapping change

Commit 747df2258b1b9a2e25929ef496262c339c380009 ('sfc: Always map MCDI
shared memory as uncacheable') introduced a separate mapping for the
MCDI shared memory (MC_TREG_SMEM).  This means we can no longer easily
include it in the register dump.  Since it is not particularly useful
in debugging, substitute a recognisable dummy value.

Signed-off-by: Ben Hutchings <>
Signed-off-by: David S. Miller <>
10 years agoMerge branch 'omap-fixes-for-linus' of git://
Linus Torvalds [Mon, 16 May 2011 15:55:49 +0000 (08:55 -0700)]
Merge branch 'omap-fixes-for-linus' of git://git./linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://
  OMAP3: set the core dpll clk rate in its set_rate function
  omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured

10 years agoMerge branch 'drm-fixes' of git://
Linus Torvalds [Mon, 16 May 2011 15:47:31 +0000 (08:47 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://
  drm: Take lock around probes for drm_fb_helper_hotplug_event
  drm/i915: Revert i915.semaphore=1 default from 47ae63e0
  vga_switcheroo: don't toggle-switch devices
  drm/radeon/kms: add some evergreen/ni safe regs
  drm/radeon/kms: fix extended lvds info parsing
  drm/radeon/kms: fix tiling reg on fusion

10 years agoRevert "mmc: fix a race between card-detect rescan and clock-gate work instances"
Chris Ball [Mon, 16 May 2011 15:32:26 +0000 (11:32 -0400)]
Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"

This reverts commit 26fc8775b51484d8c0a671198639c6d5ae60533e, which has
been reported to cause boot/resume-time crashes for some users:

Signed-off-by: Chris Ball <>
Cc: <>
10 years agodrm: Take lock around probes for drm_fb_helper_hotplug_event
Chris Wilson [Fri, 22 Apr 2011 10:03:57 +0000 (11:03 +0100)]
drm: Take lock around probes for drm_fb_helper_hotplug_event

We need to hold the dev->mode_config.mutex whilst detecting the output
status. But we also need to drop it for the call into
drm_fb_helper_single_fb_probe(), which indirectly acquires the lock when
attaching the fbcon.

Failure to do so exposes a race with normal output probing. Detected by
adding some warnings that the mutex is held to the backend detect routines:

[   17.772456] WARNING: at drivers/gpu/drm/i915/intel_crt.c:471 intel_crt_detect+0x3e/0x373 [i915]()
[   17.772458] Hardware name: Latitude E6400
[   17.772460] Modules linked in: ....
[   17.772582] Pid: 11, comm: kworker/0:1 Tainted: G        W #8
[   17.772584] Call Trace:
[   17.772591]  [<ffffffff81046af5>] ? warn_slowpath_common+0x78/0x8c
[   17.772603]  [<ffffffffa03f3e5c>] ? intel_crt_detect+0x3e/0x373 [i915]
[   17.772612]  [<ffffffffa0355d49>] ?  drm_helper_probe_single_connector_modes+0xbf/0x2af [drm_kms_helper]
[   17.772619]  [<ffffffffa03534d5>] ?  drm_fb_helper_probe_connector_modes+0x39/0x4d [drm_kms_helper]
[   17.772625]  [<ffffffffa0354760>] ?  drm_fb_helper_hotplug_event+0xa5/0xc3 [drm_kms_helper]
[   17.772633]  [<ffffffffa035577f>] ? output_poll_execute+0x146/0x17c [drm_kms_helper]
[   17.772638]  [<ffffffff81193c01>] ? cfq_init_queue+0x247/0x345
[   17.772644]  [<ffffffffa0355639>] ? output_poll_execute+0x0/0x17c [drm_kms_helper]
[   17.772648]  [<ffffffff8105b540>] ? process_one_work+0x193/0x28e
[   17.772652]  [<ffffffff8105c6bc>] ? worker_thread+0xef/0x172
[   17.772655]  [<ffffffff8105c5cd>] ? worker_thread+0x0/0x172
[   17.772658]  [<ffffffff8105c5cd>] ? worker_thread+0x0/0x172
[   17.772663]  [<ffffffff8105f767>] ? kthread+0x7a/0x82
[   17.772668]  [<ffffffff8100a724>] ? kernel_thread_helper+0x4/0x10
[   17.772671]  [<ffffffff8105f6ed>] ? kthread+0x0/0x82
[   17.772674]  [<ffffffff8100a720>] ? kernel_thread_helper+0x0/0x10

Reported-by: Frederik Himpe <>
Signed-off-by: Chris Wilson <>
Signed-off-by: Dave Airlie <>
10 years agodrm/i915: Revert i915.semaphore=1 default from 47ae63e0
Andy Lutomirski [Fri, 13 May 2011 16:14:54 +0000 (12:14 -0400)]
drm/i915: Revert i915.semaphore=1 default from 47ae63e0

My Q67 / i7-2600 box has rev09 Sandy Bridge graphics.  It hangs
instantly when GNOME loads and it hangs so hard the reset button
doesn't work.  Setting i915.semaphore=0 fixes it.

Semaphores were disabled in a1656b9090f7008d2941c314f5a64724bea2ae37
in 2.6.38 and were re-enabled by

commit 47ae63e0c2e5fdb582d471dc906eb29be94c732f
Merge: c59a333 467cffb
Author: Chris Wilson <>
Date:   Mon Mar 7 12:32:44 2011 +0000

    Merge branch 'drm-intel-fixes' into drm-intel-next

    Apply the trivial conflicting regression fixes, but keep GPU semaphores


(It's worth noting that the offending change is i915_drv.c,
 which is not a conflict.)

Signed-off-by: Andy Lutomirski <>
Acked-by: Keith Packard <>
Signed-off-by: Dave Airlie <>
10 years agovga_switcheroo: don't toggle-switch devices
Florian Mickler [Sun, 15 May 2011 14:32:50 +0000 (16:32 +0200)]
vga_switcheroo: don't toggle-switch devices

If the requested device is already active, ignore the request.

This restores the original behaviour of the interface. The change was
probably an unintended side effect of

commit 66b37c6777c4 vga_switcheroo: split switching into two stages

which did not take into account to duplicate the !active check in the split-off

Fix this by factoring that check out of stage1 into the debugfs_write routine.

Reported-by: Igor Murzov <>
Tested-by: Igor Murzov <>
Signed-off-by: Florian Mickler <>
Signed-off-by: Dave Airlie <>
10 years agoMerge branch 'pablo/nf-2.6-updates' of git://
David S. Miller [Sun, 15 May 2011 22:14:02 +0000 (18:14 -0400)]
Merge branch 'pablo/nf-2.6-updates' of git://

10 years agoMerge git://
Linus Torvalds [Sun, 15 May 2011 17:22:10 +0000 (10:22 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable

* git://
  Btrfs: fix FS_IOC_SETFLAGS ioctl
  Btrfs: fix FS_IOC_GETFLAGS ioctl
  fs: remove FS_COW_FL
  Btrfs: fix easily get into ENOSPC in mixed case
  Prevent oopsing in posix_acl_valid()

10 years agoIPVS: fix netns if reading ip_vs_* procfs entries
Hans Schillstrom [Sun, 15 May 2011 15:20:29 +0000 (17:20 +0200)]
IPVS: fix netns if reading ip_vs_* procfs entries

Without this patch every access to ip_vs in procfs will increase
the netns count i.e. an unbalanced get_net()/put_net().
(ipvsadm commands also use procfs.)
The result is you can't exit a netns if reading ip_vs_* procfs entries.

Signed-off-by: Hans Schillstrom <>
Signed-off-by: Pablo Neira Ayuso <>
10 years agobridge: fix forwarding of IPv6
Stephen Hemminger [Fri, 13 May 2011 20:03:24 +0000 (16:03 -0400)]
bridge: fix forwarding of IPv6

The commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
    bridge: Reset IPCB when entering IP stack on NF_FORWARD
broke forwarding of IPV6 packets in bridge because it would
call bp_parse_ip_options with an IPV6 packet.

Reported-by: Noah Meyerhans <>
Signed-off-by: Stephen Hemminger <>
Reviewed-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
Signed-off-by: Pablo Neira Ayuso <>
10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Sat, 14 May 2011 22:41:10 +0000 (15:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://
  rbd: fix split bio handling
  rbd: fix leak of ops struct

10 years agoBtrfs: fix FS_IOC_SETFLAGS ioctl
Li Zefan [Fri, 15 Apr 2011 03:03:17 +0000 (03:03 +0000)]
Btrfs: fix FS_IOC_SETFLAGS ioctl

Steps to reproduce the bug:

  - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL
  - Call FS_IOC_SETFLAGS ioctl with flags=0
  - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set!

Signed-off-by: Li Zefan <>
Signed-off-by: Chris Mason <>
10 years agoBtrfs: fix FS_IOC_GETFLAGS ioctl
Li Zefan [Fri, 15 Apr 2011 03:03:06 +0000 (03:03 +0000)]
Btrfs: fix FS_IOC_GETFLAGS ioctl

As we've added per file compression/cow support.

Signed-off-by: Li Zefan <>
Signed-off-by: Chris Mason <>
10 years agofs: remove FS_COW_FL
Li Zefan [Fri, 15 Apr 2011 03:02:49 +0000 (03:02 +0000)]
fs: remove FS_COW_FL

FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.

The fact is we don't have corresponding BTRFS_INODE_COW flag.

COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.

If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the

Signed-off-by: Li Zefan <>
Signed-off-by: Chris Mason <>
10 years agoBtrfs: fix easily get into ENOSPC in mixed case
liubo [Fri, 8 Apr 2011 08:44:37 +0000 (08:44 +0000)]
Btrfs: fix easily get into ENOSPC in mixed case

When a btrfs disk is created by mixed data & metadata option, it will have no
pure data or pure metadata space info.

In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920
(Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at
the very beginning.  The problem is this initialization does not take the mixed
case into account, which will cause btrfs will easily get into ENOSPC in mixed

Signed-off-by: Liu Bo <>
Signed-off-by: Chris Mason <>
10 years agoPrevent oopsing in posix_acl_valid()
Daniel J Blueman [Tue, 3 May 2011 16:44:13 +0000 (16:44 +0000)]
Prevent oopsing in posix_acl_valid()

If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for error code first.

Signed-off-by: Daniel J Blueman <>
Reviewed-by: Josef Bacik <>
Signed-off-by: Chris Mason <>
10 years agoMerge branch 'upstream-linus' of git://
Linus Torvalds [Sat, 14 May 2011 19:19:18 +0000 (12:19 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://
  libata: fix oops when LPM is used with PMP

10 years agotmpfs: fix race between swapoff and writepage
Hugh Dickins [Sat, 14 May 2011 19:06:42 +0000 (12:06 -0700)]
tmpfs: fix race between swapoff and writepage

Shame on me!  Commit b1dea800ac39 "tmpfs: fix race between umount and
writepage" fixed the advertized race, but introduced another: as even
its comment makes clear, we cannot safely rely on a peek at list_empty()
while holding no lock - until info->swapped is set, shmem_unuse_inode()
may delete any formerly-swapped inode from the shmem_swaplist, which
in this case would leave a swap area impossible to swapoff.

Although I don't relish taking the mutex every time, I don't care much
for the alternatives either; and at least the peek at list_empty() in
shmem_evict_inode() (a hotter path since most inodes would never have
been swapped) remains safe, because we already truncated the whole file.

Signed-off-by: Hugh Dickins <>
Signed-off-by: Linus Torvalds <>
10 years agolibata: fix oops when LPM is used with PMP
Tejun Heo [Mon, 9 May 2011 14:04:11 +0000 (16:04 +0200)]
libata: fix oops when LPM is used with PMP

ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65)
added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag.
However, @ap is NULL if @link points to a PMP link and thus the
unconditional @ap->flags dereference leads to the following oops.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
  IP: [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  Pid: 295, comm: scsi_eh_4 Tainted: P   #1 System76, Inc. Serval Professional/Serval Professional
  RIP: 0010:[<ffffffff813f98e1>]  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  RSP: 0018:ffff880132defbf0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000
  RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000
  RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98
  R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
  R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0)
   0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738
   ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190
   ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000
  Call Trace:
   [<ffffffff81400867>] sata_pmp_error_handler+0x607/0xc30
   [<ffffffffa00b273f>] ahci_error_handler+0x1f/0x70 [libahci]
   [<ffffffff813faade>] ata_scsi_error+0x5be/0x900
   [<ffffffff813cf724>] scsi_error_handler+0x124/0x650
   [<ffffffff810834b6>] kthread+0x96/0xa0
   [<ffffffff8100cd64>] kernel_thread_helper+0x4/0x10
  Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 <48> 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83
  RIP  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
   RSP <ffff880132defbf0>
  CR2: 0000000000000018

Fix it by testing @link->ap->flags instead.

stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was
        backported to 2.6.37 and 38.  This is a fix for that and thus
        also applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <>
Reported-by: "Nathan A. Mourey II" <>
LKML-Reference: <1304555277.2059.2.camel@localhost.localdomain>
Cc: Connor H <>
Signed-off-by: Jeff Garzik <>
10 years agoMerge branch 'fbmem'
Linus Torvalds [Sat, 14 May 2011 18:24:32 +0000 (11:24 -0700)]
Merge branch 'fbmem'

* fbmem:
  Further fbcon sanity checking
  fbmem: fix remove_conflicting_framebuffers races

10 years agoRevert "libata: ahci_start_engine compliant to AHCI spec"
Tejun Heo [Sat, 14 May 2011 10:28:04 +0000 (12:28 +0200)]
Revert "libata: ahci_start_engine compliant to AHCI spec"

This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4.

The commits causes command timeouts on AC plug/unplug.  It isn't yet
clear why.  As the commit was for a single rather obscure controller,
revert the change for now.

The problem was reported and bisected by Gu Rui in bug#34692.

Also, reported by Rafael and Michael in the following thread.

Signed-off-by: Tejun Heo <>
Reported-by: Gu Rui <>
Reported-by: Rafael J. Wysocki <>
Reported-by: Michael Leun <>
Cc: Jian Peng <>
Cc: Jeff Garzik <>
Signed-off-by: Linus Torvalds <>
10 years agoFurther fbcon sanity checking
Bruno Prémont [Sat, 14 May 2011 10:24:15 +0000 (12:24 +0200)]
Further fbcon sanity checking

This moves the

    if (num_registered_fb == FB_MAX)
            return -ENXIO;

check _AFTER_ the call to do_remove_conflicting_framebuffers() as this
would (now in a safe way) allow a native driver to replace the
conflicting one even if all slots in registered_fb[] are taken.

This also prevents unregistering a framebuffer that is no longer
registered (vga16f will unregister at module unload time even if the
frame buffer had been unregistered earlier due to being found

Signed-off-by: Bruno Prémont <>
Signed-off-by: Linus Torvalds <>
10 years agofbmem: fix remove_conflicting_framebuffers races
Linus Torvalds [Fri, 13 May 2011 23:16:41 +0000 (16:16 -0700)]
fbmem: fix remove_conflicting_framebuffers races

When a register_framebuffer() call results in us removing old
conflicting framebuffers, the new registration_lock doesn't protect that
situation.  And we can't just add the same locking to the function,
because these functions call each other: register_framebuffer() calls
remove_conflicting_framebuffers, which in turn calls
unregister_framebuffer for any conflicting entry.

In order to fix it, this just creates wrapper functions around all three
functions and makes the versions that actually do the work be called
"do_xxx()", leaving just the wrapper that gets the lock and calls the
worker function.

So the rule becomes simply that "do_xxxx()" has to be called with the
lock held, and now do_register_framebuffer() can just call
do_remove_conflicting_framebuffers(), and that in turn can call
_do_unregister_framebuffer(), and there is no deadlock, and we can hold
the registration lock over the whole sequence, fixing the races.

It also makes error cases simpler, and fixes one situation where we
would return from unregister_framebuffer() without releasing the lock,
pointed out by Bruno Prémont.

Tested-by: Bruno Prémont <>
Tested-by: Anca Emanuel <>
Signed-off-by: Linus Torvalds <>
10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Sat, 14 May 2011 00:29:03 +0000 (17:29 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha-2.6

* 'for-linus' of git://
  alpha: Wire up syscalls new to 2.6.39
  alpha: convert to clocksource_register_hz

10 years agoalpha: Wire up syscalls new to 2.6.39
Michael Cree [Wed, 4 May 2011 08:14:50 +0000 (08:14 +0000)]
alpha: Wire up syscalls new to 2.6.39

Wire up the syscalls:
and adjust some whitespace in the neighbourhood to align commments.

Signed-off-by: Michael Cree <>
Signed-off-by: Matt Turner <>
10 years agoalpha: convert to clocksource_register_hz
John Stultz [Wed, 16 Feb 2011 06:34:49 +0000 (22:34 -0800)]
alpha: convert to clocksource_register_hz

Converts alpha to use clocksource_register_hz.

Signed-off-by: John Stultz <>
CC: Richard Henderson <>
CC: Ivan Kokshaysky <>
CC: Thomas Gleixner <>
Signed-off-by: Matt Turner <>
10 years agoMerge git://
Linus Torvalds [Fri, 13 May 2011 22:20:51 +0000 (15:20 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://
  bridge: fix forwarding of IPv6
  bonding,llc: Fix structure sizeof incompatibility for some PDUs
  ipv6: restore correct ECN handling on TCP xmit
  ne-h8300: Fix regression caused during net_device_ops conversion
  hydra: Fix regression caused during net_device_ops conversion
  zorro8390: Fix regression caused during net_device_ops conversion
  sfc: Always map MCDI shared memory as uncacheable
  ehea: Fix memory hotplug oops
  libertas: fix cmdpendingq locking
  iwlegacy: fix IBSS mode crashes
  ath9k: Fix a warning due to a queued work during S3 state
  mac80211: don't start the dynamic ps timer if not associated

10 years agoMerge branch 'bugfixes' of git://
Linus Torvalds [Fri, 13 May 2011 22:19:39 +0000 (15:19 -0700)]
Merge branch 'bugfixes' of git://

* 'bugfixes' of git://
  NFSv4.1: Ensure that layoutget uses the correct gfp modes
  NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list

10 years agorbd: fix split bio handling
Yehuda Sadeh [Fri, 13 May 2011 20:52:56 +0000 (13:52 -0700)]
rbd: fix split bio handling

The rbd driver currently splits bios when they span an object boundary.
However, the blk_end_request expects the completions to roll up the results
in block device order, and the split rbd/ceph ops can complete in any
order.  This patch adds a struct rbd_req_coll to track completion of split
requests and ensures that the results are passed back up to the block layer
in order.

This fixes errors where the file system gets completion of a read operation
that spans an object boundary before the data has actually arrived.  The
bug is easily reproduced with iozone with a working set larger than
available RAM.

Reported-by: Fyodor Ustinov <>
Signed-off-by: Yehuda Sadeh <>
Signed-off-by: Sage Weil <>
10 years agobridge: fix forwarding of IPv6
Stephen Hemminger [Fri, 13 May 2011 20:03:24 +0000 (16:03 -0400)]
bridge: fix forwarding of IPv6

The commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
    bridge: Reset IPCB when entering IP stack on NF_FORWARD
broke forwarding of IPV6 packets in bridge because it would
call bp_parse_ip_options with an IPV6 packet.

Reported-by: Noah Meyerhans <>
Signed-off-by: Stephen Hemminger <>
Reviewed-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
10 years agodrm/i915: Revert i915.semaphore=1 default from i915 merge
Andy Lutomirski [Fri, 13 May 2011 16:14:54 +0000 (12:14 -0400)]
drm/i915: Revert i915.semaphore=1 default from i915 merge

My Q67 / i7-2600 box has rev09 Sandy Bridge graphics.  It hangs
instantly when GNOME loads and it hangs so hard the reset button
doesn't work.  Setting i915.semaphore=0 fixes it.

Semaphores were disabled in a1656b9090f7 ("drm/i915: Disable GPU
semaphores by default") in 2.6.38 but were then re-enabled (by mistake?)
by the merge 47ae63e0c2e5 ("Merge branch 'drm-intel-fixes' into

(It's worth noting that the offending change is i915_drv.c, which was
not marked as a conflict - although a 'git show --cc' on the merge does
show that neither parent had it set to 1)

Signed-off-by: Andy Lutomirski <>
Signed-off-by: Linus Torvalds <>
10 years agobonding,llc: Fix structure sizeof incompatibility for some PDUs
Vitalii Demianets [Thu, 12 May 2011 23:04:29 +0000 (23:04 +0000)]
bonding,llc: Fix structure sizeof incompatibility for some PDUs

With some combinations of arch/compiler (e.g. arm-linux-gcc) the sizeof
operator on structure returns value greater than expected. In cases when the
structure is used for mapping PDU fields it may lead to unexpected results
(such as holes and alignment problems in skb data). __packed prevents this
undesired behavior.

Signed-off-by: Vitalii Demianets <>
Signed-off-by: David S. Miller <>
10 years agovfs: micro-optimize acl_permission_check()
Linus Torvalds [Fri, 13 May 2011 18:51:01 +0000 (11:51 -0700)]
vfs: micro-optimize acl_permission_check()

It's a hot function, and we're better off not mixing types in the mask
calculations.  The compiler just ends up mixing 16-bit and 32-bit
operations, for no good reason.

So do everything in 'unsigned int' rather than mixing 'unsigned int'
masking with a 'umode_t' (16-bit) mode variable.

This, together with the parent commit (47a150edc2ae: "Cache user_ns in
struct cred") makes acl_permission_check() much nicer.

Signed-off-by: Linus Torvalds <>
10 years agoCache user_ns in struct cred
Serge E. Hallyn [Fri, 13 May 2011 03:27:54 +0000 (04:27 +0100)]
Cache user_ns in struct cred

If !CONFIG_USERNS, have current_user_ns() defined to (&init_user_ns).

Get rid of _current_user_ns.  This requires nsown_capable() to be
defined in capability.c rather than as static inline in capability.h,
so do that.

Request_key needs init_user_ns defined at current_user_ns if
!CONFIG_USERNS, so forward-declare that in cred.h if !CONFIG_USERNS
at current_user_ns() define.

Compile-tested with and without CONFIG_USERNS.

Signed-off-by: Serge E. Hallyn <>
[ This makes a huge performance difference for acl_permission_check(),
  up to 30%.  And that is one of the hottest kernel functions for loads
  that are pathname-lookup heavy.  ]
Signed-off-by: Linus Torvalds <>
10 years agoOMAP3: set the core dpll clk rate in its set_rate function
Avinash H.M [Mon, 9 May 2011 12:29:40 +0000 (12:29 +0000)]
OMAP3: set the core dpll clk rate in its set_rate function

The debug l3_ick/rate is not displaying the actual rate of the clock in
hardware. This is because, the core dpll set_rate function doesn't update the
clk.rate. After fixing, the l3_ick/rate is displaying proper values.

Signed-off-by: Shweta Gulati <>
Signed-off-by: Avinash.H.M <>
Cc: Rajendra Nayak <>
Cc: Paul Wamsley <>
Acked-by: Paul Walmsley <>
Signed-off-by: Tony Lindgren <>
10 years agodrm/radeon/kms: add some evergreen/ni safe regs
Alex Deucher [Fri, 13 May 2011 01:15:15 +0000 (21:15 -0400)]
drm/radeon/kms: add some evergreen/ni safe regs

need to programmed from the userspace drivers.

Signed-off-by: Alex Deucher <>
Signed-off-by: Dave Airlie <>
10 years agodrm/radeon/kms: fix extended lvds info parsing
Alex Deucher [Wed, 11 May 2011 18:02:07 +0000 (14:02 -0400)]
drm/radeon/kms: fix extended lvds info parsing

On rev <= 1.1 tables, the offset is absolute,
on newer tables, it's relative.


Signed-off-by: Alex Deucher <>
Reviewed-by: Jerome Glisse <>
Signed-off-by: Dave Airlie <>
10 years agodrm/radeon/kms: fix tiling reg on fusion
Alex Deucher [Wed, 11 May 2011 07:15:24 +0000 (03:15 -0400)]
drm/radeon/kms: fix tiling reg on fusion

The location of MC_ARB_RAMCFG changed on fusion.
I've diffed all the other regs in evergreend.h and this
is the only other reg that changed.

Signed-off-by: Alex Deucher <>
Signed-off-by: Dave Airlie <>
10 years agorbd: fix leak of ops struct
Sage Weil [Thu, 12 May 2011 23:13:54 +0000 (16:13 -0700)]
rbd: fix leak of ops struct

The ops vector must be freed by the rbd_do_request caller.

Signed-off-by: Sage Weil <>
10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Fri, 13 May 2011 01:16:13 +0000 (18:16 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://
  SELinux: delete debugging printks from filename_trans rule processing

10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Fri, 13 May 2011 01:00:09 +0000 (18:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs

* 'for-linus' of git://
  net/9p/protocol.c: Fix a memory leak

10 years agoMerge branch 'for-2639-rc7/i2c-fixes' of git://
Linus Torvalds [Thu, 12 May 2011 23:55:33 +0000 (16:55 -0700)]
Merge branch 'for-2639-rc7/i2c-fixes' of git://

* 'for-2639-rc7/i2c-fixes' of git://
  i2c: pnx: Fix crash due to wrong init of timer->data

10 years agoMerge branch 'for-linus' of git:// into for...
James Morris [Thu, 12 May 2011 23:52:16 +0000 (09:52 +1000)]
Merge branch 'for-linus' of git:// into for-linus

10 years agoi2c: pnx: Fix crash due to wrong init of timer->data
Wolfram Sang [Fri, 29 Apr 2011 13:30:02 +0000 (15:30 +0200)]
i2c: pnx: Fix crash due to wrong init of timer->data

alg_data is already a pointer which must be passed directly.

Reported-by: Dieter Ripp <>
Signed-off-by: Wolfram Sang <>
Cc: Russell King <>
Cc: Ben Dooks <>
Signed-off-by: Ben Dooks <>
10 years agoipv6: restore correct ECN handling on TCP xmit
Steinar H. Gunderson [Fri, 6 May 2011 23:44:46 +0000 (23:44 +0000)]
ipv6: restore correct ECN handling on TCP xmit

Since commit e9df2e8fd8fbc9 (Use appropriate sock tclass setting for
routing lookup) we lost ability to properly add ECN codemarks to ipv6
TCP frames.

It seems like TCP_ECN_send() calls INET_ECN_xmit(), which only sets the
ECN bit in the IPv4 ToS field (inet_sk(sk)->tos), but after the patch,
what's checked is inet6_sk(sk)->tclass, which is a completely different

Close bug

[Eric Dumazet] : added the INET_ECN_dontxmit() fix and replace macros
by inline functions for clarity.

Signed-off-by: Steinar H. Gunderson <>
Signed-off-by: Eric Dumazet <>
Cc: YOSHIFUJI Hideaki <>
Cc: Andrew Morton <>
Signed-off-by: David S. Miller <>
10 years agovsprintf: Turn kptr_restrict off by default
Ingo Molnar [Thu, 12 May 2011 21:00:28 +0000 (23:00 +0200)]
vsprintf: Turn kptr_restrict off by default

kptr_restrict has been triggering bugs in apps such as perf, and it also makes
the system less useful by default, so turn it off by default.

This is how we generally handle security features that remove functionality,
such as firewall code or SELinux - they have to be configured and activated
from user-space.

Distributions can turn kptr_restrict on again via this line in

kernel.kptr_restrict = 1

( Also mark the variable __read_mostly while at it, as it's typically modified
  only once per bootup, or not at all. )

Signed-off-by: Ingo Molnar <>
Acked-by: David S. Miller <>
Signed-off-by: Linus Torvalds <>
10 years agonet/9p/protocol.c: Fix a memory leak
Pedro Scarapicchia Junior [Mon, 9 May 2011 14:10:49 +0000 (14:10 +0000)]
net/9p/protocol.c: Fix a memory leak

When p9pdu_readf() is called with "s" attribute, it allocates a pointer that
will store a string. In p9dirent_read(), this pointer is not being released,
leading to out of memory errors.
This patch releases this pointer after string is copyed to dirent->d_name.

Signed-off-by: Pedro Scarapicchia Junior <>
Signed-off-by: Eric Van Hensbergen <>
10 years agone-h8300: Fix regression caused during net_device_ops conversion
Geert Uytterhoeven [Thu, 12 May 2011 09:11:40 +0000 (09:11 +0000)]
ne-h8300: Fix regression caused during net_device_ops conversion

Changeset dcd39c90290297f6e6ed8a04bb20da7ac2b043c5 ("ne-h8300: convert to
net_device_ops") broke ne-h8300 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in ne-h8300.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.

Fix based on commits 217cbfa856dc1cbc2890781626c4032d9e3ec59f ("mac8390:
fix regression caused during net_device_ops conversion") and
4e0168fa4842e27795a75b205a510f25b62181d9 ("mac8390: fix build with

Signed-off-by: Geert Uytterhoeven <>
Signed-off-by: David S. Miller <>
10 years agohydra: Fix regression caused during net_device_ops conversion
Geert Uytterhoeven [Thu, 12 May 2011 09:11:39 +0000 (09:11 +0000)]
hydra: Fix regression caused during net_device_ops conversion

Changeset 5618f0d1193d6b051da9b59b0e32ad24397f06a4 ("hydra: convert to
net_device_ops") broke hydra by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in hydra.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.

Fix based on commits 217cbfa856dc1cbc2890781626c4032d9e3ec59f ("mac8390:
fix regression caused during net_device_ops conversion") and
4e0168fa4842e27795a75b205a510f25b62181d9 ("mac8390: fix build with

Signed-off-by: Geert Uytterhoeven <>
Signed-off-by: David S. Miller <>
10 years agozorro8390: Fix regression caused during net_device_ops conversion
Geert Uytterhoeven [Thu, 12 May 2011 09:11:38 +0000 (09:11 +0000)]
zorro8390: Fix regression caused during net_device_ops conversion

Changeset b6114794a1c394534659f4a17420e48cf23aa922 ("zorro8390: convert to
net_device_ops") broke zorro8390 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in zorro8390.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.

Fix based on commits 217cbfa856dc1cbc2890781626c4032d9e3ec59f ("mac8390:
fix regression caused during net_device_ops conversion") and
4e0168fa4842e27795a75b205a510f25b62181d9 ("mac8390: fix build with

Reported-by: Christian T. Steigies <>
Suggested-by: Finn Thain <>
Signed-off-by: Geert Uytterhoeven <>
Tested-by: Christian T. Steigies <>
Signed-off-by: David S. Miller <>
10 years agoMerge branch 'sfc-2.6.39' of git://
David S. Miller [Thu, 12 May 2011 20:26:45 +0000 (16:26 -0400)]
Merge branch 'sfc-2.6.39' of git://git./linux/kernel/git/bwh/sfc-2.6

10 years agoSELinux: delete debugging printks from filename_trans rule processing
Eric Paris [Thu, 7 Apr 2011 18:46:59 +0000 (14:46 -0400)]
SELinux: delete debugging printks from filename_trans rule processing

The filename_trans rule processing has some printk(KERN_ERR ) messages
which were intended as debug aids in creating the code but weren't removed
before it was submitted.  Remove them.

Reported-by: Paul Bolle <>
Signed-off-by: Eric Paris <>
10 years agoMerge branch 'fix/asoc' of git://
Linus Torvalds [Thu, 12 May 2011 19:41:30 +0000 (12:41 -0700)]
Merge branch 'fix/asoc' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'fix/asoc' of git://
  ASoC: WM8903: Fix Digital Capture Volume range
  ASoC: UDA134x: Remove POWER_OFF_ON_STANDBY define.
  ASoC: SSM2602: Fix reg_cache_size
  ASoC: SSM2602: Fix 'Mic Boost2' control
  ASoC: SSM2602: Properly annotate i2c probe and remove functions
  ASoC: sst_platform: add hw_free callback to fix resource leak
  ASoC: Don't crash on PM operations
  ASoC: JZ4740: Fix i2s shutdown

10 years agoMerge branch 'stable/bug-fixes-for-rc7' of git://
Linus Torvalds [Thu, 12 May 2011 19:21:51 +0000 (12:21 -0700)]
Merge branch 'stable/bug-fixes-for-rc7' of git://git./linux/kernel/git/konrad/xen

* 'stable/bug-fixes-for-rc7' of git://
  x86/mm: Fix section mismatch derived from native_pagetable_reserve()
  x86,xen: introduce x86_init.mapping.pagetable_reserve
  Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""

10 years agoRevert "drm/i915: Only enable the plane after setting the fb base (pre-ILK)"
Linus Torvalds [Thu, 12 May 2011 19:19:43 +0000 (12:19 -0700)]
Revert "drm/i915: Only enable the plane after setting the fb base (pre-ILK)"

This reverts commit 49183b2818de6899383bb82bc032f9344d6791ff.

Quoth Franz Melchior:

  "This patch introduces a bug on my infamous "Acer Travelmate
   5735Z-452G32Mnss": when KMS takes over, the frame buffer contents get
   completely garbled up on screen, with colored stripes and unreadable
   text (photo on request).  Only when X11 is started, the screen gets
   restored again.  Closing and re-opening the lid partly cures the
   mess, too: it makes the font readable, though horizontally stretched."

Acked-by: Keith Packard <>
Cc: Chris Wilson <>
Cc: Daniel Vetter <>
Cc: Jesse Barnes <>
Signed-off-by: Linus Torvalds <>
10 years agoMerge branch 'fbmem'
Linus Torvalds [Thu, 12 May 2011 17:42:36 +0000 (10:42 -0700)]
Merge branch 'fbmem'

* fbmem:
  fbmem: make read/write/ioctl use the frame buffer at open time
  fbcon: add lifetime refcount to opened frame buffers

10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 12 May 2011 17:41:31 +0000 (10:41 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://
  Input: ads7846 - remove unused variable from struct ads7845_ser_req
  Input: ads7846 - make transfer buffers DMA safe

10 years agox86/mm: Fix section mismatch derived from native_pagetable_reserve()
Sedat Dilek [Sun, 17 Apr 2011 14:17:34 +0000 (16:17 +0200)]
x86/mm: Fix section mismatch derived from native_pagetable_reserve()

With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415:

  LD      vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range()
The function native_pagetable_reserve() references
the function __init memblock_x86_reserve_range().
This is often because native_pagetable_reserve lacks a __init
annotation or the annotation of memblock_x86_reserve_range is wrong.

This patch fixes the issue.
Thanks to pipacs from PaX project for help on IRC.

Acked-by: "H. Peter Anvin" <>
Signed-off-by: Sedat Dilek <>
Signed-off-by: Konrad Rzeszutek Wilk <>
10 years agox86,xen: introduce x86_init.mapping.pagetable_reserve
Stefano Stabellini [Thu, 14 Apr 2011 14:49:41 +0000 (15:49 +0100)]
x86,xen: introduce x86_init.mapping.pagetable_reserve

Introduce a new x86_init hook called pagetable_reserve that at the end
of init_memory_mapping is used to reserve a range of memory addresses for
the kernel pagetable pages we used and free the other ones.

On native it just calls memblock_x86_reserve_range while on xen it also
takes care of setting the spare memory previously allocated
for kernel pagetable pages from RO to RW, so that it can be used for
other purposes.

A detailed explanation of the reason why this hook is needed follows.

As a consequence of the commit:

commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
Author: Yinghai Lu <>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason we need a hook to reserve the kernel pagetable pages we
used and free the other ones so that they can be reused for other
On native it just means calling memblock_x86_reserve_range, on Xen it
also means marking RW the pagetable pages that we allocated before but
that haven't been used before.

Another way to fix this is without using the hook is by adding a 'if
(xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
counterpart, but that is just nasty.

Signed-off-by: Stefano Stabellini <>
Acked-by: Yinghai Lu <>
Acked-by: H. Peter Anvin <>
Cc: Ingo Molnar <>
Signed-off-by: Konrad Rzeszutek Wilk <>
10 years agoRevert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:50:43 +0000 (13:50 -0400)]
Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""

This reverts commit a38647837a411f7df79623128421eef2118b5884.

It does not work with certain AMD machines.

last_pfn = 0x100000 max_arch_pfn = 0x400000000
initial memory mapped : 0 - 02c3a000
Base memory trampoline at [ffff88000009b000] 9b000 size 20480
init_memory_mapping: 0000000000000000-0000000100000000
 0000000000 - 0100000000 page 4k
kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
init_memory_mapping: 0000000100000000-00000001e0800000
 0100000000 - 01e0800000 page 4k
kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
xen: setting RW the range fffdc000 - 100000000
RAMDISK: 0203b000 - 02c3a000
No NUMA configuration found
Faking a node at 0000000000000000-00000001e0800000
NUMA: Using 63 for the hash shift.
Initmem setup node 0 0000000000000000-00000001e0800000
  NODE_DATA [00000001dfffb000 - 00000001dfffffff]
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
Oops: 0003 [#1] SMP
last sysfs file:
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
 0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
 ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
 ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
Call Trace:
 [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6f67>] numa_init+0x6c/0x70
 [<ffffffff81cf7057>] initmem_init+0x39/0x3b
 [<ffffffff81ce5865>] setup_arch+0x64e/0x769
 [<ffffffff815e43c1>] ? printk+0x51/0x53
 [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
 [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
 [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
 RSP <ffffffff81c01e38>
CR2: 0000000000000000
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1

Reported-by: Stefan Bader <>
Signed-off-by: Konrad Rzeszutek Wilk <>
10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 12 May 2011 15:06:53 +0000 (08:06 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

* 'for-linus' of git://
  fuse: fix oops in revalidate when called with NULL nameidata

10 years agoMerge git://
Linus Torvalds [Thu, 12 May 2011 14:53:34 +0000 (07:53 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://
  sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic
  sparc32: fix sparcstation 5 boot
  sparc32: fix section mismatch warnings in apc, pmc and time_32

10 years agoMerge branch 'fixes' of
Linus Torvalds [Thu, 12 May 2011 14:53:06 +0000 (07:53 -0700)]
Merge branch 'fixes' of /home/rmk/linux-2.6-arm

* 'fixes' of
  ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses
  ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls
  ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM
  ARM: zImage: the page table memory must be considered before relocation
  ARM: zImage: make sure not to relocate on top of the relocation code
  ARM: zImage: Fix bad SP address after relocating kernel
  ARM: zImage: make sure the stack is 64-bit aligned
  ARM: RiscPC: acornfb: fix section mismatches
  ARM: RiscPC: etherh: fix section mismatches

10 years agofbmem: make read/write/ioctl use the frame buffer at open time
Linus Torvalds [Wed, 11 May 2011 21:58:34 +0000 (14:58 -0700)]
fbmem: make read/write/ioctl use the frame buffer at open time

read/write/ioctl on a fbcon file descriptor has traditionally used the
fbcon not when it was opened, but as it was at the time of the call.
That makes no sense, but the lack of sense is much more obvious now that
we properly ref-count the usage - it means that the ref-counting doesn't
actually protect operations we do on the frame buffer.

This changes it to look at the fb_info that we got at open time, but in
order to avoid using a frame buffer long after it has been unregistered,
we do verify that it is still current, and return -ENODEV if not.

Acked-by: Tim Gardner <>
Tested-by: Daniel J Blueman <>
Tested-by: Anca Emanuel <>
Cc: Bruno Prémont <>
Cc: Alan Cox <>
Cc: Paul Mundt <>
Cc: Dave Airlie <>
Cc: Andy Whitcroft <>
Signed-off-by: Linus Torvalds <>
10 years agofbcon: add lifetime refcount to opened frame buffers
Linus Torvalds [Wed, 11 May 2011 21:49:36 +0000 (14:49 -0700)]
fbcon: add lifetime refcount to opened frame buffers

This just adds the refcount and the new registration lock logic.  It
does not (for example) actually change the read/write/ioctl routines to
actually use the frame buffer that was opened: those function still end
up alway susing whatever the current frame buffer is at the time of the

Without this, if something holds the frame buffer open over a
framebuffer switch, the close() operation after the switch will access a
fb_info that has been free'd by the unregistering of the old frame

(The read/write/ioctl operations will normally not cause problems,
because they will - illogically - pick up the new fbcon instead.  But a
switch that happens just as one of those is going on might see problems
too, the window is just much smaller: one individual op rather than the
whole open-close sequence.)

This use-after-free is apparently fairly easily triggered by the Ubuntu
11.04 boot sequence.

Acked-by: Tim Gardner <>
Tested-by: Daniel J Blueman <>
Tested-by: Anca Emanuel <>
Cc: Bruno Prémont <>
Cc: Alan Cox <>
Cc: Paul Mundt <>
Cc: Dave Airlie <>
Cc: Andy Whitcroft <>
Signed-off-by: Linus Torvalds <>
10 years agosfc: Always map MCDI shared memory as uncacheable
Ben Hutchings [Wed, 11 May 2011 16:41:18 +0000 (17:41 +0100)]
sfc: Always map MCDI shared memory as uncacheable

We enabled write-combining for memory-mapped registers in commit
65f0b417dee94f779ce9b77102b7d73c93723b39, but inhibited it for the
MCDI shared memory where this is not supported.  However,
write-combining mappings also allow read-reordering, which may also
be a problem.

I found that when an SFC9000-family controller is connected to an
Intel 3000 chipset, and write-combining is enabled, the controller
stops responding to PCIe read requests during driver initialisation
while the driver is polling for completion of an MCDI command.  This
results in an NMI and system hang.  Adding read memory barriers
between all reads to the shared memory area appears to reduce but not
eliminate the probability of this.

We have not yet established whether this is a bug in our BIU or in the
PCIe bridge.  For now, work around by mapping the shared memory area

Signed-off-by: Ben Hutchings <>
10 years agoARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses
Catalin Marinas [Wed, 6 Apr 2011 15:18:47 +0000 (16:18 +0100)]
ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses

Since mandatory barriers may be used (explicitly or implicitly via readl
etc.) to ensure the ordering between Device and Normal memory accesses,
a DMB is not enough. This patch converts it to a DSB.

Cc: Colin Cross <>
Signed-off-by: Catalin Marinas <>
Signed-off-by: Russell King <>
10 years agoARM: 6892/1: handle ptrace requests to change PC during interrupted system calls
Arnd Bergmann [Tue, 3 May 2011 17:32:55 +0000 (18:32 +0100)]
ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls

GDB's interrupt.exp test cases currenly fail on ARM.  The problem is how do_signal
handled restarting interrupted system calls:

The entry.S assembler code determines that we come from a system call; and that
information is passed as "syscall" parameter to do_signal.  That routine then
calls get_signal_to_deliver [*] and if a signal is to be delivered, calls into
handle_signal.  If a system call is to be restarted either after the signal
handler returns, or if no handler is to be called in the first place, the PC
is updated after the get_signal_to_deliver call, either in handle_signal (if
we have a handler) or at the end of do_signal (otherwise).

Now the problem is that during [*], the call to get_signal_to_deliver, a ptrace
intercept may happen.  During this intercept, the debugger may change registers,
including the PC.  This is done by GDB if it wants to execute an "inferior call",
i.e. the execution of some code in the debugged program triggered by GDB.

To this purpose, GDB will save all registers, allocate a stack frame, set up
PC and arguments as appropriate for the call, and point the link register to
a dummy breakpoint instruction.  Once the process is restarted, it will execute
the call and then trap back to the debugger, at which point GDB will restore
all registers and continue original execution.

This generally works fine.  However, now consider what happens when GDB attempts
to do exactly that while the process was interrupted during execution of a to-be-
restarted system call:  do_signal is called with the syscall flag set; it calls
get_signal_to_deliver, at which point the debugger takes over and changes the PC
to point to a completely different place.  Now get_signal_to_deliver returns
without a signal to deliver; but now do_signal decides it should be restarting
a system call, and decrements the PC by 2 or 4 -- so it now points to 2 or 4
bytes before the function GDB wants to call -- which leads to a subsequent crash.

To fix this problem, two things need to be supported:
- do_signal must be able to recognize that get_signal_to_deliver changed the PC
  to a different location, and skip the restart-syscall sequence
- once the debugger has restored all registers at the end of the inferior call
  sequence, do_signal must recognize that *now* it needs to restart the pending
  system call, even though it was now entered from a breakpoint instead of an
  actual svc instruction

This set of issues is solved on other platforms, usually by one of two

- The status information "do_signal is handling a system call that may need
  restarting" is itself carried in some register that can be accessed via
  ptrace.  This is e.g. on Intel the "orig_eax" register; on Sparc the kernel
  defines a magic extra bit in the flags register for this purpose.
  This allows GDB to manage that state: reset it when doing an inferior call,
  and restore it after the call is finished.

- On s390, do_signal transparently handles this problem without requiring
  GDB interaction, by performing system call restarting in the following
  way: first, adjust the PC as necessary for restarting the call.  Then,
  call get_signal_to_deliver; and finally just continue execution at the
  PC.  This way, if GDB does not change the PC, everything is as before.
  If GDB *does* change the PC, execution will simply continue there --
  and once GDB restores the PC it saved at that point, it will automatically
  point to the *restarted* system call.  (There is the minor twist how to
  handle system calls that do *not* need restarting -- do_signal will undo
  the PC change in this case, after get_signal_to_deliver has returned, and
  only if ptrace did not change the PC during that call.)

Because there does not appear to be any obvious register to carry the
syscall-restart information on ARM, we'd either have to introduce a new
artificial ptrace register just for that purpose, or else handle the issue
transparently like on s390.  The patch below implements the second option;
using this patch makes the interrupt.exp test cases pass on ARM, with no
regression in the GDB test suite otherwise.

Signed-off-by: Ulrich Weigand <>
Signed-off-by: Arnd Bergmann <>
Signed-off-by: Russell King <>
10 years agoARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM
Will Deacon [Thu, 28 Apr 2011 17:44:31 +0000 (18:44 +0100)]
ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM

The SPARSEMEM code allocates memmap entries only for sections which are
present (i.e. those which contain some valid memory). The membank checks
in free_unused_memmap do not take this into account and can incorrectly
attempt to free memory which is not allocated, resulting in a BUG() in
the bootmem code.

However, if memory is configured as follows:

    | bank 0 | unused |              | bank 1 | unused |

where a bank only occupies part of a section, the memmap allocated for
the remainder of the section *can* be freed.

This patch modifies the checks in free_unused_memmap so that only valid
memmap entries are considered for removal.

Acked-by: Catalin Marinas <>
Signed-off-by: Will Deacon <>
Signed-off-by: Russell King <>
10 years agosparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic
Tkhai Kirill [Tue, 10 May 2011 02:31:41 +0000 (02:31 +0000)]
sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic

When we are in the label cc_dword_align, registers %o0 and %o1 have the same last 2 bits,
but it's not guaranteed one of them is zero. So we can get unaligned memory access
in label ccte. Example of parameters which lead to this:
%o0=0x7ff183e9, %o1=0x8e709e7d, %g1=3

With the parameters I had a memory corruption, when the additional 5 bytes were rewritten.
This patch corrects the error.

One comment to the patch. We don't care about the third bit in %o1, because cc_end_cruft
stores word or less.

Signed-off-by: Tkhai Kirill <>
Signed-off-by: David S. Miller <>
10 years agoehea: Fix memory hotplug oops
Anton Blanchard [Tue, 10 May 2011 16:17:10 +0000 (16:17 +0000)]
ehea: Fix memory hotplug oops

The ehea driver oopses during memory hotplug if the ports are not
up. A simple testcase:

# ifconfig ethX down
# echo offline > /sys/devices/system/memory/memory32/state

Oops: Kernel access of bad area, sig: 11 [#1]
last sysfs file: /sys/devices/system/memory/memory32/state
REGS: c000000709393110 TRAP: 0300   Not tainted  (2.6.39-rc2-01385-g7ef73bc-dirty)
DAR: 0000000000000000, DSISR: 40000000
NIP [c000000000067c98] .__wake_up_common+0x48/0xf0
LR [c00000000006d034] .__wake_up+0x54/0x90
Call Trace:
[c00000000006d034] .__wake_up+0x54/0x90
[d000000006bb6270] .ehea_rereg_mrs+0x140/0x730 [ehea]
[d000000006bb69c4] .ehea_mem_notifier+0x164/0x170 [ehea]
[c0000000006fc8a8] .notifier_call_chain+0x78/0xf0
[c0000000000b3d70] .__blocking_notifier_call_chain+0x70/0xb0
[c000000000458d78] .memory_notify+0x28/0x40
[c0000000001871d8] .remove_memory+0x208/0x6d0
[c000000000458264] .memory_section_action+0x94/0x140
[c0000000004583ec] .memory_block_change_state+0xdc/0x1d0
[c0000000004585cc] .store_mem_state+0xec/0x160
[c00000000044768c] .sysdev_store+0x3c/0x50
[c00000000020b48c] .sysfs_write_file+0xec/0x1f0
[c00000000018f86c] .vfs_write+0xec/0x1e0
[c00000000018fa88] .SyS_write+0x58/0xd0

To fix this, initialise the waitqueues during port probe instead
of port open.

Signed-off-by: Anton Blanchard <>
Acked-by: Breno Leitao <>
Signed-off-by: David S. Miller <>
10 years agoNFSv4.1: Ensure that layoutget uses the correct gfp modes
Trond Myklebust [Wed, 11 May 2011 22:00:51 +0000 (18:00 -0400)]
NFSv4.1: Ensure that layoutget uses the correct gfp modes

Currently, writebacks may end up recursing back into the filesystem due to
GFP_KERNEL direct reclaims in the pnfs subsystem.

Signed-off-by: Trond Myklebust <>
10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 12 May 2011 02:13:34 +0000 (19:13 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://
  ceph: do not use i_wrbuffer_ref as refcount for Fb cap
  ceph: fix list_add in ceph_put_snap_realm
  ceph: print debug message before put mds session

10 years agoMerge branch 'drm-fixes' of git://
Linus Torvalds [Thu, 12 May 2011 02:13:16 +0000 (19:13 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://
  drm/radeon/nouveau: fix build regression on alpha due to Xen changes.
  drm/radeon/kms: fix cayman acceleration
  drm/radeon: fix cayman struct accessors.

10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 12 May 2011 02:00:15 +0000 (19:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sameo/mfd-2.6

* 'for-linus' of git://
  mfd: Fix for the TWL4030 PM sleep/wakeup sequence
  mfd: Fix asic3 build error
  mfd: Fixed gpio polarity of omap-usb gpio USB-phy reset

10 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 12 May 2011 01:59:45 +0000 (18:59 -0700)]
Merge branch 'for-linus' of git://

* 'for-linus' of git://
  [S390] fix alloc_pgste check in init_new_context
  [S390] oprofile: fix min/max interval query checks
  [S390] replace diag10() with diag10_range() function
  [S390] disassembler: handle b280/spp instruction
  [S390] kernel: Initialize register 14 when starting new CPU
  [S390] dasd: prevent IO error during reserve/release loop
  [S390] sclp/memory hotplug: fix initial usecount of increments

10 years agoRevert "Bluetooth: fix shutdown on SCO sockets"
Linus Torvalds [Thu, 12 May 2011 01:58:16 +0000 (18:58 -0700)]
Revert "Bluetooth: fix shutdown on SCO sockets"

This reverts commit f21ca5fff6e548833fa5ee8867239a8378623150.

Quoth Gustavo F. Padovan:
  "Commit f21ca5fff6e548833fa5ee8867239a8378623150 can cause a NULL
   dereference if we call shutdown in a bluetooth SCO socket and doesn't
   wait the shutdown completion to call close().  Please revert it.  I
   may have a fix for it soon, but we don't have time anymore, so revert
   is the way to go.  ;)"

Requested-by: Gustavo F. Padovan <>
Signed-off-by: Linus Torvalds <>
10 years agoMerge branch 'pm-fixes' of git://
Linus Torvalds [Thu, 12 May 2011 01:57:05 +0000 (18:57 -0700)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://
  PM / Hibernate: Fix ioctl SNAPSHOT_S2RAM
  PM / Hibernate: Make snapshot_release() restore GFP mask
  PM: Fix warning in pm_restrict_gfp_mask() during SNAPSHOT_S2RAM ioctl

10 years agomm: tracing: add missing GFP flags to tracing
Mel Gorman [Wed, 11 May 2011 22:13:39 +0000 (15:13 -0700)]
mm: tracing: add missing GFP flags to tracing

include/linux/gfp.h and include/trace/events/gfpflags.h are out of sync.
When tracing is enabled, certain flags are not recognised and the text
output is less useful as a result.  Add the missing flags.

Signed-off-by: Mel Gorman <>
Cc: Andrea Arcangeli <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agotmpfs: fix spurious ENOSPC when racing with unswap
Hugh Dickins [Wed, 11 May 2011 22:13:38 +0000 (15:13 -0700)]
tmpfs: fix spurious ENOSPC when racing with unswap

Testing the shmem_swaplist replacements for igrab() revealed another bug:
writes to /dev/loop0 on a tmpfs file which fills its filesystem were
sometimes failing with "Buffer I/O error"s.

These came from ENOSPC failures of shmem_getpage(), when racing with
swapoff: the same could happen when racing with another shmem_getpage(),
pulling the page in from swap in between our find_lock_page() and our
taking the info->lock (though not in the single-threaded loop case).

This is unacceptable, and surprising that I've not noticed it before:
it dates back many years, but (presumably) was made a lot easier to
reproduce in 2.6.36, which sited a page preallocation in the race window.

Fix it by rechecking the page cache before settling on an ENOSPC error.

Signed-off-by: Hugh Dickins <>
Cc: Konstantin Khlebnikov <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agotmpfs: fix race between umount and swapoff
Hugh Dickins [Wed, 11 May 2011 22:13:37 +0000 (15:13 -0700)]
tmpfs: fix race between umount and swapoff

The use of igrab() in swapoff's shmem_unuse_inode() is just as vulnerable
to umount as that in shmem_writepage().

Fix this instance by extending the protection of shmem_swaplist_mutex
right across shmem_unuse_inode(): while it's on the list, the inode cannot
be evicted (and the filesystem cannot be unmounted) without
shmem_evict_inode() taking that mutex to remove it from the list.

But since shmem_writepage() might take that mutex, we should avoid making
memory allocations or memcg charges while holding it: prepare them at the
outer level in shmem_unuse().  When mem_cgroup_cache_charge() was
originally placed, we didn't know until that point that the page from swap
was actually a shmem page; but nowadays it's noted in the swap_map, so
we're safe to charge upfront.  For the radix_tree, do as is done in
shmem_getpage(): preload upfront, but don't pin to the cpu; so we make a
habit of refreshing the node pool, but might dip into GFP_NOWAIT reserves
on occasion if subsequently preempted.

With the allocation and charge moved out from shmem_unuse_inode(),
we can also hold index map and info->lock over from finding the entry.

Signed-off-by: Hugh Dickins <>
Cc: Konstantin Khlebnikov <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agotmpfs: fix race between umount and writepage
Hugh Dickins [Wed, 11 May 2011 22:13:36 +0000 (15:13 -0700)]
tmpfs: fix race between umount and writepage

Konstanin Khlebnikov reports that a dangerous race between umount and
shmem_writepage can be reproduced by this script:

  for i in {1..300} ; do
mkdir $i
while true ; do
mount -t tmpfs none $i
dd if=/dev/zero of=$i/test bs=1M count=$(($RANDOM % 100))
umount $i
done &

on a 6xCPU node with 8Gb RAM: kernel very unstable after this accident. =)

Kernel log:

  VFS: Busy inodes after unmount of tmpfs.
                 Self-destruct in 5 seconds.  Have a nice day...

  WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
  list_del corruption. prev->next should be ffff880222fdaac8, but was (null)
  Pid: 11222, comm: mount.tmpfs Not tainted 2.6.39-rc2+ #4
  Call Trace:
  BUG: unable to handle kernel paging request at ffffffffffffffff
  IP: shmem_free_blocks+0x18/0x4c
  Pid: 10422, comm: dd Tainted: G        W   2.6.39-rc2+ #4
  Call Trace:

shmem_writepage() calls igrab() on the inode for the page which came from
page reclaim, to add it later into shmem_swaplist for swapoff operation.

This igrab() can race with super-block deactivating process:

  shrink_inactive_list()          deactivate_super()
  pageout()                       tmpfs_fs_type->kill_sb()
  shmem_writepage()               kill_litter_super()
                                   if (!list_empty(&sb->s_inodes))
                                          printk("VFS: Busy inodes after...

This igrap-iput pair was added in commit 1b1b32f2c6f6 "tmpfs: fix
shmem_swaplist races" based on incorrect assumptions: igrab() protects the
inode from concurrent eviction by deletion, but it does nothing to protect
it from concurrent unmounting, which goes ahead despite the raised

So this use of igrab() was wrong all along, but the race made much worse
in 2.6.37 when commit 63997e98a3be "split invalidate_inodes()" replaced
two attempts at invalidate_inodes() by a single evict_inodes().

Konstantin posted a plausible patch, raising sb->s_active too: I'm unsure
whether it was correct or not; but burnt once by igrab(), I am sure that
we don't want to rely more deeply upon externals here.

Fix it by adding the inode to shmem_swaplist earlier, while the page lock
on page in page cache still secures the inode against eviction, without
artifically raising i_count.  It was originally added later because
shmem_unuse_inode() is liable to remove an inode from the list while it's
unswapped; but we can guard against that by taking spinlock before
dropping mutex.

Reported-by: Konstantin Khlebnikov <>
Signed-off-by: Hugh Dickins <>
Tested-by: Konstantin Khlebnikov <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agomemcg: allocate memory cgroup structures in local nodes
Andi Kleen [Wed, 11 May 2011 22:13:35 +0000 (15:13 -0700)]
memcg: allocate memory cgroup structures in local nodes

Commit dde79e005a769 ("page_cgroup: reduce allocation overhead for
page_cgroup array for CONFIG_SPARSEMEM") added a regression that the
memory cgroup data structures all end up in node 0 because the first
attempt at allocating them would not pass in a node hint.  Since the
initialization runs on CPU #0 it would all end up node 0.  This is a
problem on large memory systems, where node 0 would lose a lot of

Change the alloc_pages_exact() to alloc_pages_exact_nid().  This will
still fall back to other nodes if not enough memory is available.

 [ RED-PEN: right now it would fall back first before trying
   vmalloc_node.  Probably not the best strategy ...  But I left it like
   that for now. ]

Signed-off-by: Andi Kleen <>
Reported-by: Doug Nelson
Cc: David Rientjes <>
Reviewed-by: Michal Hocko <>
Cc: Dave Hansen <>
Acked-by: Balbir Singh <>
Acked-by: Johannes Weiner <>
Reviewed-by: KOSAKI Motohiro <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agomm: add alloc_pages_exact_nid()
Andi Kleen [Wed, 11 May 2011 22:13:34 +0000 (15:13 -0700)]
mm: add alloc_pages_exact_nid()

Add a alloc_pages_exact_nid() that allocates on a specific node.

The naming is quite broken, but fixing that would need a larger renaming

[ coding-style fixes]
[ tweak comment]
Signed-off-by: Andi Kleen <>
Cc: Michal Hocko <>
Cc: Balbir Singh <>
Cc: KOSAKI Motohiro <>
Cc: Dave Hansen <>
Cc: David Rientjes <>
Acked-by: Johannes Weiner <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agoMAINTAINERS: fix sorting
Harry Wei [Wed, 11 May 2011 22:13:33 +0000 (15:13 -0700)]
MAINTAINERS: fix sorting

Take alphabetical orders for MAINTAINERS file.

Signed-off-by: Harry Wei <>
Cc: Joe Perches <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agomm: use alloc_bootmem_node_nopanic() on really needed path
Yinghai Lu [Wed, 11 May 2011 22:13:32 +0000 (15:13 -0700)]
mm: use alloc_bootmem_node_nopanic() on really needed path

Stefan found nobootmem does not work on his system that has only 8M of
RAM.  This causes an early panic:

  BIOS-provided physical RAM map:
   BIOS-88: 0000000000000000 - 000000000009f000 (usable)
   BIOS-88: 0000000000100000 - 0000000000840000 (usable)
  bootconsole [earlyser0] enabled
  Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS!
  DMI not present or invalid.
  last_pfn = 0x840 max_arch_pfn = 0x100000
  init_memory_mapping: 0000000000000000-0000000000840000
  8MB LOWMEM available.
    mapped low ram: 0 - 00840000
    low ram: 0 - 00840000
  Zone PFN ranges:
    DMA      0x00000001 -> 0x00001000
    Normal   empty
  Movable zone start PFN for each node
  early_node_map[2] active PFN ranges
      0: 0x00000001 -> 0x0000009f
      0: 0x00000100 -> 0x00000840
  BUG: Int 6: CR2 (null)
       EDI c034663c  ESI (null)  EBP c0329f38  ESP c0329ef4
       EBX c0346380  EDX 00000006  ECX ffffffff  EAX fffffff4
       err (null)  EIP c0353191   CS c0320060  flg 00010082
  Stack: (null) c030c533 000007cd (null) c030c533 00000001 (null) (null)
         00000003 0000083f 00000018 00000002 00000002 c0329f6c c03534d6 (null)
         (null) 00000100 00000840 (null) c0329f64 00000001 00001000 (null)
  Pid: 0, comm: swapper Not tainted 2.6.36 #5
  Call Trace:
   [<c02e3707>] ? 0xc02e3707
   [<c035e6e5>] 0xc035e6e5
   [<c0353191>] ? 0xc0353191
   [<c03534d6>] 0xc03534d6
   [<c034f1cd>] 0xc034f1cd
   [<c034a824>] 0xc034a824
   [<c03513cb>] ? 0xc03513cb
   [<c0349432>] 0xc0349432
   [<c0349066>] 0xc0349066

It turns out that we should ignore the low limit of 16M.

Use alloc_bootmem_node_nopanic() in this case.

[ less mess]
Signed-off-by: Yinghai LU <>
Reported-by: Stefan Hellermann <>
Tested-by: Stefan Hellermann <>
Cc: Ingo Molnar <>
Cc: "H. Peter Anvin" <>
Cc: Thomas Gleixner <>
Cc: <> [2.6.34+]
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agomm: check PageUnevictable in lru_deactivate_fn()
Minchan Kim [Wed, 11 May 2011 22:13:30 +0000 (15:13 -0700)]
mm: check PageUnevictable in lru_deactivate_fn()

The lru_deactivate_fn should not move page which in on unevictable lru
into inactive list.  Otherwise, we can meet BUG when we use
isolate_lru_pages as __isolate_lru_page could return -EINVAL.

Reported-by: Ying Han <>
Tested-by: Ying Han <>
Signed-off-by: Minchan Kim <>
Reviewed-by: KOSAKI Motohiro <>
Reviewed-by: Rik van Riel<>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agodrivers/rtc/rtc-s3c.c: fixup wake support for rtc
Ben Dooks [Wed, 11 May 2011 22:13:28 +0000 (15:13 -0700)]
drivers/rtc/rtc-s3c.c: fixup wake support for rtc

The driver is not balancing set_irq and disable_irq_wake() calls, so
ensure that it keeps track of whether the wake is enabled.

The fixes the following error on S3C6410 devices:

  WARNING: at kernel/irq/manage.c:382 set_irq_wake+0x84/0xec()
  Unbalanced IRQ 92 wake disable

Signed-off-by: Ben Dooks <>
Signed-off-by: Mark Brown <>
Cc: Alessandro Zummo <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
10 years agoMerge branch 'master' of git://
David S. Miller [Wed, 11 May 2011 23:13:08 +0000 (19:13 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

10 years agoPM / Hibernate: Fix ioctl SNAPSHOT_S2RAM
Rafael J. Wysocki [Tue, 10 May 2011 19:10:13 +0000 (21:10 +0200)]
PM / Hibernate: Fix ioctl SNAPSHOT_S2RAM

The SNAPSHOT_S2RAM ioctl used for implementing the feature allowing
one to suspend to RAM after creating a hibernation image is currently
broken, because it doesn't clear the "ready" flag in the struct
snapshot_data object handled by it.  As a result, the
SNAPSHOT_UNFREEZE doesn't work correctly after SNAPSHOT_S2RAM has
returned and the user space hibernate task cannot thaw the other
processes as appropriate.  Make SNAPSHOT_S2RAM clear data->ready
to fix this problem.

Tested-by: Alexandre Felipe Muller de Souza <>
Signed-off-by: Rafael J. Wysocki <>
10 years agoPM / Hibernate: Make snapshot_release() restore GFP mask
Rafael J. Wysocki [Tue, 10 May 2011 19:10:01 +0000 (21:10 +0200)]
PM / Hibernate: Make snapshot_release() restore GFP mask

If the process using the hibernate user space interface closes
/dev/snapshot after creating a hibernation image without thawing
tasks, snapshot_release() should call pm_restore_gfp_mask() to
restore the GFP mask used before the creation of the image.  Make
that happen.

Tested-by: Alexandre Felipe Muller de Souza <>
Signed-off-by: Rafael J. Wysocki <>
10 years agoPM: Fix warning in pm_restrict_gfp_mask() during SNAPSHOT_S2RAM ioctl
Rafael J. Wysocki [Tue, 10 May 2011 19:09:53 +0000 (21:09 +0200)]
PM: Fix warning in pm_restrict_gfp_mask() during SNAPSHOT_S2RAM ioctl

A warning is printed by pm_restrict_gfp_mask() while the
SNAPSHOT_S2RAM ioctl is being executed after creating a hibernation
image, because pm_restrict_gfp_mask() has been called once already
before the image creation and suspend_devices_and_enter() calls it
once again.  This happens after commit 452aa6999e6703ffbddd7f6ea124d3
(mm/pm: force GFP_NOIO during suspend/hibernation and resume).

To avoid this issue, move pm_restrict_gfp_mask() and
pm_restore_gfp_mask() from suspend_devices_and_enter() to its caller
in kernel/power/suspend.c.

Reported-by: Alexandre Felipe Muller de Souza <>
Signed-off-by: Rafael J. Wysocki <>
10 years agoNFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list
Andy Adamson [Wed, 11 May 2011 05:19:58 +0000 (01:19 -0400)]
NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list

Prevents an infinite loop as list was never emptied.

Signed-off-by: Andy Adamson <>
Signed-off-by: Trond Myklebust <>