KAMEZAWA Hiroyuki [Wed, 18 Feb 2009 22:48:32 +0000 (14:48 -0800)]
 
mm: clean up for early_pfn_to_nid()
commit 
f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream.
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:
	BUG_ON(page_zone(start_page) != page_zone(end_page));
Once I knew this is what was happening, I added some annotations:
	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...
And here's what I got:
	move_freepages: Bogus zones: start_page[
2207d0000] end_page[
2207dffc0] zone[
fffff8103effcb00]
	move_freepages: start_zone[
fffff8103effcb00] end_zone[
fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]
My memory layout on this box is:
[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.
This patch:
Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.
This patch moves all declaration to include/linux/mm.h
After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Mon, 16 Feb 2009 20:29:31 +0000 (21:29 +0100)]
 
JFFS2: fix mount crash caused by removed nodes
commit 
4c41bd0ec953954158f92bed5d3062645062b98e upstream.
At scan time we observed following scenario:
   node A inserted
   node B inserted
   node C inserted -> sets overlapped flag on node B
   node A is removed due to CRC failure -> overlapped flag on node B remains
   while (tn->overlapped)
   	 tn = tn_prev(tn);
   ==> crash, when tn_prev(B) is referenced.
When the ultimate node is removed at scan time and the overlapped flag
is set on the penultimate node, then nothing updates the overlapped
flag of that node. The overlapped iterators blindly expect that the
ultimate node does not have the overlapped flag set, which causes the
scan code to crash.
It would be a huge overhead to go through the node chain on node
removal and fix up the overlapped flags, so detecting such a case on
the fly in the overlapped iterators is a simpler and reliable
solution.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Steve French [Tue, 17 Feb 2009 01:29:40 +0000 (01:29 +0000)]
 
Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS
commit 
69765529d701c838df19ea1f5ad2f33a528261ae upstream.
Fixes kernel bug #10451 http://bugzilla.kernel.org/show_bug.cgi?id=10451
Certain NAS appliances do not set the operating system or network operating system
fields in the session setup response on the wire.  cifs was oopsing on the unexpected
zero length response fields (when trying to null terminate a zero length field).
This fixes the oops.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ed Cashin [Wed, 18 Feb 2009 22:48:13 +0000 (14:48 -0800)]
 
aoe: ignore vendor extension AoE responses
commit 
b6d6c5175809934e04a606d9193ef04924a7a7d9 upstream.
The Welland ME-747K-SI AoE target generates unsolicited AoE responses that
are marked as vendor extensions.  Instead of ignoring these packets, the
aoe driver was generating kernel messages for each unrecognized response
received.  This patch corrects the behavior.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Reported-by: <karaluh@karaluh.pl>
Tested-by: <karaluh@karaluh.pl>
Cc: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Bill Nottingham [Wed, 18 Feb 2009 22:48:39 +0000 (14:48 -0800)]
 
vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls.
commit 
2db69a9340da12a4db44edb7506dd68799aeff55 upstream.
Otherwise, these don't work when called from 32-bit userspace on 64-bit
kernels.
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Biederman [Wed, 18 Feb 2009 22:48:16 +0000 (14:48 -0800)]
 
seq_file: properly cope with pread
commit 
8f19d472935c83d823fa4cf02bcc0a7b9952db30 upstream.
Currently seq_read assumes that the offset passed to it is always the
offset it passed to user space.  In the case pread this assumption is
broken and we do the wrong thing when presented with pread.
To solve this I introduce an offset cache inside of struct seq_file so we
know where our logical file position is.  Then in seq_read if we try to
read from another offset we reset our data structures and attempt to go to
the offset user space wanted.
[akpm@linux-foundation.org: restore FMODE_PWRITE]
[pjt@google.com: seq_open needs its fmode opened up to take advantage of this]
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Turner <pjt@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Turner [Wed, 18 Feb 2009 22:48:15 +0000 (14:48 -0800)]
 
vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags
commit 
55ec82176eca52e4e0530a82a0eb59160a1a95a1 upstream.
Separate FMODE_PREAD and FMODE_PWRITE into separate flags to reflect the
reality that the read and write paths may have independent restrictions.
A git grep verifies that these flags are always cleared together so this
new behavior will only apply to interfaces that change to clear flags
individually.
This is required for "seq_file: properly cope with pread", a post-2.6.25
regression fix.
[akpm@linux-foundation.org: add comment]
Signed-off-by: Paul Turner <pjt@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc:  Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christoph Hellwig [Wed, 5 Nov 2008 13:58:46 +0000 (14:58 +0100)]
 
documnt FMODE_ constants
commit 
fc9161e54d0dbf799beff9692ea1cc6237162b85 upstream.
Make sure all FMODE_ constants are documents, and ensure a coherent
style for the already existing comments.
[This is needed for the next patch in the .27 kernel which
 changes fs.h.  This patch makes it easier to handle. - gkh]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Tue, 20 Jan 2009 07:24:42 +0000 (23:24 -0800)]
 
sparc: We need to implement arch_ptrace_stop().
[ Upstream commit 
878a5535957b563c447d32866a9e606c55fef091 ]
In order to always provide fully synchronized state to the debugger,
we might need to do a synchronize_user_stack().
A pair of hooks, arch_ptrace_stop_needed() and arch_ptrace_stop(),
exist to handle this kind of situation.  It was created for
the sake of IA64.
Use them, to flush the kernel side cached register windows
to the user stack, when necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Tue, 20 Jan 2009 06:56:51 +0000 (22:56 -0800)]
 
sparc64: Fix DAX handling via userspace access from kernel.
[ Upstream commit 
fcd26f7ae2ea5889134e8b3d60a42ce8b993c95f ]
If we do a userspace access from kernel mode, and get a
data access exception, we need to check the exception
table just like a normal fault does.
The spitfire DAX handler was doing this, but such logic
was missing from the sun4v DAX code.
Reported-by: Dennis Gilmore <dgilmore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Thu, 26 Feb 2009 07:09:34 +0000 (23:09 -0800)]
 
net: Kill skb_truesize_check(), it only catches false-positives.
[ Upstream commit 
92a0acce186cde8ead56c6915d9479773673ea1a ]
A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.
skb_truesize_check() was added in order to try and catch this error
more systematically.
However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eugene Teo [Mon, 23 Feb 2009 23:38:41 +0000 (15:38 -0800)]
 
net: amend the fix for SO_BSDCOMPAT gsopt infoleak
[ Upstream commit 
50fee1dec5d71b8a14c1b82f2f42e16adc227f8b ]
The fix for CVE-2009-0676 (upstream commit 
df0bca04) is incomplete. Note
that the same problem of leaking kernel memory will reappear if someone
on some architecture uses struct timeval with some internal padding (for
example tv_sec 64-bit and tv_usec 32-bit) --- then, you are going to
leak the padded bytes to userspace.
Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Tue, 25 Nov 2008 00:06:50 +0000 (16:06 -0800)]
 
bridge: netfilter: fix update_pmtu crash with GRE
[ Upstream commit 
631339f1e544a4d39a63cfe6708c5bddcd5a2c48 ]
As GRE tries to call the update_pmtu function on skb->dst and
bridge supplies an skb->dst that has a NULL ops field, all is
not well.
This patch fixes this by giving the bridge device an ops field
with an update_pmtu function.  For the moment I've left all
other fields blank but we can fill them in later should the
need arise.
Based on report and patch by Philip Craig.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jason Cooper [Tue, 11 Nov 2008 18:02:53 +0000 (13:02 -0500)]
 
USB: net: asix: add support for Cables-to-Go USB Ethernet adapter
commit 
ccf95402d0ae6f433f29ce88cfd589cec8fc81ad upstream.
Add support to drivers/net/usb/asix.c for the Cables-to-Go "USB 2.0 to
10/100 Ethernet Adapter". USB id 0b95:772a.
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Fri, 20 Feb 2009 22:39:34 +0000 (14:39 -0800)]
 
Linux 2.6.27.19
Theodore Ts'o [Tue, 17 Feb 2009 15:58:44 +0000 (10:58 -0500)]
 
ext4: Initialize the new group descriptor when resizing the filesystem
(cherry picked from commit 
fdff73f094e7220602cc3f8959c7230517976412)
Make sure all of the fields of the group descriptor are properly
initialized.  Previously, we allowed bg_flags field to be contain
random garbage, which could trigger non-deterministic behavior,
including a kernel OOPS.
http://bugzilla.kernel.org/show_bug.cgi?id=12433
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:43 +0000 (10:58 -0500)]
 
jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs"
(cherry picked from commit 
08ec8c3878cea0bf91f2ba3c0badf44b383752d0)
Otherwise it can be very confusing to find a "EXT3-fs: " failure in
the middle of EXT4-fs failures, and it makes it harder to track the
source of the failure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:42 +0000 (10:58 -0500)]
 
ext4: Add sanity check to make_indexed_dir
(cherry picked from commit 
e6b8bc09ba2075cd91fbffefcd2778b1a00bd76f)
Make sure the rec_len field in the '..' entry is sane, lest we overrun
the directory block and cause a kernel oops on a purposefully
corrupted filesystem.
Thanks to Sami Liedes for reporting this bug.
http://bugzilla.kernel.org/show_bug.cgi?id=12430
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:41 +0000 (10:58 -0500)]
 
ext4: only use i_size_high for regular files
(cherry picked from commit 
06a279d636734da32bb62dd2f7b0ade666f65d7c)
Directories are not allowed to be bigger than 2GB, so don't use
i_size_high for anything other than regular files.  E2fsck should
complain about these inodes, but the simplest thing to do for the
kernel is to only use i_size_high for regular files.
This prevents an intentially corrupted filesystem from causing the
kernel to burn a huge amount of CPU and issuing error messages such
as:
EXT4-fs warning (device loop0): ext4_block_to_path: block 
135090028 > max
Thanks to David Maciejak from Fortinet's FortiGuard Global Security
Research Team for reporting this issue.
http://bugzilla.kernel.org/show_bug.cgi?id=12375
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:40 +0000 (10:58 -0500)]
 
ext4: Add sanity checks for the superblock before mounting the filesystem
(cherry picked from commit 
4ec110281379826c5cf6ed14735e47027c3c5765)
This avoids insane superblock configurations that could lead to kernel
oops due to null pointer derefences.
http://bugzilla.kernel.org/show_bug.cgi?id=12371
Thanks to David Maciejak at Fortinet's FortiGuard Global Security
Research Team who discovered this bug independently (but at
approximately the same time) as Thiemo Nagel, who submitted the patch.
Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:39 +0000 (10:58 -0500)]
 
ext4: Init the complete page while building buddy cache
(cherry picked from commit 
29eaf024980e07cc01f31ae4ea5d68c917f4b7da)
We need to init the complete page during buddy cache init
by setting the contents to '1'.  Otherwise we can see the
following errors after doing an online resize of the
filesystem:
EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used:
	Allocating block 1040385 in system zone of 127 group
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:38 +0000 (10:58 -0500)]
 
ext4: Don't allow new groups to be added during block allocation
(cherry picked from commit 
8556e8f3b6c4c11601ce1e9ea8090a6d8bd5daae)
After we mark the blocks in the buddy cache as allocated,
we need to ensure that we don't reinit the buddy cache until
the block bitmap is updated.  This commit achieves this by holding
the group_info alloc_semaphore till ext4_mb_release_context
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:37 +0000 (10:58 -0500)]
 
ext4: mark the blocks/inode bitmap beyond end of group as used
(cherry picked from commit 
648f5879f5892dddd3ba71cd0d285599f40f2512)
We need to mark the block/inode bitmap beyond the end of the group
with '1'.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:36 +0000 (10:58 -0500)]
 
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
(cherry picked from commit 
2ccb5fb9f113dae969d1ae9b6c10e80fa34f8cd3)
For uninit block group, the ondisk bitmap is not initialized. That implies
we cannot depend on the uptodate flag on the bitmap buffer_head to
find bitmap validity. Use a new buffer_head flag which would be set after
we properly initialize the bitmap. This also prevent the initializing
the uninit group bitmap initialization every time we do a
ext4_read_block_bitmap.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mark Fasheh [Tue, 17 Feb 2009 15:58:35 +0000 (10:58 -0500)]
 
jbd2: Add BH_JBDPrivateStart
(cherry picked from commit 
e97fcd95a4778a8caf1980c6c72fdf68185a0838)
Add this so that file systems using JBD2 can safely allocate unused b_state
bits.
In this case, we add it so that Ocfs2 can define a single bit for tracking
the validation state of a buffer.
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:34 +0000 (10:58 -0500)]
 
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
(cherry picked from commit 
393418676a7602e1d7d3f6e560159c65c8cbd50e)
We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
whether to initialize the inode bitmap each time it is called.
(introduced by commit 
c806e68f.)
ext4_read_inode_bitmap does:
spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
	ext4_init_inode_bitmap(sb, bh, block_group, desc);
and ext4_new_inode does
if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                   ino, inode_bitmap_bh->b_data))
		   ......
		   ...
spin_lock(sb_bgl_lock(sbi, group));
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
i.e., on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
parallel ext4_read_inode_bitmap can zero out the bitmap in between
the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)
The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:33 +0000 (10:58 -0500)]
 
ext4: Fix race between read_block_bitmap() and mark_diskspace_used()
(cherry picked from commit 
e8134b27e351e813414da3b95aa8eac6d3908088)
We need to make sure we update the block bitmap and clear
EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
whether to initialize the block bitmap each time it is called
(introduced by commit 
c806e68f), and this can race with block
allocations in ext4_mb_mark_diskspace_used().
ext4_read_block_bitmap does:
spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
	ext4_init_block_bitmap(sb, bh, block_group, desc);
Now on the block allocation side we do
mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
....
spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
ie on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
parallel ext4_read_block_bitmap can zero out the bitmap in between
the above mb_set_bits and spin_lock(sb_bg_lock..)
The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:32 +0000 (10:58 -0500)]
 
ext4: don't use blocks freed but not yet committed in buddy cache init
(cherry picked from commit 
7a2fcbf7f85737735fd44eb34b62315bccf6d6e4)
When we generate buddy cache (especially during resize) we need to
make sure we don't use the blocks freed but not yet comitted.  This
makes sure we have the right value of free blocks count in the group
info and also in the bitmap.  This also ensures the ordered mode
consistency
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:31 +0000 (10:58 -0500)]
 
ext4: Use an rbtree for tracking blocks freed during transaction.
(cherry picked from commit 
c894058d66637c7720569fbe12957f4de64d9991 to allow
commit 
e21675d4 to be included in 2.6.27.y)
With this patch we track the block freed during a transaction using
red-black tree.  We also make sure contiguous blocks freed are collected
in one node in the tree.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:30 +0000 (10:58 -0500)]
 
ext4: cleanup mballoc header files
(cherry picked from commit 
c3a326a657562dab81acf05aee106dc1fe345eb4)
Move some of the forward declaration of the static functions
to mballoc.c where they are used. This enables us to include
mballoc.h in other .c files. Also correct the buddy cache
documentation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:29 +0000 (10:58 -0500)]
 
ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize
(cherry picked from commit 
920313a726e04fef0f2c0bcb04ad8229c0e700d8)
The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size < page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info->alloc_sem is added to ensure the same.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:28 +0000 (10:58 -0500)]
 
ext4: Add blocks added during resize to bitmap
(cherry picked from commit 
e21675d4b63975d09eb75c443c48ebe663d23e18)
With this change new blocks added during resize
are marked as free in the block bitmap and the
group is flagged with EXT4_GROUP_INFO_NEED_INIT_BIT
flag. This make sure when mballoc tries to allocate
blocks from the new group we would reload the
buddy information using the bitmap present in the disk.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:27 +0000 (10:58 -0500)]
 
ext4: Don't overwrite allocation_context ac_status
(cherry picked from commit 
032115fcef837a00336ddf7bda584e89789ea498)
We can call ext4_mb_check_limits even after successfully allocating
the requested blocks.  In that case, make sure we don't overwrite
ac_status if it already has the status AC_STATUS_FOUND.  This fixes
the lockdep warning:
=============================================
[ INFO: possible recursive locking detected ]
2.6.28-rc6-autokern1 #1
---------------------------------------------
fsstress/11948 is trying to acquire lock:
 (&meta_group_info[i]->alloc_sem){----}, at: [<
c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
.....
stack backtrace:
.....
 [<
c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44
.....
but task is already holding lock:
 (&meta_group_info[i]->alloc_sem){----}, at: [<
c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:26 +0000 (10:58 -0500)]
 
jbd2: Add barrier not supported test to journal_wait_on_commit_record
(cherry picked from commit 
fd98496f467b3d26d05ab1498f41718b5ef13de5)
Xen doesn't report that barriers are not supported until buffer I/O is
reported as completed, instead of when the buffer I/O is submitted.
Add a check and a fallback codepath to journal_wait_on_commit_record()
to detect this case, so that attempts to mount ext4 filesystems on
LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
journal on device XXX"; "Remounting filesystem read-only" error.
Thanks to Andreas Sundstrom for reporting this issue.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Yasunori Goto [Tue, 17 Feb 2009 15:58:25 +0000 (10:58 -0500)]
 
ext4: Widen type of ext4_sb_info.s_mb_maxs[]
(cherry picked from commit 
ff7ef329b268b603ea4a2303241ef1c3829fd574)
I chased the cause of following ext4 oops report which is tested on
ia64 box.
http://bugzilla.kernel.org/show_bug.cgi?id=12018
The cause is the size of s_mb_maxs array that is defined as "unsigned
short" in ext4_sb_info structure.  If the file system's block size is
8k or greater, an unsigned short is not wide enough to contain the
value fs->blocksize << 3.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:24 +0000 (10:58 -0500)]
 
ext4: avoid ext4_error when mounting a fs with a single bg
(cherry picked from commit 
565a9617b2151e21b22700e97a8b04e70e103153)
Remove some completely unneeded code which which caused an ext4_error
to be generated when mounting a file system with only a single block
group.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Aneesh Kumar K.V [Tue, 17 Feb 2009 15:58:23 +0000 (10:58 -0500)]
 
ext4: Fix the delalloc writepages to allocate blocks at the right offset.
(cherry picked from commit 
791b7f08954869d7b8ff438f3dac3cfb39778297)
When iterating through the pages which have mapped buffer_heads, we
failed to update the b_state value. This results in allocating blocks
at logical offset 0.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:22 +0000 (10:58 -0500)]
 
ext4: tone down ext4_da_writepages warnings
(cherry picked from commit 
2a21e37e48b94388f2cc8c0392f104f5443d4bb8)
If the filesystem has errors, ext4_da_writepages() will return a *lot*
of errors, including lots and lots of stack dumps.  While it's true
that we are dropping user data on the floor, which is unfortunate, the
stack dumps aren't helpful, and they tend to obscure the true original
root cause of the problem.  So in the case where the filesystem has
aborted, return an EROFS right away.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Theodore Ts'o [Tue, 17 Feb 2009 15:58:21 +0000 (10:58 -0500)]
 
ext4: Add support for non-native signed/unsigned htree hash algorithms
(cherry picked from commit 
f99b25897a86fcfff9140396a97261ae65fed872)
The original ext3 hash algorithms assumed that variables of type char
were signed, as God and K&R intended.  Unfortunately, this assumption
is not true on some architectures.  Userspace support for marking
filesystems with non-native signed/unsigned chars was added two years
ago, but the kernel-side support was never added (until now).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeremy Fitzhardinge [Wed, 11 Feb 2009 17:32:19 +0000 (09:32 -0800)]
 
x86/cpa: make sure cpa is safe to call in lazy mmu mode
commit 
4f06b0436b2ddbd3b67b10e77098a6862787b3eb upstream.
Impact: fix race leading to crash under KVM and Xen
The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mike Christie [Fri, 16 Jan 2009 18:36:51 +0000 (12:36 -0600)]
 
SCSI: libiscsi: fix iscsi pool leak
commit 
2f5899a39dcffb404c9a3d06ad438aff3e03bf04 upstream.
I am not sure what happened. It looks like we have always leaked
the q->queue that is allocated from the kfifo_init call. nab finally
noticed that we were leaking and this patch fixes it by adding a
kfree call to iscsi_pool_free. kfifo_free is not used per kfifo_init's
instructions to use kfree.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Carsten Otte [Wed, 11 Feb 2009 21:04:37 +0000 (13:04 -0800)]
 
ext2/xip: refuse to change xip flag during remount with busy inodes
commit 
0e4a9b59282914fe057ab17027f55123964bc2e2 upstream.
For a reason that I was unable to understand in three months of debugging,
mount ext2 -o remount stopped working properly when remounting from
regular operation to xip, or the other way around.  According to a git
bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
rework in the vm:
commit 
70688e4dd1647f0ceb502bbd5964fa344c5eb411
Author: Nick Piggin <npiggin@suse.de>
Date:   Mon Apr 28 02:13:02 2008 -0700
    xip: support non-struct page backed memory
In the failing scenario, the filesystem is mounted read only via root=
kernel parameter on s390x.  During remount (in rc.sysinit), the inodes of
the bash binary and its libraries are busy and cannot be invalidated (the
bash which is running rc.sysinit resides on subject filesystem).
Afterwards, another bash process (running ifup-eth) recurses into a
subshell, runs dup_mm (via fork).  Some of the mappings in this bash
process were created from inodes that could not be invalidated during
remount.
Both parent and child process crash some time later due to inconsistencies
in their address spaces.  The issue seems to be timing sensitive, various
attempts to recreate it have failed.
This patch refuses to change the xip flag during remount in case some
inodes cannot be invalidated.  This patch keeps users from running into
that issue.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sergio Luis [Mon, 27 Oct 2008 06:08:48 +0000 (23:08 -0700)]
 
btsdio: free sk_buff with kfree_skb
commit 
cbfd24a75f98fe731547d3bc995f3a1f1fed6b20 upstream.
free sk_buff with kfree_skb, instead of kree
Signed-off-by: Sergio Luis <sergio@larces.uece.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tomas Winkler [Sun, 30 Nov 2008 11:17:18 +0000 (12:17 +0100)]
 
Bluetooth: Fix TX error path in btsdio driver
commit 
7644d63d1348ec044ccd8f775fefe5eb7cbcac69 upstream.
This patch fixes accumulating of the header in case packet was requeued
in the error path.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Zlatko Calusic [Wed, 18 Feb 2009 00:33:34 +0000 (01:33 +0100)]
 
Add support for VT6415 PCIE PATA IDE Host Controller
commit 
5955c7a2cfb6a35429adea5dc480002b15ca8cfc upstream.
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Roel Kluin [Fri, 13 Feb 2009 00:52:31 +0000 (16:52 -0800)]
 
3c505: do not set pcb->data.raw beyond its size
commit 
501aa061bd68169a5b54c123641f8dfa9ad31545 upstream.
Ensure that we do not set pcb->data.raw beyond its size, print an error message
and return false if we attempt to. A timout message was printed one too early.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Thu, 12 Feb 2009 01:34:32 +0000 (10:34 +0900)]
 
sata_nv: give up hardreset on nf2
commit 
7dac745b8e367c99175b8f0d014d996f0e5ed9e5 upstream.
Kernel bz#12176 reports that nf2 hardreset simply doesn't work.  Give
up.  Argh...
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockr@shaw.ca>
Reported-by: Saro <saro_v@hotmail.it>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael Neuling [Thu, 12 Feb 2009 19:08:58 +0000 (19:08 +0000)]
 
powerpc/vsx: Fix VSX alignment handler for regs 32-63
commit 
26456dcfb8d8e43b1b64b2a14710694cf7a72f05 upstream.
Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
in the VMX part of the thread_struct not the FPR part.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Woodhouse [Fri, 13 Feb 2009 23:18:03 +0000 (23:18 +0000)]
 
Fix Intel IOMMU write-buffer flushing
commit 
ca77fde8e62cecb2c0769052228d15b901367af8 upstream.
This is the cause of the DMA faults and disk corruption that people have
been seeing. Some chipsets neglect to report the RWBF "capability" --
the flag which says that we need to flush the chipset write-buffer when
changing the DMA page tables, to ensure that the change is visible to
the IOMMU.
Override that bit on the affected chipsets, and everything is happy
again.
Thanks to Chris and Bhavesh and others for helping to debug.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sukadev Bhattiprolu [Thu, 8 Jan 2009 02:08:50 +0000 (18:08 -0800)]
 
mqueue: fix si_pid value in mqueue do_notify()
commit 
a6684999f7c6bddd75cf9755ad7ff44435f72fff upstream.
If a process registers for asynchronous notification on a POSIX message
queue, it gets a signal and a siginfo_t structure when a message arrives
on the message queue.  The si_pid in the siginfo_t structure is set to the
PID of the process that sent the message to the message queue.
The principle is the following:
. when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
  notification when a msg arrives. The associated pid structure is stroed into
  inode_info->notify_owner. Let's call this process P1.
. when mq_send() is called by say P2, P2 sends a signal to P1 to notify
  him about msg arrival.
The way .si_pid is set today is not correct, since it doesn't take into account
the fact that the process that is sending the message might not be in the
same namespace as the notified one.
This patch proposes to set si_pid to the sender's pid into the notify_owner
namespace.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric W. Biederman [Thu, 8 Jan 2009 02:08:46 +0000 (18:08 -0800)]
 
pid: implement ns_of_pid
commit 
f9fb860f67b9542cd78d1558dec7058092b57d8e upstream.
A current problem with the pid namespace is that it is easy to do pid
related work after exit_task_namespaces which drops the nsproxy pointer.
However if we are doing pid namespace related work we are always operating
on some struct pid which retains the pid_namespace pointer of the pid
namespace it was allocated in.
So provide ns_of_pid which allows us to find the pid namespace a pid was
allocated in.
Using this we have the needed infrastructure to do pid namespace related
work at anytime we have a struct pid, removing the chance of accidentally
having a NULL pointer dereference when accessing current->nsproxy.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Tue, 17 Feb 2009 17:47:50 +0000 (09:47 -0800)]
 
Linux 2.6.27.18
Jarek Poplawski [Tue, 20 Jan 2009 01:03:56 +0000 (17:03 -0800)]
 
net: Fix data corruption when splicing from sockets.
[ Upstream commit 
8b9d3728977760f6bd1317c4420890f73695354e ]
The trick in socket splicing where we try to convert the skb->data
into a page based reference using virt_to_page() does not work so
well.
The idea is to pass the virt_to_page() reference via the pipe
buffer, and refcount the buffer using a SKB reference.
But if we are splicing from a socket to a socket (via sendpage)
this doesn't work.
The from side processing will grab the page (and SKB) references.
The sendpage() calls will grab page references only, return, and
then the from side processing completes and drops the SKB ref.
The page based reference to skb->data is not enough to keep the
kmalloc() buffer backing it from being reused.  Yet, that is
all that the socket send side has at this point.
This leads to data corruption if the skb->data buffer is reused
by SLAB before the send side socket actually gets the TX packet
out to the device.
The fix employed here is to simply allocate a page and copy the
skb->data bytes into that page.
This will hurt performance, but there is no clear way to fix this
properly without a copy at the present time, and it is important
to get rid of the data corruption.
With fixes from Herbert Xu.
Tested-by: Willy Tarreau <w@1wt.eu>
Foreseen-by: Changli Gao <xiaosuo@gmail.com>
Diagnosed-by: Willy Tarreau <w@1wt.eu>
Reported-by: Willy Tarreau <w@1wt.eu>
Fixed-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Takashi Iwai [Wed, 11 Feb 2009 23:06:42 +0000 (00:06 +0100)]
 
ALSA: mtpav - Fix initial value for input hwport
commit 
32cf9a16f4af01573ddec1eb073111fc20a9d7d4 upstream.
Fix the initial value for input hwport.  The old value (-1) may cause
Oops when an realtime MIDI byte is received before the input port is
explicitly given.
Instead, now it's set to the broadcasting as default.
Tested-by: Holger Dehnhardt <dehnhardt@ahdehnhardt.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jianjun Kong [Tue, 11 Nov 2008 05:37:39 +0000 (21:37 -0800)]
 
mac80211: fix a buffer overrun in station debug code
commit 
013cd397532e5803a1625954a884d021653da720 upstream.
net/mac80211/debugfs_sta.c
The trailing zero was written to state[4], it's out of bounds.
Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andreas Herrmann [Tue, 25 Nov 2008 16:18:03 +0000 (17:18 +0100)]
 
x86: fixup config space size of CPU functions for AMD family 11h
commit 
ffd565a8b817d1eb4b25184e8418e8d96c3f56f6 upstream.
Impact: extend allowed configuration space access on 11h CPUs from 256 to 4K
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sergei Shtylyov [Sun, 1 Feb 2009 16:46:39 +0000 (20:46 +0400)]
 
ide/libata: fix ata_id_is_cfa() (take 4)
commit 
2999b58b795ad81f10e34bdbbfd2742172f247e4 upstream.
When checking for the CFA feature set support, ata_id_is_cfa() tests bit 2 in
word 82 of the identify data instead the word 83;  it also checks the ATA/PI
version support in the word 80 (which the CompactFlash specifications have as
reserved), this having no slightest chance to work on the modern CF cards that
don't have 0x848A in the word 0...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Thu, 29 Jan 2009 11:31:29 +0000 (20:31 +0900)]
 
libata: fix EH device failure handling
commit 
d89293abd95bfd7dd9229087d6c30c1464c5ac83 upstream.
The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.
This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.
[cebbert@redhat.com: trivial backport to 2.6.27]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jiri Kosina [Tue, 10 Feb 2009 22:00:34 +0000 (17:00 -0500)]
 
HID: adjust report descriptor fixup for MS 1028 receiver
commit 
0fb21de0799a985d2da3da14ae5625d724256638 upstream
HID: adjust report descriptor fixup for MS 1028 receiver
[Backport to 2.6.27: cebbert@redhat.com]
Report descriptor fixup for MS 1028 receiver changes also values for
Keyboard and Consumer, which incorrectly trims the range, causing correct
events being thrown away before passing to userspace.
We need to keep the GenDesk usage fixup though, as it reports totally bogus
values about axis.
Reported-by: Lucas Gadani <lgadani@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Torsten Rausche [Thu, 12 Feb 2009 01:32:44 +0000 (02:32 +0100)]
 
bluetooth hid: enable quirk handling for Apple Wireless Keyboards in 2.6.27
This patch is basically a backport of
commit 
ee8a1a0a1a5817accd03ced7e7ffde3a4430f485 upstream
which was made after the big HID overhaul in 2.6.28.
Kernel 2.6.27 fails to handle quirks for the aluminum Apple Wireless
Keyboard because it is handled as USB device and not as Bluetooth
device. This patch expands 'hidp_blacklist' to make the kernel handle
the keyboard in the same way as the Apple wireless Mighty Mouse (also a
Bluetooth device).
Signed-off-by: Torsten Rausche <torsten@rausche.net>
Cc: Jan Scholz <Scholz@fias.uni-frankfurt.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Qu Haoran [Thu, 12 Feb 2009 07:07:38 +0000 (08:07 +0100)]
 
netfilter: xt_sctp: sctp chunk mapping doesn't work
netfilter: xt_sctp: sctp chunk mapping doesn't work
Upstream commit: 
d4e2675a
When user tries to map all chunks given in argument, kernel
works on a copy of the chunkmap, but at the end it doesn't
check the copy, but the orginal one.
Signed-off-by: Qu Haoran <haoran.qu@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Leblond [Thu, 12 Feb 2009 07:07:37 +0000 (08:07 +0100)]
 
netfilter: fix tuple inversion for Node information request
netfilter: fix tuple inversion for Node information request
Upstream commit: 
a51f42f3c
The patch fixes a typo in the inverse mapping of Node Information
request. Following draft-ietf-ipngwg-icmp-name-lookups-09, "Querier"
sends a type 139 (ICMPV6_NI_QUERY) packet to "Responder" which answer
with a type 140 (ICMPV6_NI_REPLY) packet.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Fri, 13 Feb 2009 09:09:19 +0000 (01:09 -0800)]
 
sparc64: Annotate sparc64 specific syscalls with SYSCALL_DEFINEx()
[ Upstream commit 
e42650196df34789c825fa83f8bb37a5d5e52c14 ]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christian Borntraeger [Fri, 13 Feb 2009 09:08:47 +0000 (01:08 -0800)]
 
sparc: Enable syscall wrappers for 64-bit (CVE-2009-0029)
[ Upstream commit 
67605d6812691bbd2158d2f60259e0407611bc1b ]
sparc64 needs sign-extended function parameters. We have to enable
the system call wrappers.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dimitris Michailidis [Tue, 27 Jan 2009 06:15:31 +0000 (22:15 -0800)]
 
tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits.
[ Upstream commit 
9fa5fdf291c9b58b1cb8b4bb2a0ee57efa21d635 ]
tcp_splice_data_recv has two lengths to consider: the len parameter it
gets from tcp_read_sock, which specifies the amount of data in the skb,
and rd_desc->count, which is the amount of data the splice caller still
wants.  Currently it passes just the latter to skb_splice_bits, which then
splices min(rd_desc->count, skb->len - offset) bytes.
Most of the time this is fine, except when the skb contains urgent data.
In that case len goes only up to the urgent byte and is less than
skb->len - offset.  By ignoring len tcp_splice_data_recv may a) splice
data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len.
Now, tcp_read_sock doesn't handle used > len and leaves the socket in a
bad state (both sk_receive_queue and copied_seq are bad at that point)
resulting in duplicated data and corruption.
Fix by passing min(rd_desc->count, len) to skb_splice_bits.
Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Willy Tarreau [Wed, 14 Jan 2009 00:04:36 +0000 (16:04 -0800)]
 
tcp: splice as many packets as possible at once
[ Upstream commit 
33966dd0e2f68f26943cd9ee93ec6abbc6547a8e ]
As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not
optimal. It processes at most one segment per call.
This results in low performance and very high overhead due to syscall rate
when splicing from interfaces which do not support LRO.
Willy provided a patch inside tcp_splice_read(), but a better fix
is to let tcp_read_sock() process as many segments as possible, so
that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less
often.
With this change, splice() behaves like tcp_recvmsg(), being able
to consume many skbs in one system call. With typical 1460 bytes
of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return
16*1460 = 23360 bytes.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Fri, 30 Jan 2009 22:12:06 +0000 (14:12 -0800)]
 
packet: Avoid lock_sock in mmap handler
[ Upstream commit 
905db44087855e3c1709f538ecdc22fd149cadd8 ]
As the mmap handler gets called under mmap_sem, and we may grab
mmap_sem elsewhere under the socket lock to access user data, we
should avoid grabbing the socket lock in the mmap handler.
Since the only thing we care about in the mmap handler is for
pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
can achieve this by simply using a new mutex.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Martin MOKREJŠ <mmokrejs@ribosome.natur.cuni.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Shyam Iyer [Fri, 30 Jan 2009 00:12:42 +0000 (16:12 -0800)]
 
net: Fix OOPS in skb_seq_read().
[ Upstream commit 
71b3346d182355f19509fadb8fe45114a35cc499 ]
It oopsd for me in skb_seq_read. addr2line said it was
linux-2.6/net/core/skbuff.c:2228, which is this line:
	while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
I added some printks in there and it looks like we hit this:
        } else if (st->root_skb == st->cur_skb &&
                   skb_shinfo(st->root_skb)->frag_list) {
                 st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                 st->frag_idx = 0;
                 goto next_skb;
        }
Actually I did some testing and added a few printks and found that the
st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
This caused the kernel panic.
 	if (abs_offset < block_limit) {
-		*data = st->cur_skb->data + abs_offset;
+		*data = st->cur_skb->data + (abs_offset - st->stepped_offset);
I enabled the debug_tcp and with a few printks found that the code did
not go to the next_skb label and could find that the sequence being
followed was this -
It hit this if condition -
        if (st->cur_skb->next) {
                st->cur_skb = st->cur_skb->next;
                st->frag_idx = 0;
                goto next_skb;
And so, now the st pointer is shifted to the next skb whereas actually
it should have hit the second else if first since the data is in the
frag_list.
        else if (st->root_skb == st->cur_skb &&
                 skb_shinfo(st->root_skb)->frag_list) {
                st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                goto next_skb;
        }
Reversing the two conditions the attached patch fixes the issue for me
on top of Herbert's patches.
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Fri, 30 Jan 2009 00:07:52 +0000 (16:07 -0800)]
 
net: Fix frag_list handling in skb_seq_read
[ Upstream commit 
95e3b24cfb4ec0479d2c42f7a1780d68063a542a ]
The frag_list handling was broken in skb_seq_read:
1) We didn't add the stepped offset when looking at the head
are of fragments other than the first.
2) We didn't take the stepped offset away when setting the data
pointer in the head area.
3) The frag index wasn't reset.
This patch fixes both issues.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alex Williamson [Fri, 13 Feb 2009 08:06:29 +0000 (00:06 -0800)]
 
virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs
[ Upstream commit 
e918085aaff34086e265f825dd469926b1aec4a4 ]
802.1Q expanded the maximum ethernet frame size by 4 bytes for the
VLAN tag.  We're not taking this into account in virtio_net, which
means the buffers we provide to the backend in the virtqueue RX ring
aren't big enough to hold a full MTU VLAN packet.  For QEMU/KVM,
this results in the backend exiting with a packet truncation error.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Dumazet [Mon, 2 Feb 2009 21:41:57 +0000 (13:41 -0800)]
 
udp: increments sk_drops in __udp_queue_rcv_skb()
[ Upstream commit 
e408b8dcb5ce42243a902205005208e590f28454 ]
Commit 
93821778def10ec1e69aa3ac10adee975dad4ff3 (udp: Fix rcv socket
locking) accidentally removed sk_drops increments for UDP IPV4
sockets.
This field can be used to detect incorrect sizing of socket receive
buffers.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jesper Dangaard Brouer [Thu, 5 Feb 2009 23:05:45 +0000 (15:05 -0800)]
 
udp: Fix UDP short packet false positive
[ Upstream commit 
7b5e56f9d635643ad54f2f42e69ad16b80a2cff1 ]
The UDP header pointer assignment must happen after calling
pskb_may_pull().  As pskb_may_pull() can potentially alter the SKB
buffer.
This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alex Williamson [Mon, 9 Feb 2009 01:49:17 +0000 (17:49 -0800)]
 
tun: Fix unicast filter overflow
[ Upstream commit 
cfbf84fcbcda98bb91ada683a8dc8e6901a83ebd ]
Tap devices can make use of a small MAC filter set via the
TUNSETTXFILTER ioctl.  The filter has a set of exact matches
plus a hash for imperfect filtering of additional multicast
addresses.  The current code is unbalanced, adding unicast
addresses to the multicast hash, but only checking the hash
against multicast addresses.  This results in the filter
dropping unicast addresses that overflow the exact filter.
The fix is simply to disable the filter by leaving count set
to zero if we find non-multicast addresses after the exact
match table is filled.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Fri, 30 Jan 2009 00:53:35 +0000 (16:53 -0800)]
 
tun: Add some missing TUN compat ioctl translations.
[ Upstream commit 
df1c46b2b6876d0a1b1b4740f009fa69d95ebbc9 ]
Based upon a report from Michael Tokarev <mjt@tls.msk.ru>:
	Just saw in dmesg:
	ioctl32(kvm:4408): Unknown cmd fd(9) cmd(
800454cf){t:'T';sz:4} arg(
ffc668e4) on /dev/net/tun
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ilkka Virta [Sat, 7 Feb 2009 06:00:36 +0000 (22:00 -0800)]
 
sungem: Soft lockup in sungem on Netra AC200 when switching interface up
[ Upstream commit 
71822faa3bc0af5dbf5e333a2d085f1ed7cd809f ]
From: Ilkka Virta <itvirta@iki.fi>
In the lockup situation the driver seems to go off in an eternal storm
of interrupts right after calling request_irq(). It doesn't actually
do anything interesting in the interrupt handler. Since connecting the link
afterwards works, something later in initialization must fix this.
Looking at gem_do_start() and gem_open(), it seems that the only thing
done while opening the device after the request_irq(), is a call to
napi_enable().
I don't know what the ordering requirements are for the
initialization, but I boldly tried to move the napi_enable() call
inside gem_do_start() before the link state is checked and interrupts
subsequently enabled, and it seems to work for me. Doesn't even break
anything too obvious...
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alexey Dobriyan [Fri, 30 Jan 2009 21:45:31 +0000 (13:45 -0800)]
 
sky2: fix hard hang with netconsoling and iface going up
[ Upstream commit 
a11da890e4c9850411303efcf6514f048ca880ee ]
Printing anything over netconsole before hw is up and running is,
of course, not going to work.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sebastiano Di Paola [Fri, 30 Jan 2009 23:37:17 +0000 (23:37 +0000)]
 
net: packet socket packet_lookup_frame fix
[ Upstream commit 
f9e6934502e46c363100245f137ddf0f4b1cb574 ]
packet_lookup_frames() fails to get user frame if current frame header
status contains extra flags.
This is due to the wrong assumption on the operators precedence during
frame status tests.
Fixed by forcing the right operators precedence order with explicit brackets.
Signed-off-by: Paolo Abeni <paolo.abeni@gmail.com>
Signed-off-by: Sebastiano Di Paola <sebastiano.dipaola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Clément Lecigne [Fri, 13 Feb 2009 00:59:09 +0000 (16:59 -0800)]
 
net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2
[ Upstream commit 
df0bca049d01c0ee94afb7cd5dfd959541e6c8da ]
In function sock_getsockopt() located in net/core/sock.c, optval v.val
is not correctly initialized and directly returned in userland in case
we have SO_BSDCOMPAT option set.
This dummy code should trigger the bug:
int main(void)
{
	unsigned char buf[4] = { 0, 0, 0, 0 };
	int len;
	int sock;
	sock = socket(33, 2, 2);
	getsockopt(sock, 1, SO_BSDCOMPAT, &buf, &len);
	printf("%x%x%x%x\n", buf[0], buf[1], buf[2], buf[3]);
	close(sock);
}
Here is a patch that fix this bug by initalizing v.val just after its
declaration.
Signed-off-by: Clément Lecigne <clement.lecigne@netasq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Thu, 5 Feb 2009 23:15:50 +0000 (15:15 -0800)]
 
ipv6: Copy cork options in ip6_append_data
[ Upstream commit 
0178b695fd6b40a62a215cbeb03dd51ada3bb5e0 ]
As the options passed to ip6_append_data may be ephemeral, we need
to duplicate it for corking.  This patch applies the simplest fix
which is to memdup all the relevant bits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Fri, 6 Feb 2009 08:49:55 +0000 (00:49 -0800)]
 
ipv6: Disallow rediculious flowlabel option sizes.
[ Upstream commit 
684de409acff8b1fe8bf188d75ff2f99c624387d ]
Just like PKTINFO, limit the options area to 64K.
Based upon report by Eric Sesterhenn and analysis by
Roland Dreier.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Benjamin Zores [Fri, 30 Jan 2009 00:19:13 +0000 (16:19 -0800)]
 
ipv4: fix infinite retry loop in IP-Config
[ Upstream commit 
9d8dba6c979fa99c96938c869611b9a23b73efa9 ]
Signed-off-by: Benjamin Zores <benjamin.zores@alcatel-lucent.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Roel Kluin [Fri, 30 Jan 2009 01:32:20 +0000 (17:32 -0800)]
 
drivers/net/skfp: if !capable(CAP_NET_ADMIN): inverted logic
[ Upstream commit 
c25b9abbc2c2c0da88e180c3933d6e773245815a ]
Fix inverted logic
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vlad Yasevich [Thu, 22 Jan 2009 22:53:01 +0000 (14:53 -0800)]
 
sctp: Properly timestamp outgoing data chunks for rtx purposes
[ Upstream commit 
759af00ebef858015eb68876ac1f383bcb6a1774 ]
Recent changes to the retransmit code exposed a long standing
bug where it was possible for a chunk to be time stamped
after the retransmit timer was reset.  This caused a rare
situation where the retrnamist timer has expired, but
nothing was marked for retrnasmission because all of
timesamps on data were less then 1 rto ago.  As result,
the timer was never restarted since nothing was retransmitted,
and this resulted in a hung association that did couldn't
complete the data transfer.  The solution is to timestamp
the chunk when it's added to the packet for transmission
purposes.  After the packet is trsnmitted the rtx timer
is restarted.  This guarantees that when the timer expires,
there will be data to retransmit.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vlad Yasevich [Thu, 22 Jan 2009 22:52:43 +0000 (14:52 -0800)]
 
sctp: Correctly start rtx timer on new packet transmissions.
[ Upstream commit 
6574df9a89f9f7da3a4e5cee7633d430319d3350 ]
Commit 
62aeaff5ccd96462b7077046357a6d7886175a57
(sctp: Start T3-RTX timer when fast retransmitting lowest TSN)
introduced a regression where it was possible to forcibly
restart the sctp retransmit timer at the transmission of any
new chunk.  This resulted in much longer timeout times and
sometimes hung sctp connections.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vlad Yasevich [Thu, 22 Jan 2009 22:52:23 +0000 (14:52 -0800)]
 
sctp: Fix crc32c calculations on big-endian arhes.
[ Upstream commit 
9c5ff5f75d0d0a1c7928ecfae3f38418b51a88e3 ]
crc32c algorithm provides a byteswaped result.  On little-endian
arches, the result ends up in big-endian/network byte order.
On big-endinan arches, the result ends up in little-endian
order and needs to be byte swapped again.  Thus calling cpu_to_le32
gives the right output.
Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hin-Tak Leung [Wed, 4 Feb 2009 23:40:43 +0000 (23:40 +0000)]
 
zd1211rw: treat MAXIM_NEW_RF(0x08) as UW2453_RF(0x09) for TP-Link WN322/422G
commit 
efb43f4b2ccf8066abc3920a0e6858e4350a65c7 upstream.
Three people (Petr Mensik <pihhan@cipis.net>
["si" should be U+0161 U+00ED], Stephen Ho <stephenhoinhk@gmail.com>
on zd1211-devs and Ismael Ojeda Perez <iojedaperez@gmail.com>
on linux-wireless) reported success in getting TP-Link WN322G/WN422G
working by treating MAXIM_NEW_RF(0x08) as UW2453_RF(0x09) for rf
chip hardware initialization.
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Tested-by: Petr Mensik <pihhan@cipis.net>
Tested-by: Stephen Ho <stephenhoinhk@gmail.com>
Tested-by: Ismael Ojeda Perez <iojedaperez@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hin-Tak Leung [Sun, 8 Feb 2009 02:13:56 +0000 (02:13 +0000)]
 
zd1211rw: adding 0ace:0xa211 as a ZD1211 device
commit 
14990c69b5f51dd57b4e0e2373de50239ac861e2 upstream.
Christoph Biedl <sourceforge.bnwi@manchmal.in-ulm.de> reported success
in the sourceforge zd1211 mailing list on this addition. This product ID
was supported by the vendor driver ZD1211LnxDrv 2.22.0.0 (and possibly
earlier) and it probably should have been added earlier.
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Tested-by: Christoph Biedl <sourceforge.bnwi@manchmal.in-ulm.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alok Kataria [Fri, 6 Feb 2009 18:29:35 +0000 (10:29 -0800)]
 
x86, vmi: put a missing paravirt_release_pmd in pgd_dtor
commit 
55a8ba4b7f76bebd7e8ce3f74c04b140627a1bad upstream.
Commit 
6194ba6ff6ccf8d5c54c857600843c67aa82c407 ("x86: don't special-case
pmd allocations as much") made changes to the way we handle pmd allocations,
and while doing that it dropped a call to  paravirt_release_pd on the
pgd page from the pgd_dtor code path.
As a result of this missing release, the hypervisor is now unaware of the
pgd page being freed, and as a result it ends up tracking this page as a
page table page.
After this the guest may start using the same page for other purposes, and
depending on what use the page is put to, it may result in various performance
and/or functional issues ( hangs, reboots).
Since this release is only required for VMI, I now release the pgd page from
the (vmi)_pgd_free hook.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Federico Cuello [Wed, 11 Feb 2009 21:04:39 +0000 (13:04 -0800)]
 
writeback: fix break condition
commit 
89e1219004b3657cc014521663eeef0744f1c99d upstream.
Commit 
dcf6a79dda5cc2a2bec183e50d829030c0972aaa ("write-back: fix
nr_to_write counter") fixed nr_to_write counter, but didn't set the break
condition properly.
If nr_to_write == 0 after being decremented it will loop one more time
before setting done = 1 and breaking the loop.
[akpm@linux-foundation.org: coding-style fixes]
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Artem Bityutskiy [Mon, 2 Feb 2009 16:33:49 +0000 (18:33 +0200)]
 
write-back: fix nr_to_write counter
commit 
dcf6a79dda5cc2a2bec183e50d829030c0972aaa upstream.
Commit 
05fe478dd04e02fa230c305ab9b5616669821dd3 introduced some
@wbc->nr_to_write breakage.
It made the following changes:
 1. Decrement wbc->nr_to_write instead of nr_to_write
 2. Decrement wbc->nr_to_write _only_ if wbc->sync_mode == WB_SYNC_NONE
 3. If synced nr_to_write pages, stop only if if wbc->sync_mode ==
    WB_SYNC_NONE, otherwise keep going.
However, according to the commit message, the intention was to only make
change 3.  Change 1 is a bug.  Change 2 does not seem to be necessary,
and it breaks UBIFS expectations, so if needed, it should be done
separately later.  And change 2 does not seem to be documented in the
commit message.
This patch does the following:
 1. Undo changes 1 and 2
 2. Add a comment explaining change 3 (it very useful to have comments
    in _code_, not only in the commit).
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ian Dall [Wed, 11 Feb 2009 21:04:46 +0000 (13:04 -0800)]
 
w1: w1 temp calculation overflow fix
commit 
507e2fbaaacb6f164b4125b87c5002f95143174b upstream.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12646
When the temperature exceeds 32767 milli-degrees the temperature overflows
to -32768 millidegrees.  These are bothe well within the -55 - +125 degree
range for the sensor.
Fix overflow in left-shift of a u8.
Signed-off-by: Ian Dall <ian@beware.dropbear.id.au>
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Heiko Carstens [Wed, 11 Feb 2009 21:04:38 +0000 (13:04 -0800)]
 
syscall define: fix uml compile bug
commit 
6c5979631b4b03c9288776562c18036765e398c1 upstream.
With the new system call defines we get this on uml:
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x308): undefined reference to `sys_sigprocmask'
Reason for this is that uml passes the preprocessor option
-Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel.
This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to
SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system
call named sys_kernel_sigprocmask.  However sys_sigprocmask is missing
because of this.
To avoid macro expansion for the system call name just concatenate the
name at first define instead of carrying it through severel levels.
This was pointed out by Al Viro.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kumar Gala [Tue, 10 Feb 2009 03:08:07 +0000 (21:08 -0600)]
 
powerpc/fsl-booke: Fix mapping functions to use phys_addr_t
commit 
6c24b17453c8dc444a746e45b8a404498fc9fcf7 upstream.
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long.  In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael Neuling [Thu, 23 Oct 2008 00:42:36 +0000 (00:42 +0000)]
 
powerpc: Fix swapcontext system for VSX + old ucontext size
commit 
16c29d180becc5bdf92fd0fc7314a44a671b5f4e upstream.
Since VSX support was added, we now have two sizes of ucontext_t;
the older, smaller size without the extra VSX state, and the new
larger size with the extra VSX state.  A program using the
sys_swapcontext system call and supplying smaller ucontext_t
structures will currently get an EINVAL error if the task has
used VSX (e.g. because of calling library code that uses VSX) and
the old_ctx argument is non-NULL (i.e. the program is asking for
its current context to be saved).  Thus the program will start
getting EINVAL errors on calls that previously worked.
This commit changes this behaviour so that we don't send an EINVAL in
this case.  It will now return the smaller context but the VSX MSR bit
will always be cleared to indicate that the ucontext_t doesn't include
the extra VSX state, even if the task has executed VSX instructions.
Both 32 and 64 bit cases are updated.
[paulus@samba.org - also fix some access_ok() and get_user() calls]
Thanks to Ben Herrenschmidt for noticing this problem.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jiri Slaby [Wed, 11 Feb 2009 21:04:40 +0000 (13:04 -0800)]
 
parport: parport_serial, don't bind netmos ibm 0299
commit 
3abdbf90a3ffb006108c831c56b092e35483b6ec upstream.
Since netmos 9835 with subids 0x1014(IBM):0x0299 is now bound with
serial/8250_pci, because it has no parallel ports and subdevice id isn't
in the expected form, return -ENODEV from probe function.
This is performed in netmos preinit_hook.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Clements [Wed, 11 Feb 2009 21:04:45 +0000 (13:04 -0800)]
 
nbd: fix I/O hang on disconnected nbds
commit 
4d48a542b42747c36a5937447d9c3de7c897ea50 upstream.
Fix a problem that causes I/O to a disconnected (or partially initialized)
nbd device to hang indefinitely.  To reproduce:
# ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048
# dd if=/dev/nbd23 of=/dev/null bs=4096 count=1
...hangs...
This can also occur when an nbd device loses its nbd-client/server
connection.  Although we clear the queue of any outstanding I/Os after the
client/server connection fails, any additional I/Os that get queued later
will hang.
This bug may also be the problem reported in this bug report:
http://bugzilla.kernel.org/show_bug.cgi?id=12277
Testing would need to be performed to determine if the two issues are the
same.
This problem was introduced by the new request handling thread code ("NBD:
allow nbd to be used locally", 3/2008), which entered into mainline around
2.6.25.
The fix, which is fairly simple, is to restore the check for lo->sock
being NULL in do_nbd_request.  This causes I/O to an uninitialized nbd to
immediately fail with an I/O error, as it did prior to the introduction of
this bug.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Reported-by: Jon Nelson <jnelson-kernel-bugzilla@jamponi.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
J. Bruce Fields [Wed, 4 Feb 2009 22:35:38 +0000 (17:35 -0500)]
 
lockd: fix regression in lockd's handling of blocked locks
commit 
9d9b87c1218be78ddecbc85ec3bb91c79c1d56ab upstream.
If a client requests a blocking lock, is denied, then requests it again,
then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP
set, because we've already queued a block and don't need the locks code
to do it again.
But that means vfs_lock_file() will return -EAGAIN instead of
FILE_LOCK_DENIED.  So we still need to translate that -EAGAIN return
into a nlm_lck_blocked error in this case, and put ourselves back on
lockd's block list.
The bug was introduced by 
bde74e4bc64415b1 "locks: add special return
value for asynchronous locks".
Thanks to Frank van Maarseveen for the report; his original test
case was essentially
	for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done
Tested-by: Frank van Maarseveen <frankvm@frankvm.com>
Reported-by: Frank van Maarseveen <frankvm@frankvm.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Randy Dunlap [Wed, 11 Feb 2009 21:04:33 +0000 (13:04 -0800)]
 
kernel-doc: fix syscall wrapper processing
commit 
b4870bc5ee8c7a37541a3eb1208b5c76c13a078a upstream.
Fix kernel-doc processing of SYSCALL wrappers.
The SYSCALL wrapper patches played havoc with kernel-doc for
syscalls.  Syscalls that were scanned for DocBook processing
reported warnings like this one, for sys_tgkill:
Warning(kernel/signal.c:2285): No description found for parameter 'tgkill'
Warning(kernel/signal.c:2285): No description found for parameter 'pid_t'
Warning(kernel/signal.c:2285): No description found for parameter 'int'
because the macro parameters all "look like" function parameters,
although they are not:
/**
 *  sys_tgkill - send signal to one specific thread
 *  @tgid: the thread group ID of the thread
 *  @pid: the PID of the thread
 *  @sig: signal to be sent
 *
 *  This syscall also checks the @tgid and returns -ESRCH even if the PID
 *  exists but it's not belonging to the target process anymore. This
 *  method solves the problem of threads exiting and PIDs getting reused.
 */
SYSCALL_DEFINE3(tgkill, pid_t, tgid, pid_t, pid, int, sig)
{
...
This patch special-cases the handling SYSCALL_DEFINE* function
prototypes by expanding them to
	long sys_foobar(type1 arg1, type1 arg2, ...)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tomas Winkler [Mon, 6 Oct 2008 08:05:29 +0000 (16:05 +0800)]
 
iwlwifi: scan correct setting of valid rx_chains
commit 
d588be6bae40f7965f1b681a4dbc3254411787b9 upstream.
This patch sets rx_chain bitmap correctly according hw configuration.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nick Piggin [Thu, 12 Feb 2009 03:34:23 +0000 (04:34 +0100)]
 
Fix page writeback thinko, causing Berkeley DB slowdown
commit 
3a4c6800f31ea8395628af5e7e490270ee5d0585 upstream.
A bug was introduced into write_cache_pages cyclic writeout by commit
31a12666d8f0c22235297e1c1575f82061480029 ("mm: write_cache_pages cyclic
fix").  The intention (and comments) is that we should cycle back and
look for more dirty pages at the beginning of the file if there is no
more work to be done.
But the !done condition was dropped from the test.  This means that any
time the page writeout loop breaks (eg.  due to nr_to_write == 0), we
will set index to 0, then goto again.  This will set done_index to
index, then find done is set, so will proceed to the end of the
function.  When updating mapping->writeback_index for cyclic writeout,
we now use done_index == 0, so we're always cycling back to 0.
This seemed to be causing random mmap writes (slapadd and iozone) to
start writing more pages from the LRU and writeout would slowdown, and
caused bugzilla entry
	http://bugzilla.kernel.org/show_bug.cgi?id=12604
about Berkeley DB slowing down dramatically.
With this patch, iozone random write performance is increased nearly
5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reported-and-tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>