pandora-kernel.git
13 years agoMerge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 13 Aug 2009 19:09:16 +0000 (12:09 -0700)]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Fix handling of bad requeue syscall pairing
  futex: Fix compat_futex to be same as futex for REQUEUE_PI
  locking, sched: Give waitqueue spinlocks their own lockdep classes
  futex: Update futex_q lock_ptr on requeue proxy lock

13 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 13 Aug 2009 19:08:44 +0000 (12:08 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix oops in identify_cpu() on CPUs without CPUID
  x86: Clear incorrectly forced X86_FEATURE_LAHF_LM flag
  x86, mce: therm_throt - change when we print messages
  x86: Add reboot quirk for every 5 series MacBook/Pro

13 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec...
Linus Torvalds [Thu, 13 Aug 2009 18:17:40 +0000 (11:17 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jlbec/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits)
  ocfs2: Fix possible deadlock when extending quota file
  ocfs2: keep index within status_map[]
  ocfs2: Initialize the cluster we're writing to in a non-sparse extend
  ocfs2: Remove redundant BUG_ON in __dlm_queue_ast()
  ocfs2/quota: Release lock for error in ocfs2_quota_write.
  ocfs2: Define credit counts for quota operations
  ocfs2: Remove syncjiff field from quota info
  ocfs2: Fix initialization of blockcheck stats
  ocfs2: Zero out padding of on disk dquot structure
  ocfs2: Initialize blocks allocated to local quota file
  ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq()
  ocfs2: Make global quota files blocksize aligned
  ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records.
  ocfs2: Fix deadlock on umount
  ocfs2: Add extra credits and access the modified bh in update_edge_lengths.
  ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation
  ocfs2: Fix error return in ocfs2_write_cluster()
  ocfs2: Fix compilation warning for fs/ocfs2/xattr.c
  ocfs2: Initialize count in aio_write before generic_write_checks
  ocfs2: log the actual return value of ocfs2_file_aio_write()
  ...

13 years agoMerge branch 'for-linus' of git://neil.brown.name/md
Linus Torvalds [Thu, 13 Aug 2009 17:59:29 +0000 (10:59 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: allow upper limit for resync/reshape to be set when array is read-only
  md/raid5: Properly remove excess drives after shrinking a raid5/6
  md/raid5: make sure a reshape restarts at the correct address.
  md/raid5: allow new reshape modes to be restarted in the middle.
  md: never advance 'events' counter by more than 1.
  Remove deadlock potential in md_open

13 years agoMerge branch 'sh/for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal...
Linus Torvalds [Thu, 13 Aug 2009 17:57:53 +0000 (10:57 -0700)]
Merge branch 'sh/for-2.6.31' of git://git./linux/kernel/git/lethal/sh-2.6

* 'sh/for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: fix i2c init order on ap325rxa V2
  sh: fix i2c init order on Migo-R V2
  sh: convert processor device setup functions to arch_initcall()

13 years agoMake sock_sendpage() use kernel_sendpage()
Linus Torvalds [Thu, 13 Aug 2009 15:28:36 +0000 (08:28 -0700)]
Make sock_sendpage() use kernel_sendpage()

kernel_sendpage() does the proper default case handling for when the
socket doesn't have a native sendpage implementation.

Now, arguably this might be something that we could instead solve by
just specifying that all protocols should do it themselves at the
protocol level, but we really only care about the common protocols.
Does anybody really care about sendpage on something like Appletalk? Not
likely.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Julien TINNES <julien@cr0.org>
Acked-by: Tavis Ormandy <taviso@sdf.lonestar.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosh: fix i2c init order on ap325rxa V2
Magnus Damm [Fri, 7 Aug 2009 03:52:18 +0000 (03:52 +0000)]
sh: fix i2c init order on ap325rxa V2

Convert the AP325RXA board code to register devices at
arch_initcall() time instead of device_initcall(). This
fix unbreaks pcf8563 RTC driver support.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
13 years agosh: fix i2c init order on Migo-R V2
Magnus Damm [Thu, 13 Aug 2009 02:39:02 +0000 (11:39 +0900)]
sh: fix i2c init order on Migo-R V2

Convert the Migo-R board code to register devices at
arch_initcall() time instead of __initcall(). This fix
unbreaks migor_ts touch screen driver support.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
13 years agosh: convert processor device setup functions to arch_initcall()
Magnus Damm [Wed, 22 Jul 2009 15:14:29 +0000 (15:14 +0000)]
sh: convert processor device setup functions to arch_initcall()

Convert the processor platform device setup
functions from __initcall() and sometimes
device_initcall() to arch_initcall().

This makes sure that the platform devices are
registered a bit earlier so the devices are
available when drivers register using initcall
levels earlier than device_initcall().

A good example is platform devices needed by
i2c-sh_mobile.c which registers a bit earlier
using subsys_initcall().

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
13 years agomd: allow upper limit for resync/reshape to be set when array is read-only
NeilBrown [Thu, 13 Aug 2009 00:41:50 +0000 (10:41 +1000)]
md: allow upper limit for resync/reshape to be set when array is read-only

Normally we only allow the upper limit for a reshape to be decreased
when the array not performing a sync/recovery/reshape, otherwise there
could be races.  But if an array is part-way through a reshape when it
is assembled the reshape is started immediately leaving no window
to set an upper bound.

If the array is started read-only, the reshape will be suspended until
the array becomes writable, so that provides a window during which it
is perfectly safe to reduce the upper limit of a reshape.

So: allow the upper limit (sync_max) to be reduced even if the reshape
thread is running, as long as the array is still read-only.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agomd/raid5: Properly remove excess drives after shrinking a raid5/6
NeilBrown [Thu, 13 Aug 2009 00:41:49 +0000 (10:41 +1000)]
md/raid5: Properly remove excess drives after shrinking a raid5/6

We were removing the drives, from the array, but not
removing symlinks from /sys/.... and not marking the device
as having been removed.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agomd/raid5: make sure a reshape restarts at the correct address.
NeilBrown [Thu, 13 Aug 2009 00:13:00 +0000 (10:13 +1000)]
md/raid5: make sure a reshape restarts at the correct address.

This "if" don't allow for the possibility that the number of devices
doesn't change, and so sector_nr isn't set correctly in that case.
So change '>' to '>='.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agomd/raid5: allow new reshape modes to be restarted in the middle.
NeilBrown [Thu, 13 Aug 2009 00:06:24 +0000 (10:06 +1000)]
md/raid5: allow new reshape modes to be restarted in the middle.

md/raid5 doesn't allow a reshape to restart if it involves writing
over the same part of disk that it would be reading from.
This happens at the beginning of a reshape that increases the number
of devices, at the end of a reshape that decreases the number of
devices, and continuously for a reshape that does not change the
number of devices.

The current code is correct for the "increase number of devices"
case as the critical section at the start is handled by userspace
performing a backup.

It does not work for reducing the number of devices, or the
no-change case.
For 'reducing', we need to invert the test.  For no-change we cannot
really be sure things will be safe, so simply require the array
to be read-only, which is how the user-space code which carefully
starts such arrays works.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agomd: never advance 'events' counter by more than 1.
NeilBrown [Wed, 12 Aug 2009 23:54:02 +0000 (09:54 +1000)]
md: never advance 'events' counter by more than 1.

When assembling arrays, md allows two devices to have different event
counts as long as the difference is only '1'.  This is to cope with
a system failure between updating the metadata on two difference
devices.

However there are currently times when we update the event count by
2.  This was done to keep the event count even when the array is clean
and odd when it is dirty, which allows us to avoid writing common
update to spare devices and so allow those spares to go to sleep.

This is bad for the above reason.  So change it to never increase by
two.  This means that the alignment between 'odd/even' and
'clean/dirty' might take a little longer to attain, but that is only a
small cost.  The spares will get a few more updates but that will
still be spared (;-) most updates and can still go to sleep.

Prior to this patch there was a small chance that after a crash an
array would fail to assemble due to the overly large event count
mismatch.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Wed, 12 Aug 2009 16:55:46 +0000 (09:55 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Remove double removal of blktrace directory

13 years agoRemove double removal of blktrace directory
Alan D. Brunelle [Fri, 7 Aug 2009 16:01:08 +0000 (12:01 -0400)]
Remove double removal of blktrace directory

commit fd51d251e4cdb21f68e9dbc4336514d64a105a79
Author: Stefan Raspl <raspl@linux.vnet.ibm.com>
Date:   Tue May 19 09:59:08 2009 +0200

    blktrace: remove debugfs entries on bad path

added in an explicit invocation of debugfs_remove for bt->dir, in
blk_remove_buf_file_callback we are also getting the directory removed. On
occasion I am seeing memory corruption that I have bisected down to
this commit. [The testing involves a (long) series of I/O benchmarks
with blktrace invoked around the actual runs.] I believe that this
committed patch is correct, but the problem actually lies in the code
in blk_remove_buf_file_callback.

With this patch I am able to consistently get complete runs whereas
previously I could not get a single run to complete.

The first part of the patch simply moves the debugfs_remove below the
relay_close: the relay_close call will remove files under bt->dir, and
so we should not remove the directory until all the files we created
have been removed. (Note: This is not sufficient to fix the problem -
the file system code has ref counts on the directoy, so our invocation
does not cause the directory to actually be removed. Nonetheless, we
should not rely upon that feature.)

Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
13 years agoMerge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Wed, 12 Aug 2009 15:49:35 +0000 (08:49 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix spin_is_locked assert on uni-processor builds
  xfs: check for dinode realtime flag corruption
  use XFS_CORRUPTION_ERROR in xfs_btree_check_sblock
  xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get
  xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap
  xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set
  xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory
  xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result
  xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make
  xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc
  xfs: switch to NOFS allocation under i_lock in xfs_getbmap
  xfs: avoid memory allocation under m_peraglock in growfs code

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Wed, 12 Aug 2009 15:32:47 +0000 (08:32 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Don't override ADC definitions for ALC codecs
  ALSA: hda - Add missing vmaster initialization for ALC269
  ASoC: Add missing DRV_NAME definitions for fsl/* drivers

13 years agoMerge branch 'zerolen' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6
Linus Torvalds [Wed, 12 Aug 2009 15:29:32 +0000 (08:29 -0700)]
Merge branch 'zerolen' of git://git./linux/kernel/git/jgarzik/misc-2.6

* 'zerolen' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
  Remove zero-length file drivers/mtd/maps/sbc8240.c

13 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Wed, 12 Aug 2009 15:24:17 +0000 (08:24 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  ahci: add workaround for on-board 5723s on some gigabyte boards
  ahci: Soften up the dmesg on SB600 PMP softreset failure recovery
  Documentation/kernel-parameters.txt: document libata's ignore_hpa option
  sata_nv: MSI support, disabled by default
  libata: OCZ Vertex can't do HPA
  pata_atiixp: fix second channel support
  pata_at91: fix resource release

13 years agoNFS: Fix an O_DIRECT Oops...
Trond Myklebust [Wed, 12 Aug 2009 13:12:30 +0000 (09:12 -0400)]
NFS: Fix an O_DIRECT Oops...

We can't call nfs_readdata_release()/nfs_writedata_release() without
first initialising and referencing args.context. Doing so inside
nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
causes an Oops.

We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
those cases.

Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
referencing the nfs_open_context for us. Since the readdata and writedata
structures carry a reference to that, we can simplify things by getting rid
of the extra nfs_open_context references, so that we can replace all
instances of nfs_readdata_release()/nfs_writedata_release().

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoRemove zero-length file drivers/mtd/maps/sbc8240.c
Jeff Garzik [Wed, 12 Aug 2009 10:29:57 +0000 (06:29 -0400)]
Remove zero-length file drivers/mtd/maps/sbc8240.c

It was "deleted" in commit 2bf961b7ccd69e108ac435c67e2b0522b403c578

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agoahci: add workaround for on-board 5723s on some gigabyte boards
Tejun Heo [Tue, 4 Aug 2009 05:30:08 +0000 (14:30 +0900)]
ahci: add workaround for on-board 5723s on some gigabyte boards

Some gigabytes have on-board SIMG5723s connected to JMB ahcis.  These
are used to implement hardware raid.  Unfortunately some firmware
revisions on these 5723s don't bring the link down when all the
downstream ports are unoccupied while not responding to reset protocol
which makes libata think that there's device attached to the port but
is not responding and retry.  This results in painfully wrong boot
detection time for these ports when they're empty.

This patch quirks those boards such that ahci gives up after the
initial timeout.  Combined with parallel probing, this gives quick
enough probing and also is safe because SIMG5723 will respond to the
first try if any of the downstream ports is occupied.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Marc Bowes <marcbowes@gmail.com>
Reported-by: Nicolas Mailhot <Nicolas.Mailhot@LaPoste.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agoahci: Soften up the dmesg on SB600 PMP softreset failure recovery
Shane Huang [Wed, 5 Aug 2009 02:10:41 +0000 (10:10 +0800)]
ahci: Soften up the dmesg on SB600 PMP softreset failure recovery

Too strong words led to spurious bug reports: Novell bugzilla #527748,
RedHat bugzilla #468800. This patch is used to soften up the dmesg on
SB600 PMP softreset failure recovery, so as to remove the scariness and
concern from community.

Reported-by: pgnet Dev <pgnet.dev@gmail.com>
Signed-off-by: Shane Huang <shane.huang@amd.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agoDocumentation/kernel-parameters.txt: document libata's ignore_hpa option
Michael Prokop [Wed, 5 Aug 2009 22:14:10 +0000 (00:14 +0200)]
Documentation/kernel-parameters.txt: document libata's ignore_hpa option

By default the kernel honors the HPA (host protected area) of hard
drives.  Using libata's ignore_hpa module option it's possible to
change this behaviour.

Document usage and options of libata.ignore_hpa in
Documentation/kernel-parameters.txt.

Signed-off-by: Michael Prokop <mika@grml.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agosata_nv: MSI support, disabled by default
Tony Vroon [Wed, 5 Aug 2009 23:50:09 +0000 (00:50 +0100)]
sata_nv: MSI support, disabled by default

At least the nVidia MCP55 controller quite happily supports MSI.
This adds an option to use it. It is disabled by default.
As per feedback by Robert Hancock, it will honour the user
request as the kernel will not enable MSI where the controller
or the specific system configuration do not support it.

Signed-off-by: Tony Vroon <tony@linx.net>
Cc: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agolibata: OCZ Vertex can't do HPA
Tejun Heo [Thu, 6 Aug 2009 16:59:15 +0000 (01:59 +0900)]
libata: OCZ Vertex can't do HPA

OCZ Vertex SSD can't do HPA and not in a usual way.  It reports HPA,
allows unlocking but then fails all IOs which fall in the unlocked
area.  Quirk it so that HPA unlocking is not used for the device.

Reported by Daniel Perup in bnc#522414.

 https://bugzilla.novell.com/show_bug.cgi?id=522414

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Perup <probe@spray.se>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agopata_atiixp: fix second channel support
Bartlomiej Zolnierkiewicz [Thu, 6 Aug 2009 15:47:05 +0000 (17:47 +0200)]
pata_atiixp: fix second channel support

PIO and MWDMA timings are never programmed for the second channel
because timing registers are treated as 16-bit long ones.

The bug is an attixp -> pata_atiixp regression and goes back to:

commit 669a5db411d85a14f86cd92bc16bf7ab5b8aa235
Author: Jeff Garzik <jeff@garzik.org>
Date:   Tue Aug 29 18:12:40 2006 -0400

    [libata] Add a bunch of PATA drivers.

Cc: Krystian Juskowiak <jusko@tlen.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bbpetkov@yahoo.de>
Cc: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agopata_at91: fix resource release
Tejun Heo [Fri, 7 Aug 2009 02:15:20 +0000 (11:15 +0900)]
pata_at91: fix resource release

Julias Lawall discovered that pata_at91 wasn't freeing a memory region
allocated with kzalloc() on init failure paths.  Upon review,
pata_at91 also seems to be doing unnecessary explicit resource
releases for managed resources too.  Convert memory allocation to
managed one and drop unnecessary explicit resource releases.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Julia Lawall <julia@diku.dk>
Cc: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
13 years agox86: Fix oops in identify_cpu() on CPUs without CPUID
Ondrej Zary [Tue, 11 Aug 2009 18:00:11 +0000 (20:00 +0200)]
x86: Fix oops in identify_cpu() on CPUs without CPUID

Kernel is broken for x86 CPUs without CPUID since 2.6.28. It
crashes with NULL pointer dereference in identify_cpu():

766        generic_identify(c);
767
768-->     if (this_cpu->c_identify)
769               this_cpu->c_identify(c);

this_cpu is NULL. This is because it's only initialized in
get_cpu_vendor() function, which is not called if the CPU has
no CPUID instruction.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
LKML-Reference: <200908112000.15993.linux@rainbow-software.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoxfs: fix spin_is_locked assert on uni-processor builds
Christoph Hellwig [Mon, 10 Aug 2009 14:32:44 +0000 (11:32 -0300)]
xfs: fix spin_is_locked assert on uni-processor builds

Without SMP or preemption spin_is_locked always returns false,
so we can't do an assert with it.  Instead use assert_spin_locked,
which does the right thing on all builds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reported-by: Johannes Engel <jcnengel@googlemail.com>
Tested-by: Johannes Engel <jcnengel@googlemail.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: check for dinode realtime flag corruption
Christoph Hellwig [Mon, 10 Aug 2009 14:32:18 +0000 (11:32 -0300)]
xfs: check for dinode realtime flag corruption

Ramon tested XFS with a modified version of fsfuzzer and hit a NULL
pointer dereference in __xfs_get_blocks due to the RT device target
pointer being NULL.

To fix this reject inode with the realtime bit set on a a filesystem
without an RT subvolume during inode read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Reported-by: Ramon de Carvalho Valle <ramon@risesecurity.org>
Tested-by: Ramon de Carvalho Valle <ramon@risesecurity.org>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agouse XFS_CORRUPTION_ERROR in xfs_btree_check_sblock
Eric Sandeen [Mon, 20 Jul 2009 15:52:15 +0000 (10:52 -0500)]
use XFS_CORRUPTION_ERROR in xfs_btree_check_sblock

In Red Hat Bug 512552
 - Can't write to XFS mount during raid5 resync

a user ran into corruption while resyncing a raid, and we failed
a consistency test, but didn't get much more info; it'd be nice
to call XFS_CORRUPTION_ERROR here so we can see the buffer
contents.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get
Christoph Hellwig [Sat, 18 Jul 2009 22:15:01 +0000 (18:15 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get

xfs_attr_rmtval_get is always called with i_lock held, but i_lock is taken
in reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap
Christoph Hellwig [Sat, 18 Jul 2009 22:15:00 +0000 (18:15 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap

xfs_readlink_bmap is called with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set
Christoph Hellwig [Sat, 18 Jul 2009 22:14:59 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set

xfs_attr_rmtval_set is always called with i_lock held, and i_lock is taken
in reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory
Christoph Hellwig [Sat, 18 Jul 2009 22:14:58 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory

xfs_buf_associate_memory is used for setting up the spare buffer for the
log wrap case in xlog_sync which can happen under i_lock when called from
xfs_fsync. The i_lock mutex is taken in reclaim context so all allocations
under it must avoid recursions into the filesystem.  There are a couple
more uses of xfs_buf_associate_memory in the log recovery code that are
also affected by this, but I'd rather keep the code simple than passing on
a gfp_mask argument.  Longer term we should just stop requiring the memoery
allocation in xlog_sync by some smaller rework of the buffer layer.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result
Christoph Hellwig [Sat, 18 Jul 2009 22:14:57 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result

xfs_dir_cilookup_result is always called with i_lock held, but i_lock is taken
in reclaim context so all allocations under it must avoid recursions into the
filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_da_buf_make
Christoph Hellwig [Sat, 18 Jul 2009 22:14:56 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make

i_lock is taken in the reclaim context so all allocations under it
must avoid recursions into the filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc
Christoph Hellwig [Sat, 18 Jul 2009 22:14:55 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc

xfs_da_state_alloc is always called with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into the
filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: switch to NOFS allocation under i_lock in xfs_getbmap
Christoph Hellwig [Sat, 18 Jul 2009 22:14:54 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_getbmap

xfs_getbmap allocates memory with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into
the filesystem.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoxfs: avoid memory allocation under m_peraglock in growfs code
Christoph Hellwig [Sat, 18 Jul 2009 22:14:53 +0000 (18:14 -0400)]
xfs: avoid memory allocation under m_peraglock in growfs code

Allocate the memory for the larger m_perag array before taking the
per-AG lock as the per-AG lock can be taken under the i_lock which
can be taken from reclaim context.

Reported by the new reclaim context tracing in lockdep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
13 years agoMerge branch 'fix/hda' into for-linus
Takashi Iwai [Wed, 12 Aug 2009 06:05:20 +0000 (08:05 +0200)]
Merge branch 'fix/hda' into for-linus

* fix/hda:
  ALSA: hda - Don't override ADC definitions for ALC codecs
  ALSA: hda - Add missing vmaster initialization for ALC269

13 years agoMerge branch 'fix/asoc' into for-linus
Takashi Iwai [Wed, 12 Aug 2009 06:05:19 +0000 (08:05 +0200)]
Merge branch 'fix/asoc' into for-linus

* fix/asoc:
  ASoC: Add missing DRV_NAME definitions for fsl/* drivers

13 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/fyu/linux-2.6
Linus Torvalds [Wed, 12 Aug 2009 00:06:16 +0000 (17:06 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/fyu/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/fyu/linux-2.6:
  arch/ia64/kernel/iosapic: missing test after ioremap()
  ia64/topology.c: exit cache_add_dev when kobject_init_and_add fails
  arch/ia64/Makefile: Remove -mtune=merced in IA64 kernel build
  IA64: includecheck fix: ia64, pgtable.h
  IA64: includecheck fix: ia64, ia64_ksyms.c
  ia64: boolean __test_and_clear_bit
  Bug Fix arch/ia64/kernel/pci-dma.c: fix recursive dma_supported() call in iommu_dma_supported()

13 years agoarch/ia64/kernel/iosapic: missing test after ioremap()
Roel Kluin [Tue, 11 Aug 2009 21:52:11 +0000 (14:52 -0700)]
arch/ia64/kernel/iosapic: missing test after ioremap()

Missing test after ioremap()

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agoia64/topology.c: exit cache_add_dev when kobject_init_and_add fails
Fenghua Yu [Tue, 11 Aug 2009 21:52:11 +0000 (14:52 -0700)]
ia64/topology.c: exit cache_add_dev when kobject_init_and_add fails

Make cache_add_dev exit sysfs when kobject_init_and_add returns an error.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agoarch/ia64/Makefile: Remove -mtune=merced in IA64 kernel build
Fenghua Yu [Tue, 11 Aug 2009 21:52:11 +0000 (14:52 -0700)]
arch/ia64/Makefile: Remove -mtune=merced in IA64 kernel build

Between GCC version 3.4.0 and 4.3.3 (including 3.4.0 and 4.3.3), -mtune=merced
is implemented in GCC. Starting from 4.4.0, -mtune=merced is deprecated.

Even implemented in versions between 3.4.0 and 4.3.3, the -mtune=merced
feature has been broken in some of the versions. For example, GCC 4.1.2 reports
interanl tuning function errors during kernel building with -mtune=merced. Or
GCC Bugzilla 16130 reports another -mtune=merced issue on GCC 3.4.1.

So I would remove the -mtune=merced from IA64 kernel build. Without this option,
kernel on Merced will remain the same except losing an unstable and out-of-date
performance tunning feature.

Since GCC version 3.4.0, -mtune=mckinley has been implemented. The
-mtune=mckinley option functions the same as mtune=itanium2. And mtune=itanium2
is the default option. So we don't need to add mtune=mckinley either since its
been the default option in any GCC version which implements this option.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agoIA64: includecheck fix: ia64, pgtable.h
Jaswinder Singh Rajput [Tue, 11 Aug 2009 21:52:11 +0000 (14:52 -0700)]
IA64: includecheck fix: ia64, pgtable.h

fix the following 'make includecheck' warning:

  arch/ia64/include/asm/pgtable.h: asm/processor.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agoIA64: includecheck fix: ia64, ia64_ksyms.c
Jaswinder Singh Rajput [Tue, 11 Aug 2009 21:52:10 +0000 (14:52 -0700)]
IA64: includecheck fix: ia64, ia64_ksyms.c

fix the following 'make includecheck' warning:

  arch/ia64/kernel/ia64_ksyms.c: asm/page.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agoia64: boolean __test_and_clear_bit
Johannes Weiner [Tue, 11 Aug 2009 21:52:10 +0000 (14:52 -0700)]
ia64: boolean __test_and_clear_bit

__test_and_clear_bit() returns a bitfield with the tested-for bit set.
Make it consistent with the other bitops - of ia64 but also every
other architecture - and return a boolean value.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agoBug Fix arch/ia64/kernel/pci-dma.c: fix recursive dma_supported() call in iommu_dma_s...
Fenghua Yu [Tue, 11 Aug 2009 21:52:10 +0000 (14:52 -0700)]
Bug Fix arch/ia64/kernel/pci-dma.c: fix recursive dma_supported() call in iommu_dma_supported()

In commit 160c1d8e40866edfeae7d68816b7005d70acf391,
dma_ops->dma_supported = iommu_dma_supported;

This dma_ops->dma_supported is first called in platform_dma_init() during kernel
boot. Then dma_ops->dma_supported will be called recursively in
iommu_dma_supported.

Kernel can not boot because kernel can not get out of iommu_dma_supported until
it runs out of stack memory.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
13 years agox86: Clear incorrectly forced X86_FEATURE_LAHF_LM flag
Kevin Winchester [Mon, 10 Aug 2009 22:56:45 +0000 (19:56 -0300)]
x86: Clear incorrectly forced X86_FEATURE_LAHF_LM flag

Due to an erratum with certain AMD Athlon 64 processors, the
BIOS may need to force enable the LAHF_LM capability.
Unfortunately, in at least one case, the BIOS does this even
for processors that do not support the functionality.

Add a specific check that will clear the feature bit for
processors known not to support the LAHF/SAHF instructions.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Acked-by: Borislav Petkov <petkovbb@googlemail.com>
LKML-Reference: <4A80A5AD.2000209@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agox86, mce: therm_throt - change when we print messages
Dmitry Torokhov [Mon, 10 Aug 2009 04:44:49 +0000 (21:44 -0700)]
x86, mce: therm_throt - change when we print messages

My Latitude d630 seems to be handling thermal events in SMI by
lowering the max frequency of the CPU till it cools down but
still leaks the "everything is normal" events.

This spams the console and with high priority printks.

Adjust therm_throt driver to only print messages about the fact
that temperatire returned back to normal when leaving the
throttling state.

Also lower the severity of "back to normal" message from
KERN_CRIT to KERN_INFO.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Acked-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20090810051513.0558F526EC9@mailhub.coreip.homeip.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoALSA: hda - Don't override ADC definitions for ALC codecs
Takashi Iwai [Tue, 11 Aug 2009 06:45:11 +0000 (08:45 +0200)]
ALSA: hda - Don't override ADC definitions for ALC codecs

ALC269 and ALC861-VD parsers override the ADC definitions
unconditionally without checking the spec definition.  This causes
the problem when any inconsistent ADC is set up in the device quirk
(like ALC272 with digital-mic).

This patch avoids the overriding by adding the proper checks.

Reference: Novell bnc#529467
https://bugzilla.novell.com/show_bug.cgi?id=529467

Signed-off-by: Takashi Iwai <tiwai@suse.de>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 11 Aug 2009 02:25:00 +0000 (19:25 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  SELinux: fix memory leakage in /security/selinux/hooks.c

13 years agoSELinux: fix memory leakage in /security/selinux/hooks.c
James Morris [Mon, 10 Aug 2009 12:00:13 +0000 (22:00 +1000)]
SELinux: fix memory leakage in /security/selinux/hooks.c

Fix memory leakage in /security/selinux/hooks.c

The buffer always needs to be freed here; we either error
out or allocate more memory.

Reported-by: iceberg <strakh@ispras.ru>
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
13 years agoPM / Driver Core: Kill dev_pm_ops platform warning for now
Magnus Damm [Mon, 10 Aug 2009 21:41:18 +0000 (23:41 +0200)]
PM / Driver Core: Kill dev_pm_ops platform warning for now

Commit 783ea7d4eeefe895f2731fe73ac951e94418927b
(Driver Core: Rework platform suspend/resume, print warning)
added a warning message printed for platform drivers that use the
legacy PM callbacks rather than struct dev_pm_ops.  Unfortunately,
this resulted in some confusion and made some people try to convert
drivers by replacing the old callbacks with struct dev_pm_ops in
automatic way, which generally is not a good idea.

Remove the platform device runtime dev_pm_ops warning for now,
because it's annoying to users and it's not really necessary right
now.

[rjw: Modified the changelog to be more informative.]

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
13 years agopty: fix data loss when stopped (^S/^Q)
Linus Torvalds [Mon, 10 Aug 2009 20:21:19 +0000 (13:21 -0700)]
pty: fix data loss when stopped (^S/^Q)

Commit d945cb9cc ("pty: Rework the pty layer to use the normal buffering
logic") dropped the test for 'tty->stopped' in pty_write_room(), which
then causes the n_tty line discipline thing to not throttle the data
properly when the tty is stopped.

So instead of pausing the write due to the tty being stopped, the ldisc
layer would go ahead and push it down to the pty.  The pty write()
routine would then refuse to take the data (because it _did_ check
'stopped'), and the data wouldn't actually be written.

This whole stopped test should eventually be moved into the tty ldisc
layer rather than have low-level tty drivers care about these things,
but right now the fix is to just re-instate the missing pty 'stopped'
handling.

Reported-and-tested-by: Artur Skawina <art.08.09@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoocfs2: Fix possible deadlock when extending quota file
Jan Kara [Thu, 6 Aug 2009 21:29:34 +0000 (23:29 +0200)]
ocfs2: Fix possible deadlock when extending quota file

In OCFS2, allocator locks rank above transaction start. Thus we
cannot extend quota file from inside a transaction less we could
deadlock.

We solve the problem by starting transaction not already in
ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and
ocfs2_global_read_dquot() and we allocate blocks to quota files before starting
the transaction.  In case we crash, quota files will just have a few blocks
more but that's no problem since we just use them next time we extend the
quota file.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
13 years agox86: Add reboot quirk for every 5 series MacBook/Pro
Shunichi Fuji [Mon, 10 Aug 2009 18:34:40 +0000 (03:34 +0900)]
x86: Add reboot quirk for every 5 series MacBook/Pro

Reboot does not work on my MacBook Pro 13 inch (MacBookPro5,5)
too. It seems all unibody MacBook and MacBookPro require
PCI reboot handling, i guess.

Following model/machine ID list shows unibody MacBook/Pro have
the 5 series of model number:

   http://www.everymac.com/systems/by_capability/macs-by-machine-model-machine-id.html

Signed-off-by: Shunichi Fuji <palglowr@gmail.com>
Cc: Ozan Çağlayan <ozan@pardus.org.tr>
LKML-Reference: <30046e3b0908101134p6487ddbftd8776e4ddef204be@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Mon, 10 Aug 2009 18:48:51 +0000 (11:48 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
  perf_counter: Zero dead bytes from ftrace raw samples size alignment
  perf_counter: Subtract the buffer size field from the event record size
  perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
  perf_counter: Correct PERF_SAMPLE_RAW output
  perf tools: callchain: Fix bad rounding of minimum rate
  perf_counter tools: Fix libbfd detection for systems with libz dependency
  perf: "Longum est iter per praecepta, breve et efficax per exempla"
  perf_counter: Fix a race on perf_counter_ctx
  perf_counter: Fix tracepoint sampling to be part of generic sampling
  perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
  perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
  perf tools: callchain: Fix 'perf report' display to be callchain by default
  perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
  perf record: Fix the -A UI for empty or non-existent perf.data
  perf util: Fix do_read() to fail on EOF instead of busy-looping
  perf list: Fix the output to not include tracepoints without an id
  perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
  perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
  perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
  perf report: Fix per task mult-counter stat reporting
  ...

13 years agofutex: Fix handling of bad requeue syscall pairing
Darren Hart [Fri, 7 Aug 2009 22:20:48 +0000 (15:20 -0700)]
futex: Fix handling of bad requeue syscall pairing

If futex_requeue(requeue_pi=1) finds a futex_q that was created by a call
other the futex_wait_requeue_pi(), the q.rt_waiter may be null.  If so,
this will result in an oops from the following call graph:

futex_requeue()
  rt_mutex_start_proxy_lock()
    task_blocks_on_rt_mutex()
      waiter->task dereference
        OOPS

We currently WARN_ON() if this is detected, clearly this is inadequate.
If we detect a mispairing in futex_requeue(), bail out, seding -EINVAL to
user-space.

V2: Fix parenthesis warnings.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
LKML-Reference: <4A7CA8C0.7010809@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 10 Aug 2009 18:21:13 +0000 (11:21 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/irq: Fix move_irq_desc() for nodes without ram

13 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 10 Aug 2009 18:11:40 +0000 (11:11 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix serialization in pit_expect_msb()

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes...
Linus Torvalds [Mon, 10 Aug 2009 18:00:37 +0000 (11:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI hotplug: SGI hotplug: do not use hotplug_slot_attr
  PCI hotplug: SGI hotplug: fix build failure

13 years agox86: Fix serialization in pit_expect_msb()
Linus Torvalds [Fri, 31 Jul 2009 19:45:41 +0000 (12:45 -0700)]
x86: Fix serialization in pit_expect_msb()

Wei Chong Tan reported a fast-PIT-calibration corner-case:

| pit_expect_msb() is vulnerable to SMI disturbance corner case
| in some platforms which causes /proc/cpuinfo to show wrong
| CPU MHz value when quick_pit_calibrate() jumps to success
| section.

I think that the real issue isn't even an SMI - but the fact
that in the very last iteration of the loop, there's no
serializing instruction _after_ the last 'rdtsc'. So even in
the absense of SMI's, we do have a situation where the cycle
counter was read without proper serialization.

The last check should be done outside the outer loop, since
_inside_ the outer loop, we'll be testing that the PIT has
the right MSB value has the right value in the next iteration.

So only the _last_ iteration is special, because that's the one
that will not check the PIT MSB value any more, and because the
final 'get_cycles()' isn't serialized.

In other words:

 - I'd like to move the PIT MSB check to after the last
   iteration, rather than in every iteration

 - I think we should comment on the fact that it's also a
   serializing instruction and so 'fences in' the TSC read.

Here's a suggested replacement.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: "Tan, Wei Chong" <wei.chong.tan@intel.com>
Tested-by: "Tan, Wei Chong" <wei.chong.tan@intel.com>
LKML-Reference: <B28277FD4E0F9247A3D55704C440A140D5D683F3@pgsmsx504.gar.corp.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Mon, 10 Aug 2009 16:00:47 +0000 (09:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  mm_for_maps: take ->cred_guard_mutex to fix the race with exec
  mm_for_maps: shift down_read(mmap_sem) to the caller
  mm_for_maps: simplify, use ptrace_may_access()

13 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Mon, 10 Aug 2009 15:59:56 +0000 (08:59 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM

13 years agoMN10300: includecheck fix: mn10300, pci.h
Jaswinder Singh Rajput [Mon, 10 Aug 2009 15:45:42 +0000 (16:45 +0100)]
MN10300: includecheck fix: mn10300, pci.h

Fix the following 'make includecheck' warning:

  arch/mn10300/include/asm/pci.h: linux/mm.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomempool.c: clean up type-casting
Figo.zhang [Sat, 8 Aug 2009 13:01:22 +0000 (21:01 +0800)]
mempool.c: clean up type-casting

clean up type-casting twice.  "size_t" is typedef as "unsigned long" in
64-bit system, and "unsigned int" in 32-bit system, and the intermediate
cast to 'long' is pointless.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodocumentation: register ioctl entry of nilfs2
Ryusuke Konishi [Sat, 8 Aug 2009 08:52:50 +0000 (17:52 +0900)]
documentation: register ioctl entry of nilfs2

This will register the ioctl range used by nilfs2 file system to the
table listed in Documentation/ioctl/ioctl-number.txt.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoperf_counter: Zero dead bytes from ftrace raw samples size alignment
Frederic Weisbecker [Mon, 10 Aug 2009 14:38:36 +0000 (16:38 +0200)]
perf_counter: Zero dead bytes from ftrace raw samples size alignment

After aligning the ftrace raw samples, there are dead bytes storing
random data from the stack. We don't want to leak these to userspace,
then zero these out.

Before:

0x2de88 [0x50]: event: 9
.
. ... raw event: size 80 bytes
.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
.  0010:  68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00  h...h...,......
.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
.  0040:  68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f  h...@.F........
                                                      ^  ^  ^  ^
                                                         Leak

After:

0x2d318 [0x50]: event: 9
.
. ... raw event: size 80 bytes
.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
.  0010:  68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00  h...h...h......
.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
.  0040:  68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00  h.....F........
                                                      ^  ^  ^  ^
 Fixed

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
13 years agoperf_counter: Subtract the buffer size field from the event record size
Frederic Weisbecker [Mon, 10 Aug 2009 14:11:32 +0000 (16:11 +0200)]
perf_counter: Subtract the buffer size field from the event record size

We compute the perf raw sample size by aligning the raw ftrace
event size plus the buffer size field itself. We do that
instead of aligning only the perf raw sample size, so that we
might economize some in some cases.

But this buffer size field is not stored in the perf raw
sample, we must then substract its size from the buffer once we
computed the alignment unless we may get a useless u32 field in
the buffer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090810141129.GA5124@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agofutex: Fix compat_futex to be same as futex for REQUEUE_PI
Dinakar Guniguntala [Mon, 10 Aug 2009 13:01:42 +0000 (18:31 +0530)]
futex: Fix compat_futex to be same as futex for REQUEUE_PI

Need to add the REQUEUE_PI checks to the compat_sys_futex API
as well to ensure 32 bit requeue's work fine on a 64 bit
system. Patch is against latest tip

Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
LKML-Reference: <20090810130142.GA23619@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agolocking, sched: Give waitqueue spinlocks their own lockdep classes
Peter Zijlstra [Mon, 10 Aug 2009 11:33:05 +0000 (12:33 +0100)]
locking, sched: Give waitqueue spinlocks their own lockdep classes

Give waitqueue spinlocks their own lockdep classes when they
are initialised from init_waitqueue_head().  This means that
struct wait_queue::func functions can operate other waitqueues.

This is used by CacheFiles to catch the page from a backing fs
being unlocked and to wake up another thread to take a copy of
it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: linux-cachefs@redhat.com
Cc: torvalds@osdl.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agomm_for_maps: take ->cred_guard_mutex to fix the race with exec
Oleg Nesterov [Fri, 10 Jul 2009 01:27:40 +0000 (03:27 +0200)]
mm_for_maps: take ->cred_guard_mutex to fix the race with exec

The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.

Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.

Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agomm_for_maps: shift down_read(mmap_sem) to the caller
Oleg Nesterov [Fri, 10 Jul 2009 01:27:38 +0000 (03:27 +0200)]
mm_for_maps: shift down_read(mmap_sem) to the caller

mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agomm_for_maps: simplify, use ptrace_may_access()
Oleg Nesterov [Tue, 23 Jun 2009 19:25:32 +0000 (21:25 +0200)]
mm_for_maps: simplify, use ptrace_may_access()

It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.

Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.

Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.

With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
13 years agoALSA: hda - Add missing vmaster initialization for ALC269
Takashi Iwai [Mon, 10 Aug 2009 09:55:51 +0000 (11:55 +0200)]
ALSA: hda - Add missing vmaster initialization for ALC269

Without the initialization of vmaster NID, the dB information got
confused for ALC269 codec.

Reference: Novell bnc#527361
https://bugzilla.novell.com/show_bug.cgi?id=527361

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>
13 years agoperf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
Peter Zijlstra [Mon, 10 Aug 2009 09:20:12 +0000 (11:20 +0200)]
perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data

Raw tracepoint data contains various kernel internals and
data from other users, so restrict this to CAP_SYS_ADMIN.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896452.17467.75.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf_counter: Correct PERF_SAMPLE_RAW output
Peter Zijlstra [Mon, 10 Aug 2009 09:16:52 +0000 (11:16 +0200)]
perf_counter: Correct PERF_SAMPLE_RAW output

PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896447.17467.74.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agofutex: Update futex_q lock_ptr on requeue proxy lock
Darren Hart [Sun, 9 Aug 2009 22:34:39 +0000 (15:34 -0700)]
futex: Update futex_q lock_ptr on requeue proxy lock

futex_requeue() can acquire the lock on behalf of a waiter
early on or during the requeue loop if it is uncontended or in
the event of a lock steal or owner died. On wakeup, the waiter
(in futex_wait_requeue_pi()) cleans up the pi_state owner using
the lock_ptr to protect against concurrent access to the
pi_state. The pi_state is hung off futex_q's on the requeue
target futex hash bucket so the lock_ptr needs to be updated
accordingly.

The problem manifested by triggering the WARN_ON in
lookup_pi_state() about the pid != pi_state->owner->pid.  With
this patch, the pi_state is properly guarded against concurrent
access via the requeue target hb lock.

The astute reviewer may notice that there is a window of time
between when futex_requeue() unlocks the hb locks and when
futex_wait_requeue_pi() will acquire hb2->lock.  During this
time the pi_state and uval are not in sync with the underlying
rtmutex owner (but the uval does indicate there are waiters, so
no atomic changes will occur in userspace).  However, this is
not a problem. Should a contending thread enter
lookup_pi_state() and acquire hb2->lock before the ownership is
fixed up, it will find the pi_state hung off a waiter's
(possibly the pending owner's) futex_q and block on the
rtmutex.  Once futex_wait_requeue_pi() fixes up the owner, it
will also move the pi_state from the old owner's
task->pi_state_list to its own.

v3: Fix plist lock name for application to mainline (rather
    than -rt) Compile tested against tip/v2.6.31-rc5.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
LKML-Reference: <4A7F4EFF.6090903@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agopowerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
Benjamin Herrenschmidt [Mon, 10 Aug 2009 06:36:38 +0000 (16:36 +1000)]
powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM

On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
https://bugzilla.redhat.com/show_bug.cgi?id=514787

We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
13 years agoRemove deadlock potential in md_open
NeilBrown [Mon, 10 Aug 2009 02:50:52 +0000 (12:50 +1000)]
Remove deadlock potential in md_open

A recent commit:
  commit 449aad3e25358812c43afc60918c5ad3819488e7

introduced the possibility of an A-B/B-A deadlock between
bd_mutex and reconfig_mutex.

__blkdev_get holds bd_mutex while calling md_open which takes
   reconfig_mutex,
do_md_run is always called with reconfig_mutex held, and it now
   takes bd_mutex in the call the revalidate_disk.

This potential deadlock was not caught by lockdep due to the
use of mutex_lock_interruptible_nexted which was introduced
by
   commit d63a5a74dee87883fda6b7d170244acaac5b05e8
do avoid a warning of an impossible deadlock.

It is quite possible to split reconfig_mutex in to two locks.
One protects the array data structures while it is being
reconfigured, the other ensures that an array is never even partially
open while it is being deactivated.
In particular, the second lock prevents an open from completing
between the time when do_md_stop checks if there are any active opens,
and the time when the array is either set read-only, or when ->pers is
set to NULL.  So we can be certain that no IO is in flight as the
array is being destroyed.

So create a new lock, open_mutex, just to ensure exclusion between
'open' and 'stop'.

This avoids the deadlock and also avoids the lockdep warning mentioned
in commit d63a5a74d

Reported-by: "Mike Snitzer" <snitzer@gmail.com>
Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMerge branch 'for-linus' of git://git.infradead.org/ubi-2.6
Linus Torvalds [Sun, 9 Aug 2009 21:58:34 +0000 (14:58 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6

* 'for-linus' of git://git.infradead.org/ubi-2.6:
  UBI: compatible fallback in absense of sequence numbers
  UBI: fix double free on error path

13 years agoMerge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 9 Aug 2009 21:58:21 +0000 (14:58 -0700)]
Merge branch 'kvm-updates/2.6.31' of git://git./virt/kvm/kvm

* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Avoid redelivery of edge interrupt before next edge
  KVM: MMU: limit rmap chain length
  KVM: ia64: fix build failures due to ia64/unsigned long mismatches
  KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
  KVM: fix ack not being delivered when msi present
  KVM: s390: fix wait_queue handling
  KVM: VMX: Fix locking imbalance on emulation failure
  KVM: VMX: Fix locking order in handle_invalid_guest_state
  KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
  KVM: SVM: force new asid on vcpu migration
  KVM: x86: verify MTRR/PAT validity
  KVM: PIT: fix kpit_elapsed division by zero
  KVM: Fix KVM_GET_MSR_INDEX_LIST

13 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Sun, 9 Aug 2009 21:58:09 +0000 (14:58 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/i915: silence vblank warnings
  drm: silence pointless vblank warning.
  drm: When adding probed modes, preserve duplicate mode types

13 years agoMerge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 9 Aug 2009 21:57:41 +0000 (14:57 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()

13 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 9 Aug 2009 21:57:26 +0000 (14:57 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Fix/complete ftrace event records sampling
  perf_counter, ftrace: Fix perf_counter integration
  tracing/filters: Always free pred on filter_add_subsystem_pred() failure
  tracing/filters: Don't use pred on alloc failure
  ring-buffer: Fix memleak in ring_buffer_free()
  tracing: Fix recordmcount.pl to handle sections with only weak functions
  ring-buffer: Fix advance of reader in rb_buffer_peek()
  tracing: do not use functions starting with .L in recordmcount.pl
  ring-buffer: do not disable ring buffer on oops_in_progress
  ring-buffer: fix check of try_to_discard result

13 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 9 Aug 2009 21:57:09 +0000 (14:57 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix buffer overflow in efi_init()
  x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci
  x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
  x86: Fix VMI && stack protector

13 years agoMerge branch 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 9 Aug 2009 21:56:51 +0000 (14:56 -0700)]
Merge branch 'core-fixes-for-linus-2' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: Fix typos in documentation
  lockdep: Fix file mode of lock_stat
  rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()

13 years agoperf tools: callchain: Fix bad rounding of minimum rate
Frederic Weisbecker [Sun, 9 Aug 2009 02:19:15 +0000 (04:19 +0200)]
perf tools: callchain: Fix bad rounding of minimum rate

Sometimes we get callchain branches that have a rate under the
limit given by the user.

Say you launched:

 perf record -f -g -a ./hackbench 10
 perf report -g fractal,10.0

And you got:

2.33%       hackbench  [kernel]                  [k] _spin_lock_irqsave
                |
                |--78.57%-- remove_wait_queue
                |          poll_freewait
                |          do_sys_poll
                |          sys_poll
                |          sysenter_dispatch
                |          0xf7ffa430
                |          0x1ffadea3c
                |
                |--7.14%-- __up_read
                |          up_read
                |          do_page_fault
                |          page_fault
                |          0xf7ffa430
                |          0xa0df710000000a
                ...

It is abnormal to get a 7.14% branch whereas we passed a 10%
filter.

The problem is that we round down the minimum threshold. This
happens mostly when we have very low number of events. If the
total amount of your branch is 4 and you have a subranch of 3
events, filtering to 90% will be computed like follows:

  limit = 4 * 0.9;

The result is about 3.6, but the cast to integer will round
down to 3. It means that our filter is actually of 75%

We must then explicitly round up the minimum threshold.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: acme@redhat.com
Cc: peterz@infradead.org
Cc: efault@gmx.de
LKML-Reference: <20090809024235.GA10146@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf_counter tools: Fix libbfd detection for systems with libz dependency
Mike Galbraith [Sat, 8 Aug 2009 12:14:15 +0000 (14:14 +0200)]
perf_counter tools: Fix libbfd detection for systems with libz dependency

Due to a libz dependency in some distro's binutils package,
C++ demangle support isn't compiled in despite the necessary
libraries being available.

Fix this by adding a -lz link test to the dependency detection
rules.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1249733655.6929.5.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf: "Longum est iter per praecepta, breve et efficax per exempla"
Carlos R. Mafra [Wed, 5 Aug 2009 18:53:34 +0000 (20:53 +0200)]
perf: "Longum est iter per praecepta, breve et efficax per exempla"

A few examples of how 'perf' can be used, from an e-mail by
Ingo Molnar http://lkml.org/lkml/2009/8/4/346.

Signed-off-by: Carlos R. Mafra <crmafra2@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Valdis.Kletnieks@vt.edu
LKML-Reference: <20090805185334.GA4535@Pilar.aei.mpg.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf_counter: Fix a race on perf_counter_ctx
Peter Zijlstra [Fri, 7 Aug 2009 17:49:01 +0000 (19:49 +0200)]
perf_counter: Fix a race on perf_counter_ctx

While extending perfcounters with BTS hw-tracing, Markus
Metzger managed to trigger this warning:

   [  995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()

triggers because commit
9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full
task tracing) removed clearing of tsk->perf_counter_ctxp out
from under ctx->lock which introduced a race (against
perf_lock_task_context).

Move it back and deal with the exit notification by explicitly
passing along the former task context.

Reported-by: Markus T Metzger <markus.t.metzger@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249667341.17467.5.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf_counter: Fix tracepoint sampling to be part of generic sampling
Frederic Weisbecker [Sat, 8 Aug 2009 02:26:37 +0000 (04:26 +0200)]
perf_counter: Fix tracepoint sampling to be part of generic sampling

Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:

- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record

We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.

Reported-by; Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf_counter: Work around gcc warning by initializing tracepoint record unconditionally
Frederic Weisbecker [Sat, 8 Aug 2009 02:26:35 +0000 (04:26 +0200)]
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally

Despite that the tracepoint record is always present when the
PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
thinking it might not be initialized:

  kernel/perf_counter.c: In function ‘perf_counter_output’:
  kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function

Then, initialize it to NULL and always check if it's not NULL
before dereference it.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf tools: callchain: Fix sum of percentages to be 100% by displaying amount of...
Frederic Weisbecker [Sat, 8 Aug 2009 00:16:25 +0000 (02:16 +0200)]
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode

When we filter the callchains below a given percentage, we
ignore them and the end result only shows entries that have an
upper percentage than the filter threshold.

It seems to users then that we have an imbalance in the
percentage, as if the sum inside a profiled branch doesn't
reach 100%.

Since in the past there have been real perf report bugs that
showed the same sypmtom, it would be nice to assure the user
that the data is perfect and trustable and it all sums up to
100.00%.

So fix this by displaying the remaining hits that have been
filtered but without more detail than their amount in each
branches. Example while filtering below 50%:

7.73%  [k] delay_tsc
                |
                |--98.22%-- __const_udelay
                |          |
                |          |--86.37%-- ath5k_hw_register_timeout
                |          |          ath5k_hw_noise_floor_calibration
                |          |          ath5k_hw_reset
                |          |          ath5k_reset
                |          |          ath5k_config
                |          |          ieee80211_hw_config
                |          |          |
                |          |          |--88.53%-- ieee80211_scan_work
                |          |          |          worker_thread
                |          |          |          kthread
                |          |          |          child_rip
                |          |           --11.47%-- [...]
                |           --13.63%-- [...]
                 --1.78%-- [...]

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf tools: callchain: Fix 'perf report' display to be callchain by default
Frederic Weisbecker [Sat, 8 Aug 2009 00:16:24 +0000 (02:16 +0200)]
perf tools: callchain: Fix 'perf report' display to be callchain by default

If we recorded with -g option to record the callchain, right now
we require a -g option to perf report as well - and people reported
this as unnecessary complication: the user already specified -g
once, no need to require it a second time.

So if the recording includes call-chains, display the callchain by
default from perf report.

( The user can override this default using "-g none" option from
  perf report. )

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>