Pandora Sourcecodes - pandora-kernel.git/log

bridge: Handle IFLA_ADDRESS correctly when creating bridge device

When bridge device is created with IFLA_ADDRESS, we are not calling
br_stp_change_bridge_id(), which leads to incorrect local fdb
management and bridge id calculation, and prevents us from receiving
frames on the bridge device.

Reported-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Update Kconfig to include Chelsio T5 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: fix Timesync Tx interrupt handler code

This patch fixes the PTP Tx timestamp interrupt handler. The original
code misinterpreted the interrupt handler design. We were clearing the
ena_mask bit for the Timesync interrupts. This is done to indicate that
the interrupt will be handled in a scheduled work item (instead of
immediately) and that work item is responsible for re-enabling the
interrupts. However, the Tx timestamp was being handled immediately and
nothing was ever re-enabling it. This resulted in a single interrupt
working for the life of the driver.

This patch fixes the issue by instead clearing the bit from icr0 which
is used to indicate that the interrupt was immediately handled and can
be re-enabled right away. This patch also clears up a related issue due
to writing the PRTTSYN_STAT_0 register, which was unintentionally
clearing the cause bits for Timesync interrupts.

Change-ID: I057bd70d53c302f60fab78246989cbdfa469d83b
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-3.15-20140424' of git://gitorious.org/linux-can/linux-can

Marc Kleine-Budde says:

====================
this is a pull request for net/master, for the v3.15 release cycle, consisting
of 26 patches.

Thomas Gleixner contributes 21 patches for the c_can driver, which address
several shortcomings in the driver like hardware initialisation, concurrency,
message ordering and poor performance. Two patches Oliver Hartkopp, one adds a
missing lock to the sja1000_isa driver, the other one fixes the return value in
the generic bit time configuration function. And finally a patch by Alexander
Stein, that fixes the slcan driver to use the correct spinlock variant.

To make it 26 patches, Wolfgang Grandegger patch for the c_can_pci
driver, which enables the bus master only for MSI and a patch by
Wolfram Sang, which converts the 'instance' in the c_can driver to the
proper type.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'altera_tse'

Vince Bridgers says:

====================
This series of patches addresses a handful of issues found in testing
and reported by users of the Altera Triple Speed Ethernet soft IP.

The patches address the following issues (in summary)

1) The SGDMA soft IP was found to incorrectly process receive packets
   when the target physical address of the receive buffer was on
   a boundary that's not 32-bit aligned. One of the patches addresses
   this issue.
2) The pause quanta was not being set by the driver, one patch of this
   series sets the pause quanta to the IEEE defined default value
   since the hardware reset value is 0.
3) An issue in a error recovery path of the probe routine caused a
   kernel panic in the event a phy was probed and could not be found.
   A patch addresses this issue.
4) A change was made to the driver name for Ethtool support, and
   comments added to support an addition to Ethtool to support
   the Altera Triple Speed Ethernet controller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Altera TSE: Change driver name used by Ethtool

This patch changes the name used by Ethtool to something more
conventional in preparation for TSE Ethtool register dump
support to be added in the near future.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Altera TSE: Fix Panic in probe routine when phy probe fails

This patch addresses a fault in the error recovery path of the probe
routine where the netdev structure was not being unregistered properly
leading to a panic only when the phy probe failed.

Abbreviated panic stack seen is as follows:

(free_netdev+0xXX) from (altera_tse_probe+0xXX)
(altera_tse_probe+0xXX) from (platform_drv_probe+0xXX)
(platform_drv_probe+0xXX) from (driver_probe_device+0xXX)
(driver_probe_device+0xXX) from (__driver_attach+0xXX)
(__driver_attach+0xXX) from (bus_for_each_dev+0xXX)
(bus_for_each_dev+0xXX) from (driver_attach+0xXX)
(driver_attach+0xXX) from (bus_add_driver+0xXX)
(bus_add_driver+0xXX) from (driver_register+0xXX)
(driver_register+0xXX) from (__platform_driver_register+0xXX)
(__platform_driver_register+0xXX) from (altera_tse_driver_init+0xXX)
(altera_tse_driver_init+0xXX) from (do_one_initcall+0xXX)
(do_one_initcall+0xXX) from (kernel_init_freeable+0xXX)
(kernel_init_freeable+0xXX) from (kernel_init+0xXX)
(kernel_init+0xXX) from (ret_from_fork+0xXX)

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Altera TSE: Set the Pause Quanta value to the IEEE default value

This patch initializes the pause quanta set for transmitted pause frames
to the IEEE specified default of 0xffff.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Altera TSE: Work around unaligned DMA receive packet issue with Altera SGDMA

This patch works around a recently discovered unaligned receive dma problem
with the Altera SGMDA. The Altera SGDMA component cannot be configured to
DMA data to unaligned addresses for receive packet operations from the
Triple Speed Ethernet component because of a potential data transfer
corruption that can occur. This patch addresses this issue by
utilizing the shift 16 bits feature of the Altera Triple Speed Ethernet
component and modifying the receive buffer physical addresses accordingly
such that the target receive DMA address is always aligned on a 32-bit
boundary.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Tested-by: Matthew Gerlach <mgerlach@altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnx2x-net'

Yuval Mintz says:

====================
bnx2x: SRIOV bug fixes

This series contains 3 SRIOV bug fixes, 2 of which are regressions starting
with commit 2dc33bbc "bnx2x: Remove the sriov VFOP mechanism".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix failure to configure VF multicast filters

Commit 2dc33bbc "bnx2x: Remove the sriov VFOP mechanism" caused a regression,
preventing VFs from configuring multicast filters.

Signed-off-by: Naredner Kumar <narender.kumar@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix vlan credit issues for VFs

Starting with commit 2dc33bbc "bnx2x: Remove the sriov VFOP mechanism",
the bnx2x started enforcing vlan credits for all vlan configurations.
This exposed 2 issues:
  - Vlan credits are not returned once a VF is removed; this causes a leak
    of credits, and eventually will lead to VFs with no vlan credits.
  - A vlan credit must be set aside for the Hypervisor to use, and should
    not be visible to the VF.

Although linux VFs at the moment do not support vlan configuration [from the
VF side] which causes them to be resilient to this sort of issue, Windows VF
over linux hypervisors might fail to load as the vlan credits become depleted.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Memory leak during VF removal

When removing a VF interface, the driver fails to release that VF's mailbox
and bulletin board allocated memory.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fib: fix fib dump restart

When the ipv6 fib changes during a table dump, the walk is
restarted and the number of nodes dumped are skipped. But the existing
code doesn't advance to the next node after a node is skipped. This can
cause the dump to loop or produce lots of duplicates when the fib
is modified during the dump.

This change advances the walk to the next node if the current node is
skipped after a restart.

Signed-off-by: Kumar Sundararajan <kumar@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: slcan: Fix spinlock variant

slc_xmit is called within softirq context and locks sl->lock, but
slcan_write_wakeup is not softirq context, so we need to use
spin_[un]lock_bh!
Detected using kernel lock debugging mechanism.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: fix return value from can_get_bittiming()

When trying to set a data bitrate on non CAN FD devices the 'ip' tool
answers with:

RTNETLINK answers: Unknown error 524

Rename '-ENOTSUPP' to '-EOPNOTSUPP' so that 'ip' answers correctly:

RTNETLINK answers: Operation not supported

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000_isa: add locking for indirect register access mode

When accessing the SJA1000 controller registers in the indirect access mode,
writing the register number and reading/writing the data has to be an atomic
attempt.

As the sja1000_isa driver is an old style driver with a fixed number of
instances the locking variable depends on the same index like all the other
configuration elements given on the module command line.

As a positive side effect dev->dev_id is populated by the instance index,
which was missing in 3e66d0138c05d9 ("can: populate netdev::dev_id for udev
discrimination").

Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can_pci: enable PCI bus master only for MSI

Coverity complains that c_can_pci_probe() calls pci_enable_msi() without
checking the result:

CID 712278 (#1 of 1): Unchecked return value (CHECKED_RETURN) 3. check_return:
Calling pci_enable_msi_block without checking return value (as is done
elsewhere 88 out of 105 times).
88 pci_enable_msi(pdev);

This is CID 712278.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: use proper type for 'instance'

Commit 6439fbce1075 (can: c_can: fix error checking of priv->instance in
probe()) found the warning but applied a suboptimal solution. Since, both
pdev->id and of_alias_get_id() return integers, it makes sense to convert the
variable to an integer and avoid the cast.

Signed-off-by: Wolfram Sang <wsa@sang-engineering.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Speed up tx buffer invalidation

It's suffcient to kill the TXIE bit in the message control register
even if the documentation of C and D CAN says that it's not allowed to
do that while MSGVAL is set. Reality tells a different story and this
change gives us another 2% of CPU back for not waiting on I/O.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Remove tx locking

Mark suggested to use one IF for the softirq and the other for the
xmit function to avoid the xmit lock.

That requires to write the frame into the interface first, then handle
the echo skb and store the dlc before committing the TX request to the
message ram.

We use an atomic to handle the active buffers instead of reading the
MSGVAL register as thats way faster especially on PCH/x86.

Suggested-by: Mark <mark5@del-llc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Use proper u32 variables in c_can_write_msg_object()

Instead of obfuscating the code by artificial 16 bit splits use the
proper 32 bit assignments and split the result when writing to the
interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Cleanup c_can_write_msg_object()

Remove the MASK from the TX transfer side.

Make the code readable and get rid of the annoying IFX_WRITE_XXX_16BIT
macros which are just obfuscating the code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Cleanup c_can_msg_obj_put/get()

Sigh!

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Cleanup c_can_inval_msg_object()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Cleanup setup of receive buffers

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Cleanup c_can_read_msg_object()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Cleanup irq enable/disable

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Work around C_CAN RX wreckage

Alexander reported that the new optimized handling of the RX fifo
causes random packet loss on Intel PCH C_CAN hardware.

After a few fruitless debugging sessions I got hold of a PCH (eg20t)
afflicted system. That machine does not have the CAN interface wired
up, but it was possible to reproduce the issue with the HW loopback
mode.

As Alexander observed correctly, clearing the NewDat flag along with
reading out the message buffer causes that issue on C_CAN, while D_CAN
handles that correctly.

Instead of restoring the original message buffer handling horror the
following workaround solves the issue:

    transfer buffer to IF without clearing the NewDat
    handle the message
    clear NewDat bit

That's similar to the original code but conditional for C_CAN.

I really wonder why all user manuals (C_CAN, Intel PCH and some more)
recommend to clear the NewDat bit right away. The knows it all Oracle
operated by Gurgle does not unearth any useful information either. I
simply cannot believe that we are the first to uncover that HW issue.

Reported-and-tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Disable rx split as workaround

The RX buffer split causes packet loss in the hardware:

What happens is:

RX Packet 1 --> message buffer 1 (newdat bit is not cleared)
RX Packet 2 --> message buffer 2 (newdat bit is not cleared)
RX Packet 3 --> message buffer 3 (newdat bit is not cleared)
RX Packet 4 --> message buffer 4 (newdat bit is not cleared)
RX Packet 5 --> message buffer 5 (newdat bit is not cleared)
RX Packet 6 --> message buffer 6 (newdat bit is not cleared)
RX Packet 7 --> message buffer 7 (newdat bit is not cleared)
RX Packet 8 --> message buffer 8 (newdat bit is not cleared)

Clear newdat bit in message buffer 1
Clear newdat bit in message buffer 2
Clear newdat bit in message buffer 3
Clear newdat bit in message buffer 4
Clear newdat bit in message buffer 5
Clear newdat bit in message buffer 6
Clear newdat bit in message buffer 7
Clear newdat bit in message buffer 8

Now if during that clearing of newdat bits, a new message comes in,
the HW gets confused and drops it.

It does not matter how many of them you clear. I put a delay between
clear of buffer 1 and buffer 2 which was long enough that the message
should have been queued either in buffer 1 or buffer 9. But it did not
show up anywhere. The next message ended up in buffer 1. So the
hardware lost a packet of course without telling it via one of the
error handlers.

That does not happen on all clear newdat bit events. I see one of 10k
packets dropped in the scenario which allows us to reproduce. But the
trace looks always the same.

Not splitting the RX Buffer avoids the packet loss but can cause
reordering. It's hard to trigger, but it CAN happen.

With that mode we use the HW as it was probably designed for. We read
from the buffer 1 upwards and clear the buffer as we get the
message. That's how all microcontrollers use it. So I assume that the
way we handle the buffers was never really tested. According to the
public documentation it should just work :)

Let the user decide which evil is the lesser one.

[ Oliver Hartkopp: Provided a sane config option and help text and
made me switch to favour potential and unlikely reordering over
packet loss ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Get rid of pointless interrupts

The driver handles pointlessly TWO interrupts per packet. The reason
is that it enables the status interrupt which fires for each rx and tx
packet and it enables the per message object interrupts as well.

The status interrupt merily acks or in case of D_CAN ignores the TX/RX
state and then the message object interrupt fires.

The message objects interrupts are only useful if all message objects
have hardware filters activated.

But we don't have that and its not simple to implement in that driver
without rewriting it completely.

So we can ditch the message object interrupts and handle the RX/TX
right away from the status interrupt. Instead of TWO we handle ONE.

Note: We must keep the TXIE/RXIE bits in the message buffers because
the status interrupt alone is not reliable enough in corner cases.

If we ever have the need for HW filtering, then this code needs a
complete overhaul and we can think about it then. For now we prefer a
lower interrupt load.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Avoid status register update for D_CAN

On D_CAN the RXOK, TXOK and LEC bits are cleared/set on read of the
status register. No need to update them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Simplify buffer reenabling

Instead of writing to the message object we can simply clear the
NewDat bit with the get method.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Always update error stats

If the allocation of the error skb fails, we still want to see the
error statistics.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Fix berr reporting

Reading the LEC type with

return (mode & ENABLED) && (status & LEC_MASK);

is not guaranteed to return (status & LEC_MASK) if the enabled bit in
mode is set. It's guaranteed to return 0 or !=0.

Remove the inline function and call unconditionally into the
berr_handling code and return early when the reporting is disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Handle state change correctly

If the allocation of an error skb fails, the state change handling
returns w/o doing any work. That leaves the interface in a wreckaged
state as the internal status is wrong.

Split the interface handling and the skb handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Do not access skb after net_receive_skb()

There is no guarantee that the skb is in the same state after calling
net_receive_skb(). It might be freed or reused. Not really harmful as
its a read access, except you turn on the proper debugging options
which catch a use after free.

The whole can subsystem is full of this. Copy and paste ....

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Make bus off interrupt disable logic work

The state change handler is called with device interrupts disabled
already. So no point in disabling them again when we enter bus off
state.

But what's worse is that we reenable the interrupts at the end of NAPI
poll unconditionally. So c_can_start() which is called from the
restart timer can trigger interrupts which confuse the hell out of the
half reinitialized driver/hw.

Remove the pointless device interrupt disable in the BUS_OFF handler
and prevent reenabling the device interrupts at the end of the poll
routine when the current state is BUS_OFF.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Fix startup logic

c_can_start() enables interrupts way too early. The first enabling
happens when setting the control mode in c_can_chip_config() and then
again at the end of the function.

But that happens before napi_enable() and that means that an interrupt
which comes in will disable interrupts again and call napi_schedule,
which ignores the request and the later napi_enable() is not making
thinks work either. So the interface is up with all device interrupts
disabled.

Move the device interrupt after napi_enable() and add it to the other
callsites of c_can_start() in c_can_set_mode() and c_can_power_up()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can_pci: Set the type of the IP core

All type checks in c_can.c are != BOSCH_D_CAN so nobody noticed so far
that the pci code does not update the type information.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'rtnetlink_vf_ports'

David Gibson says:

====================
Fix problems with with IFLA_VF_PORTS (v2)

I've had a customer encounter a problem with getifaddrs(3) freezing up
on a system with a Cisco enic device.

I've discovered that the problem is caused by an enic device with a
large number of SR-IOV virtual functions overflowing the normal sized
packet buffer for netlink, leading to interfaces not being reported
from an RTM_GETLINK request.

The first patch here just makes the problem easier to locate if it
occurs again in a different way, by adding a WARN_ON() when we run out
of room in a netlink packet in this manner.

The second patch actually fixes the problem, by only reporting
IFLA_VF_PORTS information when the RTEXT_FILTER_VF flag is specified.

v2: Corrected some CodingStyle problems
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set

Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with
buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
attribute if they were solicited by a GETLINK message containing an
IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.

That was done because some user programs broke when they received more data
than expected - because IFLA_VFINFO_LIST contains information for each VF
it can become large if there are many VFs.

However, the IFLA_VF_PORTS attribute, supplied for devices which implement
ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
It supplies per-VF information and can therefore become large, but it is
not currently conditional on the IFLA_EXT_MASK value.

Worse, it interacts badly with the existing EXT_MASK handling.  When
IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
NLMSG_GOODSIZE.  If the information for IFLA_VF_PORTS exceeds this, then
rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
netlink_dump() will misinterpret this as having finished the listing and
omit data for this interface and all subsequent ones.  That can cause
getifaddrs(3) to enter an infinite loop.

This patch addresses the problem by only supplying IFLA_VF_PORTS when
IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Warn when interface's information won't fit in our packet

Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.

If it doesn't, however, things will go badly wrong, When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones. This can cause getifaddrs(3) to
enter an infinite loop.

This patch won't fix the problem, but it will WARN_ON() making it easier to
track down what's going wrong.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: Fix warning in nfnetlink_receive().

net/netfilter/nfnetlink.c: In function ‘nfnetlink_rcv’:
net/netfilter/nfnetlink.c:371:14: warning: unused variable ‘net’ [-Wunused-variable]

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netlink-caps'

Eric W. Biederman says:

====================
netlink: Preventing abuse when passing file descriptors.

Andy Lutomirski when looking at the networking stack noticed that it is
possible to trick privilged processes into calling write on a netlink
socket and send netlink messages they did not intend.

In particular from time to time there are suid applications that will
write to stdout or stderr without checking exactly what kind of file
descriptors those are and can be tricked into acting as a limited form
of suid cat. In other conversations the magic string CVE-2014-0181 has
been used to talk about this issue.

This patchset cleans things up a bit, adds some clean abstractions that
when used prevent this kind of problem and then finally changes all of
the handlers of netlink messages that I could find that call capable to
use netlink_ns_capable or an appropriate wrapper.

The abstraction netlink_ns_capable verifies that the original creator of
the netlink socket a message is sent from had the necessary capabilities
as well as verifying that the current sender of a netlink packet has the
necessary capabilities.

The idea is to prevent file descriptor passing of any form from
resulting in a file descriptor that can do more than it can for the
creator of the file descriptor.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Use netlink_ns_capable to verify the permisions of netlink messages

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add variants of capable for use on netlink messages

netlink_net_capable - The common case use, for operations that are safe on a network namespace
netlink_capable - For operations that are only known to be safe for the global root
netlink_ns_capable - The general case of capable used to handle special cases

__netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
the skbuff of a netlink message.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add variants of capable for use on on sockets

sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump

The permission check in sock_diag_put_filterinfo is wrong, and it is so removed
from it's sources it is not clear why it is wrong. Move the computation
into packet_diag_dump and pass a bool of the result into sock_diag_filterinfo.

This does not yet correct the capability check but instead simply moves it to make
it clear what is going on.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Rename netlink_capable netlink_allowed

netlink_capable is a static internal function in af_netlink.c and we
have better uses for the name netlink_capable.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Check if phydev present on ethtool -A

This fixes a seg fault on 'ethtool -A' entry if the
interface is down. Obviously we need to have the
phy device initialized / "connected" (see of_phy_connect())
to be able to advertise pause frame capabilities.

Fixes: 23402bddf9e56eecb27bbd1e5467b3b79b3dbe58
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qlcnic-net'

Shahed Shaikh says:

====================
qlcnic: Bug fixes

This patch series contains following fixes -

* Fix memory leak caused because of issuing mailbox
command which can not wait for its completion.
* Reset firmware API lock which might be in inconsistent state.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix memory leak.

o In case QLC_83XX_MBX_CMD_NO_WAIT command type the calling
  function does not free the memory as it does not wait for
  response. So free it when get a response from adapter after
  sending the command.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Reset firmware API lock at driver load time

Some firmware versions fails to reset the lock during
initialization. Force reset firmware API lock during driver
probe to ensure lock availability.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: forbid incorrect fall-through in notifier

There are two breaks missing there. The result is that userspace
receives multiple messages which might be confusing.

Introduced-by: 3d249d4c "net: introduce ethernet teaming device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cadence: Fix architecture dependencies

I was told that the Cadence macb driver is also useful on Microblaze.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mark Brown <broonie@kernel.org>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc91x: improve definition of debug macros

Redefine some macros that were conditioned upon SMC_DEBUG level.

By allowing compiler to verify parameters used by these macros
unconditionally, we can flag compilation failures.

Compiler will still optimize out the unused code path depending on
SMC_DEBUG, so this is a net gain.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: initialize A and X registers

exisiting BPF verifier allows uninitialized access to registers,
'ret A' is considered to be a valid filter.
So initialize A and X to zero to prevent leaking kernel memory
In the future BPF verifier will be rejecting such filters

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Update my email address

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: ensure to advertise the right fdb remote

The goal of this patch is to fix rtnelink notification. The main problem was
about notification for fdb entry with more than one remote. Before the patch,
when a remote was added to an existing fdb entry, the kernel advertised the
first remote instead of the added one. Also when a remote was removed from a fdb
entry with several remotes, the deleted remote was not advertised.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/phy: micrel: fix bugged test on device tree loading for ksz9021

In ksz9021_load_values_from_of() val2 to val4 aren't tested against their
initialization value.
This causes the test to always succeed, and this value to be used as if it
was loaded from the devicetree instead of being ignored, in case of a
missing/invalid property in the ethernet OF device node.
As a result, the value "0" is written to the relevant registers.

Change the conditions to test against the right initialization value.

Signed-off-by: Hubert Chaumette <hchaumette@adeneo-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hisax/icc: add missing semicolon after label

A label just before a brace needs a following semicolon (empty statement).

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc91x: fix compile error when SMC_DEBUG >= 2

When SMC_DEBUG >= 2, we hit the following compilation error:

drivers/net/ethernet/smsc/smc91x.c:85:0:
drivers/net/ethernet/smsc/smc91x.c: In function ‘smc_findirq’:
drivers/net/ethernet/smsc/smc91x.c:1784:9: error: ‘dev’ undeclared (first use in this function)
DBG(2, dev, "%s: %s\n", CARDNAME, __func__);
^
Fix it by passing in the appropriate netdev pointer.

Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sxgbe: Added phy_found error path

This patch adds phy_found error path when there is no phy device
and changes bus_name.

Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sxgbe: rearrange dma descriptor

This patch moves cksum_ctl to tx_rd_des23 from cksum_pktlen for correct checksum
offloading and modifies size for Tx/Rx descriptor.

Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: zero is an invald queue_pairs number

Execute "ethtool -L eth0 combined 0" in guest, if multiqueue
is enabled, virtnet_send_command() will return -EINVAL error,
there is a validation in QEMU.

But if multiqueue is disabled, virtnet_set_queues() will just
return zero (success). We should return error for this situation.

Signed-off-by: Amos Kong <akong@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arc_emac: write initial MAC address from devicetree to hw

The MAC address retrieved from dt was not actually written to the
hardware. This meant proper communication was only possible after
changing the MAC address.

Fix that by always writing the mac address during probing.

Signed-off-by: Max Schwarz <max.schwarz@online.de>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix ns_capable check in sock_diag_put_filterinfo

The caller needs capabilities on the namespace being queried, not on
their own namespace. This is a security bug, although it likely has
only a minor impact.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: SXGBE authors update

The mail address for Siva Reddy Kallam is bouncing, remove the email
address from the MAINTAINERS entry for Samsung's SXGBE driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to e1000e, igb, ixgbe and i40e.

Most notably are Jakub's patches to clean up the Rx time stamping
code for ixgbe and the fix up of debug messages with proper termination.

Jesse's i40e patch fixes an issue reported by Eric Dumazet that the
i40e driver was allowing the hardware to replicate the PSH flag on
all segments of a TSO operation. With this fix, we are now configuring
the CWR bit to only be set in the first packet of a TSO and we
enable TSO_ECN in order to advertise to the stack that we do the right
thing on the wire.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
"This fixes the preemption-count imbalance crash reported by Owen
Kibel"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix CMCI preemption bugs

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Two fixes:

   - a SCHED_DEADLINE task selection fix
   - a sched/numa related lockdep splat fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Check for stop task appearance when balancing happens
  sched/numa: Fix task_numa_free() lockdep splat

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Two kernel side fixes:

   - an Intel uncore PMU driver potential crash fix
   - a kprobes/perf-call-graph interaction fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
  kprobes/x86: Fix page-fault handling logic

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Unfortunately this contains no easter eggs, its a bit larger than I'd
  like, but I included a patch that just moves code from one file to
  another and I'd like to avoid merge conflicts with that later, so it
  makes it seem worse than it is,

  Otherwise:
   - radeon: fixes to use new microcode to stabilise some cards, use
     some common displayport code, some runtime pm fixes, pll regression
     fixes
   - i915: fix for some context oopses, a warn in a used path, backlight
     fixes
   - nouveau: regression fix
   - omap: a bunch of fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (51 commits)
  drm: bochs: drop unused struct fields
  drm: bochs: add power management support
  drm: cirrus: add power management support
  drm: Split out drm_probe_helper.c from drm_crtc_helper.c
  drm/plane-helper: Don't fake-implement primary plane disabling
  drm/ast: fix value check in cbr_scan2
  drm/nouveau/bios: fix a bit shift error introduced by 457e77b
  drm/radeon/ci: make sure mc ucode is loaded before checking the size
  drm/radeon/si: make sure mc ucode is loaded before checking the size
  drm/radeon: improve PLL params if we don't match exactly v2
  drm/radeon: memory leak on bo reservation failure. v2
  drm/radeon: fix VCE fence command
  drm/radeon: re-enable mclk dpm on R7 260X asics
  drm/radeon: add support for newer mc ucode on CI (v2)
  drm/radeon: add support for newer mc ucode on SI (v2)
  drm/radeon: apply more strict limits for PLL params v2
  drm/radeon: update CI DPM powertune settings
  drm/radeon: fix runpm handling on APUs (v4)
  drm/radeon: disable mclk dpm on R7 260X
  drm/tegra: Remove gratuitous pad field
  ...

e1000e/igb/ixgbe/i40e: fix message terminations

Add \n at the end of messages where missing, remove all \r.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: clean up Rx time stamping code

Time stamping resources are per-interface so there is no need
to keep separate last_rx_timestamp for each Rx ring, move
last_rx_timestamp to the adapter structure.

With last_rx_timestamp inside adapter, ixgbe_ptp_rx_hwtstamp()
inline function is reduced to a single if statement so it is
no longer necessary. If statement is placed directly in
ixgbe_process_skb_fields() fixing likely/unlikely marking.

Checks for q_vector or adapter to be NULL are superfluous.

Comment about taking I/O hit is a leftover from previous design.

Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: fix stats for i210 rx_fifo_errors

RQDPC on i210/i211 is R/W not ReadClear. Clear after reading.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux into drm-next

Some i2c fixes over DisplayPort.

* 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux:
  drm/radeon: Improve vramlimit module param documentation
  drm/radeon: fix audio pin counts for DCE6+ (v2)
  drm/radeon/dp: switch to the common i2c over aux code
  drm/dp/i2c: Update comments about common i2c over dp assumptions (v3)
  drm/dp/i2c: send bare addresses to properly reset i2c connections (v4)
  drm/radeon/dp: handle zero sized i2c over aux transactions (v2)
  drm/i915: support address only i2c-over-aux transactions
  drm/tegra: dp: Support address-only I2C-over-AUX transactions

e1000e: Enclose e1000e_pm_thaw() with CONFIG_PM_SLEEP

Fix following compilation warning:
drivers/net/ethernet/intel/e1000e/netdev.c:6238:12: warning
‘e1000e_pm_thaw’ defined but not used [-Wunused-function]
static int e1000e_pm_thaw(struct device *dev)
^
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Correctly include VLAN_HLEN when changing interface MTU

When changing the interface mtu, the driver starts with a value
that doesn't include VLAN_HLEN. Later tests in the driver
set the rx_buffer_len based on the mtu. As a result, when
the user increases the mtu to 1504 (to support 802.1AD for example),
the driver rx_buffer_len does not change and frames longer
the 1522 bytes are rejected as too long.

Include VLAN_HLEN from the start so that an user mtu greater then
1500 bytes is correctly reflected in the driver rx_buffer_len.

CC: e1000-devel@lists.sourceforge.net
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge git://git./linux/kernel/git/davem/net

Pull more networking fixes from David Miller:

1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
    context, not synchronize it.  From Chris Mason.

2) Ipv4 flow input interface should never be zero, it should be
    LOOPBACK_IFINDEX instead.  From Cong Wang and Julian Anastasov.

3) Properly configure MAC to PHY connection in mvneta devices, from
    Thomas Petazzoni.

4) sys_recv should use SYSCALL_DEFINE.  From Jan Glauber.

5) Tunnel driver ioctls do not use the correct namespace, fix from
    Nicolas Dichtel.

6) Fix memory leak on seccomp filter attach, from Kees Cook.

7) Fix lockdep warning for nested vlans, from Ding Tianhong.

8) Crashes can happen in SCTP due to how the auth_enable value is
    managed, fix from Vlad Yasevich.

9) Wireless fixes from John W Linville and co.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  net: sctp: cache auth_enable per endpoint
  tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
  vlan: Fix lockdep warning when vlan dev handle notification
  seccomp: fix memory leak on filter attach
  isdn: icn: buffer overflow in icn_command()
  ip6_tunnel: use the right netns in ioctl handler
  sit: use the right netns in ioctl handler
  ip_tunnel: use the right netns in ioctl handler
  net: use SYSCALL_DEFINEx for sys_recv
  net: mdio-gpio: Add support for separate MDI and MDO gpio pins
  net: mdio-gpio: Add support for active low gpio pins
  net: mdio-gpio: Use devm_ functions where possible
  ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
  ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
  mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
  net: mvneta: properly configure the MAC <-> PHY connection in all situations
  net: phy: add minimal support for QSGMII PHY
  sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
  mwifiex: fix hung task on command timeout
  mwifiex: process event before command response
  ...

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"A set of 5 small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cif: fix dead code
  cifs: fix error handling cifs_user_readv
  fs: cifs: remove unused variable.
  Return correct error on query of xattr on file with empty xattrs
  cifs: Wait for writebacks to complete before attempting write.

i40e: fix TCP flag replication for hardware offload

As reported by Eric Dumazet, the i40e driver was allowing the hardware
to replicate the PSH flag on all segments of a TSO operation.

This patch fixes the first/middle/last TCP flags settings which
makes the TSO operations work correctly.

With this change we are now configuring the CWR bit to only be set
in the first packet of a TSO, so this patch also enables TSO_ECN,
in order to advertise to the stack that we do the right thing
on the wire.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: remove open-coded skb_cow_head

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge tag 'char-misc-3.15-rc2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a few driver fixes for char/misc drivers that resolve
  reported issues.

  All have been in linux-next successfully for a few days"

* tag 'char-misc-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts
  Tools: hv: Handle the case when the target file exists correctly
  vme_tsi148: Utilize to_pci_dev() macro
  vme_tsi148: Fix PCI address mapping assumption
  vme_tsi148: Fix typo in tsi148_slave_get()
  w1: avoid recursive device_add
  w1: fix netlink refcnt leak on error path
  misc: Grammar s/addition/additional/
  drivers: mcb: fix memory leak in chameleon_parse_cells() error path
  mei: ignore client writing state during cb completion
  mei: me: do not load the driver if the FW doesn't support MEI interface
  GenWQE: Increase driver version number
  GenWQE: Fix multithreading problems
  GenWQE: Ensure rc is not returning an uninitialized value
  GenWQE: Add wmb before DDCB is started
  GenWQE: Enable access to VPD flash area

Merge tag 'driver-core-3.15-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are some driver core fixes for 3.15-rc2.  Also in here are some
  documentation updates, as well as an API removal that had to wait for
  after -rc1 due to the cleanups coming into you from multiple developer
  trees (this one and the PPC tree.)

  All have been in linux next successfully"

* tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers/base/dd.c incorrect pr_debug() parameters
  Documentation: Update stable address in Chinese and Japanese translations
  topology: Fix compilation warning when not in SMP
  Chinese: add translation of io_ordering.txt
  stable_kernel_rules: spelling/word usage
  sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
  kernfs: protect lazy kernfs_iattrs allocation with mutex
  fs: Don't return 0 from get_anon_bdev

Merge tag 'staging-3.15-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are a few staging driver fixes for issues that have been reported
  for 3.15-rc2.

  Also dominating the diffstat for the pull request is the removal of
  the rtl8187se driver.  It's no longer needed in staging as a "real"
  driver for this hardware is now merged in the tree in the "correct"
  location in drivers/net/

  All of these patches have been tested in linux-next"

* tag 'staging-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8188eu: Fix case where ethtype was never obtained and always be checked against 0
  staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0
  staging: r8188eu: Calling rtw_get_stainfo() with a NULL sta_addr will return NULL
  staging: comedi: fix circular locking dependency in comedi_mmap()
  staging: r8723au: Add missing initialization of change_inx in sort algorithm
  Staging: unisys: use after free in list_for_each()
  staging: unisys: use after free in error messages
  staging: speakup: fix misuse of kstrtol() in handle_goto()
  staging: goldfish: Call free_irq in error path
  staging: delete rtl8187se wireless driver
  staging: rtl8723au: Fix buffer overflow in rtw_get_wfd_ie()
  staging: gs_fpgaboot: remove __TIMESTAMP__ macro
  staging: vme: fix memory leak in vme_user_probe()
  staging: fpgaboot: clean up Makefile
  staging/usbip: fix store_attach() sscanf return value check
  staging/usbip: userspace - fix usbipd SIGSEGV from refresh_exported_devices()
  staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls
  staging/rtl8821ae: Fix OOM handling in _rtl_init_deferred_work()

Merge tag 'tty-3.15-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty/serial driver fixes for 3.15-rc2.  Also
  in here are some Documentation file removals for drivers that we
  removed a long time ago, no need to keep it around any longer.

  All of these have been in linux-next for a bit"

* tag 'tty-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: 8250, disable "too much work" messages"
  serial: amba-pl011: fix regression, causing an Oops on rmmod
  tty: Fix help text of SYNCLINK_CS
  tty: fix memleak in alloc_pid
  ttyprintk: Allow built as a module
  ttyprintk: Fix wrong tty_unregister_driver() call in the error path
  serial: 8250, disable "too much work" messages
  Documentation/serial: Delete obsolete driver documentation
  serial: omap: Fix missing pm_runtime_resume handling by simplifying code
  serial_core: Fix pm imbalance on unbind
  serial: pl011: change Rx burst size to half of trigger level
  serial: timberdale: Depend on X86_32
  serial: st-asc: Fix SysRq char handling
  Revert "serial: clps711x: Give a chance to perform useful tasks during wait loop"
  serial_core: Fix conditional start_tx on ring buffer not empty
  serial: efm32: use $vendor,$device scheme for compatible string
  serial: omap: free the wakeup settings in remove

Merge tag 'usb-3.15-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of tiny USB fixes and new device ids for 3.15-rc2.
  Nothing major, just issues some people have reported.

  All of these have been in linux-next"

* tag 'usb-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  uas: fix deadlocky memory allocations
  uas: fix error handling during scsi_scan()
  uas: fix GFP_NOIO under spinlock
  uwb: adds missing error handling
  USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver
  USB: ohci-jz4740: FEAT_POWER is a port feature, not a hub feature
  USB: ohci-jz4740: Fix uninitialized variable warning
  USB: EHCI: tegra: set txfill_tuning
  usb: ehci-platform: Return immediately from suspend if ehci_suspend fails
  usb: ehci-exynos: Return immediately from suspend if ehci_suspend fails
  USB: fix crash during hotplug of PCI USB controller card
  USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate()
  usb: usb-common: fix typo for usb_state_string
  USB: usb_wwan: fix handling of missing bulk endpoints
  USB: pl2303: add ids for Hewlett-Packard HP POS pole displays
  USB: cp210x: Add 8281 (Nanotec Plug & Drive)
  usb: option driver, add support for Telit UE910v2
  Revert "USB: serial: add usbid for dell wwan card to sierra.c"
  USB: serial: ftdi_sio: add id for Brainboxes serial cards

Merge branch 'akpm' (incoming from Andrew)

Merge misc fixes from Andrew Morton:
"13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  thp: close race between split and zap huge pages
  mm: fix new kernel-doc warning in filemap.c
  mm: fix CONFIG_DEBUG_VM_RB description
  mm: use paravirt friendly ops for NUMA hinting ptes
  mips: export flush_icache_range
  mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
  wait: explain the shadowing and type inconsistencies
  Shiraz has moved
  Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
  powerpc/mm: fix ".__node_distance" undefined
  kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
  init/Kconfig: move the trusted keyring config option to general setup
  vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()

thp: close race between split and zap huge pages

Sasha Levin has reported two THP BUGs[1][2].  I believe both of them
have the same root cause.  Let's look to them one by one.

The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".  It's
BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().  From my
testing I see that page_mapcount() is higher than mapcount here.

I think it happens due to race between zap_huge_pmd() and
page_check_address_pmd().  page_check_address_pmd() misses PMD which is
under zap:

CPU0 CPU1
zap_huge_pmd()
  pmdp_get_and_clear()
__split_huge_page()
  anon_vma_interval_tree_foreach()
    __split_huge_page_splitting()
      page_check_address_pmd()
        mm_find_pmd()
  /*
   * We check if PMD present without taking ptl: no
   * serialization against zap_huge_pmd(). We miss this PMD,
   * it's not accounted to 'mapcount' in __split_huge_page().
   */
  pmd_present(pmd) == 0

  BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!

  page_remove_rmap(page)
    atomic_add_negative(-1, &page->_mapcount)

The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().

This happens in similar way:

CPU0 CPU1
zap_huge_pmd()
  pmdp_get_and_clear()
  page_remove_rmap(page)
    atomic_add_negative(-1, &page->_mapcount)
__split_huge_page()
  anon_vma_interval_tree_foreach()
    __split_huge_page_splitting()
      page_check_address_pmd()
        mm_find_pmd()
  pmd_present(pmd) == 0 /* The same comment as above */
  /*
   * No crash this time since we already decremented page->_mapcount in
   * zap_huge_pmd().
   */
  BUG_ON(mapcount != page_mapcount(page))

  /*
   * We split the compound page here into small pages without
   * serialization against zap_huge_pmd()
   */
  __split_huge_page_refcount()
VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!

So my understanding the problem is pmd_present() check in mm_find_pmd()
without taking page table lock.

The bug was introduced by me commit with commit 117b0791ac42. Sorry for
that. :(

Let's open code mm_find_pmd() in page_check_address_pmd() and do the
check under page table lock.

Note that __page_check_address() does the same for PTE entires
if sync != 0.

I've stress tested split and zap code paths for 36+ hours by now and
don't see crashes with the patch applied. Before it took <20 min to
trigger the first bug and few hours for second one (if we ignore
first).

[1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
[2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix new kernel-doc warning in filemap.c

Fix new kernel-doc warning in mm/filemap.c:

Warning(mm/filemap.c:2600): Excess function parameter 'ppos' description in '__generic_file_aio_write'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix CONFIG_DEBUG_VM_RB description

This appears to be a copy/paste error. Update the description to
reflect extra rbtree debug and checks for the config option instead of
duplicating CONFIG_DEBUG_VM.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: use paravirt friendly ops for NUMA hinting ptes

David Vrabel identified a regression when using automatic NUMA balancing
under Xen whereby page table entries were getting corrupted due to the
use of native PTE operations.  Quoting him

Xen PV guest page tables require that their entries use machine
addresses if the preset bit (_PAGE_PRESENT) is set, and (for
successful migration) non-present PTEs must use pseudo-physical
addresses.  This is because on migration MFNs in present PTEs are
translated to PFNs (canonicalised) so they may be translated back
to the new MFN in the destination domain (uncanonicalised).

pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
set and clear the _PAGE_PRESENT bit using pte_set_flags(),
pte_clear_flags(), etc.

In a Xen PV guest, these functions must translate MFNs to PFNs
when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
_PAGE_PRESENT.

His suggested fix converted p[te|md]_[set|clear]_flags to using
paravirt-friendly ops but this is overkill.  He suggested an alternative
of using p[te|md]_modify in the NUMA page table operations but this is
does more work than necessary and would require looking up a VMA for
protections.

This patch modifies the NUMA page table operations to use paravirt
friendly operations to set/clear the flags of interest.  Unfortunately
this will take a performance hit when updating the PTEs on
CONFIG_PARAVIRT but I do not see a way around it that does not break
Xen.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mips: export flush_icache_range

The lkdtm module performs tests against executable memory ranges, so it
needs to flush the icache for proper behaviors. Other architectures
already export this, so do the same for MIPS.

[akpm@linux-foundation.org: relocate export sites]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()

soft lockup in freeing gigantic hugepage fixed in commit 55f67141a892 "mm:
hugetlb: fix softlockup when a large number of hugepages are freed." can
happen in return_unused_surplus_pages(), so let's fix it.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

wait: explain the shadowing and type inconsistencies

Stick in a comment before someone else tries to fix the sparse warning
this generates.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Shiraz has moved

shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
company. Replace ST's id with shiraz.linux.kernel@gmail.com.

It also updates .mailmap file to fix address for 'git shortlog'.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt

In document numa_memory_policy.txt, the following examples for flag
MPOL_F_RELATIVE_NODES are incorrect.

For example, consider a task that is attached to a cpuset with
mems 2-5 that sets an Interleave policy over the same set with
MPOL_F_RELATIVE_NODES.  If the cpuset's mems change to 3-7, the
interleave now occurs over nodes 3,5-6.  If the cpuset's mems
then change to 0,2-3,5, then the interleave occurs over nodes
0,3,5.

According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
the nodemasks the user specifies should be considered relative to the
current task's mems_allowed.

(https://lkml.org/lkml/2008/2/29/428)

And according to numa_memory_policy.txt, if the user's nodemask includes
nodes that are outside the range of the new set of allowed nodes, then
the remap wraps around to the beginning of the nodemask and, if not
already set, sets the node in the mempolicy nodemask.

So in the example, if the user specifies 2-5, for a task whose
mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
fifth, sixth node in mems_allowed.  like the following:

mems_allowed:       3  4  5  6  7

relative index:     0  1  2  3  4
                    5

So the nodemasks should be remapped to 3,5-7, but not 3,5-6.

And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
remapped to 0,2-3,5, but not 0,3,5.

mems_allowed:       0  2  3  5

        relative index:     0  1  2  3
                            4  5

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

powerpc/mm: fix ".__node_distance" undefined

  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  ...
  Building modules, stage 2.
WARNING: 1 bad relocations
c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
  WRAP    arch/powerpc/boot/zImage.pseries
  WRAP    arch/powerpc/boot/zImage.epapr
  MODPOST 1849 modules
ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....

The reason is symbol "__node_distance" not been exported in powerpc.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>