pandora-kernel.git
10 years agoi40e: shorten wordy fields
Mitch Williams [Wed, 18 Dec 2013 13:45:59 +0000 (13:45 +0000)]
i40e: shorten wordy fields

The alloc_rx_buff_failed and alloc_rx_page_failed variables
are both part of an rx specific structure so just remove
the _rx part of the name.  No functional changes.

Change-ID: Icffa2f5d13c6f2b1e09cf45b9472b83c9dae8fc6
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: accept pf to pf adminq messages
Shannon Nelson [Wed, 18 Dec 2013 13:45:58 +0000 (13:45 +0000)]
i40e: accept pf to pf adminq messages

The admin queue can be used to send messages among the physical
function interfaces.  This adds the code to handle that case.

Change-ID: I0700fcc47e41433131a381f0eb72fc7b01b6bd87
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: remove interrupt on AQ error
Shannon Nelson [Wed, 18 Dec 2013 13:45:57 +0000 (13:45 +0000)]
i40e: remove interrupt on AQ error

Since nearly everything we do is synchronous, using the interrupt-on-error bit
is unnecessary and causing unneeded interrupts.  If anyone wants to use the bit
they can turn it on in individual AQ requests using the cmd_details parameter.

Change-ID: I4690a9c561d3e0836aeadb4f88f8a8702b1d1366
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: release NVM resource reservation on startup
Shannon Nelson [Wed, 18 Dec 2013 13:45:56 +0000 (13:45 +0000)]
i40e: release NVM resource reservation on startup

Call AQ to release any reservation held by this PF on the NVM resource
lock on startup, in order to clear anything that might have been left over from
a previous run.  The lock is only cleared by the requestor calling for it to be
cleared, on power-on, or firmware reset.  This should help limit the need for
rebooting a customer machine if something goes wrong on a firmware update or
some other action.

Change-ID: I8c8473e601d4ef512dda7baa77a6e75f2e5fea49
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Cleanup reconfig rss path
Anjali Singhai Jain [Wed, 18 Dec 2013 13:45:55 +0000 (13:45 +0000)]
i40e: Cleanup reconfig rss path

RSS initialization was doing some extra work, remove the extra
work and any bugs it created when managing number of queues.

Change-ID: Iea75b04a70d73ce76947b6a177ce89ab4899d4c6
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: disable packet split
Jesse Brandeburg [Wed, 18 Dec 2013 13:45:54 +0000 (13:45 +0000)]
i40e: disable packet split

The driver and hardware support splitting packets on headers
but with the use of GRO we don't need the extra bus
overhead, so make this driver more like igb and ixgbe and
disable packet split.

Change-ID: Id42f2c3736baa9d5bdfe1f72d64226e7d8ebd737
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: add a comment on barrier and fix panic on reset
Greg Rose [Wed, 18 Dec 2013 13:45:53 +0000 (13:45 +0000)]
i40e: add a comment on barrier and fix panic on reset

The memory barrier used in maybe_stop_tx can use a comment.

Also add checks to VSI->rx_rings to ensure a kernel panic is not induced.

Change-ID: I48cc1bf1d6cf301818155b737edeef77c0d790c7
Change-ID: I1363a8445fbf521a26267849966296ed55f43ad8
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Fix MAC format in Write MAC address AQ cmd
Kamil Krawczyk [Wed, 18 Dec 2013 13:45:52 +0000 (13:45 +0000)]
i40e: Fix MAC format in Write MAC address AQ cmd

The MAC address format expected by the hardware is in a very specific
format, and the driver was filling in the data incorrectly.

Change-ID: I7bc66505ef459ee347dd3bda68051004c141c689
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Fix GPL header
Greg Rose [Wed, 18 Dec 2013 13:45:51 +0000 (13:45 +0000)]
i40e: Fix GPL header

The GPL header included in each file in the i40e driver doesn't
need to include the "this program" text since this driver
is already part of the larger kernel.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: use kernel specific defines
Jesse Brandeburg [Wed, 18 Dec 2013 13:45:50 +0000 (13:45 +0000)]
i40e: use kernel specific defines

Replace uses of I40E_LENGTH_OF_ADDRESS with ETH_ALEN.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Re-enable interrupt on ICR0
Anjali Singhai Jain [Wed, 18 Dec 2013 13:45:49 +0000 (13:45 +0000)]
i40e: Re-enable interrupt on ICR0

The hardware can occasionally give an interrupt on the misc
queue for which there is no driver work to do.  In that case
the driver was not re-enabling interrupts even though they
were auto masked by hardware.  This left interrupts disabled
on this queue.

Re-enable the interrupt whenever leaving this function.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Wed, 8 Jan 2014 20:04:56 +0000 (15:04 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains three Netfilter updates, they are:

* Fix wrong usage of skb_header_pointer in the DCCP protocol helper that
  has been there for quite some time. It was resulting in copying the dccp
  header to a pointer allocated in the stack. Fortunately, this pointer
  provides room for the dccp header is 4 bytes long, so no crashes have been
  reported so far. From Daniel Borkmann.

* Use format string to print in the invocation of nf_log_packet(), again
  in the DCCP helper. Also from Daniel Borkmann.

* Revert "netfilter: avoid get_random_bytes call" as prandom32 does not
  guarantee enough entropy when being calling this at boot time, that may
  happen when reloading the rule.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobatman-adv: set the isolation mark in the skb if needed
Antonio Quartulli [Sat, 16 Nov 2013 11:03:52 +0000 (12:03 +0100)]
batman-adv: set the isolation mark in the skb if needed

If a broadcast packet is coming from a client marked as
isolated, then mark the skb using the isolation mark so
that netfilter (or any other application) can recognise
them.

The mark is written in the skb based on the mask value:
only bits set in the mask are substitued by those in the
mark value

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: create helper function to get AP isolation status
Antonio Quartulli [Sat, 16 Nov 2013 11:03:51 +0000 (12:03 +0100)]
batman-adv: create helper function to get AP isolation status

The AP isolation status may be evaluated in different spots.
Create an helper function to avoid code duplication.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: extend the ap_isolation mechanism
Antonio Quartulli [Sat, 16 Nov 2013 11:03:50 +0000 (12:03 +0100)]
batman-adv: extend the ap_isolation mechanism

Change the AP isolation mechanism to not only "isolate" WIFI
clients but also all those marked with the more generic
"isolation flag" (BATADV_TT_CLIENT_ISOLA).

The result is that when AP isolation is on any unicast
packet originated by an "isolated" client and directed to
another "isolated" client is dropped at the source node.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: print the new BATADV_TT_CLIENT_ISOLA flag
Antonio Quartulli [Sat, 16 Nov 2013 11:03:49 +0000 (12:03 +0100)]
batman-adv: print the new BATADV_TT_CLIENT_ISOLA flag

Print the new BATADV_TT_CLIENT_ISOLA flag properly in the
Local and Global Translation Table output.

The character 'I' is used in the flags column to indicate
that the entry is marked as isolated.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: mark a local client as isolated when needed
Antonio Quartulli [Sat, 16 Nov 2013 11:03:48 +0000 (12:03 +0100)]
batman-adv: mark a local client as isolated when needed

A client sending packets which mark matches the value
configured via sysfs has to be identified as isolated using
the TT_CLIENT_ISOLA flag.

The match is mask based, meaning that only bits set in the
mask are compared with those in the mark value.

If the configured mask is equal to 0 no operation is
performed.

Such flag is then advertised within the classic client
announcement mechanism.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: add isolation_mark sysfs attribute
Antonio Quartulli [Sat, 16 Nov 2013 11:03:47 +0000 (12:03 +0100)]
batman-adv: add isolation_mark sysfs attribute

This attribute can be used to set and read the value and the
mask of the skb mark which will be used to classify the
source non-mesh client as ISOLATED. In this way a client can
be advertised as such and the mark can potentially be
restored at the receiving node before delivering the skb.

This can be helpful for creating network wide netfilter
policies.

This sysfs file expects a string of the shape "$mark/$mask".
Where $mark has to be a 32-bit number in any base, while
$mask must be a 32bit mask expressed in hex base. Only bits
in $mark covered by the bitmask are really stored.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: send every DHCP packet as bat-unicast
Antonio Quartulli [Tue, 5 Nov 2013 18:31:08 +0000 (19:31 +0100)]
batman-adv: send every DHCP packet as bat-unicast

In different situations it is possible that the DHCP server
or client uses broadcast Ethernet frames to send messages
to each other. The GW component in batman-adv takes care of
using bat-unicast packets to bring broadcast DHCP
Discover/Requests to the "best" server.

On the way back the DHCP server usually sends unicasts,
but upon client request it may decide to use broadcasts as
well.

This patch improves the GW component so that it now snoops
and sends as unicast all the DHCP packets, no matter if they
were generated by a DHCP server or client.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: remove parenthesis from return statements
Antonio Quartulli [Thu, 7 Nov 2013 07:02:07 +0000 (08:02 +0100)]
batman-adv: remove parenthesis from return statements

Remove parenthesis around return expression as suggested by
checkpatch.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: rename gw_deselect() to gw_reselect()
Antonio Quartulli [Mon, 4 Nov 2013 19:59:41 +0000 (20:59 +0100)]
batman-adv: rename gw_deselect() to gw_reselect()

The function batadv_gw_deselect() is actually not deselecting
anything. It is just informing the GW code to perform a
re-election procedure when possible.
The current gateway is not being touched at all and therefore
the name of this function is rather misleading.

Rename it to batadv_gw_reselect() to batadv_gw_reselect()
to make its behaviour easier to grasp.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: deselect current GW on client mode switch off
Antonio Quartulli [Mon, 4 Nov 2013 19:59:40 +0000 (20:59 +0100)]
batman-adv: deselect current GW on client mode switch off

When switching from gw_mode client to either off or server
the current selected gateway has to be deselected.
In this way when client mode is enabled again a gateway
re-election is forced and a GW_ADD event is consequently
sent.

The current behaviour instead is to keep the current gateway
leading to no GW_ADD event when gw_mode client is selected
for a second time

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
10 years agobatman-adv: remove FSF address from GPL disclaimer
Antonio Quartulli [Sun, 3 Nov 2013 19:40:48 +0000 (20:40 +0100)]
batman-adv: remove FSF address from GPL disclaimer

As suggested by checkpatch, remove all the references to the
FSF address since the kernel already has one reference in
its documentation.

In this way it is easier to update it in case of future
changes.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agobatman-adv: don't switch byte order too often if not needed
Antonio Quartulli [Sun, 13 Oct 2013 00:50:17 +0000 (02:50 +0200)]
batman-adv: don't switch byte order too often if not needed

If possible, operations like ntohs/ntohl should not be
performed too often. Use a variable to locally store the
converted value and then use it.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
10 years agobatman-adv: properly rename define in distributed arp table header file
Antonio Quartulli [Sun, 13 Oct 2013 21:02:45 +0000 (23:02 +0200)]
batman-adv: properly rename define in distributed arp table header file

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
10 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
John W. Linville [Wed, 8 Jan 2014 18:44:29 +0000 (13:44 -0500)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

10 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Wed, 8 Jan 2014 06:11:19 +0000 (01:11 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e only.

Anjali adds more functionality to debugfs to assist development and
testing of admin queue commands.

Greg makes sure broadcast promiscuous is disabled by default, otherwise
VLAN tagged packets out of the assigned VLAN domain are received.  Also
provides a fix when the 8021q driver is loaded, so that VLAN 0 tagged
packets are accepted so that upper layers can interpret the priority
bits. Then provides a fix to let the VF to request the PF to set its
already assigned MAC address without generating an error.  Greg also
adds helper functions to enable or disable internal switch loopback
when VFs are created or destroyed via the sysfs interface.

Shannon provides most of the changes, where he adds code to ensure
that the hardware waits to make sure that the firmware is ready as well
after reset.  Also updates the code to use the new features in the
firmware.  Provides a fix while in MFP mode where resources are
reduced, so use a smaller range of test registers than when in SFP mode.
Moves the PF ID initialization code to earlier in the driver
initialization function since a few operations need the information
before the first PF reset is called.  Shannon adds a check for MAC
type before reading anything from the registers to ensure we dealing
with the correct MAC type.  Then reworks Shadow RAM read word/buffer
functions as to not block whole NVM resources for SR read operations.

Mitch lastly provides a fix to correctly setup ARQ descriptors in
the cleanup code.

Catherine bumps the driver version due to all the recent changes.

v2:
 - Added blank lines after local variable declarations to patch 01
   as suggested by David Miller
 - Used the suggested memset() line in patch 15 as suggested by David
   Miller
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoi40e: correctly setup ARQ descriptors
Mitch Williams [Wed, 18 Dec 2013 13:45:48 +0000 (13:45 +0000)]
i40e: correctly setup ARQ descriptors

When cleaning descriptors, we must set up ALL fields, not just the DMA
address. The initial setup does this correctly, but not the cleanup
code, so the firmware would process the ring exactly once and then fail.

Change-ID: I2930b83c76194b3016a8ac0fa693f9a573995640
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: remove redundant AQ enable
Kamil Krawczyk [Wed, 18 Dec 2013 13:45:46 +0000 (13:45 +0000)]
i40e: remove redundant AQ enable

The admin queue length register is updated in
config_a<sq|rq>_regs functions.  We should not update it again,
as we will trigger firmware to init the AQ again. In this case
firmware will lose the information about the AQ Rx tail position
and will see Rx queue as full (no free desc for FW to use).

Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Enable/Disable PF switch LB on SR-IOV configure changes
Greg Rose [Fri, 13 Dec 2013 08:38:38 +0000 (08:38 +0000)]
i40e: Enable/Disable PF switch LB on SR-IOV configure changes

The PF VSI was never updated to enable or disable internal switch loopback
when VFs were created or destroyed via the sysfs interface.  Add some
helper functions to take care of that.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: whitespace paren and comment tweaks
Shannon Nelson [Wed, 18 Dec 2013 05:29:17 +0000 (05:29 +0000)]
i40e: whitespace paren and comment tweaks

Addresses a few code format issues that have crept in over time.

Change-Id: I1a62cbd16b29a218a933b0f7176abe748f9615e8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: rework shadow ram read functions
Shannon Nelson [Wed, 11 Dec 2013 08:17:15 +0000 (08:17 +0000)]
i40e: rework shadow ram read functions

Rework Shadow RAM read word/buffer functions to not use AQ Request
Resource commands.  Requesting resource is not needed for SR read
operations which are done through SRCTL register.  Access to SR through
register is controlled through DONE bit within SRCTL.  With this change
we do not block whole NVM resource for SR read operations.

Change-Id: I73e96cdea39a45ee7b5bdf038e527308de2d9efe
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: check MAC type before any REG access
Shannon Nelson [Wed, 11 Dec 2013 08:17:14 +0000 (08:17 +0000)]
i40e: check MAC type before any REG access

We need to check if we are dealing with the correct MAC type before we
try to read anything from the registers.

Change-Id: I3989516999d06c3009e87d6a2eafc20af305c5c2
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: move PF ID init from PF reset to SC init
Shannon Nelson [Wed, 11 Dec 2013 08:17:13 +0000 (08:17 +0000)]
i40e: move PF ID init from PF reset to SC init

Move PF ID initialization code from PF reset routine to earlier driver
initialization function.  There are a few operations which need the
PF ID before the first PF reset is called.

Change-Id: I7e971f7556b46a837149850ec05ce115c35db575
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Reduce range of interrupt reg in reg test
Shannon Nelson [Wed, 11 Dec 2013 08:17:12 +0000 (08:17 +0000)]
i40e: Reduce range of interrupt reg in reg test

Use a smaller range of test registers in MFP mode as there are fewer
resources than when in SFP mode.

Change-Id: I08424890c3f57b5dde5ee99e99724ce252e0875a
Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: update firmware api to 1.1
Shannon Nelson [Wed, 11 Dec 2013 08:17:11 +0000 (08:17 +0000)]
i40e: update firmware api to 1.1

The firmware's AdminQ interface has matured a little, so update the
code to use the new fields and values.

Change-Id: I8fcd7b443f268dcf9346bd6a9e940fe9c2958891
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Add code to wait for FW to complete in reset path
Shannon Nelson [Wed, 11 Dec 2013 08:17:10 +0000 (08:17 +0000)]
i40e: Add code to wait for FW to complete in reset path

The RSTAT comes back with DEVICE READY to indicate that the HW is ready,
but we still have to wait for the FW to indicate that the Core and Global
modules are back up again.  This needs a read of the NVM_ULD register.

Change-Id: I88276165f9cd446d2f166fb4b8cff00521af4bec
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Bump version
Catherine Sullivan [Thu, 28 Nov 2013 06:42:42 +0000 (06:42 +0000)]
i40e: Bump version

Version update to 0.3.25-k

Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Allow VF to set already assigned MAC address
Greg Rose [Sat, 7 Dec 2013 10:36:54 +0000 (10:36 +0000)]
i40e: Allow VF to set already assigned MAC address

The VF is allowed to request the PF to set its already assigned
MAC address without generating an error.

Change-Id: I8dfdf353396995dbbb26cafab4e42b451911da3d
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Stop accepting any VLAN tag on VLAN 0 filter set
Greg Rose [Thu, 28 Nov 2013 06:42:40 +0000 (06:42 +0000)]
i40e: Stop accepting any VLAN tag on VLAN 0 filter set

When the 8021q driver is loaded it sets VLAN 0 filters on all devices.
This does not mean that any VLAN tagged packet should be accepted.
Instead accept only VLAN 0 tagged packets so that upper layers can
interpret the priority bits.

Change-Id: I17274a540b613749612ffe23a3aef2b8ee6ff6a4
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Do not enable broadcast promiscuous by default
Greg Rose [Thu, 28 Nov 2013 06:42:39 +0000 (06:42 +0000)]
i40e: Do not enable broadcast promiscuous by default

Broadcast promiscuous should only be turned on when general
promiscuous mode is turned on, otherwise VLAN tagged packets out of
the assigned VLAN domain are received.

Add a broadcast MAC filter in order to continue to receive
broadcast traffic on VLANs, MAIN or VMDQ VSI.

Change-Id: I99d8e382a082ee51201228f1226af3b46452ac55
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agoi40e: Expose AQ debugfs hooks
Anjali Singhai Jain [Thu, 28 Nov 2013 06:42:38 +0000 (06:42 +0000)]
i40e: Expose AQ debugfs hooks

Add more functionality to debugfs to assist development
and testing of AQ commands.

adds:
send aq_cmd
send indirect aq_cmd

Change-Id: I01710cddd33110a6c1e1aabf84cb6e93cda4c42a
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
10 years agonet: xfrm: xfrm_policy: silence compiler warning
Ying Xue [Wed, 8 Jan 2014 02:26:39 +0000 (10:26 +0800)]
net: xfrm: xfrm_policy: silence compiler warning

Fix below compiler warning:

net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function]

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'tipc'
David S. Miller [Tue, 7 Jan 2014 23:44:35 +0000 (18:44 -0500)]
Merge branch 'tipc'

Jon Maloy says:

====================
tipc: link setup and failover improvements

This series consists of four unrelated commits with different purposes.

- Commit #1 is purely cosmetic and pedagogic, hopefully making the
  failover/tunneling logics slightly easier to understand.
- Commit #2 fixes a bug that has always been in the code, but was not
  discovered until very recently.
- Commit #3 fixes a non-fatal race issue in the neighbour discovery
  code.
- Commit #4 removes an unnecessary indirection step during link
  startup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: make link start event synchronous
Jon Paul Maloy [Tue, 7 Jan 2014 22:02:44 +0000 (17:02 -0500)]
tipc: make link start event synchronous

When a link is created we delay the start event by launching it
to be executed later in a tasklet. As we hold all the
necessary locks at the moment of creation, and there is no risk
of deadlock or contention, this delay serves no purpose in the
current code.

We remove this obsolete indirection step, and the associated function
link_start(). At the same time, we rename the function tipc_link_stop()
to the more appropriate tipc_link_purge_queues().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: introduce new spinlock to protect struct link_req
Ying Xue [Tue, 7 Jan 2014 22:02:43 +0000 (17:02 -0500)]
tipc: introduce new spinlock to protect struct link_req

Currently, only 'bearer_lock' is used to protect struct link_req in
the function disc_timeout(). This is unsafe, since the member fields
'num_nodes' and 'timer_intv' might be accessed by below three different
threads simultaneously, none of them grabbing bearer_lock in the
critical region:

link_activate()
  tipc_bearer_add_dest()
    tipc_disc_add_dest()
      req->num_nodes++;

tipc_link_reset()
  tipc_bearer_remove_dest()
    tipc_disc_remove_dest()
      req->num_nodes--
      disc_update()
        read req->num_nodes
write req->timer_intv

disc_timeout()
  read req->num_nodes
  read/write req->timer_intv

Without lock protection, the only symptom of a race is that discovery
messages occasionally may not be sent out. This is not fatal, since such
messages are best-effort anyway. On the other hand, since discovery
messages are not time critical, adding a protecting lock brings no
serious overhead either. So we add a new, dedicated spinlock in
order to guarantee absolute data consistency in link_req objects.
This also helps reduce the overall role of the bearer_lock, which
we want to remove completely in a later commit series.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: remove 'has_redundant_link' flag from STATE link protocol messages
Jon Paul Maloy [Tue, 7 Jan 2014 22:02:42 +0000 (17:02 -0500)]
tipc: remove 'has_redundant_link' flag from STATE link protocol messages

The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
protocol messages. Due to an ambiguity in the protocol specification it
is currently also transferred in STATE messages. Its value is used to
initialize a link state variable, 'permit_changeover', which is used
to inhibit futile link failover attempts when it is known that the
peer node has no working links at the moment, although the local node
may still think it has one.

The fact that 'has_redundant_link' incorrectly is read from STATE
messages has the effect that 'permit_changeover' sometimes gets a wrong
value, and permanently blocks any links from being re-established. Such
failures can only occur in in dual-link systems, and are extremely rare.
This bug seems to have always been present in the code.

Furthermore, since commit b4b5610223f17790419b03eaa962b0e3ecf930d7
("tipc: Ensure both nodes recognize loss of contact between them"),
the 'permit_changeover' field serves no purpose any more. The task of
enforcing 'lost contact' cycles at both peer endpoints is now taken
by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
in struct tipc_node to abort unnecessary failover attempts.

We therefore remove the 'has_redundant_link' flag from STATE messages,
as well as the now redundant 'permit_changeover' variable.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotipc: rename functions related to link failover and improve comments
Jon Paul Maloy [Tue, 7 Jan 2014 22:02:41 +0000 (17:02 -0500)]
tipc: rename functions related to link failover and improve comments

The functionality related to link addition and failover is unnecessarily
hard to understand and maintain. We try to improve this by renaming
some of the functions, at the same time adding or improving the
explanatory comments around them. Names such as "tipc_rcv()" etc. also
align better with what is used in other networking components.

The changes in this commit are purely cosmetic, no functional changes
are made.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: skbuff: const-ify casts in skb_queue_* functions
Daniel Borkmann [Tue, 7 Jan 2014 22:23:44 +0000 (23:23 +0100)]
net: skbuff: const-ify casts in skb_queue_* functions

We should const-ify comparisons on skb_queue_* inline helper
functions as their parameters are const as well, so lets not
drop that.

Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: xfrm: xfrm_policy: fix inline not at beginning of declaration
Daniel Borkmann [Tue, 7 Jan 2014 22:20:27 +0000 (23:20 +0100)]
net: xfrm: xfrm_policy: fix inline not at beginning of declaration

Fix three warnings related to:

  net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
  net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
  net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]

Just removing the inline keyword is sufficient as the compiler will
decide on its own about inlining or not.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonetfilter: nft_ct: load both IPv4 and IPv6 conntrack modules for NFPROTO_INET
Patrick McHardy [Mon, 6 Jan 2014 18:09:49 +0000 (18:09 +0000)]
netfilter: nft_ct: load both IPv4 and IPv6 conntrack modules for NFPROTO_INET

The ct expression can currently not be used in the inet family since
we don't have a conntrack module for NFPROTO_INET, so
nf_ct_l3proto_try_module_get() fails. Add some manual handling to
load the modules for both NFPROTO_IPV4 and NFPROTO_IPV6 if the
ct expression is used in the inet family.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agonetfilter: nft_meta: add l4proto support
Patrick McHardy [Fri, 3 Jan 2014 12:16:18 +0000 (12:16 +0000)]
netfilter: nft_meta: add l4proto support

For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agonetfilter: nf_tables: add nfproto support to meta expression
Patrick McHardy [Fri, 3 Jan 2014 12:16:17 +0000 (12:16 +0000)]
netfilter: nf_tables: add nfproto support to meta expression

Needed by multi-family tables to distinguish IPv4 and IPv6 packets.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agonetfilter: nf_tables: add "inet" table for IPv4/IPv6
Patrick McHardy [Fri, 3 Jan 2014 12:16:16 +0000 (12:16 +0000)]
netfilter: nf_tables: add "inet" table for IPv4/IPv6

This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agonetfilter: nf_tables: add support for multi family tables
Patrick McHardy [Fri, 3 Jan 2014 12:16:15 +0000 (12:16 +0000)]
netfilter: nf_tables: add support for multi family tables

Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
10 years agonetfilter: nf_tables: add hook ops to struct nft_pktinfo
Patrick McHardy [Fri, 3 Jan 2014 12:16:14 +0000 (12:16 +0000)]
netfilter: nf_tables: add hook ops to struct nft_pktinfo

Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agonetfilter: nf_tables: make chain types override the default AF functions
Patrick McHardy [Fri, 3 Jan 2014 12:16:13 +0000 (12:16 +0000)]
netfilter: nf_tables: make chain types override the default AF functions

Currently the AF-specific hook functions override the chain-type specific
hook functions. That doesn't make too much sense since the chain types
are a special case of the AF-specific hooks.

Make the AF-specific hook functions the default and make the optional
chain type hooks override them.

As a side effect, the necessary code restructuring reduces the code size,
f.i. in case of nf_tables_ipv4.o:

  nf_tables_ipv4_init_net   |  -24
  nft_do_chain_ipv4         | -113
 2 functions changed, 137 bytes removed, diff: -137

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agonetfilter: nft_reject: fix compilation warning if NF_TABLES_IPV6 is disabled
Pablo Neira Ayuso [Mon, 6 Jan 2014 20:06:30 +0000 (21:06 +0100)]
netfilter: nft_reject: fix compilation warning if NF_TABLES_IPV6 is disabled

net/netfilter/nft_reject.c: In function 'nft_reject_eval':
net/netfilter/nft_reject.c:37:14: warning: unused variable 'net' [-Wunused-variable]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 years agomlx4_en: Select PTP_1588_CLOCK
Shawn Bohrer [Tue, 7 Jan 2014 18:49:17 +0000 (12:49 -0600)]
mlx4_en: Select PTP_1588_CLOCK

Now that mlx4_en includes a PHC driver it must select PTP_1588_CLOCK.

   drivers/built-in.o: In function `mlx4_en_get_ts_info':
>> en_ethtool.c:(.text+0x391a11): undefined reference to `ptp_clock_index'
   drivers/built-in.o: In function `mlx4_en_remove_timestamp':
>> (.text+0x397913): undefined reference to `ptp_clock_unregister'
   drivers/built-in.o: In function `mlx4_en_init_timestamp':
>> (.text+0x397b20): undefined reference to `ptp_clock_register'

Fixes: ad7d4eaed995d ("mlx4_en: Add PTP hardware clock")
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet-gre-gro: Add GRE support to the GRO stack
Jerry Chu [Tue, 7 Jan 2014 18:23:19 +0000 (10:23 -0800)]
net-gre-gro: Add GRE support to the GRO stack

This patch built on top of Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.

The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.

Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.

The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.

Note that commit 60769a5dcd8755715c7143b4571d5c44f01796f1 "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.

I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.

All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)

An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.

The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).

1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%

1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%

1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%

1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%

The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).

2.1 ethtool -K gre1 gro on (hence will use gro_cells)

2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%

2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%

2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%

2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%

2.2 ethtool -K gre1 gro off

2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%

2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%

2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%

2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Do not enable tx-nocache-copy by default
Benjamin Poirier [Tue, 7 Jan 2014 15:11:10 +0000 (10:11 -0500)]
net: Do not enable tx-nocache-copy by default

There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.

1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
        692000±1000 tps
        50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
        693000±1000 tps
        50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

200x netperf -r 14000,14000
tx-nocache-copy off
        86450±80 tps
        50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
        86110±60 tps
        50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

                        throughput  cpu0        cpu1        demand
                        (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off     9402±5      9.4±0.2                 0.80±0.01
tx-nocache-copy on      9403±3      9.85±0.04               0.838±0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off     9401±5      5.83±0.03   5.0±0.1     0.923±0.007
tx-nocache-copy on      9404±2      5.74±0.03   5.523±0.009 0.958±0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      9407.44   2.50     -1.00    0.522   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

       4282.648396 task-clock                #    0.423 CPUs utilized
             9,348 context-switches          #    0.002 M/sec
                88 CPU-migrations            #    0.021 K/sec
               355 page-faults               #    0.083 K/sec
    11,812,797,651 cycles                    #    2.758 GHz                     [82.79%]
     9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle    [82.54%]
     4,579,889,681 stalled-cycles-backend    #   38.77% backend  cycles idle    [67.33%]
     6,053,172,792 instructions              #    0.51  insns per cycle
                                             #    1.49  stalled cycles per insn [83.64%]
       597,275,583 branches                  #  139.464 M/sec                   [83.70%]
         8,960,541 branch-misses             #    1.50% of all branches         [83.65%]

      10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      9412.45   2.15     -1.00    0.449   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

       2847.375441 task-clock                #    0.281 CPUs utilized
            11,632 context-switches          #    0.004 M/sec
                49 CPU-migrations            #    0.017 K/sec
               354 page-faults               #    0.124 K/sec
     7,646,889,749 cycles                    #    2.686 GHz                     [83.34%]
     6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle    [83.31%]
     1,726,460,071 stalled-cycles-backend    #   22.58% backend  cycles idle    [66.55%]
     2,079,702,453 instructions              #    0.27  insns per cycle
                                             #    2.94  stalled cycles per insn [83.22%]
       363,773,213 branches                  #  127.757 M/sec                   [83.29%]
         4,242,732 branch-misses             #    1.17% of all branches         [83.51%]

      10.128449949 seconds time elapsed

CC: Tom Herbert <therbert@google.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv4: loopback device: ignore value changes after device is upped
Jiri Pirko [Tue, 7 Jan 2014 14:55:45 +0000 (15:55 +0100)]
ipv4: loopback device: ignore value changes after device is upped

When lo is brought up, new ifa is created. Then, devconf and neigh values
bitfield should be set so later changes of default values would not
affect lo values.

Note that the same behaviour is in ipv6. Also note that this is likely
not an issue in many distros (for example Fedora 19) because userspace
sets address to lo manually before bringing it up.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoIPv6: add the option to use anycast addresses as source addresses in echo reply
FX Le Bail [Tue, 7 Jan 2014 13:57:27 +0000 (14:57 +0100)]
IPv6: add the option to use anycast addresses as source addresses in echo reply

This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 echo reply. This sysctl is false by default
  to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_en: fix error return code in mlx4_en_get_qp()
Wei Yongjun [Tue, 7 Jan 2014 08:56:07 +0000 (16:56 +0800)]
net/mlx4_en: fix error return code in mlx4_en_get_qp()

Fix to return a negative error code from the error handling
case instead of 0.

Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoBluetooth: Fix 6loWPAN peer lookup
Claudio Takahasi [Tue, 7 Jan 2014 12:07:48 +0000 (09:07 -0300)]
Bluetooth: Fix 6loWPAN peer lookup

This patch fixes peer address lookup for 6loWPAN over Bluetooth Low
Energy links.

ADDR_LE_DEV_PUBLIC, and ADDR_LE_DEV_RANDOM are the values allowed for
"dst_type" field in the hci_conn struct for LE links.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
10 years agoBluetooth: Fix setting Universal/Local bit
Claudio Takahasi [Tue, 7 Jan 2014 12:07:47 +0000 (09:07 -0300)]
Bluetooth: Fix setting Universal/Local bit

This patch fixes the Bluetooth Low Energy Address type checking when
setting Universal/Local bit for the 6loWPAN network device or for the
peer device connection.

ADDR_LE_DEV_PUBLIC or ADDR_LE_DEV_RANDOM are the values allowed for
"src_type" and "dst_type" in the hci_conn struct. The Bluetooth link
type can be obtainned reading the "type" field in the same struct.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
10 years agor8152: correct some messages
Hayes Wang [Tue, 7 Jan 2014 03:18:22 +0000 (11:18 +0800)]
r8152: correct some messages

 - Replace pr_warn_ratelimited() with net_ratelimit() and netdev_warn().
 - Adjust the algnment of some messages.
 - Remove the peroid.
 - Fix some messages don't have terminating newline.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobna: Fix build due to missing use of dma_unmap_len_set()
David S. Miller [Tue, 7 Jan 2014 01:37:41 +0000 (20:37 -0500)]
bna: Fix build due to missing use of dma_unmap_len_set()

> as reported for linux-next of Dec.20, 2013
> when CONFIG_NEED_DMA_MAP_STATE is not enabled:
>
> drivers/net/ethernet/brocade/bna/bnad.c: In function 'bnad_start_xmit':
> drivers/net/ethernet/brocade/bna/bnad.c:3074:26: error: 'struct bnad_tx_vector' has no member named 'dma_len'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agogre_offload: statically build GRE offloading support
Eric Dumazet [Mon, 6 Jan 2014 22:03:07 +0000 (14:03 -0800)]
gre_offload: statically build GRE offloading support

GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.

This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobgmac: fix typos
Hauke Mehrtens [Mon, 6 Jan 2014 22:24:29 +0000 (23:24 +0100)]
bgmac: fix typos

This fixes some typos found by Sergei.

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
David S. Miller [Tue, 7 Jan 2014 00:48:38 +0000 (19:48 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoovs: make functions local
Stephen Hemminger [Tue, 17 Dec 2013 19:22:48 +0000 (19:22 +0000)]
ovs: make functions local

Several functions and datastructures could be local
Found with 'make namespacecheck'

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Compute checksum in skb_gso_segment() if needed
Thomas Graf [Fri, 13 Dec 2013 14:22:22 +0000 (15:22 +0100)]
openvswitch: Compute checksum in skb_gso_segment() if needed

The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Use skb_zerocopy() for upcall
Thomas Graf [Fri, 13 Dec 2013 14:22:21 +0000 (15:22 +0100)]
openvswitch: Use skb_zerocopy() for upcall

Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.

Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.

Cost of upcall is significantly reduced from:
+   7.48%       vhost-8471  [k] memcpy
+   5.57%     ovs-vswitchd  [k] memcpy
+   2.81%       vhost-8471  [k] csum_partial_copy_generic

to:
+   5.72%     ovs-vswitchd  [k] memcpy
+   3.32%       vhost-5153  [k] memcpy
+   0.68%       vhost-5153  [k] skb_zerocopy

(megaflows disabled)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Pass datapath into userspace queue functions
Thomas Graf [Fri, 13 Dec 2013 14:22:20 +0000 (15:22 +0100)]
openvswitch: Pass datapath into userspace queue functions

Allows removing the net and dp_ifindex argument and simplify the
code.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Drop user features if old user space attempted to create datapath
Thomas Graf [Fri, 13 Dec 2013 14:22:19 +0000 (15:22 +0100)]
openvswitch: Drop user features if old user space attempted to create datapath

Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Allow user space to announce ability to accept unaligned Netlink messages
Thomas Graf [Fri, 13 Dec 2013 14:22:18 +0000 (15:22 +0100)]
openvswitch: Allow user space to announce ability to accept unaligned Netlink messages

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agonet: Export skb_zerocopy() to zerocopy from one skb to another
Thomas Graf [Fri, 13 Dec 2013 14:22:17 +0000 (15:22 +0100)]
net: Export skb_zerocopy() to zerocopy from one skb to another

Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: remove duplicated include from flow_table.c
Wei Yongjun [Mon, 16 Dec 2013 06:06:15 +0000 (14:06 +0800)]
openvswitch: remove duplicated include from flow_table.c

Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agonet: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback}
Daniel Borkmann [Tue, 10 Dec 2013 11:02:03 +0000 (12:02 +0100)]
net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback}

As we're only doing a kfree() anyway in the RCU callback, we can
simply use kfree_rcu, which does the same job, and remove the
function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Per cpu flow stats.
Pravin B Shelar [Wed, 30 Oct 2013 00:22:21 +0000 (17:22 -0700)]
openvswitch: Per cpu flow stats.

With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Enable memory mapped Netlink i/o
Thomas Graf [Sat, 30 Nov 2013 12:21:32 +0000 (13:21 +0100)]
openvswitch: Enable memory mapped Netlink i/o

Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.

Benchmark
  * pktgen -> ovs internal port
  * 5M pkts, 5M flows
  * 4 threads, 8 cores

Before:
Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
  74163pps 5339Mb/sec (5339736000bps) errors: 0
+   2.98%     ovs-vswitchd  [k] copy_user_generic_string
+   2.49%     ovs-vswitchd  [k] memcpy
+   1.84%       kpktgend_2  [k] memcpy
+   1.81%       kpktgend_1  [k] memcpy
+   1.81%       kpktgend_3  [k] memcpy
+   1.78%       kpktgend_0  [k] memcpy

After:
Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
  206358pps 14857Mb/sec (14857776000bps) errors: 0
+   2.80%     ovs-vswitchd  [k] memcpy
+   1.31%       kpktgend_2  [k] memcpy
+   1.23%       kpktgend_0  [k] memcpy
+   1.09%       kpktgend_1  [k] memcpy
+   1.04%       kpktgend_3  [k] memcpy
+   0.96%     ovs-vswitchd  [k] copy_user_generic_string

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agonetlink: Avoid netlink mmap alloc if msg size exceeds frame size
Thomas Graf [Sat, 30 Nov 2013 12:21:31 +0000 (13:21 +0100)]
netlink: Avoid netlink mmap alloc if msg size exceeds frame size

An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agogenl: Add genlmsg_new_unicast() for unicast message allocation
Thomas Graf [Sat, 30 Nov 2013 12:21:30 +0000 (13:21 +0100)]
genl: Add genlmsg_new_unicast() for unicast message allocation

Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Silence RCU lockdep checks from flow lookup.
Jesse Gross [Tue, 3 Dec 2013 18:58:53 +0000 (10:58 -0800)]
openvswitch: Silence RCU lockdep checks from flow lookup.

Flow lookup can happen either in packet processing context or userspace
context but it was annotated as requiring RCU read lock to be held. This
also allows OVS mutex to be held without causing warnings.

Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
10 years agoopenvswitch: Change ovs_flow_tbl_lookup_xx() APIs
Andy Zhou [Mon, 25 Nov 2013 18:42:46 +0000 (10:42 -0800)]
openvswitch: Change ovs_flow_tbl_lookup_xx() APIs

API changes only for code readability. No functional chnages.

This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.

Reported by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).
Ben Pfaff [Mon, 25 Nov 2013 18:41:28 +0000 (10:41 -0800)]
openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).

We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.

This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems.  On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.

Compile tested only.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoopenvswitch: Correct comment.
Ben Pfaff [Mon, 25 Nov 2013 18:40:51 +0000 (10:40 -0800)]
openvswitch: Correct comment.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
10 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Mon, 6 Jan 2014 22:37:45 +0000 (17:37 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoBluetooth: Remove rfcomm_carrier_raised()
Gianluca Anzolin [Mon, 6 Jan 2014 20:23:53 +0000 (21:23 +0100)]
Bluetooth: Remove rfcomm_carrier_raised()

Remove the rfcomm_carrier_raised() definition as that function isn't
used anymore.

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
10 years agoBluetooth: Always wait for a connection on RFCOMM open()
Gianluca Anzolin [Mon, 6 Jan 2014 20:23:52 +0000 (21:23 +0100)]
Bluetooth: Always wait for a connection on RFCOMM open()

This patch fixes two regressions introduced with the recent rfcomm tty
rework.

The current code uses the carrier_raised() method to wait for the
bluetooth connection when a process opens the tty.

However processes may open the port with the O_NONBLOCK flag or set the
CLOCAL termios flag: in these cases the open() syscall returns
immediately without waiting for the bluetooth connection to
complete.

This behaviour confuses userspace which expects an established bluetooth
connection.

The patch restores the old behaviour by waiting for the connection in
rfcomm_dev_activate() and removes carrier_raised() from the tty_port ops.

As a side effect the new code also fixes the case in which the rfcomm
tty device is created with the flag RFCOMM_REUSE_DLC: the old code
didn't call device_move() and ModemManager skipped the detection
probe. Now device_move() is always called inside rfcomm_dev_activate().

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Reported-by: Beson Chow <blc+bluez@mail.vanade.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
10 years agoBluetooth: Move rfcomm_get_device() before rfcomm_dev_activate()
Gianluca Anzolin [Mon, 6 Jan 2014 20:23:51 +0000 (21:23 +0100)]
Bluetooth: Move rfcomm_get_device() before rfcomm_dev_activate()

This is a preparatory patch which moves the rfcomm_get_device()
definition before rfcomm_dev_activate() where it will be used.

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
10 years agoBluetooth: Release RFCOMM port when the last user closes the TTY
Gianluca Anzolin [Mon, 6 Jan 2014 20:23:50 +0000 (21:23 +0100)]
Bluetooth: Release RFCOMM port when the last user closes the TTY

This patch fixes a userspace regression introduced by the commit
29cd718b.

If the rfcomm device was created with the flag RFCOMM_RELEASE_ONHUP the
user space expects that the tty_port is released as soon as the last
process closes the tty.

The current code attempts to release the port in the function
rfcomm_dev_state_change(). However it won't get a reference to the
relevant tty to send a HUP: at that point the tty is already destroyed
and therefore NULL.

This patch fixes the regression by taking over the tty refcount in the
tty install method(). This way the tty_port is automatically released as
soon as the tty is destroyed.

As a consequence the check for RFCOMM_RELEASE_ONHUP flag in the hangup()
method is now redundant. Instead we have to be careful with the reference
counting in the rfcomm_release_dev() function.

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Reported-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
10 years agonet_sched: act: action flushing missaccounting
Jamal Hadi Salim [Mon, 23 Dec 2013 13:02:13 +0000 (08:02 -0500)]
net_sched: act: action flushing missaccounting

action flushing missaccounting
Account only for deleted actions

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: Remove unnecessary checks for act->ops
Jamal Hadi Salim [Mon, 23 Dec 2013 13:02:12 +0000 (08:02 -0500)]
net_sched: Remove unnecessary checks for act->ops

Remove unnecessary checks for act->ops
(suggested by Eric Dumazet).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobridge: use DEVICE_ATTR_xx macros
sfeldma@cumulusnetworks.com [Mon, 6 Jan 2014 19:00:44 +0000 (11:00 -0800)]
bridge: use DEVICE_ATTR_xx macros

Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobridge: use spin_lock_bh() in br_multicast_set_hash_max
Curt Brune [Mon, 6 Jan 2014 19:00:32 +0000 (11:00 -0800)]
bridge: use spin_lock_bh() in br_multicast_set_hash_max

br_multicast_set_hash_max() is called from process context in
net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.

br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
which can deadlock the CPU if a softirq that also tries to take the
same lock interrupts br_multicast_set_hash_max() while the lock is
held .  This can happen quite easily when any of the bridge multicast
timers expire, which try to take the same lock.

The fix here is to use spin_lock_bh(), preventing other softirqs from
executing on this CPU.

Steps to reproduce:

1. Create a bridge with several interfaces (I used 4).
2. Set the "multicast query interval" to a low number, like 2.
3. Enable the bridge as a multicast querier.
4. Repeatedly set the bridge hash_max parameter via sysfs.

  # brctl addbr br0
  # brctl addif br0 eth1 eth2 eth3 eth4
  # brctl setmcqi br0 2
  # brctl setmcquerier br0 1

  # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done

Signed-off-by: Curt Brune <curt@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agovxlan: keep original skb ownership
Eric Dumazet [Mon, 6 Jan 2014 17:54:31 +0000 (09:54 -0800)]
vxlan: keep original skb ownership

Sathya Perla posted a patch trying to address following problem :

<quote>
 The vxlan driver sets itself as the socket owner for all the TX flows
 it encapsulates (using vxlan_set_owner()) and assigns it's own skb
 destructor. This causes all tunneled traffic to land up on only one TXQ
 as all encapsulated skbs refer to the vxlan socket and not the original
 socket.  Also, the vxlan skb destructor breaks some functionality for
 tunneled traffic like wmem accounting and as TCP small queues and
 FQ/pacing packet scheduler.
</quote>

I reworked Sathya patch and added some explanations.

vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair
and gain better drop monitor accuracy, by calling kfree_skb() when
appropriate.

The UDP socket used by vxlan to perform encapsulation of xmit packets
do not need to be alive while packets leave vxlan code. Its better
to keep original socket ownership to get proper feedback from qdisc and
NIC layers.

We use skb->sk to

A) control amount of bytes/packets queued on behalf of a socket, but
prior vxlan code did the skb->sk transfert without any limit/control
on vxlan socket sk_sndbuf.

B) security purposes (as selinux) or netfilter uses, and I do not think
anything is prepared to handle vxlan stacked case in this area.

By not changing ownership, vxlan tunnels behave like other tunnels.
As Stephen mentioned, we might do the same change in L2TP.

Reported-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotcp: out_of_order_queue do not use its lock
Eric Dumazet [Mon, 6 Jan 2014 17:36:12 +0000 (09:36 -0800)]
tcp: out_of_order_queue do not use its lock

TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: don't install anycast address for /128 addresses on routers
Hannes Frederic Sowa [Mon, 6 Jan 2014 16:53:14 +0000 (17:53 +0100)]
ipv6: don't install anycast address for /128 addresses on routers

It does not make sense to create an anycast address for an /128-prefix.
Suppress it.

As 32019e651c6fce ("ipv6: Do not leave router anycast address for /127
prefixes.") shows we also may not leave them, because we could accidentally
remove an anycast address the user has allocated or got added via another
prefix.

Cc: François-Xavier Le Bail <fx.lebail@yahoo.com>
Cc: Thomas Haller <thaller@redhat.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>