pandora-kernel.git
9 years agotcp: syncookies: mark cookie_secret read_mostly
Florian Westphal [Tue, 26 Aug 2014 10:55:53 +0000 (12:55 +0200)]
tcp: syncookies: mark cookie_secret read_mostly

only written once.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Fix static checker warning regarding `txdata_ptr'
Yuval Mintz [Tue, 26 Aug 2014 07:24:41 +0000 (10:24 +0300)]
bnx2x: Fix static checker warning regarding `txdata_ptr'

Incorrect checking of array instead of array contents in panic_dump
flow - results of commit e261199872a2 ("bnx2x: Safe bnx2x_panic_dump()").

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: replace strncpy with strlcpy
hayeswang [Tue, 26 Aug 2014 02:08:23 +0000 (10:08 +0800)]
r8152: replace strncpy with strlcpy

Replace the strncpy with strlcpy, and use sizeof to determine the
length.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Update sk_buff flag bit availability comment.
David S. Miller [Wed, 27 Aug 2014 21:39:04 +0000 (14:39 -0700)]
net: Update sk_buff flag bit availability comment.

We lost one when xmit_more was added.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoneigh: document gc_thresh2
stephen hemminger [Mon, 25 Aug 2014 22:05:30 +0000 (15:05 -0700)]
neigh: document gc_thresh2

Missing documentation for gc_thresh2 sysctl.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bnx2x: fix build error with ptp
Randy Dunlap [Mon, 25 Aug 2014 20:11:48 +0000 (13:11 -0700)]
net: bnx2x: fix build error with ptp

bnx2x uses ptp functions, so it should select the provider of
those functions (PTP_1588_CLOCK).  Fixes these build errors:

drivers/built-in.o: In function `__bnx2x_remove':
/home/jim/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:13409:
undefined reference to `ptp_clock_unregister'
drivers/built-in.o: In function `bnx2x_register_phc':
/home/jim/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:13202:
undefined reference to `ptp_clock_register'
drivers/built-in.o: In function `bnx2x_get_ts_info':
/home/jim/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c:3498:
undefined reference to `ptp_clock_index'

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoteam: set IFF_TEAM_PORT priv_flag after rx_handler is registered
Jiri Pirko [Mon, 25 Aug 2014 19:38:27 +0000 (21:38 +0200)]
team: set IFF_TEAM_PORT priv_flag after rx_handler is registered

When one tries to add eth as a port into team and that eth is already in
use by other rx_handler device (macvlan, bond, bridge, ...) a bug in
team_port_add() causes that IFF_TEAM_PORT flag is set before rx_handler
is registered. In between, netdev nofifier is called and
team_device_event() sees IFF_TEAM_PORT and thinks that rx_handler_data
pointer is set to team_port. But it isn't.

Fix this by reordering rx_handler register and IFF_TEAM_PORT priv flag
set so it is very similar to how bonding does this.

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Fixes: 3d249d4ca7 "net: introduce ethernet teaming device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT
Alexei Starovoitov [Mon, 25 Aug 2014 19:27:02 +0000 (12:27 -0700)]
bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT

'shift by register' operations are supported by eBPF interpreter, but were
accidently left out of x64 JIT compiler. Fix it and add a testcase.

Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bnx2x-next'
David S. Miller [Tue, 26 Aug 2014 00:30:27 +0000 (17:30 -0700)]
Merge branch 'bnx2x-next'

Yuval Mintz says:

====================
bnx2x: `fixes' patch-series

This series contains mostly bug fixes, but never the less is intended
for `net-next' and not `net', as:
  - Some of the fixes are quite insignificant [`VF clean statistics',
    `ethtool -d might cause timeout in log'].
  - Some only recently were submitted to `net-next' [`Fix timesync endianity'].
  - Some are not usually compiled as part of the kernel [`Fix stop-on-error'].

Dave - please consider applying this series to `net-next'; If you prefer,
I can break this series into 2 parts [one for `net' and the other for
`net-next'] - but personally I don't see much benefit in it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Fix timesync endianity
Michal Kalderon [Mon, 25 Aug 2014 14:48:33 +0000 (17:48 +0300)]
bnx2x: Fix timesync endianity

Commit eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support")
has a missing conversion to LE32, which will prevent the feature from working
on big endian machines.

Signed-off-by: Michal Kalderon <Michal.Kalderon@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Be more forgiving toward SW GRO
Dmitry Kravkov [Mon, 25 Aug 2014 14:48:32 +0000 (17:48 +0300)]
bnx2x: Be more forgiving toward SW GRO

This introduces 2 new relaxations in the bnx2x driver regarding GRO:
  1. Don't prevent SW GRO if HW GRO is disabled.
  2. If all aggregations are disabled, when GRO configuration changes
     there's no need to perform an inner-reload [since it will have no
     actual effect].

Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: VF clean statistics
Yuval Mintz [Mon, 25 Aug 2014 14:48:31 +0000 (17:48 +0300)]
bnx2x: VF clean statistics

During statistics initialization of a VF we need to clean its statistics.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Fix stop-on-error
Yuval Mintz [Mon, 25 Aug 2014 14:48:30 +0000 (17:48 +0300)]
bnx2x: Fix stop-on-error

When STOP_ON_ERROR is set driver will not compile. Even if it did,
traffic will not pass without this patch as several fields which are
verified by FW/HW on the Tx path are not properly set.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: ethtool -d might cause timeout in log
Dmitry Kravkov [Mon, 25 Aug 2014 14:48:29 +0000 (17:48 +0300)]
bnx2x: ethtool -d might cause timeout in log

This changes slightly the set of registers read during `ethtool -d'.
Without this change, it's possible the HW will generate a grc Attention which
will be logged into system logs as `grc timeout'.

Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: make skb an optional parameter for__skb_flow_dissect()
WANG Cong [Tue, 26 Aug 2014 00:03:47 +0000 (17:03 -0700)]
net: make skb an optional parameter for__skb_flow_dissect()

Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: fix comments for __skb_flow_get_ports()
WANG Cong [Tue, 26 Aug 2014 00:03:46 +0000 (17:03 -0700)]
net: fix comments for __skb_flow_get_ports()

Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoixgbe: support skb->xmit_more in netdev_ops->ndo_start_xmit()
Daniel Borkmann [Sun, 24 Aug 2014 13:42:16 +0000 (15:42 +0200)]
ixgbe: support skb->xmit_more in netdev_ops->ndo_start_xmit()

This implements the deferred tail pointer flush API for the ixgbe
driver. Similar version also proposed longer time ago by Alexander Duyck.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Remove ndo_xmit_flush netdev operation, use signalling instead.
David S. Miller [Mon, 25 Aug 2014 22:51:53 +0000 (15:51 -0700)]
net: Remove ndo_xmit_flush netdev operation, use signalling instead.

As reported by Jesper Dangaard Brouer, for high packet rates the
overhead of having another indirect call in the TX path is
non-trivial.

There is the indirect call itself, and then there is all of the
reloading of the state to refetch the tail pointer value and
then write the device register.

Move to a more passive scheme, which requires very light modifications
to the device drivers.

The signal is a new skb->xmit_more value, if it is non-zero it means
that more SKBs are pending to be transmitted on the same queue as the
current SKB.  And therefore, the driver may elide the tail pointer
update.

Right now skb->xmit_more is always zero.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'is_kdump_kernel'
David S. Miller [Mon, 25 Aug 2014 22:42:25 +0000 (15:42 -0700)]
Merge branch 'is_kdump_kernel'

Amir Vadai says:

====================
Make is_kdump_kernel() accessible from modules

I'm re-spinning this patchset. At the begining it was suggested to use a
different name for the parameter, but at the end [3] the resolution was to
leave it as it is in this patch.

Drivers need to know if running from kdump kernel in order to change their
memory profile - since kdump environment is limited by available memory.
Currently there are drivers that are using reset_devices as suggested in [2].
In [2] it was suggested to use reset_devices, but the context was, to enable
driver to know when the hardware device is needed to be reset, and not if this
is a kdump environment. We think that is_kdump_kernel() is better suited to
select between different memory profiles.

The first patch in this patchset exports a needed symbol in order to make
is_kdump_kernel() accessible from the drivers. The rest of the patches change
from reset_devices to is_kdump_kernel() in 2 networking drivers.

The idea of this patchset was suggested by Vivek Goyal.

Tested (only build) and applied on top of commit 8fc54f6: ("net: use
reciprocal_scale() helper")

[1] - ea1c1af: ("net/mlx4_en: Reduce memory consumption on kdump kernel")
[2] - https://lkml.org/lkml/2011/1/27/341
[3] - http://www.spinics.net/lists/netdev/msg291492.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/bnx2x: Use is_kdump_kernel() to detect kdump kernel
Amir Vadai [Mon, 25 Aug 2014 13:06:54 +0000 (16:06 +0300)]
net/bnx2x: Use is_kdump_kernel() to detect kdump kernel

Use is_kdump_kernel() to detect kdump kernel, instead of
reset_devices.

CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/mlx4: Use is_kdump_kernel() to detect kdump kernel
Amir Vadai [Mon, 25 Aug 2014 13:06:53 +0000 (16:06 +0300)]
net/mlx4: Use is_kdump_kernel() to detect kdump kernel

Use is_kdump_kernel() to detect kdump kernel, instead of reset_devices.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocrash_dump: Make is_kdump_kernel() accessible from modules
Amir Vadai [Mon, 25 Aug 2014 13:06:52 +0000 (16:06 +0300)]
crash_dump: Make is_kdump_kernel() accessible from modules

In order to make is_kdump_kernel() accessible from modules, need to
make elfcorehdr_addr exported.
This was rejected in the past [1] because reset_devices was prefered in
that context (reseting the device in kdump kernel), but now there are
some network drivers that need to reduce memory usage when loaded from
a kdump kernel.  And in that context, is_kdump_kernel() suits better.

[1] - https://lkml.org/lkml/2011/1/27/341

CC: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agostmmac: simple cleanups
Pavel Machek [Mon, 25 Aug 2014 11:31:16 +0000 (13:31 +0200)]
stmmac: simple cleanups

This adds simple cleanups for stmmac, removing test we know is always
true, fixing whitespace, and moving code out of if().

Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: check code with checkpatch.pl
hayeswang [Mon, 25 Aug 2014 07:53:00 +0000 (15:53 +0800)]
r8152: check code with checkpatch.pl

626: CHECK: Alignment should match open parenthesis
 646: CHECK: Alignment should match open parenthesis
 655: CHECK: Alignment should match open parenthesis
 695: CHECK: Alignment should match open parenthesis
 729: CHECK: Alignment should match open parenthesis
 739: CHECK: Alignment should match open parenthesis
 976: WARNING: externs should be avoided in .c files
 1314: CHECK: Alignment should match open parenthesis
 1358: WARNING: networking block comments don't use an empty /* line, use /* Comment...
 1402: WARNING: networking block comments don't use an empty /* line, use /* Comment...
 1521: CHECK: multiple assignments should be avoided
 1775: CHECK: Alignment should match open parenthesis
 1838: CHECK: multiple assignments should be avoided
 1843: CHECK: multiple assignments should be avoided
 1847: CHECK: multiple assignments should be avoided
 1850: WARNING: Missing a blank line after declarations
 1864: CHECK: Alignment should match open parenthesis
 1872: CHECK: braces {} should be used on all arms of this statement
 1906: CHECK: usleep_range is preferred over udelay
 2865: WARNING: networking block comments don't use an empty /* line, use /* Comment...
 3088: CHECK: Alignment should match open parenthesis
 total: 0 errors, 5 warnings, 16 checks, 3567 lines checked

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'ndo_xmit_flush'
David S. Miller [Mon, 25 Aug 2014 06:02:53 +0000 (23:02 -0700)]
Merge branch 'ndo_xmit_flush'

Basic deferred TX queue flushing infrastructure.

Over time, and specifically and more recently at the Networking
Workshop during Kernel SUmmit in Chicago, we have discussed the idea
of having some way to optimize transmits of multiple TX packets at
a time.

There are several areas of overhead that could be amortized with such
schemes.  One has to do with locking and transactional overhead, the
other has to do with device specific costs.

This patch set here is more aimed at device specific costs.

Typically a device queues up a packet in the TX queue and then has to
do something to have the device start processing that new entry.
Sometimes this is composed of doing an MMIO write to a "tail"
register, and in other cases it can involve something as expensive as
a hypervisor call.

The basic setup defined here is that when the driver supports deferred
TX queue flushing, ndo_start_xmit should no longer perform that
operation.  Instead a new operation, ndo_xmit_flush, should do it.

I have converted IGB and virtio_net as example initial users.  The IGB
conversion is tested, virtio_net is not but it does compile :-)

All ndo_start_xmit call sites have been abstracted behind a new helper
called netdev_start_xmit().

This just adds the infrastructure, it does not actually add any
instances of actually doing multiple ndo_start_xmit calls per
ndo_xmit_flush invocation.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovirtio_net: Support netdev_ops->ndo_xmit_flush()
David S. Miller [Sat, 23 Aug 2014 20:18:10 +0000 (13:18 -0700)]
virtio_net: Support netdev_ops->ndo_xmit_flush()

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoigb: Support netdev_ops->ndo_xmit_flush()
David S. Miller [Sat, 23 Aug 2014 00:24:49 +0000 (17:24 -0700)]
igb: Support netdev_ops->ndo_xmit_flush()

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Add ops->ndo_xmit_flush()
David S. Miller [Fri, 22 Aug 2014 23:21:53 +0000 (16:21 -0700)]
net: Add ops->ndo_xmit_flush()

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: White-space cleansing : gaps between function and symbol export
Ian Morris [Sun, 24 Aug 2014 20:53:12 +0000 (21:53 +0100)]
ipv6: White-space cleansing : gaps between function and symbol export

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

This patch removes some blank lines between the end of a function
definition and the EXPORT_SYMBOL_GPL macro in order to prevent
checkpatch warning that EXPORT_SYMBOL must immediately follow
a function.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: White-space cleansing : Structure layouts
Ian Morris [Sun, 24 Aug 2014 20:53:11 +0000 (21:53 +0100)]
ipv6: White-space cleansing : Structure layouts

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

This patch addresses structure definitions, specifically it cleanses the brace
placement and replaces spaces with tabs in a few places.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: White-space cleansing : Line Layouts
Ian Morris [Sun, 24 Aug 2014 20:53:10 +0000 (21:53 +0100)]
ipv6: White-space cleansing : Line Layouts

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ec_bhf: remove excessive debug messages
Darek Marcinkiewicz [Sun, 24 Aug 2014 18:40:16 +0000 (20:40 +0200)]
net: ec_bhf: remove excessive debug messages

This cuts down the number of debug information spit out by
the driver.

Signed-off-by: Dariusz Marcinkiewicz <reksio@newterm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorandom32: improvements to prandom_bytes
Daniel Borkmann [Sat, 23 Aug 2014 15:03:28 +0000 (17:03 +0200)]
random32: improvements to prandom_bytes

This patch addresses a couple of minor items, mostly addesssing
prandom_bytes(): 1) prandom_bytes{,_state}() should use size_t
for length arguments, 2) We can use put_unaligned() when filling
the array instead of open coding it [ perhaps some archs will
further benefit from their own arch specific implementation when
GCC cannot make up for it ], 3) Fix a typo, 4) Better use unsigned
int as type for getting the arch seed, 5) Make use of
prandom_u32_max() for timer slack.

Regarding the change to put_unaligned(), callers of prandom_bytes()
which internally invoke prandom_bytes_state(), don't bother as
they expect the array to be filled randomly and don't have any
control of the internal state what-so-ever (that's also why we
have periodic reseeding there, etc), so they really don't care.

Now for the direct callers of prandom_bytes_state(), which
are solely located in test cases for MTD devices, that is,
drivers/mtd/tests/{oobtest.c,pagetest.c,subpagetest.c}:

These tests basically fill a test write-vector through
prandom_bytes_state() with an a-priori defined seed each time
and write that to a MTD device. Later on, they set up a read-vector
and read back that blocks from the device. So in the verification
phase, the write-vector is being re-setup [ so same seed and
prandom_bytes_state() called ], and then memcmp()'ed against the
read-vector to check if the data is the same.

Akinobu, Lothar and I also tested this patch and it runs through
the 3 relevant MTD test cases w/o any errors on the nandsim device
(simulator for MTD devs) for x86_64, ppc64, ARM (i.MX28, i.MX53
and i.MX6):

  # modprobe nandsim first_id_byte=0x20 second_id_byte=0xac \
                     third_id_byte=0x00 fourth_id_byte=0x15
  # modprobe mtd_oobtest dev=0
  # modprobe mtd_pagetest dev=0
  # modprobe mtd_subpagetest dev=0

We also don't have any users depending directly on a particular
result of the PRNG (except the PRNG self-test itself), and that's
just fine as it e.g. allowed us easily to do things like upgrading
from taus88 to taus113.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Lothar Waßmann <LW@KARO-electronics.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'csums-next'
David S. Miller [Mon, 25 Aug 2014 01:09:58 +0000 (18:09 -0700)]
Merge branch 'csums-next'

Tom Herbert says:

====================
net: Checksum offload changes - Part V

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simplify code

What is in this fifth patch set:

- Added GRO checksum validation functions
- Call the GRO validations functions from TCP and GRE gro_receive
- Perform checksum verification in the UDP gro_receive path using
  GRO functions and add support for gro_receive in UDP6

Changes in V2:

- Change ip_summed to CHECKSUM_UNNECESSARY instead of moving it
  to CHECKSUM_COMPLETE from GRO checksum validation. This avoids
  performance penalty in checksumming bytes which are before the header
  GRO is at.

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)

----

Test results with this patch set are below. I did not notice any
performace regression.

Tests run:
   TCP_STREAM: super_netperf with 200 streams
   TCP_RR: super_netperf with 200 streams and -r 1,1

Device bnx2x (10Gbps):
   No GRE RSS hash (RX interrupts occur on one core)
   UDP RSS port hashing enabled.

* GRE with checksum with IPv4 encapsulated packets
  With fix:
    TCP_STREAM
        9.91% CPU utilization
        5163.78 Mbps
    TCP_RR
        50.64% CPU utilization
        219/347/502 90/95/99% latencies
        834103 tps
  Without fix:
    TCP_STREAM
        10.05% CPU utilization
        5186.22 tps
    TCP_RR
        49.70% CPU utilization
        227/338/486 90/95/99% latencies
        813450 tps

* GRE without checksum with IPv4 encapsulated packets
  With fix:
    TCP_STREAM
        10.18% CPU utilization
        5159 Mbps
    TCP_RR
        51.86% CPU utilization
        214/325/471 90/95/99% latencies
        865943 tps
  Without fix:
    TCP_STREAM
        10.26% CPU utilization
        5307.87 Mbps
    TCP_RR
        50.59% CPU utilization
        224/325/476 90/95/99% latencies
        846429 tps

*** Simulate device returns CHECKSUM_COMPLETE

* VXLAN with checksum
  With fix:
    TCP_STREAM
        13.03% CPU utilization
        9093.9 Mbps
    TCP_RR
        95.96% CPU utilization
        161/259/474 90/95/99% latencies
        1.14806e+06 tps
  Without fix:
    TCP_STREAM
        13.59% CPU utilization
        9093.97 Mbps
    TCP_RR
        93.95% CPU utilization
        160/259/484 90/95/99% latencies
        1.10262e+06 tps

* VXLAN without checksum
  With fix:
    TCP_STREAM
        13.28% CPU utilization
        9093.87 Mbps
    TCP_RR
        95.04% CPU utilization
        155/246/439 90/95/99% latencies
        1.15e+06 tps
  Without fix:
    TCP_STREAM
        13.37% CPU utilization
        9178.45 Mbps
    TCP_RR
        93.74% CPU utilization
        161/257/469 90/95/99% latencies
        1.1068e+06 Mbps
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agogre: When GRE csum is present count as encap layer wrt csum
Tom Herbert [Fri, 22 Aug 2014 20:34:52 +0000 (13:34 -0700)]
gre: When GRE csum is present count as encap layer wrt csum

In GRE demux if the GRE checksum pop rcv encapsulation so that any
encapsulated checksums are treated as tunnel checksums.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoudp: additional GRO support
Tom Herbert [Fri, 22 Aug 2014 20:34:44 +0000 (13:34 -0700)]
udp: additional GRO support

Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: Call skb_gro_checksum_validate
Tom Herbert [Fri, 22 Aug 2014 20:34:30 +0000 (13:34 -0700)]
tcp: Call skb_gro_checksum_validate

In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP
checksum in the gro context.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agogre: call skb_gro_checksum_simple_validate
Tom Herbert [Fri, 22 Aug 2014 20:34:22 +0000 (13:34 -0700)]
gre: call skb_gro_checksum_simple_validate

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: add gro_compute_pseudo functions
Tom Herbert [Fri, 22 Aug 2014 20:34:04 +0000 (13:34 -0700)]
net: add gro_compute_pseudo functions

Add inet_gro_compute_pseudo and ip6_gro_compute_pseudo. These are
the logical equivalents of inet_compute_pseudo and ip6_compute_pseudo
for GRO path. The IP header is taken from skb_gro_network_header.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: skb_gro_checksum_* functions
Tom Herbert [Fri, 22 Aug 2014 20:33:47 +0000 (13:33 -0700)]
net: skb_gro_checksum_* functions

Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check,
and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete.
These are the cognates of the normal checksum functions but are used
in the gro_receive path and operate on GRO related fields in sk_buffs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: use reciprocal_scale() helper
Daniel Borkmann [Sat, 23 Aug 2014 18:58:54 +0000 (20:58 +0200)]
net: use reciprocal_scale() helper

Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Allow raw buffers to be passed into the flow dissector.
David S. Miller [Sat, 23 Aug 2014 19:13:41 +0000 (12:13 -0700)]
net: Allow raw buffers to be passed into the flow dissector.

Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.

For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s).  The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).

However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently.  The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.

Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.

So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bcm7xxx_apd_eee'
David S. Miller [Sat, 23 Aug 2014 18:39:24 +0000 (11:39 -0700)]
Merge branch 'bcm7xxx_apd_eee'

Florian Fainelli says:

====================
net: phy: bcm7xxx: APD and EEE support

This patch series enables Auto-power down and EEE for the BCM7xxx integrated
Gigabit PHYs.

I also put a fix for the fixed PHY that would allow clause 45 over clause 22
reads/writes but would return bogus data by using e.g: ethtool --show-eee
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: bcm7xxx: enable EEE at the PHY level
Florian Fainelli [Sat, 23 Aug 2014 01:55:45 +0000 (18:55 -0700)]
net: phy: bcm7xxx: enable EEE at the PHY level

The 28nm Gigabit PHY on BCM7xxx chips comes out of reset with absolutely
no EEE capabilities, such that we would actually return that we do not
support EEE when accessing 3.20 (MDIO_PCS_EEE_ABLE) registers.

Poke through the vendor-specific C45 register to enable EEE globally at
the PHY level, and advertise supported EEE modes.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: allow phy_init_eee() to work with internal PHYs
Florian Fainelli [Sat, 23 Aug 2014 01:55:44 +0000 (18:55 -0700)]
net: phy: allow phy_init_eee() to work with internal PHYs

Internal PHYs do not have any specific phy_interface_t defined because
they are within an Ethernet MAC or a larger IC, they will fail the early
check in phy_init_eee(). Allow these PHYs to proceed with EEE
initialization and report error/success by checking the standard C45
EEE-related registers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: export phy_{read,write}_mmd_indirect
Florian Fainelli [Sat, 23 Aug 2014 01:55:43 +0000 (18:55 -0700)]
net: phy: export phy_{read,write}_mmd_indirect

Some PHY drivers might need to access Clause 45 registers in Clause 22
compatibility mode to e.g: properly advertise EEE support when disabled
by default.

Export these two helper functions: phy_read_mmd_indirect() and
phy_write_mmd_indirect() for drivers to use them.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: fixed: return an error for Clause 45 over 22 reads
Florian Fainelli [Sat, 23 Aug 2014 01:55:42 +0000 (18:55 -0700)]
net: phy: fixed: return an error for Clause 45 over 22 reads

The fixed PHY driver does not properly emulate Clause 45 over Clause 22
MDIO reads, and as such, will return bogus values when we access such
registers.

Return an error when accessing these registers in order to prevent
advertising bogus capabilities such as EEE support and such.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: bcm7xxx: enable auto power down
Florian Fainelli [Sat, 23 Aug 2014 01:55:41 +0000 (18:55 -0700)]
net: phy: bcm7xxx: enable auto power down

The 28nm process BCM7xxx internal Gigabit PHYs all support automatic
power down, turn on that feature as part of the configuration
initialization callback.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: broadcom: move shadow 0x1C register accessors to brcmphy.h
Florian Fainelli [Sat, 23 Aug 2014 01:55:40 +0000 (18:55 -0700)]
net: phy: broadcom: move shadow 0x1C register accessors to brcmphy.h

The shadow register 0x1C is used both by the BCM54xxx PHYs and the
BCM7xxx internal PHYs, move the accessors to a common location so both
drivers can use them.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: broadcom: extract all registers to brcmphy.h
Florian Fainelli [Sat, 23 Aug 2014 01:55:39 +0000 (18:55 -0700)]
net: phy: broadcom: extract all registers to brcmphy.h

Commit 439d39a9ac8fbbba9c04581361188f33f21ced50 ("net: phy: broadcom:
extract register definitions") added a bunch of registers to brcmphy.h
but left some to broadcom.c, move all of them to the header file since
the BCM54xx and BCM7xxx PHY drivers do share all of these registers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc-next'
David S. Miller [Sat, 23 Aug 2014 18:18:41 +0000 (11:18 -0700)]
Merge branch 'tipc-next'

Jon Maloy says:

====================
tipc: Merge port and socket layer code

After the removal of the TIPC native interface, there is no reason to
keep a distinction between a "generic" port layer and a "specific"
socket layer in the code. Throughout the last months, we have posted
several series that aimed at facilitating removal of the port layer,
and in particular the port_lock spinlock, which in reality duplicates
the role normally kept by lock_sock()/bh_lock_sock().

In this series, we finalize this work, by making a significant number of
changes to the link, node, port and socket code, all with the aim of
reducing dependencies between the layers. In the final commits, we then
remove the port spinlock, port.c and port.h altogether.

After this series, we have a socket layer that has only few dependencies
to the rest of the stack, so that it should be possible to continue
cleanups of its code without significantly affecting other code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: merge struct tipc_port into struct tipc_sock
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:20 +0000 (18:09 -0400)]
tipc: merge struct tipc_port into struct tipc_sock

We complete the merging of the port and socket layer by aggregating
the fields of struct tipc_port directly into struct tipc_sock, and
moving the combined structure into socket.c.

We also move all functions and macros that are not any longer
exposed to the rest of the stack into socket.c, and rename them
accordingly.

Despite the size of this commit, there are no functional changes.
We have only made such changes that are necessary due of the removal
of struct tipc_port.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove files ref.h and ref.c
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:19 +0000 (18:09 -0400)]
tipc: remove files ref.h and ref.c

The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove include file port.h
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:18 +0000 (18:09 -0400)]
tipc: remove include file port.h

We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.

We move struct tipc_port and some macros to socket.h.

Finally, we remove the file port.h.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove source file port.c
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:17 +0000 (18:09 -0400)]
tipc: remove source file port.c

In this commit, we move the remaining functions in port.c to
socket.c, and give them new names that correspond to their new
location. We then remove the file port.c.

There are only cosmetic changes to the moved functions.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: remove port_lock
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:16 +0000 (18:09 -0400)]
tipc: remove port_lock

In previous commits we have reduced usage of port_lock to a minimum,
and complemented it with usage of bh_lock_sock() at the remaining
locations. The purpose has been to remove this lock altogether, since
it largely duplicates the role of bh_lock_sock. We are now ready to do
this.

However, we still need to protect the BH callers from inadvertent
release of the socket while they hold a reference to it. We do this by
replacing port_lock by a combination of a rw-lock protecting the
reference table as such, and updating the socket reference counter while
the socket is referenced from BH. This technique is more standard and
comprehensible than the previous approach, and turns out to have a
positive effect on overall performance.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: replace port pointer with socket pointer in registry
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:15 +0000 (18:09 -0400)]
tipc: replace port pointer with socket pointer in registry

In order to make tipc_sock the only entity referencable from other
parts of the stack, we add a tipc_sock pointer instead of a tipc_port
pointer to the registry. As a consequence, we also let the function
tipc_port_lock() return a pointer to a tipc_sock instead  of a tipc_port.
We keep the function's name for now, since the lock still is owned by
the port.

This is another step in the direction of eliminating port_lock, replacing
its usage with lock_sock() and bh_lock_sock().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: use registry when scanning sockets
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:14 +0000 (18:09 -0400)]
tipc: use registry when scanning sockets

The functions tipc_port_get_ports() and tipc_port_reinit() scan over
all sockets/ports to access each of them. This is done by using a
dedicated linked list, 'tipc_socks' where all sockets are members. The
list is in turn protected by a spinlock, 'port_list_lock', while each
socket is locked by using port_lock at the moment of access.

In order to reduce complexity and risk of deadlock, we want to get
rid of the linked list and the accompanying spinlock.

This is what we do in this commit. Instead of the linked list, we use
the port registry to scan across the sockets. We also add usage of
bh_lock_sock() inside the scope of port_lock in both functions, as a
preparation for the complete removal of port_lock.

Finally, we move the functions from port.c to socket.c, and rename them
to tipc_sk_sock_show() and tipc_sk_reinit() repectively.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: eliminate functions tipc_port_init and tipc_port_destroy
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:13 +0000 (18:09 -0400)]
tipc: eliminate functions tipc_port_init and tipc_port_destroy

After the latest changes to the socket/port layer the existence of
the functions tipc_port_init() and tipc_port_destroy() cannot be
justified. They are both called only once, from tipc_sk_create() and
tipc_sk_delete() respectively, and their functionality can better be
merged into the latter two functions.

This also entails that all remaining references to port_lock now are
made from inside socket.c, something that will make it easier to remove
this lock.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: redefine message acknowledge function
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:12 +0000 (18:09 -0400)]
tipc: redefine message acknowledge function

The function tipc_acknowledge() is a remnant from the obsolete native
API. Currently, it grabs port_lock, before building an acknowledge
message and sending it to the peer.

Since all access to socket members now is protected by the socket lock,
it has become unnecessary to grab port_lock here.

In this commit, we remove the usage of port_lock, simplify the
function, and move it to socket.c, renaming it to tipc_sk_send_ack().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: eliminate port_connect()/port_disconnect() functions
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:11 +0000 (18:09 -0400)]
tipc: eliminate port_connect()/port_disconnect() functions

tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete
native API. Their only task is to grab port_lock and call the functions
__tipc_port_connect()/__tipc_port_disconnect() respectively, which will
perform the actual state change.

Since socket/port exection now is single-threaded the use of port_lock
is not needed any more, so we can safely replace the two functions with
their lock-free counterparts.

In this commit, we remove the two functions. Furthermore, the contents
of __tipc_port_disconnect() is so trivial that we choose to eliminate
that function too, expanding its functionality into tipc_shutdown().
__tipc_port_connect() is simplified, moved to socket.c, and given the
more correct name tipc_sk_finish_conn(). Finally, we eliminate the
function auto_connect(), and expand its contents into filter_connect().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: eliminate function tipc_port_shutdown()
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:10 +0000 (18:09 -0400)]
tipc: eliminate function tipc_port_shutdown()

tipc_port_shutdown() is a remnant from the now obsolete native
interface. As such it grabs port_lock in order to protect itself
from concurrent BH processing.

However, after the recent changes to the port/socket upcalls, sockets
are now basically single-threaded, and all execution, except the read-only
tipc_sk_timer(), is executing within the protection of lock_sock(). So
the use of port_lock is not needed here.

In this commit we eliminate the whole function, and merge it into its
only caller, tipc_shutdown().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: clean up socket timer function
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:09 +0000 (18:09 -0400)]
tipc: clean up socket timer function

The last remaining BH upcall to the socket, apart for the message
reception function tipc_sk_rcv(), is the timer function.

We prefer to let this function continue executing in BH, since it only
does read-acces to semi-permanent data, but we make three changes to it:

1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope
   of port_lock.  This is a preparation for replacing port_lock with
   bh_lock_sock() at the locations where it is still used.

2) We move the function from port.c to socket.c, as a further step
   of eliminating the port code level altogether.

3) We let it make use of the newly introduced tipc_msg_create()
   function. This enables us to get rid of three context specific
   functions (port_create_self_abort_msg() etc.) in port.c

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: use message to abort connections when losing contact to node
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:08 +0000 (18:09 -0400)]
tipc: use message to abort connections when losing contact to node

In the current implementation, each 'struct tipc_node' instance keeps
a linked list of those ports/sockets that are connected to the node
represented by that struct. The purpose of this is to let the node
object know which sockets to alert when it loses contact with its peer
node, i.e., which sockets need to have their connections aborted.

This entails an unwanted direct reference from the node structure
back to the port/socket structure, and a need to grab port_lock
when we have to make an upcall to the port. We want to get rid of
this unecessary BH entry point into the socket, and also eliminate
its use of port_lock.

In this commit, we instead let the node struct keep list of "connected
socket" structs, which each represents a connected socket, but is
allocated independently by the node at the moment of connection. If
the node loses contact with its peer node, the list is traversed, and
a "connection abort" message is created for each entry in the list. The
message is sent to it respective connected socket using the ordinary
data path, and the receiving socket aborts its connections upon reception
of the message.

This enables us to get rid of the direct reference from 'struct node' to
´struct port', and another unwanted BH access point to the latter.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: use pseudo message to wake up sockets after link congestion
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:07 +0000 (18:09 -0400)]
tipc: use pseudo message to wake up sockets after link congestion

The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: introduce new function tipc_msg_create()
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:06 +0000 (18:09 -0400)]
tipc: introduce new function tipc_msg_create()

The function tipc_msg_init() has turned out to be of limited value
in many cases. It take too few parameters to be usable for creating
a complete message, it makes too many assumptions about what the
message should be used for, and it does not allocate any buffer to
be returned to the caller.

Therefore, we now introduce the new function tipc_msg_create(), which
takes all the parameters needed to create a full message, and returns
a buffer of the requested size. The new function will be very useful
for the changes we will be doing in later commits in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sat, 23 Aug 2014 18:12:08 +0000 (11:12 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pulling to get some TIPC fixes that a net-next series depends
upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: improve undo on timeout
Yuchung Cheng [Fri, 22 Aug 2014 21:15:22 +0000 (14:15 -0700)]
tcp: improve undo on timeout

Upon timeout, undo (via both timestamps/Eifel and DSACKs) was
disabled if any retransmits were still in flight.  The concern was
perhaps that spurious retransmission sent in a previous recovery
episode may trigger DSACKs to falsely undo the current recovery.

However, this inadvertently misses undo opportunities (using either
TCP timestamps or DSACKs) when timeout occurs during a loss episode,
i.e.  recurring timeouts or timeout during fast recovery. In these
cases some retransmissions will be in flight but we should allow
undo. Furthermore, we should only reset undo_marker and undo_retrans
upon timeout if we are starting a new recovery episode. Finally,
when we do reset our undo state, we now do so in a manner similar
to tcp_enter_recovery(), so that we require a DSACK for each of
the outstsanding retransmissions. This will achieve the original
goal by requiring that we receive the same number of DSACKs as
retransmissions.

This patch increases the undo events by 50% on Google servers.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agophylib: use MDIO_DEVS[12]
Sergei Shtylyov [Fri, 22 Aug 2014 19:56:47 +0000 (23:56 +0400)]
phylib: use MDIO_DEVS[12]

The bare register numbers are used despite <uapi/linux/mdio.h> has MDIO_DEVS[12]
#define'd for those.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: remove dead code after sk_data_ready change
Eric Dumazet [Sat, 23 Aug 2014 03:30:12 +0000 (20:30 -0700)]
net: remove dead code after sk_data_ready change

As a followup to commit 676d23690fb ("net: Fix use after free by
removing length arg from sk_data_ready callbacks"), we can remove
some useless code in sock_queue_rcv_skb() and rxrpc_queue_rcv_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: use ktime_get_ns() and ktime_get_real_ns() helpers
Eric Dumazet [Sat, 23 Aug 2014 01:32:09 +0000 (18:32 -0700)]
net: use ktime_get_ns() and ktime_get_real_ns() helpers

ktime_get_ns() replaces ktime_to_ns(ktime_get())

ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real())

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovxlan: fix incorrect initializer in union vxlan_addr
Gerhard Stenzel [Fri, 22 Aug 2014 19:34:16 +0000 (21:34 +0200)]
vxlan: fix incorrect initializer in union vxlan_addr

The first initializer in the following

        union vxlan_addr ipa = {
            .sin.sin_addr.s_addr = tip,
            .sa.sa_family = AF_INET,
        };

is optimised away by the compiler, due to the second initializer,
therefore initialising .sin.sin_addr.s_addr always to 0.
This results in netlink messages indicating a L3 miss never contain the
missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
The problem affects user space programs relying on an IP address being
sent as part of a netlink message indicating a L3 miss.

Changing
            .sa.sa_family = AF_INET,
to
            .sin.sin_family = AF_INET,
fixes the problem.

Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'linux-can-next-for-3.18-20140820' of git://gitorious.org/linux-can/linux...
David S. Miller [Sat, 23 Aug 2014 02:42:25 +0000 (19:42 -0700)]
Merge tag 'linux-can-next-for-3.18-20140820' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2014-08-20

this is a pull request of 10 patches for net-next/master.

There is one patch by Wolfram Sang to clean up the build system.
Two patches by Stefan Agner that add vf610 support to the flexcan
driver. Dong Aisheng add support for bosch's m_can core, which is found
in the new freescale ARM SoCs. Sergei Shtylyov improves the rcar_can
driver by supporting all input clocks and adding device tree support.
The next patch is a small cleanup for the bit rate calculation function
by Lad, Prabhakar. And finally a patch by Himangi Saraogi, which
converts the mcp251x driver to use dmam_alloc_coherent.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry...
Linus Torvalds [Fri, 22 Aug 2014 21:50:21 +0000 (14:50 -0700)]
Merge tag 'pwm/for-3.17-rc2' of git://git./linux/kernel/git/thierry.reding/linux-pwm

Pull pwm fix from Thierry Reding:
 "Just one bugfix for the PWM lookup table code that would cause a PWM
  channel to be set to the wrong period and polarity for non-perfect
  matches"

* tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: Fix period and polarity in pwm_get() for non-perfect matches

9 years agomac80211: fix channel switch for chanctx-based drivers
Michal Kazior [Mon, 18 Aug 2014 11:19:09 +0000 (13:19 +0200)]
mac80211: fix channel switch for chanctx-based drivers

The new_ctx pointer is set only for non-chanctx drivers.  This yielded a
crash for chanctx-based drivers during channel switch finalization:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]

Use an adequate chanctx pointer to fix this.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 22 Aug 2014 21:33:18 +0000 (14:33 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Here are some bug fixes that have piled up during ksummit/linuxcon.

   1) Fix endian problems in ibmveth, from Anton Blanchard.

   2) IPV6 routing code does GFP_KERNEL allocation in atomic, fix from
      Benjamin Block.

   3) SCTP association fixes from Daniel Borkmann.

   4) When multiple VLAN headers are present we have to make sure the
      second and subsequent ones are pullable in the SKB otherwise we
      blindly dereference garbage.  From Jiri Benc.

   5) The argument adjustment of the signature of hlist_add_after*()
      introduced a regression in the batman-adv code, fix from Sven
      Eckelmann.

   6) Fix TX hang handling to avoid a panic in i40e, from Anjali Singhai
      Jain.

   7) PTP flag test is inverted in i40e driver, from Jesse Brandeburg.

   8) ATM LEC driver needs to hold RTNL mutex over MTU changes, from
      Chas Williams.

   9) Truncate packets larger then the TPACKET_V3 format configured
      buffers, otherwise we overwrite past the end of said buffers.
      From Eric Dumazet.

  10) Fix endianness bugs in qlcnic firmware handling, from Rajesh
      Borundia and Shahed Shaikh.

  11) CXGB4 sometimes doesn't get all of the TX completion events it
      should resulting in SKBs getting stuck in the TX queue, from
      Hariprasad Shenai.

  12) When the FEC chip's PTP clock is disabled, you can't access the
      register.  Add necessary checks to avoid the resulting hang, from
      Fugang Duan"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
  drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef
  net: sctp: fix suboptimal edge-case on non-active active/retrans path selection
  net: sctp: spare unnecessary comparison in sctp_trans_elect_best
  net: ethernet: broadcom: bnx2x: Remove redundant #ifdef
  ibmveth: Fix endian issues with rx_no_buffer statistic
  net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings()
  openvswitch: fix panic with multiple vlan headers
  net: ipv6: fib: don't sleep inside atomic lock
  net: fec: ptp: avoid register access when ipg clock is disabled
  cxgb4: Free completed tx skbs promptly
  cxgb4: Fix race condition in cleanup
  sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe
  bnx2x: Revert UNDI flushing mechanism
  qlcnic: Fix endianess issue in firmware load from file operation
  qlcnic: Fix endianess issue in FW dump template header
  qlcnic: Fix flash access interface to application
  MAINTAINERS: Add section for MRF24J40 IEEE 802.15.4 radio driver
  macvlan: Allow setting multicast filter on all macvlan types
  packet: handle too big packets for PACKET_V3
  MAINTAINERS: add entry for ec_bhf driver
  ...

9 years agodp83640: Fix length check for event timestamp status messages
Christian Riesch [Thu, 21 Aug 2014 13:17:04 +0000 (15:17 +0200)]
dp83640: Fix length check for event timestamp status messages

Event timestamp status messages have a variable length, ranging from
1 to 5 words (16 bit words). The current code however requires
a minimum message length of sizeof(*phy_txts). In most cases this
condition is fulfilled due to padding bytes. However, if several events
are signaled in a single message, padding bytes may not be present.
For short event timestamp status messages, the length check will fail,
and the event timestamp will be dropped.

Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: stmmac: add fix_mac_speed support for socfpga
Ley Foon Tan [Wed, 20 Aug 2014 06:33:33 +0000 (14:33 +0800)]
net: stmmac: add fix_mac_speed support for socfpga

This patch adds fix_mac_speed() support for
Altera socfpga Ethernet controller. Emac splitter is a
soft IP core in FPGA system that converts GMII interface from
Synopsys mac to RGMII/SGMII interface. This splitter core is
an optional IP if user would like to use RGMII/SGMII
interface in their system. Software needs to update a register
in splitter core when there is speed change.

Signed-off-by: Ley Foon Tan <lftan@altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8169:add support for RTL8168H and RTL8107E
Chun-Hao Lin [Tue, 19 Aug 2014 17:54:04 +0000 (01:54 +0800)]
r8169:add support for RTL8168H and RTL8107E

RTL8168H is Realtek PCIe Gigabit Ethernet controller.
RTL8107E is Realtek PCIe Fast Ethernet controller.

This patch add support for these two chips.

Signed-off-by: Chun-Hao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: create netlink event when bonding option is changed
Jiri Pirko [Tue, 19 Aug 2014 14:02:12 +0000 (16:02 +0200)]
bonding: create netlink event when bonding option is changed

Userspace needs to be notified if one changes some option.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bnx2x-next'
David S. Miller [Fri, 22 Aug 2014 19:31:24 +0000 (12:31 -0700)]
Merge branch 'bnx2x-next'

Yuval Mintz says:

====================
bnx2x: Start utilizing 7.10.51

This series will enable bnx2x to start utlizing its 7.10.51 FW.
In addition, it will also add timestamping support, as well as a couple
of routine semantic cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: FW assertion changes
Ariel Elior [Sun, 17 Aug 2014 13:47:51 +0000 (16:47 +0300)]
bnx2x: FW assertion changes

This is mostly a semantic change which modifies the code parsing and printing
of FW asserts.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Make BP_VF more robust
Yuval Mintz [Sun, 17 Aug 2014 13:47:50 +0000 (16:47 +0300)]
bnx2x: Make BP_VF more robust

Prevent dereference of pointer in case it's NULL.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Prevent pci_disable_sriov with assigned VFs
Yuval Mintz [Sun, 17 Aug 2014 13:47:49 +0000 (16:47 +0300)]
bnx2x: Prevent pci_disable_sriov with assigned VFs

Trying to disable sriov when VFs are assigned may lead to all kinds of problems.
This patch unifies the call in the driver to pci_disable_sriov() and prevents
them if some of the PF's child VFs are marked as assigned.

[Notice this is a bad scenario either way; User should not reach a point where
the OS tries to disable SRIOV when a VF is assigned - but currently there's no
way of preventing the user from doing so, and the ill-effect for the driver is
smaller this way]

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Prevent IOV if no entries in CAM
Yuval Mintz [Sun, 17 Aug 2014 13:47:48 +0000 (16:47 +0300)]
bnx2x: Prevent IOV if no entries in CAM

It's possible there's a bad chip configuration which will result with
PCIe IOV capabilities, but with no available interrupts for VFs.

In such case, we want to gracefully prevent the PF from initializing its
IOV capabilities rather than encounter difficulties further along the way.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Safe bnx2x_panic_dump()
Yuval Mintz [Sun, 17 Aug 2014 13:47:47 +0000 (16:47 +0300)]
bnx2x: Safe bnx2x_panic_dump()

The bnx2x panic dump spills a lot of information from the driver's
fastpath, but may be called while some of the fastpath is uninitialized.

This patch verifies that pointers are already allocated before dereferencing
them to prevent possible kernel panics.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Update driver version to 1.710.51
Yuval Mintz [Sun, 17 Aug 2014 13:47:46 +0000 (16:47 +0300)]
bnx2x: Update driver version to 1.710.51

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Code cleanup
Yuval Mintz [Sun, 17 Aug 2014 13:47:45 +0000 (16:47 +0300)]
bnx2x: Code cleanup

This patch does several semantic things:
  - Fixing typos.
  - Removing unnecessary prints.
  - Removing unused functions and definitions.
  - Change 'strange' usage of boolean variables.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Add timestamping and PTP hardware clock support
Michal Kalderon [Sun, 17 Aug 2014 13:47:44 +0000 (16:47 +0300)]
bnx2x: Add timestamping and PTP hardware clock support

This adds a PHC to the bnx2x driver. Driver supports timestamping send/receive
PTP packets, as well as adjusting the on-chip clock.

The driver has been tested with linuxptp project.

Signed-off-by: Michal Kalderon <Michal.Kalderon@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobnx2x: Utilize FW 7.10.51
Dmitry Kravkov [Sun, 17 Aug 2014 13:47:43 +0000 (16:47 +0300)]
bnx2x: Utilize FW 7.10.51

 - (L2) In some multi-function configurations, inter-PF and inter-VF
   Tx switching is incorrectly enabled.

 - (L2) Wrong assert code in FLR final cleanup in case it is sent not
   after FLR.

 - (L2) Chip may stall in very rare cases under heavy traffic with FW GRO
   enabled.

 - (L2) VF malicious notification error fixes.

 - (L2) Default gre tunnel to IPGRE which allows proper RSS for IPGRE packets,
   L2GRE traffic will reach single queue.

 - (FCoE) Fix data being placed in wrong buffer when corrupt FCoE frame is
   received.

 - (FCoE) Burst of FIP packets with destination MAC of ALL-FCF_MACs
   causes FCoE traffic to stop.

Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoaf_decnet: Use time_after_eq
Himangi Saraogi [Wed, 20 Aug 2014 17:54:40 +0000 (23:24 +0530)]
af_decnet: Use time_after_eq

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2,E3;
@@
- jiffies - E1 >= (E2*E3)
+ time_after_eq(jiffies, E1+E2*E3)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodecnet: Use time_after_eq
Himangi Saraogi [Wed, 20 Aug 2014 17:50:09 +0000 (23:20 +0530)]
decnet: Use time_after_eq

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@
- (jiffies - E1) >= E2
+ time_after_eq(jiffies, E1+E2)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipconfig: Use time_before
Himangi Saraogi [Wed, 20 Aug 2014 17:44:10 +0000 (23:14 +0530)]
ipconfig: Use time_before

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@
- jiffies - E1 < E2
+ time_before(jiffies, E1+E2)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodn_dev: Use time_before
Himangi Saraogi [Wed, 20 Aug 2014 17:43:07 +0000 (23:13 +0530)]
dn_dev: Use time_before

The functions time_before, time_before_eq, time_after, and time_after_eq
are more robust for comparing jiffies against other values.

A simplified version of the Coccinelle semantic patch making this change
is as follows:

@change@
expression E1,E2;
@@

(
- (jiffies - E1) < E2
+ time_before(jiffies, E1+E2)
)

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobr_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
Andreea-Cristina Bernat [Fri, 22 Aug 2014 13:06:09 +0000 (16:06 +0300)]
br_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER()

The use of "rcu_assign_pointer()" is NULLing out the pointer.
According to RCU_INIT_POINTER()'s block comment:
"1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.

The following Coccinelle semantic patch was used:
@@
@@

- rcu_assign_pointer
+ RCU_INIT_POINTER
  (..., NULL)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer()
Andreea-Cristina Bernat [Sun, 17 Aug 2014 13:29:43 +0000 (16:29 +0300)]
net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer()

The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer()
Andreea-Cristina Bernat [Sun, 17 Aug 2014 12:49:41 +0000 (15:49 +0300)]
net/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer()

The "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacement.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: Replace rcu_dereference() with rcu_access_pointer()
Andreea-Cristina Bernat [Sun, 17 Aug 2014 10:21:45 +0000 (13:21 +0300)]
bonding: Replace rcu_dereference() with rcu_access_pointer()

This "rcu_dereference()" call is used directly in a condition.
Since its return value is never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes this replacement.

The following Coccinelle semantic patch was used for solving it:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
 ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocnic: Replace rcu_dereference() with rcu_access_pointer()
Andreea-Cristina Bernat [Sun, 17 Aug 2014 10:12:09 +0000 (13:12 +0300)]
cnic: Replace rcu_dereference() with rcu_access_pointer()

The "rcu_dereference()" calls are used directly in conditions.
Since their return values are never dereferenced it is recommended to use
"rcu_access_pointer()" instead of "rcu_dereference()".
Therefore, this patch makes the replacements.

The following Coccinelle semantic patch was used:
@@
@@

(
 if(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
|
 while(
 (<+...
- rcu_dereference
+ rcu_access_pointer
  (...)
  ...+>)) {...}
)

Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Restore accept_local behaviour in fib_validate_source()
Sébastien Barré [Sun, 17 Aug 2014 07:19:54 +0000 (09:19 +0200)]
ipv4: Restore accept_local behaviour in fib_validate_source()

Commit 7a9bc9b81a5b ("ipv4: Elide fib_validate_source() completely when possible.")
introduced a short-circuit to avoid calling fib_validate_source when not
needed. That change took rp_filter into account, but not accept_local.
This resulted in a change of behaviour: with rp_filter and accept_local
off, incoming packets with a local address in the source field should be
dropped.

Here is how to reproduce the change pre/post 7a9bc9b81a5b commit:
-configure the same IPv4 address on hosts A and B.
-try to send an ARP request from B to A.
-The ARP request will be dropped before that commit, but accepted and answered
after that commit.

This adds a check for ACCEPT_LOCAL, to maintain full
fib validation in case it is 0. We also leave __fib_validate_source() earlier
when possible, based on the same check as fib_validate_source(), once the
accept_local stuff is verified.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>