pandora-kernel.git
17 years ago[EBTABLES]: Move calls of ebt_verify_pointers() upstream.
Al Viro [Fri, 1 Dec 2006 03:28:08 +0000 (19:28 -0800)]
[EBTABLES]: Move calls of ebt_verify_pointers() upstream.

... and pass just repl->name to translate_table()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: ebt_check_entry() doesn't need valid_hooks
Al Viro [Fri, 1 Dec 2006 03:27:48 +0000 (19:27 -0800)]
[EBTABLES]: ebt_check_entry() doesn't need valid_hooks

We can check newinfo->hook_entry[...] instead.
Kill unused argument.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Clean ebt_get_udc_positions() up.
Al Viro [Fri, 1 Dec 2006 03:27:32 +0000 (19:27 -0800)]
[EBTABLES]: Clean ebt_get_udc_positions() up.

Check for valid_hooks is redundant (newinfo->hook_entry[i] will
be NULL if bit i is not set).  Kill it, kill unused arguments.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Switch ebt_check_entry_size_and_hooks() to use of newinfo->hook_entry[]
Al Viro [Fri, 1 Dec 2006 03:27:13 +0000 (19:27 -0800)]
[EBTABLES]: Switch ebt_check_entry_size_and_hooks() to use of newinfo->hook_entry[]

kill unused arguments

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: translate_table(): switch direct uses of repl->hook_info to newinfo
Al Viro [Fri, 1 Dec 2006 03:26:53 +0000 (19:26 -0800)]
[EBTABLES]: translate_table(): switch direct uses of repl->hook_info to newinfo

Since newinfo->hook_table[] already has been set up, we can switch to using
it instead of repl->{hook_info,valid_hooks}.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Move more stuff into ebt_verify_pointers().
Al Viro [Fri, 1 Dec 2006 03:26:35 +0000 (19:26 -0800)]
[EBTABLES]: Move more stuff into ebt_verify_pointers().

Take intialization of ->hook_entry[...], ->entries_size and ->nentries
over there, pull the check for empty chains into the end of that sucker.

Now it's self-contained, so we can move it up in the very beginning of
translate_table() *and* we can rely on ->hook_entry[] being properly
transliterated after it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Pull the loop doing __ebt_verify_pointers() into a separate function.
Al Viro [Fri, 1 Dec 2006 03:26:14 +0000 (19:26 -0800)]
[EBTABLES]: Pull the loop doing __ebt_verify_pointers() into a separate function.

It's easier to expand the iterator here *and* we'll be able to move all
uses of ebt_replace from translate_table() into this one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Split ebt_check_entry_size_and_hooks
Al Viro [Fri, 1 Dec 2006 03:25:51 +0000 (19:25 -0800)]
[EBTABLES]: Split ebt_check_entry_size_and_hooks

Split ebt_check_entry_size_and_hooks() in two parts - one that does
sanity checks on pointers (basically, checks that we can safely
use iterator from now on) and the rest of it (looking into details
of entry).

The loop applying ebt_check_entry_size_and_hooks() is split in two.

Populating newinfo->hook_entry[] is done in the first part.

Unused arguments killed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Prevent wraparounds in checks for entry components' sizes.
Al Viro [Fri, 1 Dec 2006 03:25:21 +0000 (19:25 -0800)]
[EBTABLES]: Prevent wraparounds in checks for entry components' sizes.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Deal with the worst-case behaviour in loop checks.
Al Viro [Fri, 1 Dec 2006 03:24:49 +0000 (19:24 -0800)]
[EBTABLES]: Deal with the worst-case behaviour in loop checks.

No need to revisit a chain we'd already finished with during
the check for current hook.  It's either instant loop (which
we'd just detected) or a duplicate work.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Verify that ebt_entries have zero ->distinguisher.
Al Viro [Fri, 1 Dec 2006 03:24:12 +0000 (19:24 -0800)]
[EBTABLES]: Verify that ebt_entries have zero ->distinguisher.

We need that for iterator to work; existing check had been too weak.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[EBTABLES]: Fix wraparounds in ebt_entries verification.
Al Viro [Fri, 1 Dec 2006 03:22:42 +0000 (19:22 -0800)]
[EBTABLES]: Fix wraparounds in ebt_entries verification.

We need to verify that
a) we are not too close to the end of buffer to dereference
b) next entry we'll be checking won't be _before_ our

While we are at it, don't subtract unrelated pointers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP]: Fix warnings with TCP_MD5SIG disabled.
Andrew Morton [Fri, 1 Dec 2006 03:16:28 +0000 (19:16 -0800)]
[TCP]: Fix warnings with TCP_MD5SIG disabled.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Possible cleanups.
Adrian Bunk [Fri, 1 Dec 2006 01:22:29 +0000 (17:22 -0800)]
[NET]: Possible cleanups.

This patch contains the following possible cleanups:
- make the following needlessly global functions statis:
  - ipv4/tcp.c: __tcp_alloc_md5sig_pool()
  - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup()
  - ipv4/udplite.c: udplite_rcv()
  - ipv4/udplite.c: udplite_err()
- make the following needlessly global structs static:
  - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops
  - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific
  - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops
- net/ipv{4,6}/udplite.c: remove inline's from static functions
                          (gcc should know best when to inline them)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPSEC]: Add AF_KEY interface for encapsulation family.
Miika Komu [Fri, 1 Dec 2006 00:41:50 +0000 (16:41 -0800)]
[IPSEC]: Add AF_KEY interface for encapsulation family.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
17 years ago[IPSEC]: Add netlink interface for the encapsulation family.
Miika Komu [Fri, 1 Dec 2006 00:40:51 +0000 (16:40 -0800)]
[IPSEC]: Add netlink interface for the encapsulation family.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPSEC]: Add encapsulation family.
Miika Komu [Fri, 1 Dec 2006 00:40:43 +0000 (16:40 -0800)]
[IPSEC]: Add encapsulation family.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG.
David S. Miller [Fri, 1 Dec 2006 00:35:01 +0000 (16:35 -0800)]
[TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG.

It just obfuscates the code and adds limited value.  And as Adrian
Bunk noticed, it lacked Kconfig help text too, so just kill it.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures
Patrick McHardy [Thu, 30 Nov 2006 01:37:42 +0000 (17:37 -0800)]
[NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures

When peeking at the next packet in a child qdisc by calling dequeue/requeue,
the upper qdisc qlen counter may get out of sync in case the requeue fails.
The qdisc and the child qdisc both have their counter decremented, but since
no packet is given to the upper qdisc it won't decrement its counter itself.

requeue should not fail, so this is mostly for "correctness".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Fix endless loops (part 4): HTB
Patrick McHardy [Thu, 30 Nov 2006 01:37:05 +0000 (17:37 -0800)]
[NET_SCHED]: Fix endless loops (part 4): HTB

Convert HTB to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Fix endless loops (part 3): HFSC
Patrick McHardy [Thu, 30 Nov 2006 01:36:43 +0000 (17:36 -0800)]
[NET_SCHED]: Fix endless loops (part 3): HFSC

Convert HFSC to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.

All queue purging goes through hfsc_purge_queue(), which is used in
three cases: grafting, class creation (when a leaf class is turned
into an intermediate class by attaching a new class) and class
deletion. In all cases qdisc_tree_decrease_len() is needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs
Patrick McHardy [Thu, 30 Nov 2006 01:36:20 +0000 (17:36 -0800)]
[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs

Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where
necessary:

- all graft operations
- destruction of old child qdiscs in prio, red and tbf change operation
- purging of queue in sfq change operation

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)
Patrick McHardy [Thu, 30 Nov 2006 01:35:48 +0000 (17:35 -0800)]
[NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)

There are multiple problems related to qlen adjustment that can lead
to an upper qdisc getting out of sync with the real number of packets
queued, leading to endless dequeueing attempts by the upper layer code.

All qdiscs must maintain an accurate q.qlen counter. There are basically
two groups of operations affecting the qlen: operations that propagate
down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the
root qdisc and operations only affecting a subtree or single qdisc
(change, graft, delete class). Since qlen changes during operations from
the second group don't propagate to ancestor qdiscs, their qlen values
become desynchronized.

This patch adds a function to propagate qlen changes up the qdisc tree,
optionally calling a callback function to perform qdisc-internal
maintenance when the child qdisc becomes empty. The follow-up patches
will convert all qdiscs to use this function where necessary.

Noticed by Timo Steinbach <tsteinbach@astaro.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: Set parent classid in default qdiscs
Patrick McHardy [Thu, 30 Nov 2006 01:35:18 +0000 (17:35 -0800)]
[NET_SCHED]: Set parent classid in default qdiscs

Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->delete
Patrick McHardy [Thu, 30 Nov 2006 01:34:50 +0000 (17:34 -0800)]
[NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->delete

qlen adjustment should happen immediately in ->delete and not in the
class destroy function because the reference count will not hit zero in
->delete (sch_api holds a reference) but in ->put. Since the qdisc
lock is released between deletion of the class and final destruction
this creates an externally visible error in the qlen counter.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoRename class_destroy to avoid namespace conflicts.
James Morris [Wed, 29 Nov 2006 21:50:27 +0000 (16:50 -0500)]
Rename class_destroy to avoid namespace conflicts.

We're seeing increasing namespace conflicts between the global
class_destroy() function declared in linux/device.h, and the private
function in the SELinux core code.  This patch renames the SELinux
function to cls_destroy() to avoid this conflict.

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoNetLabel: add the ranged tag to the CIPSOv4 protocol
Paul Moore [Wed, 29 Nov 2006 18:18:20 +0000 (13:18 -0500)]
NetLabel: add the ranged tag to the CIPSOv4 protocol

Add support for the ranged tag (tag type #5) to the CIPSOv4 protocol.

The ranged tag allows for seven, or eight if zero is the lowest category,
category ranges to be specified in a CIPSO option.  Each range is specified by
two unsigned 16 bit fields, each with a maximum value of 65534.  The two values
specify the start and end of the category range; if the start of the category
range is zero then it is omitted.

See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoNetLabel: add the enumerated tag to the CIPSOv4 protocol
Paul Moore [Wed, 29 Nov 2006 18:18:19 +0000 (13:18 -0500)]
NetLabel: add the enumerated tag to the CIPSOv4 protocol

Add support for the enumerated tag (tag type #2) to the CIPSOv4 protocol.

The enumerated tag allows for 15 categories to be specified in a CIPSO option,
where each category is an unsigned 16 bit field with a maximum value of 65534.

See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years agoNetLabel: convert to an extensibile/sparse category bitmap
Paul Moore [Wed, 29 Nov 2006 18:18:18 +0000 (13:18 -0500)]
NetLabel: convert to an extensibile/sparse category bitmap

The original NetLabel category bitmap was a straight char bitmap which worked
fine for the initial release as it only supported 240 bits due to limitations
in the CIPSO restricted bitmap tag (tag type 0x01).  This patch converts that
straight char bitmap into an extensibile/sparse bitmap in order to lay the
foundation for other CIPSO tag types and protocols.

This patch also has a nice side effect in that all of the security attributes
passed by NetLabel into the LSM are now in a format which is in the host's
native byte/bit ordering which makes the LSM specific code much simpler; look
at the changes in security/selinux/ss/ebitmap.c as an example.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
17 years ago[NETFILTER]: remove the reference to ipchains from Kconfig
Pablo Neira Ayuso [Wed, 29 Nov 2006 01:35:43 +0000 (02:35 +0100)]
[NETFILTER]: remove the reference to ipchains from Kconfig

It is time to move on :-)

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: Fix PROC_FS=n warnings
Patrick McHardy [Wed, 29 Nov 2006 01:35:42 +0000 (02:35 +0100)]
[NETFILTER]: Fix PROC_FS=n warnings

Fix some unused function/variable warnings.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: remove remaining ASSERT_{READ,WRITE}_LOCK
Patrick McHardy [Wed, 29 Nov 2006 01:35:41 +0000 (02:35 +0100)]
[NETFILTER]: remove remaining ASSERT_{READ,WRITE}_LOCK

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: ebtables: add --snap-arp option
Bart De Schuymer [Wed, 29 Nov 2006 01:35:40 +0000 (02:35 +0100)]
[NETFILTER]: ebtables: add --snap-arp option

The attached patch adds --snat-arp support, which makes it possible to
change the source mac address in both the mac header and the arp header
with one rule.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: x_tables: add NFLOG target
Patrick McHardy [Wed, 29 Nov 2006 01:35:38 +0000 (02:35 +0100)]
[NETFILTER]: x_tables: add NFLOG target

Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6.
Currently we have two (unsupported by userspace) hacks in the LOG and ULOG
targets to optionally call to the nflog API. They lack a few features,
namely the IPv4 and IPv6 LOG targets can not specify a number of arguments
related to nfnetlink_log, while the ULOG target is only available for IPv4.
Remove those hacks and add a clean way to use nfnetlink_log.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6
Patrick McHardy [Wed, 29 Nov 2006 01:35:36 +0000 (02:35 +0100)]
[NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nfnetlink_log: remove useless prefix length limitation
Patrick McHardy [Wed, 29 Nov 2006 01:35:34 +0000 (02:35 +0100)]
[NETFILTER]: nfnetlink_log: remove useless prefix length limitation

There is no reason for limiting netlink attributes in size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink
Eric Leblond [Wed, 29 Nov 2006 01:35:33 +0000 (02:35 +0100)]
[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: ctnetlink: rework conntrack fields dumping logic on events
Pablo Neira Ayuso [Wed, 29 Nov 2006 01:35:32 +0000 (02:35 +0100)]
[NETFILTER]: ctnetlink: rework conntrack fields dumping logic on events

               |   NEW   | UPDATE  | DESTROY |
     ----------------------------------------|
     tuples    |    Y    |    Y    |    Y    |
     status    |    Y    |    Y    |    N    |
     timeout   |    Y    |    Y    |    N    |
     protoinfo |    S    |    S    |    N    |
     helper    |    S    |    S    |    N    |
     mark      |    S    |    S    |    N    |
     counters  |    F    |    F    |    Y    |

 Leyend:
         Y: yes
         N: no
         S: iif the field is set
 F: iif overflow

This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to
track the helper assignation process, not the changes in the private
information held by the helper.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: ctnetlink: check for status attribute existence on conntrack creation
Pablo Neira Ayuso [Wed, 29 Nov 2006 01:35:31 +0000 (02:35 +0100)]
[NETFILTER]: ctnetlink: check for status attribute existence on conntrack creation

Check that status flags are available in the netlink message received
to create a new conntrack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: sip conntrack: better NAT handling
Patrick McHardy [Wed, 29 Nov 2006 01:35:30 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: better NAT handling

The NAT handling of the SIP helper has a few problems:

- Request headers are only mangled in the reply direction, From/To headers
  not at all, which can lead to authentication failures with DNAT in case
  the authentication domain is the IP address

- Contact headers in responses are only mangled for REGISTER responses

- Headers may be mangled even though they contain addresses not
  participating in the connection, like alternative addresses

- Packets are droppen when domain names are used where the helper expects
  IP addresses

This patch takes a different approach, instead of fixed rules what field
to mangle to what content, it adds symetric mapping of From/To/Via/Contact
headers, which allows to deal properly with echoed addresses in responses
and foreign addresses not belonging to the connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: sip conntrack: make header shortcuts optional
Patrick McHardy [Wed, 29 Nov 2006 01:35:28 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: make header shortcuts optional

Not every header has a shortcut, so make them optional instead
of searching for the same string twice.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: sip conntrack: do case insensitive SIP header search
Patrick McHardy [Wed, 29 Nov 2006 01:35:27 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: do case insensitive SIP header search

SIP headers are generally case-insensitive, only SDP headers are
case sensitive.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: sip conntrack: minor cleanup
Patrick McHardy [Wed, 29 Nov 2006 01:35:26 +0000 (02:35 +0100)]
[NETFILTER]: sip conntrack: minor cleanup

- Use enum for header field enumeration
- Use numerical value instead of pointer to header info structure to
  identify headers, unexport ct_sip_hdrs
- group SIP and SDP entries in header info structure
- remove double forward declaration of ct_sip_get_info

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: ip_conntrack: fix NAT helper unload races
Patrick McHardy [Wed, 29 Nov 2006 01:35:25 +0000 (02:35 +0100)]
[NETFILTER]: ip_conntrack: fix NAT helper unload races

The NAT helpr hooks are protected by RCU, but all of the
conntrack helpers test and use the global pointers instead
of copying them first using rcu_dereference()

Also replace synchronize_net() by synchronize_rcu() for clarity
since sychronizing only with packet receive processing is
insufficient to prevent races.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find
Yasuyuki Kozakai [Wed, 29 Nov 2006 01:35:23 +0000 (02:35 +0100)]
[NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find

We usually uses 'xxx_find_get' for function which increments
reference count.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking
Patrick McHardy [Wed, 29 Nov 2006 01:35:22 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking

This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and
/proc/net/stat/ip_conntrack files to keep old programs using them working.

The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only
IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking
Patrick McHardy [Wed, 29 Nov 2006 01:35:20 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking

This patch adds an option to keep the connection tracking sysctls visible
under their old names.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules
Patrick McHardy [Wed, 29 Nov 2006 01:35:18 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: automatic sysctl registation for conntrack protocols
Patrick McHardy [Wed, 29 Nov 2006 01:35:17 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: automatic sysctl registation for conntrack protocols

Add helper functions for sysctl registration with optional instantiating
of common path elements (like net/netfilter) and use it for support for
automatic registation of conntrack protocol sysctls.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: move extern declaration to header files
Patrick McHardy [Wed, 29 Nov 2006 01:35:15 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: move extern declaration to header files

Using extern in a C file is a bad idea because the compiler can't
catch type errors.

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack_ftp: fix missing helper mask initilization
Patrick McHardy [Wed, 29 Nov 2006 01:35:14 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack_ftp: fix missing helper mask initilization

Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: reduce timer updates in __nf_ct_refresh_acct()
Martin Josefsson [Wed, 29 Nov 2006 01:35:12 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: reduce timer updates in __nf_ct_refresh_acct()

Only update the conntrack timer if there's been at least HZ jiffies since
the last update. Reduces the number of del_timer/add_timer cycles from one
per packet to one per connection per second (plus once for each state change
of a connection)

Should handle timer wraparounds and connection timeout changes.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: remove unused struct list_head from protocols
Martin Josefsson [Wed, 29 Nov 2006 01:35:11 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: remove unused struct list_head from protocols

Remove unused struct list_head from struct nf_conntrack_l3proto and
nf_conntrack_l4proto as all protocols are kept in arrays, not linked
lists.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: minor __nf_ct_refresh_acct() whitespace cleanup
Martin Josefsson [Wed, 29 Nov 2006 01:35:10 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: minor __nf_ct_refresh_acct() whitespace cleanup

Minor whitespace cleanup.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: remove ASSERT_{READ,WRITE}_LOCK
Martin Josefsson [Wed, 29 Nov 2006 01:35:09 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: remove ASSERT_{READ,WRITE}_LOCK

Remove the usage of ASSERT_READ_LOCK/ASSERT_WRITE_LOCK in nf_conntrack,
it didn't do anything, it was just an empty define and it uglified the code.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: more sanity checks in protocol registration/unregistration
Martin Josefsson [Wed, 29 Nov 2006 01:35:08 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: more sanity checks in protocol registration/unregistration

Add some more sanity checks when registering/unregistering l3/l4 protocols.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol
Martin Josefsson [Wed, 29 Nov 2006 01:35:06 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol

Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in
order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets
rather confusing with 'nf_conntrack_protocol'.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: More __read_mostly annotations
Martin Josefsson [Wed, 29 Nov 2006 01:35:04 +0000 (02:35 +0100)]
[NETFILTER]: More __read_mostly annotations

Place rarely written variables in the read-mostly section by using
__read_mostly

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: split out protocol handling
Martin Josefsson [Wed, 29 Nov 2006 01:35:03 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: split out protocol handling

This patch splits out L3/L4 protocol handling into its own file
nf_conntrack_proto.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: split out the event cache
Martin Josefsson [Wed, 29 Nov 2006 01:35:01 +0000 (02:35 +0100)]
[NETFILTER]: nf_conntrack: split out the event cache

This patch splits out the event cache into its own file
nf_conntrack_ecache.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: split out helper handling
Martin Josefsson [Wed, 29 Nov 2006 01:34:59 +0000 (02:34 +0100)]
[NETFILTER]: nf_conntrack: split out helper handling

This patch splits out handling of helpers into its own file
nf_conntrack_helper.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[NETFILTER]: nf_conntrack: split out expectation handling
Martin Josefsson [Wed, 29 Nov 2006 01:34:58 +0000 (02:34 +0100)]
[NETFILTER]: nf_conntrack: split out expectation handling

This patch splits out expectation handling into its own file
nf_conntrack_expect.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
17 years ago[TCP] Vegas: Increase default alpha to 2 and beta to 4.
David S. Miller [Tue, 28 Nov 2006 22:37:38 +0000 (14:37 -0800)]
[TCP] Vegas: Increase default alpha to 2 and beta to 4.

This helps Vegas cope better with delayed ACKs, see
analysis at:

http://www.cs.caltech.edu/%7Eweixl/technical/ns2linux/known_linux/index.html#vegas

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[DCCP]: Use `unsigned' for packet lengths
Gerrit Renker [Tue, 28 Nov 2006 21:55:06 +0000 (19:55 -0200)]
[DCCP]: Use `unsigned' for packet lengths

This patch implements a suggestion by Ian McDonald and

 1) Avoids tests against negative packet lengths by using unsigned int
    for packet payload lengths in the CCID send_packet()/packet_sent() routines

 2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
    in ccid3_hc_tx_packet_sent: that condition is always true, since
      * negative packet lengths are avoided
      * ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
        As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
        returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit

 3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
    since it is currently always set to skb->len. The code is updated with regard
    to this parameter change.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Larger initial windows
Gerrit Renker [Tue, 28 Nov 2006 21:51:42 +0000 (19:51 -0200)]
[DCCP] ccid3: Larger initial windows

This implements the larger-initial-windows feature for CCID 3, as described in
section 5 of RFC 4342. When the first feedback packet arrives, the sender can
send up to 2..4 packets per RTT, instead of just one.

The patch further
 * reduces the number of timestamping calls by passing the timestamp value
   (which is computed in one of the calling functions anyway) as argument

 * renames one constant with a very long name into one which is shorter and
   resembles the one in RFC 3448 (t_mbi)

 * simplifies some of the min_t/max_t cases where both `x', `y' have the same
   type

Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 21:42:03 +0000 (19:42 -0200)]
[DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0

To reflect the fact that this now is of no effect, not making apps
stop working, just be warned in the system log.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP]: Tidy up unused structures
Gerrit Renker [Tue, 28 Nov 2006 21:33:36 +0000 (19:33 -0200)]
[DCCP]: Tidy up unused structures

This removes and cleans up unused variables and structures which have become
unnecessary following the introduction of the EWMA patch to automatically track
the CCID 3 receiver/sender packet sizes `s'.

It deprecates the PACKET_SIZE socket option by returning an error code and
printing a deprecation warning if an application tries to read or write this
socket option.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Track RX/TX packet size `s' using moving-average
Gerrit Renker [Tue, 28 Nov 2006 21:22:33 +0000 (19:22 -0200)]
[DCCP] ccid3: Track RX/TX packet size `s' using moving-average

Problem:

17 years ago[DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448
Gerrit Renker [Tue, 28 Nov 2006 20:34:34 +0000 (18:34 -0200)]
[DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448

This corrects the setting of the nofeedback timer with regard to RFC
3448 - previously it was not set to max(4*R, 2*s/X) as specified. Using
the maximum of 1 second as upper bound (as it was done before) can have
detrimental effects, especially if R is small.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP]: Remove allocation of sysctl numbers
Gerrit Renker [Tue, 28 Nov 2006 20:14:10 +0000 (18:14 -0200)]
[DCCP]: Remove allocation of sysctl numbers

This is in response to a request sent earlier by Eric W. Biederman
and replaces all sysctl numbers for net.dccp.default with CTL_UNNUMBERED.

It has been tested to compile and to work.

Commiter note: I've removed the use of CTL_UNNUMBERED, not setting .ctl_name
               sets it to 0, that is the what CTL_UNNUMBERED is, reason is
               to avoid unneeded source code cluttering.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[INET]: Change protocol field in struct inet_protosw to u16
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 05:11:33 +0000 (03:11 -0200)]
[INET]: Change protocol field in struct inet_protosw to u16

[acme@newtoy net-2.6.20]$ pahole /tmp/tcp_ipv6.o inet_protosw
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/protocol.h:69 */
struct inet_protosw {
        struct list_head           list;                 /*     0     8 */
        short unsigned int         type;                 /*     8     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        protocol;             /*    12     4 */
        struct proto *             prot;                 /*    16     4 */
        const struct proto_ops  *  ops;                  /*    20     4 */
        int                        capability;           /*    24     4 */
        char                       no_check;             /*    28     1 */
        unsigned char              flags;                /*    29     1 */
}; /* size: 32, sum members: 28, holes: 1, sum holes: 2, padding: 2 */

So that we can kill that hole, protocol can only go all the way to 255 (RAW).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[TCP]: Renove the __ prefix on the struct tcp_sock members
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 03:12:38 +0000 (01:12 -0200)]
[TCP]: Renove the __ prefix on the struct tcp_sock members

As this struct is not userland visible at all.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[TCP]: Change tcp_header_len member in tcp_sock to u16
Arnaldo Carvalho de Melo [Tue, 28 Nov 2006 02:48:32 +0000 (00:48 -0200)]
[TCP]: Change tcp_header_len member in tcp_sock to u16

With this we eliminate the last hole in struct tcp_sock.

End result:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct tcp_sock |   -4
    tcp_header_len;
     from: int                   /*  1000(0)     4(0) */
     to:   u16                   /*  1000(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Now sizeof(tcp_sock) is just...

[acme@newtoy net-2.6.20]$ pahole --sizes ../OUTPUT/qemu/net-2.6.20/net/ipv4/tcp.o | grep -w tcp_sock
struct tcp_sock: 1500 0

1500 bytes ;-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Consolidate handling of t_RTO
Gerrit Renker [Mon, 27 Nov 2006 22:32:37 +0000 (20:32 -0200)]
[DCCP] ccid3: Consolidate handling of t_RTO

This patch
 * removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is
   undefined until feedback has been received);

 * makes some trivial changes (updates of comments);

 * performs a small optimisation by exploiting that the feedback timeout
   uses the value of t_ipi. The way it is done is safe, because the timeouts
   appear after the changes to t_ipi, ensuring that up-to-date values are used;

 * in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation
   of the next_tmout. This makes the code clearer to read and is also safe, since
   t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not
   read by the functions called via ccid_wait_for_ccid();

 * removes a `max' statement in sk_reset_timer, this is not needed since the timeout
   value is always greater than 1E6 microseconds.

 * adds `XXX'es to highlight that currently the nofeedback timer is set
   in a non-standard way

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta
Gerrit Renker [Mon, 27 Nov 2006 22:31:33 +0000 (20:31 -0200)]
[DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta

This patch:

 * consolidates updating of parameters (t_nom, t_ipi, t_delta) which
   need to be updated at the same time, since they are inter-dependent

 * removes two inline functions which are no longer needed as a result of
   the above consolidation

 * resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback
   timer, in the state where no feedback has previously been received

 * ties updating these parameters to updating the sending rate X, exploiting
   that all three parameters in turn depend on X; and using a small optimisation
   which can reduce the number of required instructions: only update the three
   parameters when X really changes

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Consolidate timer resets
Gerrit Renker [Mon, 27 Nov 2006 22:29:27 +0000 (20:29 -0200)]
[DCCP] ccid3: Consolidate timer resets

This patch concerns updating the value of the nofeedback timer when no feedback
has been received so far.

Since in this case the value of R is still undefined according to [RFC 3448,
4.2], we can not perform step (3) of [RFC 3448, 4.3].  A clarification is
provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback
timer (still) expires "after two seconds".

Many thanks to Ian McDonald for pointing this out and providing the
clarification.

The patch
  * implements [RFC 4342, sec. 5] with regard to the above case
  * consolidates handling timer restart by
- adding an appropriate jump label and
- initialising the timeout value

Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[XFRM]: Convert a few __u8 to proper u8
Jamal Hadi Salim [Mon, 27 Nov 2006 20:59:30 +0000 (12:59 -0800)]
[XFRM]: Convert a few __u8 to proper u8

Caught by the EyeBalls(tm) of Thomas Graf

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM]: Make flush notifier prettier when subpolicy used
Jamal Hadi Salim [Mon, 27 Nov 2006 20:58:20 +0000 (12:58 -0800)]
[XFRM]: Make flush notifier prettier when subpolicy used

Might as well make flush notifier prettier when subpolicy used

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[XFRM]: Pack struct xfrm_policy
Arnaldo Carvalho de Melo [Mon, 27 Nov 2006 19:58:59 +0000 (17:58 -0200)]
[XFRM]: Pack struct xfrm_policy

[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
        struct xfrm_policy *       next;                 /*     0     4 */
        struct hlist_node          bydst;                /*     4     8 */
        struct hlist_node          byidx;                /*    12     8 */
        rwlock_t                   lock;                 /*    20    36 */
        atomic_t                   refcnt;               /*    56     4 */
        struct timer_list          timer;                /*    60    24 */
        u8                         type;                 /*    84     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        priority;             /*    88     4 */
        u32                        index;                /*    92     4 */
        struct xfrm_selector       selector;             /*    96    56 */
        struct xfrm_lifetime_cfg   lft;                  /*   152    64 */
        struct xfrm_lifetime_cur   curlft;               /*   216    32 */
        struct dst_entry *         bundles;              /*   248     4 */
        __u16                      family;               /*   252     2 */
        __u8                       action;               /*   254     1 */
        __u8                       flags;                /*   255     1 */
        __u8                       dead;                 /*   256     1 */
        __u8                       xfrm_nr;              /*   257     1 */

        /* XXX 2 bytes hole, try to pack */

        struct xfrm_sec_ctx *      security;             /*   260     4 */
        struct xfrm_tmpl           xfrm_vec[6];          /*   264   360 */
}; /* size: 624, sum members: 619, holes: 2, sum holes: 5 */

So lets have just one hole instead of two, by moving 'type' to just before 'action',
end result:

[acme@newtoy net-2.6.20]$ codiff -s /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct xfrm_policy |   -4
 1 struct changed
[acme@newtoy net-2.6.20]$

[acme@newtoy net-2.6.20]$ pahole -c 64 net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
        struct xfrm_policy *       next;                 /*     0     4 */
        struct hlist_node          bydst;                /*     4     8 */
        struct hlist_node          byidx;                /*    12     8 */
        rwlock_t                   lock;                 /*    20    36 */
        atomic_t                   refcnt;               /*    56     4 */
        struct timer_list          timer;                /*    60    24 */
        u32                        priority;             /*    84     4 */
        u32                        index;                /*    88     4 */
        struct xfrm_selector       selector;             /*    92    56 */
        struct xfrm_lifetime_cfg   lft;                  /*   148    64 */
        struct xfrm_lifetime_cur   curlft;               /*   212    32 */
        struct dst_entry *         bundles;              /*   244     4 */
        u16                        family;               /*   248     2 */
        u8                         type;                 /*   250     1 */
        u8                         action;               /*   251     1 */
        u8                         flags;                /*   252     1 */
        u8                         dead;                 /*   253     1 */
        u8                         xfrm_nr;              /*   254     1 */

        /* XXX 1 byte hole, try to pack */

        struct xfrm_sec_ctx *      security;             /*   256     4 */
        struct xfrm_tmpl           xfrm_vec[6];          /*   260   360 */
}; /* size: 620, sum members: 619, holes: 1, sum holes: 1 */

Are there any fugly data dependencies here? None that I know.

In the process changed the removed the __ prefixed types, that are just for
userspace visible headers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[NET]: Pack struct hh_cache
Arnaldo Carvalho de Melo [Mon, 27 Nov 2006 19:58:02 +0000 (17:58 -0200)]
[NET]: Pack struct hh_cache

[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o hh_cache
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/netdevice.h:190 */
struct hh_cache {
        struct hh_cache *          hh_next;              /*     0     4 */
        atomic_t                   hh_refcnt;            /*     4     4 */
        __be16                     hh_type;              /*     8     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        hh_len;               /*    12     4 */
        int                        (*hh_output)();       /*    16     4 */
        rwlock_t                   hh_lock;              /*    20    36 */
        long unsigned int          hh_data[24];          /*    56    96 */
}; /* size: 152, sum members: 150, holes: 1, sum holes: 2 */

[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep 'hh_len.\+=' | sort -u
net/atm/br2684.c:               hh->hh_len = PADLEN + ETH_HLEN;
net/ethernet/eth.c:     hh->hh_len = ETH_HLEN;
net/ipv4/ipconfig.c:    int hh_len = LL_RESERVED_SPACE(dev);
net/ipv4/ip_output.c:   hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv4/ip_output.c:   int hh_len = LL_RESERVED_SPACE(dev);
net/ipv4/netfilter.c:   hh_len = (*pskb)->dst->dev->hard_header_len;
net/ipv4/raw.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv6/ip6_output.c:  hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv6/netfilter/ip6t_REJECT.c:       hh_len = (dst->dev->hard_header_len + 15)&~15;
net/ipv6/raw.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
[acme@newtoy net-2.6.20]$

[acme@newtoy net-2.6.20]$ find include -name "*.h" | xargs grep 'define ETH_HLEN'
include/linux/if_ether.h:#define ETH_HLEN       14              /* Total octets in header.       */

        (((dev)->hard_header_len&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)

[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o net_device | grep hard_header_len
        short unsigned int         hard_header_len;      /*   106     2 */
[acme@newtoy net-2.6.20]$

So I think we're safe in turning hh_len an u16, end result:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct hh_cache |   -4
    hh_len;
     from: int                   /*    12(0)     4(0) */
     to:   u16                   /*    10(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[INET_CONNECTION_SOCK]: Pack struct inet_connection_sock_af_ops
Arnaldo Carvalho de Melo [Mon, 27 Nov 2006 19:56:43 +0000 (17:56 -0200)]
[INET_CONNECTION_SOCK]: Pack struct inet_connection_sock_af_ops

We have a hole in:

[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
        int                        (*queue_xmit)();      /*     0     4 */
        void                       (*send_check)();      /*     4     4 */
        int                        (*rebuild_header)();  /*     8     4 */
        int                        (*conn_request)();    /*    12     4 */
        struct sock *              (*syn_recv_sock)();   /*    16     4 */
        int                        (*remember_stamp)();  /*    20     4 */
        __u16                      net_header_len;       /*    24     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        (*setsockopt)();      /*    28     4 */
        int                        (*getsockopt)();      /*    32     4 */
        int                        (*compat_setsockopt)(); /*    36     4 */
        int                        (*compat_getsockopt)(); /*    40     4 */
        void                       (*addr2sockaddr)();   /*    44     4 */
        int                        sockaddr_len;         /*    48     4 */
}; /* size: 52, sum members: 50, holes: 1, sum holes: 2 */

But we don't need sockaddr_len to be an int:

[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep '\.sockaddr_len.\+=' | sort -u
net/dccp/ipv4.c:        .sockaddr_len      = sizeof(struct sockaddr_in),
net/dccp/ipv6.c:        .sockaddr_len      = sizeof(struct sockaddr_in6),
net/ipv4/tcp_ipv4.c:    .sockaddr_len      = sizeof(struct sockaddr_in),
net/ipv6/tcp_ipv6.c:    .sockaddr_len      = sizeof(struct sockaddr_in6),
net/sctp/ipv6.c:        .sockaddr_len      = sizeof(struct sockaddr_in6),
net/sctp/protocol.c:    .sockaddr_len      = sizeof(struct sockaddr_in),

[acme@newtoy net-2.6.20]$ pahole --sizes net/ipv6/tcp_ipv6.o | grep sockaddr_in
struct sockaddr_in: 16 0
struct sockaddr_in6: 28 0
[acme@newtoy net-2.6.20]$

So I turned sockaddr_len a 'u16', and now:

[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
        int            (*queue_xmit)();        /*     0   4 */
        void           (*send_check)();        /*     4   4 */
        int            (*rebuild_header)();    /*     8   4 */
        int            (*conn_request)();      /*    12   4 */
        struct sock *  (*syn_recv_sock)();     /*    16   4 */
        int            (*remember_stamp)();    /*    20   4 */
        u16            net_header_len;         /*    24   2 */
        u16            sockaddr_len;           /*    26   2 */
        int            (*setsockopt)();        /*    28   4 */
        int            (*getsockopt)();        /*    32   4 */
        int            (*compat_setsockopt)(); /*    36   4 */
        int            (*compat_getsockopt)(); /*    40   4 */
        void           (*addr2sockaddr)();     /*    44   4 */
}; /* size: 48 */

So we've saved 4 bytes:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp_ipv6.o.before net/ipv6/tcp_ipv6.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c:
  struct inet_connection_sock_af_ops |   -4
    net_header_len;
     from: __u16                 /*    24(0)     2(0) */
     to:   u16                   /*    24(0)     2(0) */
    sockaddr_len;
     from: int                   /*    48(0)     4(0) */
     to:   u16                   /*    26(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code
Gerrit Renker [Mon, 27 Nov 2006 17:29:59 +0000 (09:29 -0800)]
[UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code

This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
justification is that UDP(-Lite) is a transport-layer protocol and therefore
the socket option code (at least in theory) should be AF-independent.

Furthermore, there is the following code reduplication:
 * do_udp{,v6}_getsockopt is 100% identical between v4 and v6
 * do_udp{,v6}_setsockopt is identical up to the following differerence
--v4 in contrast to v4 additionally allows the experimental encapsulation
          types  UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
--the remainder is identical between v4 and v6
   I believe that this difference is of little relevance.

The advantages in not duplicating twice almost completely identical code.

The patch further simplifies the interface of udp{,v6}_push_pending_frames,
since for the second argument (struct udp_sock *up) it always holds that
up = udp_sk(sk); where sk is the first function argument.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code
Thomas Graf [Mon, 27 Nov 2006 17:27:07 +0000 (09:27 -0800)]
[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code

IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar
way, therefore rtnl_put_cacheinfo() is added to reuse code.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETLINK]: Remove unused dst_pid field in netlink_skb_parms
Thomas Graf [Mon, 27 Nov 2006 17:25:58 +0000 (09:25 -0800)]
[NETLINK]: Remove unused dst_pid field in netlink_skb_parms

The destination PID is passed directly to netlink_unicast()
respectively netlink_multicast().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Add documentation for TFRC structures
Gerrit Renker [Mon, 27 Nov 2006 14:31:45 +0000 (12:31 -0200)]
[NET]: Add documentation for TFRC structures

This adds documentation for the TFRC structure fields.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Resolve small FIXME
Gerrit Renker [Mon, 27 Nov 2006 14:28:48 +0000 (12:28 -0200)]
[DCCP] ccid3: Resolve small FIXME

This considers the  case - ACK received while no packet has been sent
so far. Resolved by printing a (rate-limited) warning message.

Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv,
received feedback on a terminating connection is simply ignored.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent
Gerrit Renker [Mon, 27 Nov 2006 14:27:55 +0000 (12:27 -0200)]
[DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent

This patch removes a switch statement which is redundant since,
 * nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK
 * it is impossible that the function is called in the state TFRC_SSTATE_TERM, since
       --the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet
       --if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns
         -EINVAL, which means that ccid3_hc_tx_packet_sent will not be called
 (compare dccp_write_xmit)
       --> therefore, this case is logically impossible
 * the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom,
   and t_delta. This is a no-op, since
       --t_ipi only changes when feedback is received
       --however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical
         code block which performs the same set of operations
       --performing the same set of operations again in ccid3_hc_tx_packet_sent therefore
         does not change anything, since between the time of receiving the last feedback
 (and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not
 changed
       --since t_ipi has not changed, the values of t_delta and t_nom also do not change,
         they depend fully on t_ipi

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Avoid congestion control on zero-sized data packets
Gerrit Renker [Mon, 27 Nov 2006 14:26:57 +0000 (12:26 -0200)]
[DCCP] ccid3: Avoid congestion control on zero-sized data packets

This resolves an `XXX' in ccid3_hc_tx_send_packet().

The function is only called on Data and DataAck packets and returns a negative
result on zero-sized messages. This is a reasonable policy since CCID 3 is a
congestion-control module and congestion control on zero-sized Data(Ack)
packets is in a way pathological.

The patch uses a more suitable error code for this case, it returns the Posix.1
code `EBADMSG' ("Not a data message") instead of `ENOTCONN'.

As a result of ignoring zero-sized packets, a the condition for a warning
"First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this
message has been removed since it will always be printed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet
Gerrit Renker [Mon, 27 Nov 2006 14:26:03 +0000 (12:26 -0200)]
[DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet

This makes some logically equivalent simplifications, by replacing
rc - values plus goto's with direct return statements.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission
Gerrit Renker [Mon, 27 Nov 2006 14:25:10 +0000 (12:25 -0200)]
[DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission

Problem:

17 years ago[DCCP] ccid3: Simplify control flow in the calculation of t_ipi
Gerrit Renker [Mon, 27 Nov 2006 14:22:48 +0000 (12:22 -0200)]
[DCCP] ccid3: Simplify control flow in the calculation of t_ipi

This patch performs a simplifying (performance) optimisation:

 In each call of the inline function ccid3_calc_new_t_ipi(), the state is
 tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function
 is called very often. A simpler solution, implemented by this patch, is
 to adapt the control flow.

Background:

17 years ago[DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi
Gerrit Renker [Mon, 27 Nov 2006 14:13:38 +0000 (12:13 -0200)]
[DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi

Problem:

17 years ago[DCCP] ccid2: Allow window to grow larger
Andrea Bittau [Sun, 26 Nov 2006 03:07:50 +0000 (01:07 -0200)]
[DCCP] ccid2: Allow window to grow larger

Now that we can stuff bigger ack vectors into options.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ackvec: Split long ack vectors across multiple options
Andrea Bittau [Sun, 26 Nov 2006 03:04:40 +0000 (01:04 -0200)]
[DCCP] ackvec: Split long ack vectors across multiple options

Ack vectors grow proportional to the window size.  If an ack vector does not fit
into a single option, it must be spread across multiple options.  This patch
will allow for windows to grow larger.

Committer note: Simplified the patch a bit, original algorithm kept.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ackvec: infrastructure for sending more than one ackvec per packet
Andrea Bittau [Fri, 24 Nov 2006 15:02:42 +0000 (13:02 -0200)]
[DCCP] ackvec: infrastructure for sending more than one ackvec per packet

Commiter note:

This was split from Andrea's original patch, in the process I changed the type
of the ackvec index fields to u16 instead of to int and haven't folded
dccp_ackvec_parse with dccp_ackvec_check_rcv_ackno.

Next patch will actually do the insertion of more than one ackvec per packet,
using, initially, up to a max of 2 ackvecs as per Andrea's original patch, then
I'll work on support for larger ackvecs, be it using a sysctl or using
setsockopt.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DCCP] ackvec: Remove unused dccpav_ack_ptr field from dccp_ackvec
Andrea Bittau [Tue, 21 Nov 2006 18:17:10 +0000 (16:17 -0200)]
[DCCP] ackvec: Remove unused dccpav_ack_ptr field from dccp_ackvec

Commiter note: original patch was splitted.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
17 years ago[DECNET] address: Convert to new netlink interface
Thomas Graf [Sat, 25 Nov 2006 01:14:51 +0000 (17:14 -0800)]
[DECNET] address: Convert to new netlink interface

Extends the netlink interface to support the __le16 type and
converts address addition, deletion and, dumping to use the
new netlink interface.

Fixes multiple occasions of possible illegal memory references
due to not validated netlink attributes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[DECNET] address: Rename rtmsg_ifa() to dn_ifaddr_notify()
Thomas Graf [Sat, 25 Nov 2006 01:14:31 +0000 (17:14 -0800)]
[DECNET] address: Rename rtmsg_ifa() to dn_ifaddr_notify()

The name rtmsg_ifa is heavly overused and confusing.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[DECNET] address: Calculate accurate message size for netlink notifications
Thomas Graf [Sat, 25 Nov 2006 01:14:07 +0000 (17:14 -0800)]
[DECNET] address: Calculate accurate message size for netlink notifications

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6]: Improve IPv6 tunnel error reporting
Ville Nuorvala [Sat, 25 Nov 2006 01:08:58 +0000 (17:08 -0800)]
[IPV6]: Improve IPv6 tunnel error reporting

Log an error if the remote tunnel endpoint is unable to handle
tunneled packets.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>