net: percpu net_device refcount
authorEric Dumazet <eric.dumazet@gmail.com>
Mon, 11 Oct 2010 10:22:12 +0000 (10:22 +0000)
committerDavid S. Miller <davem@davemloft.net>
Tue, 12 Oct 2010 19:35:25 +0000 (12:35 -0700)
commit29b4433d991c88d86ca48a4c1cc33c671475be4b
tree2ad21b86aab8193c4533820c40cd31af97a7377f
parentf0b9f4725180ea58c8da78b3de0b4e0ad180fc2c
net: percpu net_device refcount

We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.

There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)

We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.

On x86, dev_hold(dev) code :

before
        lock    incl 0x280(%ebx)
after:
        movl    0x260(%ebx),%eax
        incl    fs:(%eax)

Stress bench :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)

Before:

real    1m1.662s
user    0m14.373s
sys     12m55.960s

After:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/infiniband/hw/nes/nes_cm.c
drivers/infiniband/hw/nes/nes_verbs.c
include/linux/netdevice.h
net/core/dev.c