pandora-kernel.git
18 years agolockd: posix_test_lock() should not call locks_copy_lock()
Trond Myklebust [Mon, 20 Mar 2006 18:44:38 +0000 (13:44 -0500)]
lockd: posix_test_lock() should not call locks_copy_lock()

The caller of posix_test_lock() should never need to look at the lock
private data, so do not copy that information. This also means that there
is no need to call the fl_release_private methods.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)
Trond Myklebust [Mon, 20 Mar 2006 18:44:37 +0000 (13:44 -0500)]
NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Debugging code for nfs_direct_(read|write)_schedule()
Trond Myklebust [Mon, 20 Mar 2006 18:44:37 +0000 (13:44 -0500)]
NFS: Debugging code for nfs_direct_(read|write)_schedule()

Make sure that we're doing our list accounting correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: O_DIRECT async IO may lose context
Trond Myklebust [Mon, 20 Mar 2006 18:44:36 +0000 (13:44 -0500)]
NFS: O_DIRECT async IO may lose context

The struct nfs_direct_req currently keeps a pointer to the file descriptor
without referencing it. This may cause problems if the parent process is
killed.

The nfs_open_context should normally have all the information that we're
currently using the filp for, and unlike fput(), is safe to release from
an rpciod process context.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agonfs: Use UNSTABLE + COMMIT for NFS O_DIRECT writes
Trond Myklebust [Mon, 20 Mar 2006 18:44:36 +0000 (13:44 -0500)]
nfs: Use UNSTABLE + COMMIT for NFS O_DIRECT writes

Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not
necessary.  This simplifies the internal logic, but this could be a
difficult workload for some servers.

Instead, let's send UNSTABLE writes, and after they all complete, send a
COMMIT for the dirty range.  After the COMMIT returns successfully, then do
the wake_up or fire off aio_complete().

Test plan:
Async direct I/O tests against Solaris (or any server that requires
committed unstable writes).  Reboot server during test.

Based on an earlier patch by Chuck Lever <cel@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Make nfs_commit_alloc() extern
Trond Myklebust [Mon, 20 Mar 2006 18:44:35 +0000 (13:44 -0500)]
NFS: Make nfs_commit_alloc() extern

We need to use nfs_commit_alloc() in fs/nfs/direct.c.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: fix data_update accounting in NFS direct I/O path
Chuck Lever [Mon, 20 Mar 2006 18:44:35 +0000 (13:44 -0500)]
NFS: fix data_update accounting in NFS direct I/O path

^C against "iozone -I" is hitting the assertion in nfs_clear_inode().

Test plan:
"iozone -i0 -I -a -c" against a slow server, then control C.  This should
not cause an oops.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Replace atomic_t variables in nfs_direct_req with a single spin lock
Chuck Lever [Mon, 20 Mar 2006 18:44:34 +0000 (13:44 -0500)]
NFS: Replace atomic_t variables in nfs_direct_req with a single spin lock

Three atomic_t variables cause a lot of bus locking.  Because they are all
used in the same places in the code, just use a single spin lock.

Now that the atomic_t variables are gone, we can remove the request size
limitation since the code no longer depends on the limited width of atomic_t
on some platforms.

Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.  Millions of fsx
operations, iozone, OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: clean up comments and tab damage in direct.c
Chuck Lever [Mon, 20 Mar 2006 18:44:34 +0000 (13:44 -0500)]
NFS: clean up comments and tab damage in direct.c

Clean up tab damage and comments.  Replace "file_offset" with more commonly
used "pos".

Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: support EIOCBQUEUED return in direct write path
Chuck Lever [Mon, 20 Mar 2006 18:44:33 +0000 (13:44 -0500)]
NFS: support EIOCBQUEUED return in direct write path

For async iocb's, the NFS direct write path now returns EIOCBQUEUED,
and calls aio_complete when all the requested writes are finished.  The
synchronous part of the NFS direct write path behaves exactly as it
was before.

Shared mapped NFS files will have some coherency difficulties when
accessed concurrently with aio+dio.  Will need to explore how this
is handled in the local file system case.

Test plan:
aio-stress with "-O". OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: make iocb available everywhere in direct write path
Chuck Lever [Mon, 20 Mar 2006 18:44:33 +0000 (13:44 -0500)]
NFS: make iocb available everywhere in direct write path

Pass the iocb argument all the way down to the direct write request
scheduler, and make it available in nfs_direct_write_result.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: remove support for multi-segment iovs in the direct write path
Chuck Lever [Mon, 20 Mar 2006 18:44:32 +0000 (13:44 -0500)]
NFS: remove support for multi-segment iovs in the direct write path

Eliminate the persistent use of automatic storage in all parts of the
NFS client's direct write path to pave the way for introducing support
for aio against files opened with the O_DIRECT flag.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: make direct write path generate write requests concurrently
Chuck Lever [Mon, 20 Mar 2006 18:44:32 +0000 (13:44 -0500)]
NFS: make direct write path generate write requests concurrently

Duplicate infrastructure from direct read path that will allow write
path to generate multiple write requests concurrently.  This will
enable us to add support for aio in this path.

Temporarily we will lose the ability to do UNSTABLE writes followed by
a COMMIT in the direct write path.  However, all applications I am
aware of that use NFS O_DIRECT currently write in relatively small
chunks, so this should not be inconvenient in any way.

Test plan:
Millions of fsx-odirect ops. OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: create common routine for handling direct I/O completion
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for handling direct I/O completion

Factor out the common piece of completing an NFS direct I/O request.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: create common routine for allocating nfs_direct_req
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for allocating nfs_direct_req

Factor out a small common piece of the path that allocate nfs_direct_req
structures.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: create common routine for waiting for direct I/O to complete
Chuck Lever [Mon, 20 Mar 2006 18:44:31 +0000 (13:44 -0500)]
NFS: create common routine for waiting for direct I/O to complete

We're about to add asynchrony to the NFS direct write path.  Begin by
abstracting out the common pieces in the read path.

The first piece is nfs_direct_read_wait, which works the same whether the
process is waiting for a read or a write.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: support EIOCBQUEUED return in direct read path
Chuck Lever [Mon, 20 Mar 2006 18:44:30 +0000 (13:44 -0500)]
NFS: support EIOCBQUEUED return in direct read path

For async iocb's, the NFS direct read path should return EIOCBQUEUED and
call aio_complete when all the requested reads are finished.  The
synchronous part of the NFS direct read path behaves exactly as it was
before.

Test plan:
aio-stress with "-O".  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: make iocb available everywhere in direct read path
Chuck Lever [Mon, 20 Mar 2006 18:44:30 +0000 (13:44 -0500)]
NFS: make iocb available everywhere in direct read path

Pass the iocb argument all the way down to the direct read request
scheduler, and make it available in nfs_direct_read_result.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: remove support for multi-segment iovs in the direct read path
Chuck Lever [Mon, 20 Mar 2006 18:44:29 +0000 (13:44 -0500)]
NFS: remove support for multi-segment iovs in the direct read path

Eliminate the persistent use of automatic storage in all parts of the NFS
client's direct read path to pave the way for introducing support for aio
against files opened with the O_DIRECT flag.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Millions of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: use size_t type for holding rsize bytes in NFS O_DIRECT read path
Chuck Lever [Mon, 20 Mar 2006 18:44:29 +0000 (13:44 -0500)]
NFS: use size_t type for holding rsize bytes in NFS O_DIRECT read path

size_t is used for holding byte counts, so use it for variables storing rsize.
Note that the write path will be updated as we add support for async
O_DIRECT writes.

Test plan:
Need to verify that existing comparisons against new size_t variables behave
correctly.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: update comments and function definitions in fs/nfs/direct.c
Chuck Lever [Mon, 20 Mar 2006 18:44:28 +0000 (13:44 -0500)]
NFS: update comments and function definitions in fs/nfs/direct.c

Update to latest coding style standards.  Remove block comments on
statically defined functions, and place function definitions all on
one line.

Test plan:
Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: clean up NFS client's a_ops->direct_IO method
Chuck Lever [Mon, 20 Mar 2006 18:44:28 +0000 (13:44 -0500)]
NFS: clean up NFS client's a_ops->direct_IO method

The NFS client's a_ops->direct_IO method, nfs_direct_IO, is required to
be present to allow NFS files to be opened with O_DIRECT, but is never
called because the NFS client shunts reads and writes to files opened
with O_DIRECT directly to its own routines.

Gut the nfs_direct_IO function.  This eliminates the only part of the
NFS client's direct I/O path that requires support for multi-segment
iovs, allowing further simplification in subsequent patches.

Test plan:
Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.  Millions
of fsx-odirect ops.  OraSim.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Cleanup of NFS read code
Trond Myklebust [Mon, 20 Mar 2006 18:44:27 +0000 (13:44 -0500)]
NFS: Cleanup of NFS read code

Same callback hierarchy inversion as for the NFS write calls. This patch is
not strictly speaking needed by the O_DIRECT code, but avoids confusing
differences between the asynchronous read and write code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Cleanup of NFS write code in preparation for asynchronous o_direct
Trond Myklebust [Mon, 20 Mar 2006 18:44:27 +0000 (13:44 -0500)]
NFS: Cleanup of NFS write code in preparation for asynchronous o_direct

This patch inverts the callback hierarchy for NFS write calls.

Instead of having the NFSv2/v3/v4-specific code set up the RPC callback
ops, we allow the original caller to do so. This allows for more
flexibility w.r.t. how to set up and tear down the nfs_write_data
structure while still allowing the NFSv3/v4 code to perform error
handling.

The greater flexibility is needed by the asynchronous O_DIRECT code, which
wants to be able to hold on to the original nfs_write_data structures after
the WRITE RPC call has completed in order to be able to replay them if the
COMMIT call determines that the server has rebooted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolockd: Remove FL_LOCKD flag
J. Bruce Fields [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
lockd: Remove FL_LOCKD flag

Currently lockd identifies its own locks using the FL_LOCKD flag.  This
doesn't scale well to multiple lock managers--if we did this in nfsv4 too,
for example, we'd be left with only one free flag bit.

Instead, we just check whether the file manager ops (fl_lmops) set on this
lock are our own.

The only use for this is in nlm_traverse_locks, which uses it to find locks
that need cleaning up when freeing a host or a file.

In the long run it might be nice to do reference counting instead of
traversing all the locks like this....

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolocks,lockd: fix race in nlmsvc_testlock
Andy Adamson [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
locks,lockd: fix race in nlmsvc_testlock

posix_test_lock() returns a pointer to a struct file_lock which is unprotected
and can be removed while in use by the caller.  Move the conflicting lock from
the return to a parameter, and copy the conflicting lock.

In most cases the caller ends up putting the copy of the conflicting lock on
the stack.  On i386, sizeof(struct file_lock) appears to be about 100 bytes.
We're assuming that's reasonable.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolocks: remove unused posix_block_lock
Andy Adamson [Mon, 20 Mar 2006 18:44:26 +0000 (13:44 -0500)]
locks: remove unused posix_block_lock

posix_lock_file() is used to add a blocked lock to Lockd's block, so
posix_block_lock() is no longer needed.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolockd: make nlmsvc_lock use only posix_lock_file
Andy Adamson [Mon, 20 Mar 2006 18:44:25 +0000 (13:44 -0500)]
lockd: make nlmsvc_lock use only posix_lock_file

Reorganize nlmsvc_lock() to make full use of posix_lock_file(), which does
eveything nlmsvc_lock() needs - no need to call posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock() separately.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolockd: simplify nlmsvc_grant_blocked
Andy Adamson [Mon, 20 Mar 2006 18:44:25 +0000 (13:44 -0500)]
lockd: simplify nlmsvc_grant_blocked

Reorganize nlmsvc_grant_blocked() to make full use of posix_lock_file().  Note
that there's no need for separate calls to posix_test_lock(),
posix_locks_deadlock(), or posix_block_lock().

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolockd: clean up nlmsvc_lock
Andy Adamson [Mon, 20 Mar 2006 18:44:24 +0000 (13:44 -0500)]
lockd: clean up nlmsvc_lock

Slightly more consistent dprintk error reporting, consolidate some up()'s.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: directory trace messages
Chuck Lever [Mon, 20 Mar 2006 18:44:24 +0000 (13:44 -0500)]
NFS: directory trace messages

Reuse NFSDBG_DIRCACHE and NFSDBG_LOOKUPCACHE to provide additional
diagnostic messages that trace the operation of the NFS client's
directory behavior.  A few new messages are now generated when NFSDBG_VFS
is active, as well, to trace normal VFS activity.  This compromise
provides better trace debugging for those who use pre-built kernels,
without adding a lot of extra noise to the standard debug settings.

Test-plan:
Enable NFS trace debugging with flags 1, 2, or 4.  You should be able to
see different types of trace messages with each flag setting.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: minor cleanup
Chuck Lever [Mon, 20 Mar 2006 18:44:23 +0000 (13:44 -0500)]
SUNRPC: minor cleanup

RPC_DEBUG_DATA no longer needed in net/sunrpc/xprt.c.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: eliminate rpc_call()
Chuck Lever [Mon, 20 Mar 2006 18:44:23 +0000 (13:44 -0500)]
SUNRPC: eliminate rpc_call()

Clean-up: replace rpc_call() helper with direct call to rpc_call_sync.

This makes NFSv2 and NFSv3 synchronous calls more computationally
efficient, and reduces stack consumption in functions that used to
invoke rpc_call more than once.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Connectathon on NFS version 2,
version 3, and version 4 mount points.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: display human-readable procedure name in rpc_iostats output
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
SUNRPC: display human-readable procedure name in rpc_iostats output

Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.

Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number.  NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.

Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: add RPC I/O statistics to /proc/self/mountstats
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
NFS: add RPC I/O statistics to /proc/self/mountstats

NFS client now shows various RPC I/O metrics in /proc/self/mountstats.

Test plan:
Mount/umount while doing "cat /proc/self/mountstats", multiple iterations
of connectathon locking suite.  Test with NFS version 2, 3, and 4.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: provide a mechanism for collecting stats in the RPC client
Chuck Lever [Mon, 20 Mar 2006 18:44:22 +0000 (13:44 -0500)]
SUNRPC: provide a mechanism for collecting stats in the RPC client

Add a simple mechanism for collecting stats in the RPC client.  Stats are
tabulated during xprt_release.  Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Basic performance regression
testing with high-speed networking and high performance server.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: introduce per-task RPC iostats
Chuck Lever [Mon, 20 Mar 2006 18:44:17 +0000 (13:44 -0500)]
SUNRPC: introduce per-task RPC iostats

Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent.  Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: add a handful of per-xprt counters
Chuck Lever [Mon, 20 Mar 2006 18:44:16 +0000 (13:44 -0500)]
SUNRPC: add a handful of per-xprt counters

Monitor generic transport events.  Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: track length of RPC wait queues
Chuck Lever [Mon, 20 Mar 2006 18:44:15 +0000 (13:44 -0500)]
SUNRPC: track length of RPC wait queues

RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: report how long an NFS file system has been mounted
Chuck Lever [Mon, 20 Mar 2006 18:44:15 +0000 (13:44 -0500)]
NFS: report how long an NFS file system has been mounted

Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().

Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: add hooks to account for NFSERR_JUKEBOX errors
Chuck Lever [Mon, 20 Mar 2006 18:44:14 +0000 (13:44 -0500)]
NFS: add hooks to account for NFSERR_JUKEBOX errors

Make an inode or an nfs_server struct available in the logic that handles
JUKEBOX/DELAY type errors so the NFS client can account for them.

This patch is split out from the main nfs iostat patch to highlight minor
architectural changes required to support this statistic.

Test plan:
None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: add I/O performance counters
Chuck Lever [Mon, 20 Mar 2006 18:44:14 +0000 (13:44 -0500)]
NFS: add I/O performance counters

Invoke the byte and event counter macros where we want to count bytes and
events.

Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.

Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: introduce mechanism for tracking NFS client metrics
Chuck Lever [Mon, 20 Mar 2006 18:44:13 +0000 (13:44 -0500)]
NFS: introduce mechanism for tracking NFS client metrics

Add a per-superblock performance counter facility to the NFS client.  This
facility mimics the counters available for block devices and for
networking.  Expose these new counters via the new /proc/self/mountstats
interface.

Thanks to Andrew Morton and Trond Myklebust for their review and comments.

Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: clean up some mount options
Chuck Lever [Mon, 20 Mar 2006 18:44:13 +0000 (13:44 -0500)]
NFS: clean up some mount options

Get rid of "lock" and "posix", and spell out "vers=".

Test plan:
None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: show retransmit settings when displaying mount options
Chuck Lever [Mon, 20 Mar 2006 18:44:12 +0000 (13:44 -0500)]
NFS: show retransmit settings when displaying mount options

Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point.  Add this facility to the NFS
client's show_options method.

Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoVFS: New /proc file /proc/self/mountstats
Chuck Lever [Mon, 20 Mar 2006 18:44:12 +0000 (13:44 -0500)]
VFS: New /proc file /proc/self/mountstats

Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on).  Use a mechanism similar to /proc/mounts and s_ops->show_options.

This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.

Thanks to Mike Waychison for his review and comments.

Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: more verbose output for rpc auth weak error
Levent Serinol [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
SUNRPC: more verbose output for rpc auth weak error

This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.

Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Code comments update in NFS
Goldwyn Rodrigues [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
NFS: Code comments update in NFS

read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: sem2mutex idmap.c
Ingo Molnar [Mon, 20 Mar 2006 18:44:11 +0000 (13:44 -0500)]
NFS: sem2mutex idmap.c

semaphore to mutex conversion.

the conversion was generated via scripts, and the result was validated
automatically via a script as well.

build and boot tested.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: kzalloc conversion in fs/nfs
Eric Sesterhenn [Mon, 20 Mar 2006 18:44:10 +0000 (13:44 -0500)]
NFS: kzalloc conversion in fs/nfs

this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Kill braindead gcc warnings
Trond Myklebust [Mon, 20 Mar 2006 18:44:10 +0000 (13:44 -0500)]
NFSv4: Kill braindead gcc warnings

nfs4_open_revalidate: 'res' may be used uninitialized
nfs4_callback_compound: â€˜hdr_res.nops’ may be used uninitialized
'op_nr’ may be used uninitialized
encode_getattr_res: â€˜savep’ may be used uninitialized

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()
Trond Myklebust [Mon, 20 Mar 2006 18:44:09 +0000 (13:44 -0500)]
NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()

The reason is that the idmapper cleanup may call flush_workqueue() on
rpciod_workqueue.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Ensure that rpc_mkpipe returns a refcounted dentry
Trond Myklebust [Mon, 20 Mar 2006 18:44:09 +0000 (13:44 -0500)]
SUNRPC: Ensure that rpc_mkpipe returns a refcounted dentry

If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic
Trond Myklebust [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Auto-load RPC authentication kernel modules
Olaf Kirch [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
SUNRPC: Auto-load RPC authentication kernel modules

This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: reduce the number of false cache invalidations.
Trond Myklebust [Mon, 20 Mar 2006 18:44:08 +0000 (13:44 -0500)]
NFS: reduce the number of false cache invalidations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: "const static" vs "static const" in nfs4
Jesper Juhl [Mon, 20 Mar 2006 18:44:07 +0000 (13:44 -0500)]
NFS: "const static" vs "static const" in nfs4

My previous "const static" vs "static const" cleanup missed a single case,
patch below takes care of it.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Don't invalidate cached attributes if change attribute is unchanged
Trond Myklebust [Mon, 20 Mar 2006 18:44:07 +0000 (13:44 -0500)]
NFSv4: Don't invalidate cached attributes if change attribute is unchanged

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: writes should not clobber utimes() calls
Trond Myklebust [Mon, 20 Mar 2006 18:44:06 +0000 (13:44 -0500)]
NFS: writes should not clobber utimes() calls

Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agolockd: Don't expose the process pid to the NLM server
Trond Myklebust [Mon, 20 Mar 2006 18:44:06 +0000 (13:44 -0500)]
lockd: Don't expose the process pid to the NLM server

Instead we use the nlm_lockowner->pid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNLM: nlm_alloc_call should not immediately fail on signal
Trond Myklebust [Mon, 20 Mar 2006 18:44:05 +0000 (13:44 -0500)]
NLM: nlm_alloc_call should not immediately fail on signal

Currently, nlm_alloc_call tests for a signal before it even tries to
allocate memory.
Fix it so that it tries at least once.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoVFS: Fix __posix_lock_file() copy of private lock area
Trond Myklebust [Mon, 20 Mar 2006 18:44:05 +0000 (13:44 -0500)]
VFS: Fix __posix_lock_file() copy of private lock area

The struct file_lock->fl_u area must be copied using the fl_copy_lock()
operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix buglet in fs/nfs/write.c
Neil Brown [Mon, 20 Mar 2006 18:44:04 +0000 (13:44 -0500)]
NFS: Fix buglet in fs/nfs/write.c

I've been reading through fs/nfs/write.c trying to track down a bug
that seems to be related to pages loosing a refcount and getting
freed too early (you interested in detail??) and I spotted a little
bug which the following patch should fix.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Avoid races between writebacks and truncation
Trond Myklebust [Mon, 20 Mar 2006 18:44:04 +0000 (13:44 -0500)]
NFS: Avoid races between writebacks and truncation

Currently, there is no serialisation between NFS asynchronous writebacks
and truncation at the page level due to the fact that nfs_sync_inode()
cannot lock the pages that it is about to write out.

This means that it is possible to be flushing out data (and calling something
like set_page_writeback()) while the page cache is busy evicting the page.
Oops...

Use the hooks provided in try_to_release_page() to ensure that dirty pages
are always written back to storage before we evict them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix a busy inodes issue...
Trond Myklebust [Mon, 20 Mar 2006 18:44:03 +0000 (13:44 -0500)]
NFS: Fix a busy inodes issue...

The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.

Make a couple of functions static while we're at it...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoLinux 2.6.16 v2.6.16
Linus Torvalds [Mon, 20 Mar 2006 05:53:29 +0000 (21:53 -0800)]
Linux 2.6.16

18 years ago[PATCH] Remove obsolete CREDITS address
Andrea Arcangeli [Sun, 19 Mar 2006 18:04:17 +0000 (19:04 +0100)]
[PATCH] Remove obsolete CREDITS address

This address is going to be obsolete, so I should update it.

18 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Mon, 20 Mar 2006 05:12:00 +0000 (21:12 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
  [MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
  [MIPS] Sibyte: Fix interrupt timer off by one bug.
  [MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
  [MIPS] Protect more of timer_interrupt() by xtime_lock.
  [MIPS] Work around bad code generation for <asm/io.h>.
  [MIPS] Simple patch to power off DBAU1200
  [MIPS] Fix DBAu1550 software power off.
  [MIPS] local_r4k_flush_cache_page fix
  [MIPS] SB1: Fix interrupt disable hazard.
  [MIPS] Get rid of the IP22-specific code in arclib.
  Update MAINTAINERS entry for MIPS.

18 years ago[TG3]: 40-bit DMA workaround part 2
Michael Chan [Sun, 19 Mar 2006 21:21:12 +0000 (13:21 -0800)]
[TG3]: 40-bit DMA workaround part 2

The 40-bit DMA workaround recently implemented for 5714, 5715, and
5780 needs to be expanded because there may be other tg3 devices
behind the EPB Express to PCIX bridge in the 5780 class device.

For example, some 4-port card or mother board designs have 5704 behind
the 5714.

All devices behind the EPB require the 40-bit DMA workaround.

Thanks to Chris Elmquist again for reporting the problem and testing
the patch.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[AX.25]: Fix potencial memory hole.
Ralf Baechle DL5RB [Sun, 19 Mar 2006 21:20:06 +0000 (13:20 -0800)]
[AX.25]: Fix potencial memory hole.

If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.

Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.

The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.

Coverity #651.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PATCH] Kconfig: swap VIDEO_CX88_ALSA and VIDEO_CX88_DVB
Michael Krufky [Wed, 15 Mar 2006 05:36:13 +0000 (02:36 -0300)]
[PATCH] Kconfig: swap VIDEO_CX88_ALSA and VIDEO_CX88_DVB

VIDEO_CX88_ALSA should not be between VIDEO_CX88_DVB and
VIDEO_CX88_DVB_ALL_FRONTENDS

When cx88-alsa was added to cx88/Kconfig, it was added in between
VIDEO_CX88_DVB and VIDEO_CX88_DVB_ALL_FRONTENDS.  This caused
undesireable effects to the appearance of the menu options in
menuconfig.

This fix reorders cx88-alsa and cx88-dvb in Kconfig, to match saa7134,
and restore the correct menuconfig appearance.

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Fixed em28xx based system lockup
Markus Rechberger [Tue, 7 Feb 2006 10:49:13 +0000 (08:49 -0200)]
[PATCH] Fixed em28xx based system lockup

Fixed em28xx based system lockup, device needs to be initialized before
starting the isoc transfer otherwise the system will completly lock up.

Signed-off-by: Markus Rechberger <mrechberger@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] disable unshare(CLONE_VM) for now
Oleg Nesterov [Sat, 18 Mar 2006 17:41:10 +0000 (20:41 +0300)]
[PATCH] disable unshare(CLONE_VM) for now

sys_unshare() does mmput(new_mm).  This is not enough if we have
mm->core_waiters.

This patch is a temporary fix for soon to be released 2.6.16.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
[ Checked with Uli: "I'm not planning to use unshare(CLONE_VM).  It's
  not needed for any functionality planned so far.  What we (as in Red
  Hat) need unshare() for now is the filesystem side." ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
Ralf Baechle [Sat, 18 Mar 2006 16:59:31 +0000 (16:59 +0000)]
[MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
Ralf Baechle [Wed, 15 Mar 2006 00:03:29 +0000 (00:03 +0000)]
[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining
value, however once this counter reaches 0 and the interrupt is raised,
it immediately resets and begins to count down again.

If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after
the timer has reset but prior to cpu 0 processing the interrupt and
taking write_seqlock() in timer_interrupt() it will return a full value
(or close to it) causing time to jump backwards 1ms. Once cpu 0 handles
the interrupt and timer_interrupt() gets far enough along it will jump
forward 1ms.

Fix this problem by implementing mips_hpt_*() on sb1250 using a spare
timer unrelated to the existing periodic interrupt timers. It runs at
1Mhz with a full 23bit counter.  This eliminated the custom
do_gettimeoffset() for sb1250 and allowed use of the generic
fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Sibyte: Fix interrupt timer off by one bug.
Ralf Baechle [Tue, 14 Mar 2006 23:52:47 +0000 (23:52 +0000)]
[MIPS] Sibyte: Fix interrupt timer off by one bug.

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

The timers need to be loaded with 1 less than the desired interval not
the interval itself.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
Ralf Baechle [Tue, 14 Mar 2006 23:47:35 +0000 (23:47 +0000)]
[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

Field width should be 23 bits not 20 bits.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Protect more of timer_interrupt() by xtime_lock.
Ralf Baechle [Tue, 14 Mar 2006 23:46:58 +0000 (23:46 +0000)]
[MIPS] Protect more of timer_interrupt() by xtime_lock.

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

* do_timer() expects the arch-specific handler to take the lock as it
  modifies jiffies[_64] and xtime.
* writing timerhi/lo in timer_interrupt() will mess up
  fixed_rate_gettimeoffset() which reads timerhi/lo.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Work around bad code generation for <asm/io.h>.
Ralf Baechle [Wed, 15 Mar 2006 11:36:31 +0000 (11:36 +0000)]
[MIPS] Work around bad code generation for <asm/io.h>.

If a call to set_io_port_base() was being followed by usage of
mips_io_port_base in the same function gcc was possibly using the old
value due to some clever abuse of const.  Adding a barrier will keep
the optimization and result in correct code with latest gcc.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Simple patch to power off DBAU1200
Matej Kupljen [Wed, 30 Nov 2005 09:20:01 +0000 (10:20 +0100)]
[MIPS] Simple patch to power off DBAU1200

Signed-off-by: Matej Kupljen <matej.kupljen@ultra.si>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Fix DBAu1550 software power off.
Sergei Shtylylov [Tue, 14 Mar 2006 04:20:00 +0000 (07:20 +0300)]
[MIPS] Fix DBAu1550 software power off.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] local_r4k_flush_cache_page fix
Atsushi Nemoto [Mon, 13 Mar 2006 09:23:03 +0000 (18:23 +0900)]
[MIPS] local_r4k_flush_cache_page fix

If dcache_size != icache_size or dcache_size != scache_size, or
set-associative cache, icache/scache does not flushed properly.  Make
blast_?cache_page_indexed() masks its index value correctly.  Also,
use physical address for physically indexed pcache/scache.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] SB1: Fix interrupt disable hazard.
Ralf Baechle [Mon, 13 Mar 2006 16:16:29 +0000 (16:16 +0000)]
[MIPS] SB1: Fix interrupt disable hazard.

The SB1 core has a three cycle interrupt disable hazard but we were
wrongly treating it as fully interlocked.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Get rid of the IP22-specific code in arclib.
Ralf Baechle [Fri, 10 Mar 2006 19:47:17 +0000 (19:47 +0000)]
[MIPS] Get rid of the IP22-specific code in arclib.

This breaks the kernel build if sgiwd93 was configured as a module.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years agoUpdate MAINTAINERS entry for MIPS.
Ralf Baechle [Fri, 10 Mar 2006 13:47:21 +0000 (13:47 +0000)]
Update MAINTAINERS entry for MIPS.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[NET]: Fix race condition in sk_wait_event().
Alexey Kuznetsov [Sat, 18 Mar 2006 00:05:43 +0000 (16:05 -0800)]
[NET]: Fix race condition in sk_wait_event().

It is broken, the condition is checked out of socket lock. It is
wonderful the bug survived for so long time.

[ This fixes bugzilla #6233:
  race condition in tcp_sendmsg when connection became established ]

Signed-off-by: David S. Miller <davem@davemloft.net>
18 years ago[PATCH] fix free swap cache latency
Hugh Dickins [Fri, 17 Mar 2006 07:04:09 +0000 (23:04 -0800)]
[PATCH] fix free swap cache latency

Lee Revell reported 28ms latency when process with lots of swapped memory
exits.

2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all.  We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.

Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] kbuild: fix buffer overflow in modpost
Sam Ravnborg [Fri, 17 Mar 2006 07:04:08 +0000 (23:04 -0800)]
[PATCH] kbuild: fix buffer overflow in modpost

Jiri Benc <jbenc@suse.cz> reported that modpost would stop with SIGABRT if
used with long filepaths.
The error looked like:
>   Building modules, stage 2.
>   MODPOST
> *** glibc detected *** scripts/mod/modpost: realloc(): invalid next size:
+0x0809f588 ***
> [...]

Fix this by allocating at least the required memory + SZ bytes each time.
Before we sometimes ended up allocating too little memory resuting in the
glibc detected bug above.  Based on patch originally submitted by: Jiri
Benc <jbenc@suse.cz>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] nfsservctl(): remove user-triggerable printk
Peter Staubach [Fri, 17 Mar 2006 07:04:02 +0000 (23:04 -0800)]
[PATCH] nfsservctl(): remove user-triggerable printk

A user can use nfsservctl() to spam the logs.

This can happen because the arguments to the nfsservctl() system call are
versioned.  This is a good thing.  However, when a bad version is detected,
the kernel prints a message and then returns an error.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix race in pagevec_strip?
Christoph Lameter [Fri, 17 Mar 2006 07:04:07 +0000 (23:04 -0800)]
[PATCH] fix race in pagevec_strip?

We can call try_to_release_page() with PagePrivate off and a valid
page->mapping This may cause all sorts of trouble for the filesystem
*_releasepage() handlers.  XFS bombs out in that case.

Lock the page before checking for page private.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] dm stripe: Fix bounds
Kevin Corry [Fri, 17 Mar 2006 07:04:03 +0000 (23:04 -0800)]
[PATCH] dm stripe: Fix bounds

The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size.  Under certain conditions, this can
lead to I/O requests going off the end of an underlying device.  This
test-case shows one example.

echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
   dmsetup create stripe0
dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k

This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out

And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100

The patch will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.

This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a
multiple of the chunk-size.

The size of a stripe device must be a multiple of its chunk-size.

(akpm: that typecast is quite gratuitous)

Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] x86: check for online cpus before bringing them up
Srivatsa Vaddagiri [Fri, 17 Mar 2006 07:04:06 +0000 (23:04 -0800)]
[PATCH] x86: check for online cpus before bringing them up

Bryce reported a bug wherein offlining CPU0 (on x86 box) and then
subsequently onlining it resulted in a lockup.

On x86, CPU0 is never offlined.  The subsequent attempt to online CPU0
doesn't take that into account.  It actually tries to bootup the already
booted CPU.  Following patch fixes the problem (as acknowledged by Bryce).
Please consider for inclusion in 2.6.16.

Check if cpu is already online.

Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] v9fs: fix overzealous dropping of dentry which breaks dcache
Eric Van Hensbergen [Fri, 17 Mar 2006 07:04:04 +0000 (23:04 -0800)]
[PATCH] v9fs: fix overzealous dropping of dentry which breaks dcache

There is a d_drop in dir_release which caused problems as it invalidates
dcache entries too soon.  This was likely a part of the wierd cwd behavior
folks were seeing.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] posix-timers: fix requeue accounting when signal is ignored
Roman Zippel [Fri, 17 Mar 2006 07:04:01 +0000 (23:04 -0800)]
[PATCH] posix-timers: fix requeue accounting when signal is ignored

When the posix-timer signal is ignored then the timer is rearmed by the
callback function.  The requeue pending accounting has to be fixed up else
the state might be wrong.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] time_interpolator: add __read_mostly
Christoph Lameter [Fri, 17 Mar 2006 07:04:00 +0000 (23:04 -0800)]
[PATCH] time_interpolator: add __read_mostly

The pointer to the current time interpolator and the current list of time
interpolators are typically only changed during bootup.  Adding
__read_mostly takes them away from possibly hot cachelines.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] page migration: Fail with error if swap not setup
Christoph Lameter [Fri, 17 Mar 2006 07:03:59 +0000 (23:03 -0800)]
[PATCH] page migration: Fail with error if swap not setup

Currently the migration of anonymous pages will silently fail if no swap is
setup.  This patch makes page migration functions check for available swap
and fail with -ENODEV if no swap space is available.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] unshare: Use rcu_assign_pointer when setting sighand
Eric W. Biederman [Thu, 16 Mar 2006 17:31:38 +0000 (10:31 -0700)]
[PATCH] unshare: Use rcu_assign_pointer when setting sighand

The sighand pointer only needs the rcu_read_lock on the
read side.  So only depending on task_lock protection
when setting this pointer is not enough.  We also need
a memory barrier to ensure the initialization is seen first.

Use rcu_assign_pointer as it does this for us, and clearly
documents that we are setting an rcu readable pointer.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[netdrvr] fix array overflows in Chelsio driver
Scott Bardone [Fri, 17 Mar 2006 00:20:40 +0000 (19:20 -0500)]
[netdrvr] fix array overflows in Chelsio driver

Adrian Bunk wrote:
> The Coverity checker spotted the following two array overflows in
> drivers/net/chelsio/sge.c (in both cases, the arrays contain 3
> elements):
[snip]

This is a bug. The array should contain 2 elements.  Here is the fix.

Signed-off-by: Scott Bardone <sbardone@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
18 years ago[PATCH] e1000 endianness bugs
David S. Miller [Wed, 15 Mar 2006 22:26:28 +0000 (14:26 -0800)]
[PATCH] e1000 endianness bugs

return -E_NO_BIG_ENDIAN_TESTING;

[E1000]: Fix 4 missed endianness conversions on RX descriptor fields.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
18 years agoMerge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linvil...
Jeff Garzik [Fri, 17 Mar 2006 00:16:59 +0000 (19:16 -0500)]
Merge branch 'upstream-fixes' of git://git./linux/kernel/git/linville/wireless-2.6