pandora-kernel.git
11 years agoNFS: Split out remaining NFS v4 inode functions
Bryan Schumaker [Mon, 30 Jul 2012 20:05:21 +0000 (16:05 -0400)]
NFS: Split out remaining NFS v4 inode functions

Somehow I missed this in my previous patch series, but these functions
are only needed by the v4 code and should be moved to a v4-only file.  I
wasn't exactly sure where I should put these functions, so I moved them
into nfs4super.c where I could make them static.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Pass super operations and xattr handlers in the nfs_subversion
Bryan Schumaker [Mon, 30 Jul 2012 20:05:20 +0000 (16:05 -0400)]
NFS: Pass super operations and xattr handlers in the nfs_subversion

I can set all variables in the nfs_fill_super() function, allowing me to
remove the nfs4_fill_super() function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Only initialize the ACL client in the v3 case
Bryan Schumaker [Mon, 30 Jul 2012 20:05:19 +0000 (16:05 -0400)]
NFS: Only initialize the ACL client in the v3 case

v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to
initialize the ACL client only when we are using v3.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Create a try_mount rpc op
Bryan Schumaker [Mon, 30 Jul 2012 20:05:18 +0000 (16:05 -0400)]
NFS: Create a try_mount rpc op

I'm already looking up the nfs subversion in nfs_fs_mount(), so I have
easy access to rpc_ops that used to be difficult to reach.  This allows
me to set up a different mount path for NFS v2/3 and NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Remove the NFS v4 xdev mount function
Bryan Schumaker [Mon, 30 Jul 2012 20:05:17 +0000 (16:05 -0400)]
NFS: Remove the NFS v4 xdev mount function

I can now share this code with the v2 and v3 code by using the NFS
subversion structure.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Add version registering framework
Bryan Schumaker [Mon, 30 Jul 2012 20:05:16 +0000 (16:05 -0400)]
NFS: Add version registering framework

This patch adds in the code to track multiple versions of the NFS
protocol.  I created default structures for v2, v3 and v4 so that each
version can continue to work while I convert them into kernel modules.
I also removed the const parameter from the rpc_version array so that I
can change it at runtime.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Fix a number of bugs in the idmapper
David Howells [Wed, 25 Jul 2012 15:53:36 +0000 (16:53 +0100)]
NFS: Fix a number of bugs in the idmapper

Fix a number of bugs in the NFS idmapper code:

 (1) Only registered key types can be passed to the core keys code, so
     register the legacy idmapper key type.

     This is a requirement because the unregister function cleans up keys
     belonging to that key type so that there aren't dangling pointers to the
     module left behind - including the key->type pointer.

 (2) Rename the legacy key type.  You can't have two key types with the same
     name, and (1) would otherwise require that.

 (3) complete_request_key() must be called in the error path of
     nfs_idmap_legacy_upcall().

 (4) There is one idmap struct for each nfs_client struct.  This means that
     idmap->idmap_key_cons is shared without the use of a lock.  This is a
     problem because key_instantiate_and_link() - as called indirectly by
     idmap_pipe_downcall() - releases anyone waiting for the key to be
     instantiated.

     What happens is that idmap_pipe_downcall() running in the rpc.idmapd
     thread, releases the NFS filesystem in whatever thread that is running in
     to continue.  This may then make another idmapper call, overwriting
     idmap_key_cons before idmap_pipe_downcall() gets the chance to call
     complete_request_key().

     I *think* that reading idmap_key_cons only once, before
     key_instantiate_and_link() is called, and then caching the result in a
     variable is sufficient.

Bug (4) is the cause of:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
CPU 1
Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8801a3645d40  EFLAGS: 00010246
RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
FS:  00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
Stack:
 ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
Call Trace:
 [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100
 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0
 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
 [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc]
 [<ffffffff8117f833>] vfs_write+0xb3/0x180
 [<ffffffff8117fb5a>] sys_write+0x4a/0x90
 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8801a3645d40>
CR2: 0000000000000000

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.4]
11 years agonfs: skip commit in releasepage if we're freeing memory for fs-related reasons
Jeff Layton [Mon, 23 Jul 2012 17:58:51 +0000 (13:58 -0400)]
nfs: skip commit in releasepage if we're freeing memory for fs-related reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
11 years agosunrpc: clarify comments on rpc_make_runnable
Jeff Layton [Mon, 23 Jul 2012 19:51:55 +0000 (15:51 -0400)]
sunrpc: clarify comments on rpc_make_runnable

rpc_make_runnable is not generally called with the queue lock held, unless
it's waking up a task that has been sitting on a waitqueue. This is safe
when the task has not entered the FSM yet, but the comments don't really
spell this out.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agopnfsblock: bail out partial page IO
Peng Tao [Tue, 29 May 2012 05:57:58 +0000 (13:57 +0800)]
pnfsblock: bail out partial page IO

Current block layout driver read/write code assumes page
aligned IO in many places. Add a checker to validate the assumption.
Otherwise there would be data corruption like when application does
open(O_WRONLY) and page unaliged write.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agonfs: fix fl_type tests in NFSv4 code
Jeff Layton [Mon, 23 Jul 2012 19:49:56 +0000 (15:49 -0400)]
nfs: fix fl_type tests in NFSv4 code

fl_type is not a bitmap.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: fix pnfs regression with directio writes
Fred Isaman [Wed, 18 Jul 2012 18:20:50 +0000 (14:20 -0400)]
NFS: fix pnfs regression with directio writes

Commit 57208fa7e51 "NFS: Create an write_pageio_init() function"
did not modify the calls in direct.c, preventing direct io from
using pnfs.  This reintroduces that capability.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: fix pnfs regression with directio reads
Fred Isaman [Wed, 18 Jul 2012 18:20:49 +0000 (14:20 -0400)]
NFS: fix pnfs regression with directio reads

Commit 1abb50886af "NFS: Create an read_pageio_init() function"
did not modify the call in direct.c, preventing direct io from
using pnfs.  This reintroduces that capability.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agosunrpc: clnt: Add missing braces
Joe Perches [Wed, 18 Jul 2012 18:17:11 +0000 (11:17 -0700)]
sunrpc: clnt: Add missing braces

Add a missing set of braces that commit 4e0038b6b24
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.4]
11 years agonfs: fix stub return type warnings
Randy Dunlap [Fri, 27 Jul 2012 18:49:26 +0000 (11:49 -0700)]
nfs: fix stub return type warnings

Fix numerous repeated warnings by making the stub function
void instead of non-void:

fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl':
fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: exit_nfs_v4() shouldn't be an __exit function
Bryan Schumaker [Tue, 17 Jul 2012 19:18:30 +0000 (15:18 -0400)]
NFS: exit_nfs_v4() shouldn't be an __exit function

... yet.  Right now, init_nfs() is calling this function if an error is
encountered when loading the nfs module.  An __exit function can't be
called from one declared as __init.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoSUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
Trond Myklebust [Tue, 17 Jul 2012 18:47:30 +0000 (14:47 -0400)]
SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors

The patch "SUNRPC: Add rpcauth_list_flavors()" introduces a new error
path in gss_mech_list_pseudoflavors, but fails to release the spin lock.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out NFS v4 client functions
Bryan Schumaker [Mon, 16 Jul 2012 20:39:21 +0000 (16:39 -0400)]
NFS: Split out NFS v4 client functions

These functions are only needed by NFS v4, so they can be moved into a
v4 specific file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out the NFS v4 filesystem types
Bryan Schumaker [Mon, 16 Jul 2012 20:39:20 +0000 (16:39 -0400)]
NFS: Split out the NFS v4 filesystem types

This allows me to move the v4 mounting and unmounting functions out of
the generic client and into a file that is only compiled when CONFIG_NFS_V4
is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Create a single nfs_clone_super() function
Bryan Schumaker [Mon, 16 Jul 2012 20:39:19 +0000 (16:39 -0400)]
NFS: Create a single nfs_clone_super() function

v2 and v3 shared a function for this, but v4 implemented something only
slightly different.  Might as well share code whenever possible...

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out NFS v4 server creating code
Bryan Schumaker [Mon, 16 Jul 2012 20:39:18 +0000 (16:39 -0400)]
NFS: Split out NFS v4 server creating code

These functions are specific to NFS v4 and can be moved to nfs4client.c
to keep them out of the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Initialize the NFS v4 client from init_nfs_v4()
Bryan Schumaker [Mon, 16 Jul 2012 20:39:17 +0000 (16:39 -0400)]
NFS: Initialize the NFS v4 client from init_nfs_v4()

And split these functions out of the generic client into a v4 specific
file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Move the v4 getroot code to nfs4getroot.c
Bryan Schumaker [Mon, 16 Jul 2012 20:39:16 +0000 (16:39 -0400)]
NFS: Move the v4 getroot code to nfs4getroot.c

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out NFS v4 file operations
Bryan Schumaker [Mon, 16 Jul 2012 20:39:15 +0000 (16:39 -0400)]
NFS: Split out NFS v4 file operations

This patch moves the NFS v4 file functions into a new file that is only
compiled when CONFIG_NFS_V4 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Initialize v4 sysctls from nfs_init_v4()
Bryan Schumaker [Mon, 16 Jul 2012 20:39:14 +0000 (16:39 -0400)]
NFS: Initialize v4 sysctls from nfs_init_v4()

And split them out of the generic client into their own file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Create an init_nfs_v4() function
Bryan Schumaker [Mon, 16 Jul 2012 20:39:13 +0000 (16:39 -0400)]
NFS: Create an init_nfs_v4() function

I want to initialize all of NFS v4 in a single function that will
eventually be used as the v4 module init function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out NFS v4 inode operations
Bryan Schumaker [Mon, 16 Jul 2012 20:39:12 +0000 (16:39 -0400)]
NFS: Split out NFS v4 inode operations

The NFS v4 file inode operations are already already in nfs4proc.c, so
this patch just needs to move the directory operations to the same file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out NFS v3 inode operations
Bryan Schumaker [Mon, 16 Jul 2012 20:39:11 +0000 (16:39 -0400)]
NFS: Split out NFS v3 inode operations

This patch moves the NFS v3 file and directory inode functions into
files that are only compiled whet CONFIG_NFS_V3 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Split out NFS v2 inode operations
Bryan Schumaker [Mon, 16 Jul 2012 20:39:10 +0000 (16:39 -0400)]
NFS: Split out NFS v2 inode operations

This patch moves the NFS v2 file and directory inode functions into
files that are only compiled whet CONFIG_NFS_V2 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Clean up nfs4_proc_setclientid() and friends
Chuck Lever [Wed, 11 Jul 2012 20:30:59 +0000 (16:30 -0400)]
NFS: Clean up nfs4_proc_setclientid() and friends

Add documenting comments and appropriate debugging messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Treat NFS4ERR_CLID_INUSE as a fatal error
Chuck Lever [Wed, 11 Jul 2012 20:30:50 +0000 (16:30 -0400)]
NFS: Treat NFS4ERR_CLID_INUSE as a fatal error

For NFSv4 minor version 0, currently the cl_id_uniquifier allows the
Linux client to generate a unique nfs_client_id4 string whenever a
server replies with NFS4ERR_CLID_INUSE.

This implementation seems to be based on a flawed reading of RFC
3530.  NFS4ERR_CLID_INUSE actually means that the client has presented
this nfs_client_id4 string with a different principal at some time in
the past, and that lease is still in use on the server.

For a Linux client this might be rather difficult to achieve: the
authentication flavor is named right in the nfs_client_id4.id
string.  If we change flavors, we change strings automatically.

So, practically speaking, NFS4ERR_CLID_INUSE means there is some other
client using our string.  There is not much that can be done to
recover automatically.  Let's make it a permanent error.

Remove the recovery logic in nfs4_proc_setclientid(), and remove the
cl_id_uniquifier field from the nfs_client data structure.  And,
remove the authentication flavor from the nfs_client_id4 string.

Keeping the authentication flavor in the nfs_client_id4.id string
means that we could have a separate lease for each authentication
flavor used by mounts on the client.  But we want just one lease for
all the mounts on this client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: When state recovery fails, waiting tasks should exit
Chuck Lever [Wed, 11 Jul 2012 20:30:41 +0000 (16:30 -0400)]
NFS: When state recovery fails, waiting tasks should exit

NFSv4 state recovery is not always successful.  Failure is signalled
by setting the nfs_client.cl_cons_state to a negative (errno) value,
then waking waiters.

Currently this can happen only during mount processing.  I'm about to
add an explicit case where state recovery failure during normal
operation should force all NFS requests waiting on that state recovery
to exit.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoSUNRPC: Add rpcauth_list_flavors()
Chuck Lever [Wed, 11 Jul 2012 20:31:08 +0000 (16:31 -0400)]
SUNRPC: Add rpcauth_list_flavors()

The gss_mech_list_pseudoflavors() function provides a list of
currently registered GSS pseudoflavors.  This list does not include
any non-GSS flavors that have been registered with the RPC client.
nfs4_find_root_sec() currently adds these extra flavors by hand.

Instead, nfs4_find_root_sec() should be looking at the set of flavors
that have been explicitly registered via rpcauth_register().  And,
other areas of code will soon need the same kind of list that
contains all flavors the kernel currently knows about (see below).

Rather than cloning the open-coded logic in nfs4_find_root_sec() to
those new places, introduce a generic RPC function that generates a
full list of registered auth flavors and pseudoflavors.

A new rpc_authops method is added that lists a flavor's
pseudoflavors, if it has any.  I encountered an interesting module
loader loop when I tried to get the RPC client to invoke
gss_mech_list_pseudoflavors() by name.

This patch is a pre-requisite for server trunking discovery, and a
pre-requisite for fixing up the in-kernel mount client to do better
automatic security flavor selection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: nfs_getaclargs.acl_len is a size_t
Chuck Lever [Wed, 11 Jul 2012 20:30:32 +0000 (16:30 -0400)]
NFS: nfs_getaclargs.acl_len is a size_t

Squelch compiler warnings:

fs/nfs/nfs4proc.c: In function â€˜__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]

Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
acl data", Dec 7, 2011.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Clean up TEST_STATEID and FREE_STATEID error reporting
Chuck Lever [Wed, 11 Jul 2012 20:30:23 +0000 (16:30 -0400)]
NFS: Clean up TEST_STATEID and FREE_STATEID error reporting

As a finishing touch, add appropriate documenting comments and some
debugging printk's.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Clean up nfs41_check_expired_stateid()
Chuck Lever [Wed, 11 Jul 2012 20:30:14 +0000 (16:30 -0400)]
NFS: Clean up nfs41_check_expired_stateid()

Clean up: Instead of open-coded flag manipulation, use test_bit() and
clear_bit() just like all other accessors of the state->flag field.
This also eliminates several unnecessary implicit integer type
conversions.

To make it absolutely clear what is going on, a number of comments
are introduced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: State reclaim clears OPEN and LOCK state
Chuck Lever [Wed, 11 Jul 2012 20:30:05 +0000 (16:30 -0400)]
NFS: State reclaim clears OPEN and LOCK state

The "state->flags & flags" test in nfs41_check_expired_stateid()
allows the state manager to squelch a TEST_STATEID operation when
it is known for sure that a state ID is no longer valid.  If the
lease was purged, for example, the client already knows that state
ID is now defunct.

But open recovery is still needed for that inode.

To force a call to nfs4_open_expired(), change the default return
value for nfs41_check_expired_stateid() to force open recovery, and
the default return value for nfs41_check_locks() to force lock
recovery, if the requested flags are clear.  Fix suggested by Bryan
Schumaker.

Also, the presence of a delegation state ID must not prevent normal
open recovery.  The delegation state ID must be cleared if it was
revoked, but once cleared I don't think it's presence or absence has
any bearing on whether open recovery is still needed.  So the logic
is adjusted to ignore the TEST_STATEID result for the delegation
state ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Don't free a state ID the server does not recognize
Chuck Lever [Wed, 11 Jul 2012 20:29:56 +0000 (16:29 -0400)]
NFS: Don't free a state ID the server does not recognize

The result of a TEST_STATEID operation can indicate a few different
things:

  o If NFS_OK is returned, then the client can continue using the
    state ID under test, and skip recovery.

  o RFC 5661 says that if the state ID was revoked, then the client
    must perform an explicit FREE_STATEID before trying to re-open.

  o If the server doesn't recognize the state ID at all, then no
    FREE_STATEID is needed, and the client can immediately continue
    with open recovery.

Let's err on the side of caution: if the server clearly tells us the
state ID is unknown, we skip the FREE_STATEID.  For any other error,
we issue a FREE_STATEID.  Sometimes that FREE_STATEID will be
unnecessary, but leaving unused state IDs on the server needlessly
ties up resources.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFS: Fix up TEST_STATEID and FREE_STATEID return code handling
Chuck Lever [Wed, 11 Jul 2012 20:29:45 +0000 (16:29 -0400)]
NFS: Fix up TEST_STATEID and FREE_STATEID return code handling

The TEST_STATEID and FREE_STATEID operations can return
-NFS4ERR_BAD_STATEID, -NFS4ERR_OLD_STATEID, or -NFS4ERR_DEADSESSION.

nfs41_{test,free}_stateid() should not pass these errors to
nfs4_handle_exception() during state recovery, since that will
recursively kick off state recovery again, resulting in a deadlock.

In particular, when the TEST_STATEID operation returns NFS4_OK,
res.status can contain one of these errors.  _nfs41_test_stateid()
replaces NFS4_OK with the value in res.status, which is then returned
to callers.

But res.status is not passed through nfs4_stat_to_errno(), and thus is
a positive NFS4ERR value.  Currently callers are only interested in
!NFS4_OK, and nfs4_handle_exception() ignores positive values.

Thus the res.status values are currently ignored by
nfs4_handle_exception() and won't cause the deadlock above.  Thanks to
this missing negative, it is only when these operations fail (which
is very rare) that a deadlock can occur.

Bryan agrees the original intent was to return res.status as a
negative NFS4ERR value to callers of nfs41_test_stateid().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list
Andy Adamson [Wed, 20 Jun 2012 19:03:34 +0000 (15:03 -0400)]
NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list

mark_matching_lsegs_invalid() resets the mds_threshold counters and can
dereference the layout hdr on an initial empty plh_segs list. It returns 0 both
in the case of an initial empty list and in a non-emtpy list that was cleared
by calls to mark_lseg_invalid.

Don't send a LAYOUTRETURN if the list was initially empty.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1 mark layout when already returned
Andy Adamson [Wed, 20 Jun 2012 19:03:33 +0000 (15:03 -0400)]
NFSv4.1 mark layout when already returned

When the file layout driver is fencing a DS, _pnfs_return_layout can be
called mulitple times per inode due to in-flight i/o referencing lsegs on it's
plh_segs list.

Remember that LAYOUTRETURN has been called, and do not call it again.
Allow LAYOUTRETURNs after a subsequent LAYOUTGET.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1 don't send LAYOUTCOMMIT if data resent through MDS
Andy Adamson [Wed, 20 Jun 2012 19:03:32 +0000 (15:03 -0400)]
NFSv4.1 don't send LAYOUTCOMMIT if data resent through MDS

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoNFSv4.1 return the LAYOUT for each file with failed DS connection I/O
Andy Adamson [Wed, 20 Jun 2012 19:03:31 +0000 (15:03 -0400)]
NFSv4.1 return the LAYOUT for each file with failed DS connection I/O

First mark the deviceid invalid to prevent any future use. Then fence all
files involved in I/O to a DS with a connection error by sending a
LAYOUTRETURN.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11 years agoMerge commit '9249e17fe094d853d1ef7475dd559a2cc7e23d42' into nfs-for-3.6
Trond Myklebust [Mon, 16 Jul 2012 16:01:42 +0000 (12:01 -0400)]
Merge commit '9249e17fe094d853d1ef7475dd559a2cc7e23d42' into nfs-for-3.6

Resolve conflicts with the VFS atomic open and sget changes.

Conflicts:
fs/nfs/nfs4proc.c

11 years agoVFS: Pass mount flags to sget()
David Howells [Mon, 25 Jun 2012 11:55:37 +0000 (12:55 +0100)]
VFS: Pass mount flags to sget()

Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called.  They could also be passed to the
compare function.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoVFS: Comment mount following code
David Howells [Mon, 25 Jun 2012 11:55:28 +0000 (12:55 +0100)]
VFS: Comment mount following code

Add comments describing what the directions "up" and "down" mean and ref count
handling to the VFS mount following family of functions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoVFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
David Howells [Mon, 25 Jun 2012 11:55:18 +0000 (12:55 +0100)]
VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors

copy_tree() can theoretically fail in a case other than ENOMEM, but always
returns NULL which is interpreted by callers as -ENOMEM.  Change it to return
an explicit error.

Also change clone_mnt() for consistency and because union mounts will add new
error cases.

Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix.
[AV: folded braino fix by Dan Carpenter]

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Valerie Aurora <valerie.aurora@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoVFS: Make chown() and lchown() call fchownat()
David Howells [Mon, 25 Jun 2012 11:55:09 +0000 (12:55 +0100)]
VFS: Make chown() and lchown() call fchownat()

Make the chown() and lchown() syscalls jump to the fchownat() syscall with the
appropriate extra arguments.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodo_dentry_open(): close the race with mark_files_ro() in failure exit
Al Viro [Sat, 23 Jun 2012 18:49:45 +0000 (22:49 +0400)]
do_dentry_open(): close the race with mark_files_ro() in failure exit

we want to take it out of mark_files_ro() reach *before* we start
checking if we ought to drop write access.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomark_files_ro(): don't bother with mntget/mntput
Al Viro [Sat, 23 Jun 2012 18:41:54 +0000 (22:41 +0400)]
mark_files_ro(): don't bother with mntget/mntput

mnt_drop_write_file() is safe under any lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonotify_change(): check that i_mutex is held
Andrew Morton [Tue, 19 Jun 2012 23:55:58 +0000 (09:55 +1000)]
notify_change(): check that i_mutex is held

Cc: Djalal Harouni <tixxdz@opendz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs: add nd_jump_link
Christoph Hellwig [Mon, 18 Jun 2012 14:47:04 +0000 (10:47 -0400)]
fs: add nd_jump_link

Add a helper that abstracts out the jump to an already parsed struct path
from ->follow_link operation from procfs.  Not only does this clean up
the code by moving the two sides of this game into a single helper, but
it also prepares for making struct nameidata private to namei.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs: move path_put on failure out of ->follow_link
Christoph Hellwig [Mon, 18 Jun 2012 14:47:03 +0000 (10:47 -0400)]
fs: move path_put on failure out of ->follow_link

Currently the non-nd_set_link based versions of ->follow_link are expected
to do a path_put(&nd->path) on failure.  This calling convention is unexpected,
undocumented and doesn't match what the nd_set_link-based instances do.

Move the path_put out of the only non-nd_set_link based ->follow_link
instance into the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodebugfs: get rid of useless arguments to debugfs_{mkdir,symlink}
Al Viro [Sun, 10 Jun 2012 00:40:20 +0000 (20:40 -0400)]
debugfs: get rid of useless arguments to debugfs_{mkdir,symlink}

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodebugfs: fold debugfs_create_by_name() into the only caller
Al Viro [Sun, 10 Jun 2012 00:33:28 +0000 (20:33 -0400)]
debugfs: fold debugfs_create_by_name() into the only caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodebugfs: make sure that debugfs_create_file() gets used only for regulars
Al Viro [Sun, 10 Jun 2012 00:28:22 +0000 (20:28 -0400)]
debugfs: make sure that debugfs_create_file() gets used only for regulars

It, debugfs_create_dir() and debugfs_create_link() use the common helper
now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago__d_unalias() should refuse to move mountpoints
Al Viro [Fri, 8 Jun 2012 19:59:33 +0000 (15:59 -0400)]
__d_unalias() should refuse to move mountpoints

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agosysfs: just use d_materialise_unique()
Al Viro [Fri, 8 Jun 2012 00:56:54 +0000 (20:56 -0400)]
sysfs: just use d_materialise_unique()

same as for nfs et.al.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agosysfs: switch to ->s_d_op and ->d_release()
Al Viro [Fri, 8 Jun 2012 00:51:39 +0000 (20:51 -0400)]
sysfs: switch to ->s_d_op and ->d_release()

a) ->d_iput() is wrong here - what we do to inode is completely usual, it's
dentry->d_fsdata that we want to drop.  Just use ->d_release().

b) switch to ->s_d_op - no need to play with d_set_d_op()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoget rid of kern_path_parent()
Al Viro [Thu, 14 Jun 2012 23:01:42 +0000 (03:01 +0400)]
get rid of kern_path_parent()

all callers want the same thing, actually - a kinda-sorta analog of
kern_path_create().  I.e. they want parent vfsmount/dentry (with
->i_mutex held, to make sure the child dentry is still their child)
+ the child dentry.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>

11 years agoVFS: Fix the banner comment on lookup_open()
David Howells [Thu, 14 Jun 2012 15:13:46 +0000 (16:13 +0100)]
VFS: Fix the banner comment on lookup_open()

Since commit 197e37d9, the banner comment on lookup_open() no longer matches
what the function returns.  It used to return a struct file pointer or NULL and
now it returns an integer and is passed the struct file pointer it is to use
amongst its arguments.  Update the comment to reflect this.

Also add a banner comment to atomic_open().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodon't pass nameidata * to vfs_create()
Al Viro [Sun, 10 Jun 2012 22:09:36 +0000 (18:09 -0400)]
don't pass nameidata * to vfs_create()

all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodon't pass nameidata to ->create()
Al Viro [Sun, 10 Jun 2012 22:05:36 +0000 (18:05 -0400)]
don't pass nameidata to ->create()

boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real()
Al Viro [Sun, 10 Jun 2012 21:17:17 +0000 (17:17 -0400)]
fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agostop passing nameidata to ->lookup()
Al Viro [Sun, 10 Jun 2012 21:13:09 +0000 (17:13 -0400)]
stop passing nameidata to ->lookup()

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs/namei.c: don't pass namedata to lookup_dcache()
Al Viro [Fri, 22 Jun 2012 08:42:10 +0000 (12:42 +0400)]
fs/namei.c: don't pass namedata to lookup_dcache()

just the flags...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs/namei.c: don't pass nameidata to d_revalidate()
Al Viro [Sun, 10 Jun 2012 20:10:59 +0000 (16:10 -0400)]
fs/namei.c: don't pass nameidata to d_revalidate()

since the method wrapped by it doesn't need that anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agostop passing nameidata * to ->d_revalidate()
Al Viro [Sun, 10 Jun 2012 20:03:43 +0000 (16:03 -0400)]
stop passing nameidata * to ->d_revalidate()

Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible
Al Viro [Sun, 10 Jun 2012 19:36:40 +0000 (15:36 -0400)]
fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonfs_lookup_verify_inode() - nd is *always* non-NULL here
Al Viro [Sun, 10 Jun 2012 19:33:51 +0000 (15:33 -0400)]
nfs_lookup_verify_inode() - nd is *always* non-NULL here

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoswitch nfs_lookup_check_intent() away from nameidata
Al Viro [Sun, 10 Jun 2012 19:18:15 +0000 (15:18 -0400)]
switch nfs_lookup_check_intent() away from nameidata

just pass the flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodo_dentry_open(): take initialization of file->f_path to caller
Al Viro [Sun, 10 Jun 2012 18:32:45 +0000 (14:32 -0400)]
do_dentry_open(): take initialization of file->f_path to caller

... and get rid of a couple of arguments and a pointless reassignment
in finish_open() case.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofold __dentry_open() into its sole caller
Al Viro [Sun, 10 Jun 2012 18:24:38 +0000 (14:24 -0400)]
fold __dentry_open() into its sole caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoswitch do_dentry_open() to returning int
Al Viro [Sun, 10 Jun 2012 18:22:04 +0000 (14:22 -0400)]
switch do_dentry_open() to returning int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomake finish_no_open() return int
Al Viro [Sun, 10 Jun 2012 10:48:09 +0000 (06:48 -0400)]
make finish_no_open() return int

namely, 1 ;-)  That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofs/namei.c: get do_last() and friends return int
Al Viro [Fri, 22 Jun 2012 08:41:10 +0000 (12:41 +0400)]
fs/namei.c: get do_last() and friends return int

Same conventions as for ->atomic_open().  Trimmed the
forest of labels a bit, while we are at it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agokill struct opendata
Al Viro [Fri, 22 Jun 2012 08:40:19 +0000 (12:40 +0400)]
kill struct opendata

Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agokill opendata->{mnt,dentry}
Al Viro [Sun, 10 Jun 2012 09:55:37 +0000 (05:55 -0400)]
kill opendata->{mnt,dentry}

->filp->f_path is there for purpose...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomake ->atomic_open() return int
Al Viro [Fri, 22 Jun 2012 08:39:14 +0000 (12:39 +0400)]
make ->atomic_open() return int

Change of calling conventions:
old new
NULL 1
file 0
ERR_PTR(-ve) -ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodon't modify od->filp at all
Al Viro [Sun, 10 Jun 2012 09:04:43 +0000 (05:04 -0400)]
don't modify od->filp at all

make put_filp() conditional on flag set by finish_open()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago->atomic_open() prototype change - pass int * instead of bool *
Al Viro [Sun, 10 Jun 2012 09:01:45 +0000 (05:01 -0400)]
->atomic_open() prototype change - pass int * instead of bool *

... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: move O_DIRECT check to common code
Miklos Szeredi [Tue, 5 Jun 2012 13:10:32 +0000 (15:10 +0200)]
vfs: move O_DIRECT check to common code

Perform open_check_o_direct() in a common place in do_last after opening the
file.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: do_last(): clean up retry
Miklos Szeredi [Tue, 5 Jun 2012 13:10:31 +0000 (15:10 +0200)]
vfs: do_last(): clean up retry

Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: do_last(): clean up bool
Miklos Szeredi [Tue, 5 Jun 2012 13:10:30 +0000 (15:10 +0200)]
vfs: do_last(): clean up bool

Consistently use bool for boolean values in do_last().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: do_last(): clean up labels
Miklos Szeredi [Tue, 5 Jun 2012 13:10:29 +0000 (15:10 +0200)]
vfs: do_last(): clean up labels

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: do_last(): clean up error handling
Miklos Szeredi [Tue, 5 Jun 2012 13:10:28 +0000 (15:10 +0200)]
vfs: do_last(): clean up error handling

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: remove open intents from nameidata
Miklos Szeredi [Tue, 5 Jun 2012 13:10:27 +0000 (15:10 +0200)]
vfs: remove open intents from nameidata

All users of open intents have been converted to use ->atomic_{open,create}.

This patch gets rid of nd->intent.open and related infrastructure.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago9p: implement i_op->atomic_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:26 +0000 (15:10 +0200)]
9p: implement i_op->atomic_open()

Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoceph: implement i_op->atomic_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:25 +0000 (15:10 +0200)]
ceph: implement i_op->atomic_open()

Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoceph: remove unused arg from ceph_lookup_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:24 +0000 (15:10 +0200)]
ceph: remove unused arg from ceph_lookup_open()

What was the purpose of this?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agocifs: implement i_op->atomic_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:23 +0000 (15:10 +0200)]
cifs: implement i_op->atomic_open()

Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steve French <sfrench@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofuse: implement i_op->atomic_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:22 +0000 (15:10 +0200)]
fuse: implement i_op->atomic_open()

Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonfs: don't use intents for checking atomic open
Miklos Szeredi [Tue, 5 Jun 2012 13:10:21 +0000 (15:10 +0200)]
nfs: don't use intents for checking atomic open

is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.

It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail.  This makes little sense now that this
function isn't used for determining whether to actually open the file or not.

The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonfs: don't use nd->intent.open.flags
Miklos Szeredi [Tue, 5 Jun 2012 13:10:20 +0000 (15:10 +0200)]
nfs: don't use nd->intent.open.flags

Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonfs: clean up ->create in nfs_rpc_ops
Miklos Szeredi [Tue, 5 Jun 2012 13:10:19 +0000 (15:10 +0200)]
nfs: clean up ->create in nfs_rpc_ops

Don't pass nfs_open_context() to ->create().  Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents.  That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.

Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain.  So allocate a context here.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonfs: implement i_op->atomic_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:18 +0000 (15:10 +0200)]
nfs: implement i_op->atomic_open()

Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: add i_op->atomic_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:17 +0000 (15:10 +0200)]
vfs: add i_op->atomic_open()

Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation.  If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup.  Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open().  If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place.  This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: lookup_open(): expand lookup_hash()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:16 +0000 (15:10 +0200)]
vfs: lookup_open(): expand lookup_hash()

Copy __lookup_hash() into lookup_open().  The next patch will insert the atomic
open call just before the real lookup.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: add lookup_open()
Miklos Szeredi [Tue, 5 Jun 2012 13:10:15 +0000 (15:10 +0200)]
vfs: add lookup_open()

Split out lookup + maybe create from do_last().  This is the part under i_mutex
protection.

The function is called lookup_open() and returns a filp even though the open
part is not used yet.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfs: do_last(): common slow lookup
Miklos Szeredi [Tue, 5 Jun 2012 13:10:14 +0000 (15:10 +0200)]
vfs: do_last(): common slow lookup

Make the slow lookup part of O_CREAT and non-O_CREAT opens common.

This allows atomic_open to be hooked into the slow lookup part.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>