Migration to self

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Our QE has found that they can crash a client by having a server migrate the client to itself. Who would do that in real life? Is a server allowed to
refer to itself?

The client crashes because it starts a migration, sets the current
nfs_client's slot table to drain, probes the new nfs_client and gets another NFS4ERR_MOVED, schedules another migration, and then sleeps on the client
waitq waiting for the state manager in error recovery.

Then the state manager starts another migration, sets this new nfs_client's slot
table to drain, and the probe_fsinfo() sleeps on the slot table.

Finally someone notices the client is hung, and rpc_killall_tasks kills off everything on the slot table and frees the server, but then the probe_fsinfo that was waiting on the state manager in error recovery wakes up and tries
to read bits from that freed nfs_server:

Unable to handle kernel paging request for data at address 0x656465736b746fa0
Faulting instruction address: 0xd0000000046006a8
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache virtio_balloon nfsd auth_rpcgss nf s_acl lockd grace sunrpc ip_tables xfs libcrc32c virtio_net virtio_console virtio_blk virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 12072 Comm: ::1-manager Not tainted 3.10.0-327.el7.ppc64 #1
task: c00000023678b010 ti: c00000002c3b8000 task.ti: c00000002c3b8000
NIP: d0000000046006a8 LR: d0000000046009bc CTR: c00000000048eba0
REGS: c00000002c3bb240 TRAP: 0300   Not tainted  (3.10.0-327.el7.ppc64)
MSR: 8000000100009032 <SF,EE,ME,IR,DR,RI>  CR: 28002024  XER: 20000000
CFAR: c000000000009368 DAR: 656465736b746fa0 DSISR: 40000000 SOFTE: 1
GPR00: d0000000046009bc c00000002c3bb4c0 d00000000468a9f0 c00000002c3bb530 GPR04: c00000023380e800 c00000002c3bb640 c00000002c3bb660 c00000002c3bb600 GPR08: c00000002c3bb660 656465736b746f70 c00000002c3bb600 d00000000467cf10 GPR12: c00000000048eba0 c000000007b83600 c00000000010c220 c000000233c8dc00 GPR16: c000000234800034 0000000000000010 c000000234a820f0 0000000000000801 GPR20: c000000234810000 c000000233900000 c00000023380e800 c00000000130fe00 GPR24: c000000003eb6600 0000000000000010 c000000237e645c0 c00000002a9aa700 GPR28: c000000235eb0008 c000000233c89c00 c000000235eb0008 c00000023380e800
NIP [d0000000046006a8] .nfs4_call_sync_sequence+0x48/0xa0 [nfsv4]
LR [d0000000046009bc] ._nfs4_server_capabilities+0x7c/0x2a0 [nfsv4]
Call Trace:
[c00000002c3bb4c0] [c000000000923240] .out_of_line_wait_on_bit+0xd0/0xe0 (unreliable) [c00000002c3bb590] [d0000000046009bc] ._nfs4_server_capabilities+0x7c/0x2a0 [nfsv4] [c00000002c3bb690] [d0000000046103fc] .nfs4_server_capabilities+0x3c/0x70 [nfsv4]
[c00000002c3bb730] [d0000000043e1f54] .nfs_probe_fsinfo+0x74/0x730 [nfs]
[c00000002c3bb830] [d00000000463b814] .nfs4_update_server+0x234/0x330 [nfsv4] [c00000002c3bba00] [d000000004638e00] .nfs4_replace_transport+0x200/0x370 [nfsv4] [c00000002c3bbaf0] [d00000000462aab4] .nfs4_try_migration+0x244/0x360 [nfsv4] [c00000002c3bbb90] [d00000000462d328] .nfs4_state_manager+0x6b8/0xc00 [nfsv4] [c00000002c3bbcb0] [d00000000462d898] .nfs4_run_state_manager+0x28/0x50 [nfsv4]
[c00000002c3bbd30] [c00000000010c308] .kthread+0xe8/0xf0
[c00000002c3bbe30] [c00000000000a470] .ret_from_kernel_thread+0x58/0x68

I would like to catch these sort of migration loops before we start
following them, and an easy place to do that is to check if the
nfs_server->nfs_client is already set and is the same as the found client in
nfs4_set_client().  Other than nfs4_try_migration(), the only other
callpaths for nfs4_set_client() are nfs4_init_server() and
nfs4_create_referral_server(), both of which have newly initialized servers.

Alternatively, a fix would need to have rpc_killall_tasks unwind tasks
waiting on the state manager in a series of migrations.

Any thoughts? I'll follow up with a patch for nfs4_set_client() to return
an error rather than set the same client on the server.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux