Re: [NLM] 2.6.27.14 breakage when grace period expires

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote:
> On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote:
> > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > > > I'm sorry to inform you but... it seems that there is a similar problem
> > > > in the NLM subsystem as reported previously but this time it is triggered
> > > > when the grace time expires after a reboot.
> > > > 
> > > > Client and server run 2.6.27.14 + previous fix, NFSv3.
> > > > 
> > > > On the client there are three shells running:
> > > > 
> > > > 	while :; do lck -w /mnt/foo 2; done
> > > > 
> > > > The "lck" program is the same as posted before and it obtains an exclusive
> > > > write lock then waits 2 seconds in above invocation (there's probably an
> > > > "fcntl" command equivalent). After an orderly server reboot + grace time
> > > 
> > > How are you rebooting the server?
> > 
> > "reboot"
> 
> Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the
> server is actually sending the reboot notification to the client, and
> that the client is trying to reclaim?  (Wireshark should make this all
> fairly clear.  But capture the traffic with tcpdump -s0 -wtmp.pcap and
> send it to me if you're having trouble interpreting it.)

I have a capture with comment below. It raised so many questions
that I decided to do some more testing, trying to figure out how
it looks when the locking works. This issue now appears to predate the
fuse changes and is also present when both client and server run
2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 +
previous fix as discussed earlier. The full capture is available at
http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and
was started on the server as part of initscripts, right after the reboot
and filtered on client IP address.

Exported by wireshark (filter: nfs or stat or nlm) and condensed:

  #   time      src      prot
  1   0.000000  client:  NFS  V3 GETATTR Call (Reply In 42), FH:0x0308030a
  2   0.000018  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
  5   0.000583  server:  ICMP Destination unreachable (Port unreachable)
  6   0.000589  server:  ICMP Destination unreachable (Port unreachable)
  7   1.891277  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
  8   1.891320  server:  ICMP Destination unreachable (Port unreachable)
  9   5.827053  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
 10   5.827119  server:  ICMP Destination unreachable (Port unreachable)
 11  14.626501  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
 12  14.626587  server:  ICMP Destination unreachable (Port unreachable)
 15  15.726426  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
 16  15.726505  server:  ICMP Destination unreachable (Port unreachable)
 17  17.926284  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
 18  17.926368  server:  ICMP Destination unreachable (Port unreachable)
 25  22.326006  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
 26  22.326090  server:  ICMP Destination unreachable (Port unreachable)
 35  30.022271  client:  NLM  V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0
 36  30.029511  server:  NLM  V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD
 37  30.029660  client:  NLM  V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0
 38  30.029691  client:  NLM  V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0
 39  30.029884  server:  NLM  V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD
 40  30.029914  server:  NLM  V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD
 41  31.125403  client:  NFS  [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
 42  31.127499  server:  NFS  V3 GETATTR Reply (Call In 1)  Directory mode:0755 uid:0 gid:0
 43  31.127942  client:  NFS  V3 GETATTR Call (Reply In 45), FH:0x0308030a
 45  31.129378  server:  NFS  V3 GETATTR Reply (Call In 43)  Directory mode:0755 uid:0 gid:0
 47  31.129958  server:  STAT V1 NOTIFY Call (Reply In 48)
 48  31.130301  client:  STAT V1 NOTIFY Reply (Call In 47)

Reboot notification ok.

 51  35.029968  client:  NLM  V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0
 52  35.030003  client:  NLM  V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0
 53  35.030016  client:  NLM  V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0
 54  35.030085  server:  NLM  V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD
 55  35.030126  server:  NLM  V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD
 56  35.030153  server:  NLM  V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD

The three contending client processes. I don't see a lock registration for
svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The
above goes on for a while. Neither the server or client shows any lock
in /proc/locks at this point.

166 115.028376  client:  NLM  V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0
167 115.028394  client:  NLM  V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0
168 115.028440  server:  NLM  V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD
169 115.028465  server:  NLM  V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD
170 120.027233  client:  NLM  V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0
171 120.027337  server:  NLM  V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD
172 120.028234  client:  NLM  V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0
173 120.028258  client:  NLM  V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0
174 120.030601  server:  NLM  V4 LOCK Reply (Call In 173)
175 120.030656  server:  NLM  V4 LOCK Reply (Call In 172) NLM_BLOCKED

This doesn't add up. There hasn't been a successful unlock for svid:114
(see #213 for that) but still one of the locks is granted.

176 120.030781  client:  NLM  V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0
177 120.030849  server:  NLM  V4 LOCK Reply (Call In 176)

Strange: an identical lock request but with a different rpc xid (i.e. no
packet duplication).

178 120.031078  client:  NFS  V3 GETATTR Call (Reply In 179), FH:0xcafa61cc
179 120.031154  server:  NFS  V3 GETATTR Reply (Call In 178)  Regular File mode:0644 uid:363 gid:1500
180 120.033973  client:  NFS  V3 ACCESS Call (Reply In 181), FH:0x0308030a
181 120.034030  server:  NFS  V3 ACCESS Reply (Call In 180)
182 120.034223  client:  NFS  V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc
183 120.034285  server:  NFS  V3 LOOKUP Reply (Call In 182), FH:0x81685ca0
184 120.034472  client:  NFS  V3 ACCESS Call (Reply In 185), FH:0x0308030c
185 120.034526  server:  NFS  V3 ACCESS Reply (Call In 184)
186 120.034722  client:  NFS  V3 ACCESS Call (Reply In 187), FH:0x0308030c
187 120.034776  server:  NFS  V3 ACCESS Reply (Call In 186)
188 120.034922  client:  NFS  V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest
189 120.034993  server:  NFS  V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc
190 120.035172  client:  NFS  V3 ACCESS Call (Reply In 191), FH:0xcafa61cc
191 120.035230  server:  NFS  V3 ACCESS Reply (Call In 190)
193 122.032218  client:  NLM  V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0
194 122.032253  client:  NLM  V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0
195 122.032343  server:  NLM  V4 UNLOCK Reply (Call In 193)
197 122.032794  server:  NLM  V4 LOCK Reply (Call In 194) NLM_BLOCKED
201 122.033767  server:  NLM  V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0
202 122.034066  client:  NLM  V4 GRANTED_MSG Reply (Call In 201)
205 122.034665  client:  NLM  V4 GRANTED_RES Call (Reply In 206) NLM_DENIED
206 122.034753  server:  NLM  V4 GRANTED_RES Reply (Call In 205)
207 122.036312  client:  NFS  V3 GETATTR Call (Reply In 208), FH:0xcafa61cc
208 122.036394  server:  NFS  V3 GETATTR Reply (Call In 207)  Regular File mode:0644 uid:363 gid:1500
209 122.036611  client:  NLM  V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0
210 122.036674  server:  NLM  V4 LOCK Reply (Call In 209) NLM_BLOCKED
213 125.027091  client:  NLM  V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0
214 125.027194  server:  NLM  V4 UNLOCK Reply (Call In 213)
215 125.029487  client:  NFS  V3 GETATTR Call (Reply In 216), FH:0xcafa61cc
216 125.029570  server:  NFS  V3 GETATTR Reply (Call In 215)  Regular File mode:0644 uid:363 gid:1500
217 125.029836  client:  NLM  V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0
218 125.029895  server:  NLM  V4 LOCK Reply (Call In 217) NLM_BLOCKED
224 152.032157  client:  NLM  V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0
225 152.032283  server:  NLM  V4 LOCK Reply (Call In 224) NLM_BLOCKED
226 152.035103  client:  NLM  V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0
227 152.035157  server:  NLM  V4 LOCK Reply (Call In 226) NLM_BLOCKED
230 155.029676  client:  NLM  V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0
231 155.029761  server:  NLM  V4 LOCK Reply (Call In 230) NLM_BLOCKED

To recap the problem: one of the fcntl calls to obtain a write lock
returns

	lck: fcntl: No locks available

shortly after the grace period expires. After that everything gets stuck,
server holding a write lock with no corresponding client side lock.


IMO looks like the client is to blame, even if/when the server
should/could have accepted UNLOCK during grace (I don't know, I'm not
an expert on that one).

-- 
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux