Re: Regression caused by commit c54f24e3 (nfsd: fix performance-limiting session calculation)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Bruce,


On 7/2/19 11:59 PM, Paul Menzel wrote:

> Could it be that commit c54f24e3 (nfsd: fix performance-limiting
> session calculation) causes a regression on big memory machines (1
> TB)?
> 
>> From c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 Mon Sep 17 00:00:00 2001
>> From: "J. Bruce Fields" <bfields@xxxxxxxxxx>
>> Date: Thu, 21 Feb 2019 10:47:00 -0500
>> Subject: [PATCH] nfsd: fix performance-limiting session calculation
>>
>> We're unintentionally limiting the number of slots per nfsv4.1 session
>> to 10.  Often more than 10 simultaneous RPCs are needed for the best
>> performance.
>>
>> This calculation was meant to prevent any one client from using up more
>> than a third of the limit we set for total memory use across all clients
>> and sessions.  Instead, it's limiting the client to a third of the
>> maximum for a single session.
>>
>> Fix this.
>>
>> Reported-by: Chris Tracy <ctracy@xxxxxxxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches"
>> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
>> ---
>>  fs/nfsd/nfs4state.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index fb3c9844c82a..6a45fb00c5fc 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1544,16 +1544,16 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
>>  {
>>      u32 slotsize = slot_bytes(ca);
>>      u32 num = ca->maxreqs;
>> -    int avail;
>> +    unsigned long avail, total_avail;
>>  
>>      spin_lock(&nfsd_drc_lock);
>> -    avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION,
>> -            nfsd_drc_max_mem - nfsd_drc_mem_used);
>> +    total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
>> +    avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
>>      /*
>>       * Never use more than a third of the remaining memory,
>>       * unless it's the only way to give this client a slot:
>>       */
>> -    avail = clamp_t(int, avail, slotsize, avail/3);
>> +    avail = clamp_t(int, avail, slotsize, total_avail/3);
>>      num = min_t(int, num, avail / slotsize);
>>      nfsd_drc_mem_used += num * slotsize;
>>      spin_unlock(&nfsd_drc_lock);
> 
> Booting a 80 threads, 1 TB server with Linux 4.19.56 and Linux
> 5.2-rc7 causes connections problems for the clients. The problems do
> not happen on servers with just 96 GB memory for example. Bisecting
> points to the two commits below (and I can only continue tomorrow).
> 
> c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 (nfsd: fix performance-limiting session calculation)
> 8127d82705998568b52ac724e28e00941538083d (NFS: Don't recoalesce on error in nfs_pageio_complete_mirror())
> 
> If you have things I could do to verify this besides reverting it
> tomorrow, please tell. It’d be great if it could be fixed before Linux
> 5.2 is released.

Reverting the suspected commit c54f24e3 indeed fixes the regression.
Please tell me how to continue. Should the commit be reverted or can
a fix be made shortly?

On a running system, `rpcdebug -m nfsd -s all` and
`rpcdebug -m rpc -s all` do not give a lot of information, as the
interesting values are set during module initialization.


Kind regards,

Paul


[  765.444765] svc: socket 0000000093976859 TCP (listen) state change 10
[  765.444784] svc: socket 0000000015376384 TCP (listen) state change 1
[  765.444869] svc: tcp_accept 000000002ad76b70 sock 000000005eae2611
[  765.444879] nfsd: connect from 141.14.17.51, port=785
[  765.444881] svc: svc_setup_socket 000000003fe99ad1
[  765.444883] setting up TCP socket for reading
[  765.444885] svc: svc_setup_socket created 000000006952397a (inet 0000000015376384), listen 0 close 0
[  765.444901] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.444904] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  765.444907] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.444912] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  765.444913] svc: TCP record, 40 bytes
[  765.444919] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 40
[  765.444920] svc: TCP final record (40 bytes)
[  765.444926] svc: svc_authenticate (0)
[  765.444930] svc: calling dispatcher
[  765.444931] nfsd_dispatch: vers 4 proc 0
[  765.444954] svc: socket 000000006952397a sendto([000000005133ae9c 28... ], 28) = 28 (addr 141.14.17.51, port=785)
[  765.444963] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.444967] svc: tcp_accept 000000002ad76b70 sock 000000005eae2611
[  765.444975] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.444980] svc: server 00000000be144e7a, pool 0, transport 000000006952397a, inuse=2
[  765.444983] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.444988] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  765.444989] RPC: TCP recv_record got -11
[  765.444990] RPC: TCP recvfrom got EAGAIN
[  765.444993] svc: server 00000000be144e7a waiting for data (to = 3600000)
[  765.445020] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[  765.445023] svc: server 0000000011abceff waiting for data (to = 3600000)
[  765.445177] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[  765.445401] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  765.445411] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.445414] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  765.445415] svc: TCP record, 252 bytes
[  765.445418] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 252
[  765.445419] svc: TCP final record (252 bytes)
[  765.445427] svc: svc_authenticate (1)
[  765.445431] svc: calling dispatcher
[  765.445432] nfsd_dispatch: vers 4 proc 1
[  765.445436] nfsd4_exchange_id rqstp=00000000aad03597 exid=000000000472f450 clname.len=32 clname.data=00000000ed18490f ip_addr=141.14.17.51 flags 101, spa_how 0
[  765.445442] renewing client (clientid 5d1c72c1/ed57da3d)
[  765.445443] nfsd4_exchange_id seqid 0 flags 10001
[  765.445444] nfsv4 compound returned 0
[  765.445458] svc: socket 000000006952397a sendto([00000000d62d9bb6 136... ], 136) = 136 (addr 141.14.17.51, port=785)
[  765.445460] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.445462] svc: server 000000002b852bb6, pool 0, transport 000000006952397a, inuse=2
[  765.445464] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.445467] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  765.445472] RPC: TCP recv_record got -11
[  765.445473] RPC: TCP recvfrom got EAGAIN
[  765.445475] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[  765.445682] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[  765.445760] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  765.445762] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.445765] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  765.445769] svc: TCP record, 252 bytes
[  765.445773] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 252
[  765.445774] svc: TCP final record (252 bytes)
[  765.445779] svc: svc_authenticate (1)
[  765.445784] svc: calling dispatcher
[  765.445785] nfsd_dispatch: vers 4 proc 1
[  765.445789] nfsd4_exchange_id rqstp=00000000aad03597 exid=000000000472f450 clname.len=32 clname.data=00000000ed18490f ip_addr=141.14.17.51 flags 101, spa_how 0
[  765.445795] renewing client (clientid 5d1c72c1/ed57da3e)
[  765.445795] nfsd4_exchange_id seqid 0 flags 10001
[  765.445854] svc: server 000000002b852bb6, pool 0, transport 000000006952397a, inuse=3
[  765.445855] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.445859] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  765.445860] RPC: TCP recv_record got -11
[  765.445860] RPC: TCP recvfrom got EAGAIN
[  765.445862] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[  765.445995] nfsv4 compound returned 0
[  765.446013] svc: socket 000000006952397a sendto([000000006f8e68ac 136... ], 136) = 136 (addr 141.14.17.51, port=785)
[  765.446020] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.446178] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[  765.446193] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  765.446198] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.446200] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  765.446201] svc: TCP record, 220 bytes
[  765.446205] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[  765.446222] svc: TCP final record (220 bytes)
[  765.446226] svc: svc_authenticate (1)
[  765.446229] svc: calling dispatcher
[  765.446230] nfsd_dispatch: vers 4 proc 1
[  765.446233] nfsv4 compound returned 10008
[  765.446243] svc: socket 000000006952397a sendto([00000000dda22c57 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[  765.446244] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.446250] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  765.446251] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  765.446253] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  765.446253] RPC: TCP recv_record got -11
[  765.446254] RPC: TCP recvfrom got EAGAIN
[  765.446257] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  765.446397] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[  766.500412] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[  766.500445] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  766.500448] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  766.500452] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  766.500454] svc: TCP record, 220 bytes
[  766.500459] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[  766.500461] svc: TCP final record (220 bytes)
[  766.500467] svc: svc_authenticate (1)
[  766.500472] svc: calling dispatcher
[  766.500474] nfsd_dispatch: vers 4 proc 1
[  766.500480] nfsv4 compound returned 10008
[  766.500498] svc: socket 000000006952397a sendto([000000006f8e68ac 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[  766.500505] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  766.500509] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  766.500512] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  766.500515] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  766.500516] RPC: TCP recv_record got -11
[  766.500517] RPC: TCP recvfrom got EAGAIN
[  766.500519] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  766.500546] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[  767.524372] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[  767.524405] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  767.524408] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  767.524413] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  767.524415] svc: TCP record, 220 bytes
[  767.524420] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[  767.524422] svc: TCP final record (220 bytes)
[  767.524428] svc: svc_authenticate (1)
[  767.524434] svc: calling dispatcher
[  767.524435] nfsd_dispatch: vers 4 proc 1
[  767.524441] nfsv4 compound returned 10008
[  767.524448] svc: server 000000002b852bb6, pool 0, transport 000000006952397a, inuse=3
[  767.524451] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  767.524470] svc: socket 000000006952397a sendto([00000000dda22c57 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[  767.524474] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  767.524476] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  767.524477] RPC: TCP recv_record got -11
[  767.524478] RPC: TCP recvfrom got EAGAIN
[  767.524482] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[  768.548339] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[  768.548372] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  768.548375] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  768.548379] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[  768.548381] svc: TCP record, 220 bytes
[  768.548386] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[  768.548387] svc: TCP final record (220 bytes)
[  768.548394] svc: svc_authenticate (1)
[  768.548400] svc: calling dispatcher
[  768.548401] nfsd_dispatch: vers 4 proc 1
[  768.548407] nfsv4 compound returned 10008
[  768.548425] svc: socket 000000006952397a sendto([000000006f8e68ac 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[  768.548432] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  768.548436] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[  768.548439] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[  768.548442] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[  768.548443] RPC: TCP recv_record got -11
[  768.548444] RPC: TCP recvfrom got EAGAIN
[  768.548446] svc: server 00000000aad03597 waiting for data (to = 3600000)
[  768.548477] svc: server 000000002b852bb6 waiting for data (to = 3600000)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux