> On Feb 18, 2022, at 1:03 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > On 2/16/22 8:15 AM, Chuck Lever III wrote: >>> On Feb 16, 2022, at 4:56 AM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: >>> >>> On 2/15/22 9:17 AM, Chuck Lever III wrote: >>>>> + */ >>>>> + if (!cour) { >>>>> + set_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags); >>>>> + clp->courtesy_client_expiry = ktime_get_boottime_seconds() + >>>>> + NFSD_COURTESY_CLIENT_TIMEOUT; >>>>> + list_add(&clp->cl_cs_list, &cslist); >>>> Can cl_lru (or some other existing list_head field) >>>> be used instead of cl_cs_list? >>> The cl_lru is used for clients to be expired, the cl_cs_list >>> is used for courtesy clients and they are treated differently. >> Understood, but cl_lru is not a list. It's a field that is >> used to attach an nfs4_client _to_ a list. > > Yes, cl_lru is a list head to hang the entry on a list. > >> >> You should be able to use cl_lru here if the nfs4_client is >> going to be added to either reaplist or cslist but not both. >> > We can not use cl_lru because the courtesy client is still > on nn->client_lru. We do not remove the courtesy client > from the nn->client_lru list. Thanks, that's what I was missing. >>>> I don't see anywhere that removes clp from cslist when >>>> this processing is complete. Seems like you will get >>>> list corruption next time the laundromat looks at >>>> its list of nfs4_clients. >>> We re-initialize the list head before every time the laundromat >>> runs so there is no need to remove the entries once we're done. >> Re-initializing cslist does not change the membership >> of the list that was just constructed, it simply orphans >> the list. Next time the code does a list_add(&clp->cl_cs_list) >> that list will still be there and the nfs4_client will still >> be on it. >> >> The nfs4_client has to be explicitly removed from cslist >> before the function completes. Otherwise, cl_cs_list >> will link those nfs4_client objects to garbage, and the >> next time nfs4_get_client_reaplist() is called, that >> list_for_each_entry() will walk off the end of the previous >> (now phantom) list that the cl_cs_list is still linked to. > > Chuck, I don't understand this. Once the cslist list head is > initialized, its next and prev pointer point to itself. When > the courtesy client is added to the tail of the cslist, the > next and prev pointer of cl_cs_list of the courtesy client > are not used and are overwritten so there should not be any > problem even if it was on an orphaned list. When nfs4_get_client_reaplist() returns, what is cl_cs_list pointing to? At that point IIUC it is linked onto a list of other nfs4_clients (via their cl_cs_list fields) but the anchor (cslist) is now in memory that has been returned to the stack. That's either a problem now, or it's some technical debt that will bite us later. Say the NFS client is having network problems. So it might transition from active to courtesy and back multiple times. list_add() does not simply initialize the .next and .prev pointers, it actually dereferences them. The next time this nfs4_client appears in the laundromat, the list_add will use the existing values of cl_cs_list.next and cl_cs_list.prev to link it into the list that cl_cs_list was in during previous laundromat pass. >> Please ensure that there is a "list_del();" somewhere >> before the function exits and cslist vanishes. You could, >> for example, replace the list_for_each_entry() with a >> >> while(!list_empty(&cslist)) { >> list_del(&clp->cl_cs_list /* or cl_lru */ ); >> ... >> } > > I added the list_del as you suggested but I don't think > it's needed, perhaps I'm missing something. We might even want list_del_init() here to ensure that the state of cl_cs_list is correct the next time it is used in a subsequent call to nfs4_get_client_reaplist. -- Chuck Lever