On Fri, Jul 23, 2021 at 12:08:10AM +0900, Kuniyuki Iwashima wrote: > From: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx> > Date: Thu, 22 Jul 2021 23:16:37 +0900 > > From: Martin KaFai Lau <kafai@xxxxxx> > > Date: Thu, 1 Jul 2021 13:05:41 -0700 > > > st->bucket stores the current bucket number. > > > st->offset stores the offset within this bucket that is the sk to be > > > seq_show(). Thus, st->offset only makes sense within the same > > > st->bucket. > > > > > > These two variables are an optimization for the common no-lseek case. > > > When resuming the seq_file iteration (i.e. seq_start()), > > > tcp_seek_last_pos() tries to continue from the st->offset > > > at bucket st->bucket. > > > > > > However, it is possible that the bucket pointed by st->bucket > > > has changed and st->offset may end up skipping the whole st->bucket > > > without finding a sk. In this case, tcp_seek_last_pos() currently > > > continues to satisfy the offset condition in the next (and incorrect) > > > bucket. Instead, regardless of the offset value, the first sk of the > > > next bucket should be returned. Thus, "bucket == st->bucket" check is > > > added to tcp_seek_last_pos(). > > > > > > The chance of hitting this is small and the issue is a decade old, > > > so targeting for the next tree. > > > > Multiple read()s or lseek()+read() can call tcp_seek_last_pos(). > > > > IIUC, the problem happens when the sockets placed before the last shown > > socket in the list are closed between some read()s or lseek() and read(). > > > > I think there is still a case where bucket is valid but offset is invalid: > > > > listening_hash[1] -> sk1 -> sk2 -> sk3 -> nulls > > listening_hash[2] -> sk4 -> sk5 -> nulls > > > > read(/proc/net/tcp) > > end up with sk2 > > > > close(sk1) > > > > listening_hash[1] -> sk2 -> sk3 -> nulls > > listening_hash[2] -> sk4 -> sk5 -> nulls > > > > read(/proc/net/tcp) (resume) > > offset = 2 > > > > listening_get_next() returns sk2 > > > > while (offset--) > > 1st loop listening_get_next() returns sk3 (bucket == st->bucket) > > 2nd loop listening_get_next() returns sk4 (bucket != st->bucket) > > > > show() starts from sk4 > > > > only is sk3 skipped, but should be shown. > > Sorry, this example is wrong. > We can handle this properly by testing bucket != st->bucket. > > In the case below, we cannot check if the offset is valid or not by testing > the bucket. > > listening_hash[1] -> sk1 -> sk2 -> sk3 -> sk4 -> nulls > > read(/proc/net/tcp) > end up with sk2 > > close(sk1) > > listening_hash[1] -> sk2 -> sk3 -> sk4 -> nulls > > read(/proc/net/tcp) (resume) > offset = 2 > > listening_get_first() returns sk2 > > while (offset--) > 1st loop listening_get_next() returns sk3 (bucket == st->bucket) > 2nd loop listening_get_next() returns sk4 (bucket == st->bucket) > > show() starts from sk4 > > only is sk3 skipped, but should be shown. > > > > > > In listening_get_next(), we can check if we passed through sk2, but this > > does not work well if sk2 itself is closed... then there are no way to > > check the offset is valid or not. > > > > Handling this may be too much though, what do you think ? There will be cases that misses sk after releasing the bucket lock (and then things changed). For example, another case could be sk_new is added to the head of the bucket, although it could arguably be treated as a legit miss since "cat /proc/net/tcp" has already been in-progress. The chance of hitting m->buf limit and that bucket gets changed should be slim. If there is use case such that lhash2 (already hashed by port+addr) is still having a large bucket (e.g. many SO_REUSEPORT), it will be a better problem to solve first. imo, remembering sk2 to solve the "cat /proc/net/tcp" alone does not worth it. Thanks for the review!