On Thu, Aug 03, 2023 at 11:53:11AM +0200, Mateusz Guzik wrote: > On Mon, Jul 24, 2023 at 09:59:15AM -0700, Linus Torvalds wrote: > > I really hate making the traditional unix single-threaded file > > descriptor case take that lock. > > > > Maybe it doesn't matter. Obviously it can't have contention, and your > > patch in that sense is pretty benign. > > > > But locking is just fundamentally expensive in the first place, and it > > annoys me that I never realized that pidfd_getfd() did that thing that > > I knew was broken for /proc. > > > > So I got curious what the impact is and checked on quite a modern CPU > (Sapphire Rapid), so nobody can claim it's some old yeller and atomics > are nowhere near as expensive on modern uarchs. > > I used read1_processes from will-it-scale -- it is doing 4KB reads at a > time over a 1MB file and dodges refing the file, but it does not dodge > the lock with the patch at hand. > > In short, I got a drop of about 5% (~5778843 -> ~5500871 ops/s). > > The kernel was patched with a toggle to force or elide the proposed > mandatory locking, like so: > @@ -1042,8 +1044,10 @@ unsigned long __fdget_pos(unsigned int fd) > struct file *file = (struct file *)(v & ~3); > > if (file && (file->f_mode & FMODE_ATOMIC_POS)) { > - v |= FDPUT_POS_UNLOCK; > - mutex_lock(&file->f_pos_lock); > + if (file_count(file) > 1 || fdget_pos_mutex) { > + v |= FDPUT_POS_UNLOCK; > + mutex_lock(&file->f_pos_lock); > + } > } > return v; > } > > I got rather unstable single-threaded perf, going up and down several % > between runs, I don't know yet what's that about. But toggling back and We've had the whole lkp intel performance testsuite run on this for a long time to see whether there any performance regressions that aren't in the noise. This includes all of will-it-scale. Also with a focus on single-thread performance. So I'm a little skeptical about the reliability of manual performance runs in comparison. If Linus thinks it matters then I think we should only take the f_pos_lock unconditionally on directory fds as this is where it really matters. For read/write we're only doing it because of posix anyway and using pidfd_getfd() is squarely out of posix anyway. So we can document that using pidfd_getfd() on a regular file descriptor outside of a seccomp stopped task is not recommended.