Jamie Lokier <jamie@xxxxxxxxxxxxx> writes: > Eric W. Biederman wrote: >> > I don't have anything at hand but multithread/process server accepting >> > on the same socket comes to mind. I don't think it would be a very >> > rare thing. If you confine the scope to character devices or sysfs, >> > it could be quite rare tho. >> >> Yes. I think I can safely exclude sockets, and not bother with >> reference counting them. > > Good idea. As well as many processes calling accept(), it's not > unusual to have two threads or processes for reading and writing > concurrently to TCP sockets, and to have a single UDP socket shared > among threads/processes for sendto. I have been playing with what I can see when I instrument up my code. The first thing that popped up was that we have a lots of reads/writes to files with f_count > 1. Which defeats my micro optimization in fops_read_lock. So in those cases I still have to pay the full cost of an atomic even if I have an exclusive cache line. I have found that for make -j N I tend to get N processes all reading from the same pipe at the same time. Not a smoking gun that my assumption that only one process will be using a file descriptor at a time in performance paths but it certainly shows that things are nowhere near as rare as I thought. The good news is that I have found a much better/cheaper optimization. Instead of per cpu or per file memory, use per task memory. It is always uncontended, and a task appears to never use more than two files simultaneously (stacking?). I have just prototyped that and things are looking very promising. Now I just need to clean everything up and resend my patches. >> The only strong evidence I have that multi-threading on a single file >> descriptor is likely to be common is that we have pread and pwrite >> syscalls. At the same time the number of races we have in struct file >> if it is accessed by multiple threads at the same time, suggests >> that at least for cases where you have an offset it doesn't happen often. > > Notice the preadv and pwritev syscalls added recently? They were > added because QEMU and KVM need them for performance. Those programs > have multiple threads doing I/O to the same file concurrently. It's > like a poor man's AIO, except it's more reliable than real Linux AIO :-) > > Databases probably should use concurrent p{read,write}{,v} if they're > not using direct I/O and AIO. I'm not sure if the well-known > databases do. In the past there have been some poor quality > "emulations" of those syscalls prone to races, on Linux and BSD I believe. > > What are the races you've noticed? Besides the f_pos (which pread variants handle) there is no locking on the file read ahead state, and f_flags only got locking a month or two ago. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html