On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote: > On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote: > > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and > > fs.file-max represent the number of file-handles that can be opened > > by each process and the entire system, respectively. > > > > Therefore, it's necessary to maintain a relative size between them, > > meaning we should ensure that files_stat.max_files is not less than > > sysctl_nr_open. > > NAK. > > You are confusing descriptors (nr_open) and open IO channels (max_files). > > We very well _CAN_ have more of the former. For further details, > RTFM dup(2) or any introductory Unix textbook. Short version: there are 3 different notions - 1) file as a collection of data kept by filesystem. Such things as contents, ownership, permissions, timestamps belong there. 2) IO channel used to access one of (1). open(2) creates such; things like current position in file, whether it's read-only or read-write open, etc. belong there. It does not belong to a process - after fork(), child has access to all open channels parent had when it had spawned a child. If you open a file in parent, read 10 bytes from it, then spawn a child that reads 10 more bytes and exits, then have parent read another 5 bytes, the first read by parent will have read bytes 0 to 9, read by child - bytes 10 to 19 and the second read by parent - bytes 20 to 24. Position is a property of IO channel; it belongs neither to underlying file (otherwise another process opening the file and reading from it would play havoc on your process) nor to process (otherwise reads done by child would not have affected the parent and the second read from parent would have gotten bytes 10 to 14). Same goes for access mode - it belongs to IO channel. 3) file descriptor - a number that has a meaning only in context of a process and refers to IO channel. That's what system calls use to identify the IO channel to operate upon; open() picks a descriptor unused by the calling process, associates the new channel with it and returns that descriptor (a number) to caller. Multiple descriptors can refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and associates it with the same IO channel fd currently refers to. IO channels are not directly exposed to userland, but they are very much present in Unix-style IO API. Note that results of e.g. int fd1 = open("/etc/issue", 0); int fd2 = open("/etc/issue", 0); and int fd1 = open("/etc/issue", 0); int fd2 = dup(fd1); are not identical, even though in both cases fd1 and fd2 are opened descriptors and reading from them will access the contents of the /etc/issue; in the former case the positions being accessed by read from fd1 and fd2 will be independent, in the latter they will be shared. It's really quite basic - Unix Programming 101 stuff. It's not just that POSIX requires that and that any Unix behaves that way, anything even remotely Unix-like will be like that. You won't find the words 'IO channel' in POSIX, but I refuse to use the term they have chosen instead - 'file description'. Yes, alongside with 'file descriptor', in the contexts where the distinction between these notions is quite important. I would rather not say what I really think of those unsung geniuses, lest CoC gets overexcited... Anyway, in casual conversations the expression 'opened file' usually refers to that thing. Which is somewhat clumsy (sounds like 'file on filesystem that happens to be opened'), but usually it's good enough. If you need to be pedantic (e.g. when explaining that material in aforementioned Unix Programming 101 class), 'IO channel' works well enough, IME.