On Sat, 23 Nov 2024 19:32:27 +0000, Al Viro wrote: > On Sat, Nov 23, 2024 at 06:27:30PM +0000, Al Viro wrote: > > On Sun, Nov 24, 2024 at 02:08:55AM +0800, Jinliang Zheng wrote: > > > According to Documentation/admin-guide/sysctl/fs.rst, fs.nr_open and > > > fs.file-max represent the number of file-handles that can be opened > > > by each process and the entire system, respectively. > > > > > > Therefore, it's necessary to maintain a relative size between them, > > > meaning we should ensure that files_stat.max_files is not less than > > > sysctl_nr_open. > > > > NAK. > > > > You are confusing descriptors (nr_open) and open IO channels (max_files). > > > > We very well _CAN_ have more of the former. For further details, > > RTFM dup(2) or any introductory Unix textbook. > > Short version: there are 3 different notions - > 1) file as a collection of data kept by filesystem. Such things as > contents, ownership, permissions, timestamps belong there. > 2) IO channel used to access one of (1). open(2) creates such; > things like current position in file, whether it's read-only or read-write > open, etc. belong there. It does not belong to a process - after fork(), > child has access to all open channels parent had when it had spawned > a child. If you open a file in parent, read 10 bytes from it, then spawn > a child that reads 10 more bytes and exits, then have parent read another > 5 bytes, the first read by parent will have read bytes 0 to 9, read by > child - bytes 10 to 19 and the second read by parent - bytes 20 to 24. > Position is a property of IO channel; it belongs neither to underlying > file (otherwise another process opening the file and reading from it > would play havoc on your process) nor to process (otherwise reads done > by child would not have affected the parent and the second read from > parent would have gotten bytes 10 to 14). Same goes for access mode - > it belongs to IO channel. I'm sorry that I don't know much about the implementation of UNIX, but specific to the implementation of Linux, struct file is more like a combination of what you said 1) and 2). But I see your point, I missed the dup() case. dup() will occupy the element position of the fdtable->fd array, but will not create a new struct file. Thank you. Jinliang Zheng > 3) file descriptor - a number that has a meaning only in context > of a process and refers to IO channel. That's what system calls use > to identify the IO channel to operate upon; open() picks a descriptor > unused by the calling process, associates the new channel with it and > returns that descriptor (a number) to caller. Multiple descriptors can > refer to the same IO channel; e.g. dup(fd) grabs a new descriptor and > associates it with the same IO channel fd currently refers to. > > IO channels are not directly exposed to userland, but they are > very much present in Unix-style IO API. Note that results of e.g. > int fd1 = open("/etc/issue", 0); > int fd2 = open("/etc/issue", 0); > and > int fd1 = open("/etc/issue", 0); > int fd2 = dup(fd1); > are not identical, even though in both cases fd1 and fd2 are opened > descriptors and reading from them will access the contents of the > /etc/issue; in the former case the positions being accessed by read from > fd1 and fd2 will be independent, in the latter they will be shared. > > It's really quite basic - Unix Programming 101 stuff. It's not > just that POSIX requires that and that any Unix behaves that way, > anything even remotely Unix-like will be like that. > > You won't find the words 'IO channel' in POSIX, but I refuse > to use the term they have chosen instead - 'file description'. Yes, > alongside with 'file descriptor', in the contexts where the distinction > between these notions is quite important. I would rather not say what > I really think of those unsung geniuses, lest CoC gets overexcited... > > Anyway, in casual conversations the expression 'opened file' > usually refers to that thing. Which is somewhat clumsy (sounds like > 'file on filesystem that happens to be opened'), but usually it's > good enough. If you need to be pedantic (e.g. when explaining that > material in aforementioned Unix Programming 101 class), 'IO channel' > works well enough, IME.