On Sun, 2020-02-23 at 14:30 +0300, Alexey Dobriyan wrote: > On Sat, Feb 22, 2020 at 12:39:39PM -0800, Joe Perches wrote: > > On Sat, 2020-02-22 at 23:15 +0300, Alexey Dobriyan wrote: > > > Now that "struct proc_ops" exist we can start putting there stuff which > > > could not fly with VFS "struct file_operations"... > > > > > > Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable > > > in the event of disappearing /proc entries which usually happens if module is > > > getting removed. Files like /proc/cpuinfo which never disappear simply do not > > > need such protection. > > > > > > Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such > > > "permanent" files. > > > > > > Enable "permanent" flag for > > > > > > /proc/cpuinfo > > > /proc/kmsg > > > /proc/modules > > > /proc/slabinfo > > > /proc/stat > > > /proc/sysvipc/* > > > /proc/swaps > > > > > > More will come once I figure out foolproof way to prevent out module > > > authors from marking their stuff "permanent" for performance reasons > > > when it is not. > > > > > > This should help with scalability: benchmark is "read /proc/cpuinfo R times > > > by N threads scattered over the system". > > > > Is this an actual expected use-case? > > Yes. > > > Is there some additional unnecessary memory consumption > > in the unscaled systems? > > No, it's the opposite. Less memory usage for everyone and noticeable > performance improvement for contented case. > > > > static ssize_t proc_reg_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) > > > { > > > struct proc_dir_entry *pde = PDE(file_inode(file)); > > > ssize_t rv = -EIO; > > > - if (use_pde(pde)) { > > > - typeof_member(struct proc_ops, proc_read) read; > > > > > > - read = pde->proc_ops->proc_read; > > > - if (read) > > > - rv = read(file, buf, count, ppos); > > > + if (pde_is_permanent(pde)) { > > > + return pde_read(pde, file, buf, count, ppos); > > > + } else if (use_pde(pde)) { > > > + rv = pde_read(pde, file, buf, count, ppos); > > > unuse_pde(pde); > > > > Perhaps all the function call duplication could be minimized > > by using code without direct returns like: > > > > rv = pde_read(pde, file, buf, count, pos); > > if (!pde_is_permanent(pde)) > > unuse_pde(pde); > > > > return rv; > > Function call non-duplication is false goal. Depends, copy/paste errors are common and object code size generally increases. > Surprisingly it makes code bigger: Not so far as I can tell. Are you sure? > $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux > add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10) > Function old new delta > proc_reg_read 108 118 +10 > > and worse too: "rv" is carried on stack through "unuse_pde" call. With gcc 9.2.1 x86-64 defconfig: Changing just proc_reg_read to: static ssize_t proc_reg_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct proc_dir_entry *pde = PDE(file_inode(file)); ssize_t rv; rv = pde_read(pde, file, buf, count, ppos); if (use_pde(pde)) unuse_pde(pde); return rv; } gives: $ size fs/proc/inode.o* text data bss dec hex filename 5448 28 0 5476 1564 fs/proc/inode.o.suggested 5526 28 0 5554 15b2 fs/proc/inode.o.alexey $ objdump -d fs/proc/inode.o.suggested (grep for proc_reg_read) 0000000000000410 <proc_reg_read>: 410: 41 54 push %r12 412: 55 push %rbp 413: 48 8b 47 20 mov 0x20(%rdi),%rax 417: 48 8b 68 d0 mov -0x30(%rax),%rbp 41b: 48 8b 45 30 mov 0x30(%rbp),%rax 41f: 48 8b 40 10 mov 0x10(%rax),%rax 423: 48 85 c0 test %rax,%rax 426: 74 28 je 450 <proc_reg_read+0x40> 428: e8 00 00 00 00 callq 42d <proc_reg_read+0x1d> 42d: 49 89 c4 mov %rax,%r12 430: 8b 45 00 mov 0x0(%rbp),%eax 433: 85 c0 test %eax,%eax 435: 78 12 js 449 <proc_reg_read+0x39> 437: 8d 50 01 lea 0x1(%rax),%edx 43a: f0 0f b1 55 00 lock cmpxchg %edx,0x0(%rbp) 43f: 75 f2 jne 433 <proc_reg_read+0x23> 441: 48 89 ef mov %rbp,%rdi 444: e8 e7 fc ff ff callq 130 <unuse_pde> 449: 4c 89 e0 mov %r12,%rax 44c: 5d pop %rbp 44d: 41 5c pop %r12 44f: c3 retq 450: 49 c7 c4 fb ff ff ff mov $0xfffffffffffffffb,%r12 457: eb d7 jmp 430 <proc_reg_read+0x20> 459: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) vs $ objdump -d fs/proc/inode.o.alexey (grep for proc_reg_read) 00000000000006e0 <proc_reg_read>: 6e0: 41 54 push %r12 6e2: 55 push %rbp 6e3: 48 8b 47 20 mov 0x20(%rdi),%rax 6e7: 48 8b 68 d0 mov -0x30(%rax),%rbp 6eb: f6 85 aa 00 00 00 01 testb $0x1,0xaa(%rbp) 6f2: 74 15 je 709 <proc_reg_read+0x29> 6f4: 48 8b 45 30 mov 0x30(%rbp),%rax 6f8: 48 8b 40 10 mov 0x10(%rax),%rax 6fc: 48 85 c0 test %rax,%rax 6ff: 74 3f je 740 <proc_reg_read+0x60> 701: 5d pop %rbp 702: 41 5c pop %r12 704: e9 00 00 00 00 jmpq 709 <proc_reg_read+0x29> 709: 8b 45 00 mov 0x0(%rbp),%eax 70c: 85 c0 test %eax,%eax 70e: 78 30 js 740 <proc_reg_read+0x60> 710: 44 8d 40 01 lea 0x1(%rax),%r8d 714: f0 44 0f b1 45 00 lock cmpxchg %r8d,0x0(%rbp) 71a: 75 f0 jne 70c <proc_reg_read+0x2c> 71c: 48 8b 45 30 mov 0x30(%rbp),%rax 720: 48 8b 40 10 mov 0x10(%rax),%rax 724: 48 85 c0 test %rax,%rax 727: 74 20 je 749 <proc_reg_read+0x69> 729: e8 00 00 00 00 callq 72e <proc_reg_read+0x4e> 72e: 49 89 c4 mov %rax,%r12 731: 48 89 ef mov %rbp,%rdi 734: e8 f7 f9 ff ff callq 130 <unuse_pde> 739: 4c 89 e0 mov %r12,%rax 73c: 5d pop %rbp 73d: 41 5c pop %r12 73f: c3 retq 740: 49 c7 c4 fb ff ff ff mov $0xfffffffffffffffb,%r12 747: eb f0 jmp 739 <proc_reg_read+0x59> 749: 49 c7 c4 fb ff ff ff mov $0xfffffffffffffffb,%r12 750: eb df jmp 731 <proc_reg_read+0x51> 752: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 759: 00 00 00 00 75d: 0f 1f 00 nopl (%rax)