Re: [PATCH 1/2] pidmap(2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/7/17, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Sep 6, 2017 at 2:04 AM, Alexey Dobriyan <adobriyan@xxxxxxxxx>
> wrote:
>> On 9/6/17, Randy Dunlap <rdunlap@xxxxxxxxxxxxx> wrote:
>>> On 09/05/17 15:53, Andrew Morton wrote:
>>>> On Tue, 5 Sep 2017 22:05:00 +0300 Alexey Dobriyan <adobriyan@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Implement system call for bulk retrieveing of pids in binary form.
>>>>>
>>>>> Using /proc is slower than necessary: 3 syscalls + another 3 for each
>>>>> thread +
>>>>> converting with atoi().
>>>>>
>>>>> /proc may be not mounted especially in containers. Natural extension
>>>>> of
>>>>> hidepid=2 efforts is to not mount /proc at all.
>>>>>
>>>>> It could be used by programs like ps, top or CRIU. Speed increase will
>>>>> become more drastic once combined with bulk retrieval of process
>>>>> statistics.
>>>>
>>>> The patches are performance optimizations, but their changelogs contain
>>>> no performance measurements!
>>>>
>>>> Demonstration of some compelling real-world performance benefits would
>>>> help things along a lot.
>>>>
>>>
>>> also, I expect that the tiny kernel people will want kconfig options for
>>> these syscalls.
>>
>> We'll add it but the question if it is a good idea. Ideally these system
>> calls
>> should be mandatory and /proc optional.
>>
>> $ size kernel/pidmap.o fs/fdmap.o
>>    text    data     bss     dec     hex filename
>>     560       0       0     560     230 kernel/pidmap.o
>>     617       0       0     617     269 fs/fdmap.o
>
> After much discussion at LPC/KS last year, I thought the idea was to
> try to speed up /proc rather than replacing it outright.  The two
> specific ideas I recall were:
>
> 1. Add a syscall like readfileat() that you can use to, in a single
> operation, open, read, and close a /proc file (or other file).  This
> should vastly reduce locking and RCU overhead.
>
> 2. Add a /proc file that has a nice binary format for task info.
> (nl_attr?)

If you do binary data in /proc there is no need for /proc part.
System call can do everything /proc/$PID/bstat (or whatever the name)
does.

> I don't see why pidmap() deserves to be significantly faster than
> getdents().

Just look at profile. XXX is pure slowdown. _Some_ of it can be deleted
or sped up but not everything. All dcache stuff is unavoidable.

XXX	6.35% [k] number
OK*	5.21% [k] proc_readfd_common  (* partially XXX)
OK	4.19% [k] __rcu_read_unlock
XXX	4.05% [.] __GI_____strtoll_l_internal
XXX	3.73% [k] dput
OK	3.64% [k] entry_SYSCALL_64_fastpath
XXX	3.23% [k] proc_fill_cache
XXX	3.10% [k] __d_lookup
XXX	3.09% [k] filldir
XXX	2.74% [k] format_decode
XXX	2.47% [k] link_path_walk
OK*	2.26% [k] _raw_spin_lock
OK	1.73% [k] get_files_struct
XXX	1.64% [k] __d_lookup_rcu
XXX	1.61% [k] do_sys_open
XXX	1.49% [k] pid_revalidate
OK	1.48% [k] __check_object_size
XXX	1.47% [k] do_filp_open
?	1.44% [.] __memmove_sse2
OK	1.40% [k] __rcu_read_lock
XXX	1.33% [.] __readdir64
XXX	1.32% [k] __follow_mount_rcu.isra.6
XXX	1.30% [k] set_root
XXX	1.27% [k] lookup_fast
XXX	1.23% [k] full_name_hash
OK?	1.17% [k] call_rcu
XXX	1.17% [k] sys_open
?	1.02% [k] lockref_put_or_lock
XXX	1.00% [k] pid_delete_dentry
XXX	0.99% [k] iterate_dir
XXX	0.95% [k] inode_permission
XXX	0.94% [k] __slab_alloc.isra.22.constprop.26
OK	0.93% [k] rcu_process_callbacks
XXX	0.93% [.] __getdents64
XXX	0.93% [k] vsnprintf
XXX	0.92% [k] sys_close

> Also, a pidmap() syscall like this inherently bypasses any security
> restrictions implied by the way that /proc is mounted.  It can respect
> hidepid, but hidepid (as a per-namespace concept) is an enormous turd
> that badly needs to be deprecated, and Djalal is working on exactly
> that.

I agree pid_ns->hide_pid is silly idea. It should be a property of
an individual mount but as posted pidmap() respect it (at a cost of
some slowdown).
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux