2016-07-31 8:22 GMT+02:00 Mateusz Guzik <mguzik@xxxxxxxxxx>: > On Tue, Jul 26, 2016 at 02:44:48PM +0200, Marcin Ślusarz wrote: >> Hey >> >> I have a simple program that mmaps 8MB of anonymous memory, spawns 16 >> threads, reads /proc/self/smaps in each thread and looks up whether >> mapped address can be found in smaps. From time to time it's not there. >> >> Is this supposed to work reliably? >> >> My guess is that libc functions allocate memory internally using mmap >> and modify process' address space while other thread is iterating over >> vmas. >> >> I see that reading from smaps takes mmap_sem in read mode. I'm guessing >> vm modifications are done under mmap_sem in write mode. >> >> Documentation/filesystem/proc.txt says reading from smaps is "slow but >> very precise" (although in context of RSS). >> > > Address space modification definitely happens as threads get their > stacks mmaped and unmapped. > > If you run your program under strace you will see all threads perform > multiple read()s to get the content as the kernel keeps return short > reads (below 1 page size). In particular, seq_read imposes the limit > artificially. > > Since there are multiple trips to the kernel, locks are dropped and > special measures are needed to maintain consistency of the result. > > In m_start you can see there is a best-effort attempt: it is remembered > what vma was accessed by the previous run. But the vma can be unmapped > before we get here next time. I added printks to m_start and I see that when last_addr is non-zero, find_vma succeeds, even in cases when my test can't find its mapping. So it seems the problem is not that simple. Just for testing I commented out m_next_vma call from m_start and now my test always succeeds (of course at the expense of duplicated entries). Maybe it's just because of changed timing or maybe the problem is deeper... > > So no, reading the file when the content is bigger than 4k is not > guaranteed to give consistent results across reads. > > I don't have a good idea how to fix it, and it is likely not worth > working on. This is not the only place which is unable to return > reliable information for sufficiently large dataset. > > The obvious thing to try out is just storing all the necessary > information and generating the text form on read. Unfortunately even > that data is quite big -- over 100 bytes per vma. This can be shrinked > down significantly with encoding what information is present as opposed > to keeping all records). But with thousands of entries per application > this translates into kilobytes of memory which would have to allocated > just to hold it, sounds like a non-starter to me. Another idea is to change seq_read to flush data to user buffer every time it's full, without "stopping/starting" the seq_file. The logic of seq_read is quite hairy, so I didn't try this. And I'm not sure if page fault in copy_to_user could retake mmap_sem in write mode. Another (crazy) idea is to implement write operation for smaps where buffer contents would pick the right VMA for the next read. Too crazy? :) Marcin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href