On Thu, Sep 01, 2011 at 03:49:46PM -0700, Andrew Morton wrote: > On Thu, 1 Sep 2011 14:46:34 +0400 > Cyrill Gorcunov <gorcunov@xxxxxxxxx> wrote: > > > On Wed, Aug 31, 2011 at 03:10:23PM -0700, Andrew Morton wrote: > > > > > > This function would benefit from a code comment. > > > > > > Given that it's pretty generic (indeed there might be open-coded code > > > which already does this elsewhere), perhaps it should be in mm/mmap.c > > > as a kernel-wide utility function. That will add a little overhead to > > > CONFIG_PROC_FS=n builds, which doesn't seem terribly important. > > > > > > > Andrew, here is an attempt to address concerns. Please review. > > Complains are welcome as always! > > Changelog still doesn't explain why /proc/pid/maps is unfixably unsuitable. > Will update. > > > > ... > > > > +static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd) > > +{ > > + unsigned long vm_start, vm_end; > > + struct task_struct *task; > > + const struct cred *cred; > > + struct mm_struct *mm; > > + struct inode *inode; > > + > > + bool exact_vma_exists = false; > > > > Extraneous newline there. ok, thanks ... > > + > > + /* > > + * We need two passes here: > > + * > > + * 1) Collect vmas of mapped files with mmap_sem taken > > + * 2) Release mmap_sem and instantiate entries > > + * > > + * otherwise we get lockdep complained, since filldir() > > + * routine might require mmap_sem taken in might_fault(). > > + */ > > + > > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > > + if (vma->vm_file) > > + nr_files++; > > + } > > + > > + if (nr_files) { > > + mem_size = nr_files * sizeof(*info); > > + if (mem_size <= KMALLOC_MAX_SIZE) > > + info = kmalloc(mem_size, GFP_KERNEL); > > + else > > + info = vmalloc(mem_size); > > This still sucks :( > > A KMALLOC_MAX_SIZE allocation is huuuuuuuuuuuuge! I don't know how big kmalloc_sizes.h said it could be quite a big :( > it is nowadays, but over 100 kbytes. This will frequently send page > reclaim on a berzerk rampage freeing *thousands* of pages (or > relocating pages) until it manages to generate 20 or 30 physically > contiguous free pages. > > Also, vmalloc sucks. The more often we perform vmallocs (with a mix of > differently-sized ones), the more internally fragmented the vmalloc > arena will become. With some workloads we'll run out of > sufficiently-large contiguous free spaces and things will start > failing. This doesn't happen often. Yet. But the more vmalloc() > callsites we add, the more likely and the more frequent it becomes. So > vmalloc is something we should only use as a last resort. > > The most robust implementation here is to allocate a large number of > small objects - one per vma. A list would be a suitable way of > managing them. Actually I though about slab-cache with 64 bytes per object, but it requires more code to push here, that is why I stopped. Still this doesn't justify me. So yes, bad idea. Thanks! > > But do we *really* need to do it in two passes? Avoiding the temporary > storage would involve doing more work under mmap_sem, and a put_filp() > under mmap_sem might be problematic. > In real, for the particular case where we use this /proc/pid/map_files it could be done in one pass without mmap_sem taken (since we use it when task is frozen and we know noone is poking vmas) but the problem is that people might start using it not for c/r but in various different cases when task is pretty running and I wanna give them more-less robust result in ls -l over this directory. Andrew, I'll think some more, probably I'll find a way to drop this two passes requirement. Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html