On Wed, Feb 22, 2023 at 05:04:32PM -0800, Yosry Ahmed wrote: > On Wed, Feb 22, 2023 at 4:53 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > > > > On Wed, Feb 08, 2023 at 12:33:37PM -0800, Yosry Ahmed wrote: > > > On Wed, Feb 8, 2023 at 9:45 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Feb 08, 2023 at 08:01:01AM -0800, Luis Chamberlain wrote: > > > > > On Tue, Feb 07, 2023 at 04:01:51AM +0000, Matthew Wilcox wrote: > > > > > > On Mon, Feb 06, 2023 at 06:52:59PM -0800, Luis Chamberlain wrote: > > > > > > > @@ -1334,11 +1336,15 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > > > > > > > struct shmem_inode_info *info; > > > > > > > struct address_space *mapping = folio->mapping; > > > > > > > struct inode *inode = mapping->host; > > > > > > > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); > > > > > > > swp_entry_t swap; > > > > > > > pgoff_t index; > > > > > > > > > > > > > > BUG_ON(!folio_test_locked(folio)); > > > > > > > > > > > > > > + if (wbc->for_reclaim && unlikely(sbinfo->noswap)) > > > > > > > + return AOP_WRITEPAGE_ACTIVATE; > > > > > > > > > > > > Not sure this is the best way to handle this. We'll still incur the > > > > > > oevrhead of tracking shmem pages on the LRU, only to fail to write them > > > > > > out when the VM thinks we should get rid of them. We'd be better off > > > > > > not putting them on the LRU in the first place. > > > > > > > > > > Ah, makes sense, so in effect then if we do that then on reclaim > > > > > we should be able to even WARN_ON(sbinfo->noswap) assuming we did > > > > > everthing right. > > > > > > > > > > Hrm, we have invalidate_mapping_pages(mapping, 0, -1) but that seems a bit > > > > > too late how about d_mark_dontcache() on shmem_get_inode() instead? > > > > > > > > I was thinking that the two calls to folio_add_lru() in mm/shmem.c > > > > should be conditional on sbinfo->noswap. > > > > > > > > > > Wouldn't this cause the folio to not show up in any lru lists, even > > > the unevictable one, which may be a strange discrepancy? > > > > > > Perhaps we can do something like shmem_lock(), which calls > > > mapping_set_unevictable(), which will make folio_evictable() return > > > true and the LRUs code will take care of the rest? > > > > If shmem_lock() should take care of that is that because writepages() > > should not happen or because we have that info->flags & VM_LOCKED stop > > gap on writepages()? If the earlier then shouldn't we WARN_ON_ONCE() > > if writepages() is called on info->flags & VM_LOCKED? > > > > While I see the value in mapping_set_unevictable() I am not sure I see > > the point in using shmem_lock(). I don't see why we should constrain > > noswap tmpfs option to RLIMIT_MEMLOCK > > > > Please correct me if I'm wrong but the limit seem to be designed for > > files / IPC / unprivileged perf limits. On the contrary, we'd bump the > > count for each new inode. Using shmem_lock() would also complicate the > > inode allocation on shmem as we'd have to unwind on failure from the > > user_shm_lock(). It would also beg the question of when to capture a > > ucount for an inode, should we just share one for the superblock at > > shmem_fill_super() or do we really need to capture it at every single > > inode creation? In theory we could end up with different limits. > > > > So why not just use mapping_set_unevictable() alone for this use case? > > Sorry if I wasn't clear, I did NOT mean that we should use > shmem_lock(), I meant that we do something similar to what > shmem_lock() does and use mapping_set_unevictable() or similar. Ah OK! Sure yeah I reviewed shmem_lock() usage and I don't think it and its rtlimit baggage makes sense here so the only thing to do is just mapping_set_unevictable(). > I think we just need to make sure that if we use > mapping_set_unevictable() does not imply that shmem_lock() was used > (i.e no code assumes that if the shmem mapping is unevictable then > shmem_lock() was used). The *other* stuff that shmem_lock() does is rlimit rlimit related to RLIMIT_MEMLOCK, I can't think off hand why we'd confuse the two use cases at the moment, but I'll give it another good luck with this in mind. I'll test what I have and post a v2 with the feedback received. Thanks, Luis