On Mon, Apr 21, 2014 at 3:44 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > I came up with something pretty similar to what you've got. I used some > local variables for the dirty state rather than using the pte, but > otherwise looks pretty similar. It actually boots, runs, and > superficially looks to be doing the right thing. .. except your version doesn't seem to have a chance of even compiling on anything that doesn't use asm-generic/tlb.h and thus HAVE_GENERIC_MMU_GATHER. Now, I don't know that mine works either, but at least I tried. I've love to hear if somebody who has a cross-compile environment set up for the non-generic architectures. I tried 'um', but we have at least arm, ia64, s390 and sh that don't use the generic mmu gather logic. I'm not entirely sure why ARM doesn't do the generic one, but I think s390 is TLB-coherent at the ptep_get_and_clear() point, so there just doing the set_page_dirty() is fine (assuming it compiles - there could be some header file ordering issue). > I fixed free_pages_and_swap_cache() but just making a first pass through > the array and clearing the bits. Yeah. I have to say, I think it's a bit ugly. I am personally starting to think that we could just make release_pages() ignore the low bit of the "struct page" pointer in the array it is passed in, and then free_pages_and_swap_cache() could easily just do the "set_page_dirty()" in the loop it already does. Now, I agree that that is certainly *also* a bit ugly, but it ends up simplifying everything else, so it's a preferable kind of ugly to me. So here's a suggested *incremental* patch (on top of my previous patch that did the interface change) that does that. Does this work for people? It *looks* sane. It compiles for me (tested on x86 that uses generic mmu gather, and on UM that does not). Linus
mm/memory.c | 5 +---- mm/swap.c | 8 +++++++- mm/swap_state.c | 14 ++++++++++++-- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 62fdcd1995f4..174542ab2b90 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -283,11 +283,8 @@ int __tlb_remove_page(struct mmu_gather *tlb, struct page *page, bool dirty) VM_BUG_ON(!tlb->need_flush); - /* FIXME! This needs to be batched too */ - if (dirty) - set_page_dirty(page); batch = tlb->active; - batch->pages[batch->nr++] = page; + batch->pages[batch->nr++] = (void *) (dirty + (unsigned long)page); if (batch->nr == batch->max) { if (!tlb_next_batch(tlb)) return 0; diff --git a/mm/swap.c b/mm/swap.c index 9ce43ba4498b..1a58c58c7f41 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -821,8 +821,14 @@ void release_pages(struct page **pages, int nr, int cold) struct lruvec *lruvec; unsigned long uninitialized_var(flags); + /* + * NOTE! The low bit of the struct page pointer in + * the "pages[]" array is used as a dirty bit, so + * we ignore it + */ for (i = 0; i < nr; i++) { - struct page *page = pages[i]; + unsigned long pageval = (unsigned long)pages[i]; + struct page *page = (void *)(~1ul & pageval); if (unlikely(PageCompound(page))) { if (zone) { diff --git a/mm/swap_state.c b/mm/swap_state.c index e76ace30d436..bb0b2d675a82 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -258,6 +258,11 @@ void free_page_and_swap_cache(struct page *page) /* * Passed an array of pages, drop them all from swapcache and then release * them. They are removed from the LRU and freed if this is their last use. + * + * NOTE! The low bit of the "struct page" pointers passed in is a dirty + * indicator, saying that the page needs to be marked dirty before freeing. + * + * release_pages() itself ignores that bit. */ void free_pages_and_swap_cache(struct page **pages, int nr) { @@ -268,8 +273,13 @@ void free_pages_and_swap_cache(struct page **pages, int nr) int todo = min(nr, PAGEVEC_SIZE); int i; - for (i = 0; i < todo; i++) - free_swap_cache(pagep[i]); + for (i = 0; i < todo; i++) { + unsigned long pageval = (unsigned long) pagep[i]; + struct page *page = (void *)(~1ul & pageval); + if (pageval & 1) + set_page_dirty(page); + free_swap_cache(page); + } release_pages(pagep, todo, 0); pagep += todo; nr -= todo;