Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > What I'd like to do is to make the GUP code not take a ref on the zero_page > > if, say, FOLL_DONT_PIN_ZEROPAGE is passed in, and then make the bio cleanup > > code always ignore the zero_page. > > I don't think that'll work, as we can't mix different pin vs get types > in a bio. And that's really a good thing. True - but I was thinking of just treating the zero_page specially and never hold a pin or a ref on it. It can be checked by address, e.g.: static inline void bio_release_page(struct bio *bio, struct page *page) { if (page == ZERO_PAGE(0)) return; if (bio_flagged(bio, BIO_PAGE_PINNED)) unpin_user_page(page); else if (bio_flagged(bio, BIO_PAGE_REFFED)) put_page(page); } I'm slightly concerned about the possibility of overflowing the refcount. The problem is that it only takes about 2 million pins to do that (because the zero_page isn't a large folio) - which is within reach of userspace. Create an 8GiB anon mmap and do a bunch of async DIO writes from it. You won't hit ENOMEM because it will stick ~2 million pointers to zero_page into the page tables. > > Something that I noticed is that the dio code seems to wangle to page bits on > > the target pages for a DIO-read, which seems odd, but I'm not sure I fully > > understand the code yet. > > I don't understand this sentence. I was looking at this: static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) { ... if (dio->is_async && dio_op == REQ_OP_READ && dio->should_dirty) bio_set_pages_dirty(bio); ... } but looking again, the lock is taken briefly and the dirty bit is set - which is reasonable. However, should we be doing it before starting the I/O? David