On Sun, May 31, 2015 at 10:26:01AM -0400, Theodore Ts'o wrote: > On Fri, May 29, 2015 at 11:08:34AM -0700, Jaegeuk Kim wrote: > > Not sure why __GFP_WAIT is defined here. > > Even GFP_NOFS has __GFP_WAIT. > > > > #define GFP_NOFS (__GFP_WAIT | __GFP_IO) > > > > IMO, __GFP_NOFAIL should be used here? > > Otherwise, we seem to handle ENOMEM instead. > > You're right, thanks for pointing that out. I've since changed the > patch so that we just return ENOMEM, and then made sure that we could > correctly handle ENOMEM. > > > On a different front, I've been thinking more about your proposal to > swap alloc_pages() and mempool_alloc(), since you were apparently > seeing OOM killers with a sufficiently aggressive fio workload on a > memory constrained device. > > I took a closer look at how mempool_alloc() works, and in fact it will > try alloc_pages() first; but with GFP flags which causes it to not try > very hard. (__GFP_NOMEMALLOC, __GFP_NORETRY, __GFP_NOWARN, > !__GFP_WAIT, !__GFP_IO). > > If this fails, it tries to use the memory pool, and then assuming the > original request includes __GFP_WAIT, it will retry the alloc_pages() > using the original gfp mask, if this *still* fails, it will wait for a > page to be released to the mempool. As a result, mempool_alloc() will > never fail when called with __GFP_WAIT, which means my original > concerns over mempool_alloc() failing were misplaced. Thank you for pointing out this. Indeed, mempool_alloc never fails if __GFP_WAIT is set. > > This has lead me to the following thoughts/conclusions: > > 1) We can just drop the alloc_page() call from alloc_bounce_page(), > since mempool_alloc() will call alloc_page() first. This also allows > us to remove all of the complexity relating to the > EXT4_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL flag. Agreed. So, once I tested a couple of flags given to the origial alloc_page(), it turns out that my OOM killer issue was occurred due to __GFP_WAIT. When I tested the original allo_page() with GFP_NOWAIT, the issue was gone. > > 2) Since mempool_alloc() will end up calling alloc_pages() --- and in > fact will try to alloc_pages() *twice* --- once with a "don't try > hard", and then a second time the way we would have alwyas called it > --- in practice this won't reduce the number of calls to allocate and > free pages assuming that there is enough memory in the system. Right, the call count of alloc/free_pages was not the actual issue here. > > 3) I don't think increasing num_prealloc_crypto_pages() to BIO_MAX is > the best idea; (a) it sequesters a large amount of memory, and (b) > even 256 pages is guaranteed to be enough. On a Samsung 840 EVO (for > example), you can have up to 32767k in a single request, and up to 31 > requests in its NCQ. So in the worst case, you can have close to a > gigabyte of outstanding writes, and thus with a sufficiently evil set > of fio options, we could end up needing that many bounce pages. And > remember, the mempool is a shared resource; if we have multiple > devices with encrypted file systems, the memory pressure could be even > greater. Agreed. I thought that mempool could be used to reuse the pages in the pool as many as possible. But, I misunderstood the usage of it. > > 4) I need to run the experiment, but what might make sense is to just > call mempool_alloc() with GFP_NOWAIT. This will allow us to use > memory if it is available, but if not, we will simply retry the page > for writeback. The mempool is there simply to provide a bit of extra > spare capacity so that we send out reasonably sized I/O's when under > very high memory pressure. We can make the size of the mempool a > tunable which is dynamically adjustable using a sysfs file, and we can > see what seems to be the best tradeoff between having a fixed memory > allocation and performance under high memory pressure and heavy write > loads. It seems that the number of preallocated pages is not a critical issue to me as of now. But, definitely, it should be tunable through sysfs for various environments. > > What do you think? Does that make sense? If so, you might want to > try a similar experiment; try removing the alloc_page() fallback > altogether, try calling mempool_alloc() with GFP_NOWAIT, and then try > different settings for num_prealloc_crypto_pages. Obviously, as your suggestion, it'd be better to call mempool_alloc(GFP_NOWAIT) as the final approach [1]. I've also seen that this can resolve all my concerns. Regarding to the num_prealloc_crypto_pages, let me take a time to investigate further in terms of the performance under different workloads. Thank you so much, [1] --- >From 16655dbe229610d679fd449a22b6f4565edfb89d Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> Date: Mon, 1 Jun 2015 12:39:30 -0700 Subject: [PATCH] f2fs crypto: remove alloc_page for bounce_page We don't need to call alloc_page() prior to mempool_alloc(), since the mempool_alloc() calls alloc_page() internally. And, if __GFP_WAIT is set, it never fails on page allocation, so let's give GFP_NOWAIT and handle ENOMEM by writepage(). Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> --- fs/f2fs/crypto.c | 33 ++++++++++++--------------------- fs/f2fs/f2fs_crypto.h | 3 +-- 2 files changed, 13 insertions(+), 23 deletions(-) diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c index 2c7819a..be6af18 100644 --- a/fs/f2fs/crypto.c +++ b/fs/f2fs/crypto.c @@ -83,10 +83,7 @@ void f2fs_release_crypto_ctx(struct f2fs_crypto_ctx *ctx) unsigned long flags; if (ctx->flags & F2FS_WRITE_PATH_FL && ctx->w.bounce_page) { - if (ctx->flags & F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL) - __free_page(ctx->w.bounce_page); - else - mempool_free(ctx->w.bounce_page, f2fs_bounce_page_pool); + mempool_free(ctx->w.bounce_page, f2fs_bounce_page_pool); ctx->w.bounce_page = NULL; } ctx->w.control_page = NULL; @@ -408,34 +405,28 @@ struct page *f2fs_encrypt(struct inode *inode, return (struct page *)ctx; /* The encryption operation will require a bounce page. */ - ciphertext_page = alloc_page(GFP_NOFS); + ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOWAIT); if (!ciphertext_page) { - /* - * This is a potential bottleneck, but at least we'll have - * forward progress. - */ - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, - GFP_NOFS); - if (WARN_ON_ONCE(!ciphertext_page)) - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, - GFP_NOFS | __GFP_WAIT); - ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; - } else { - ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; + err = -ENOMEM; + goto err_out; } + ctx->flags |= F2FS_WRITE_PATH_FL; ctx->w.bounce_page = ciphertext_page; ctx->w.control_page = plaintext_page; err = f2fs_page_crypto(ctx, inode, F2FS_ENCRYPT, plaintext_page->index, plaintext_page, ciphertext_page); - if (err) { - f2fs_release_crypto_ctx(ctx); - return ERR_PTR(err); - } + if (err) + goto err_out; + SetPagePrivate(ciphertext_page); set_page_private(ciphertext_page, (unsigned long)ctx); lock_page(ciphertext_page); return ciphertext_page; + +err_out: + f2fs_release_crypto_ctx(ctx); + return ERR_PTR(err); } /** diff --git a/fs/f2fs/f2fs_crypto.h b/fs/f2fs/f2fs_crypto.h index be59d91..c2c1c2b 100644 --- a/fs/f2fs/f2fs_crypto.h +++ b/fs/f2fs/f2fs_crypto.h @@ -84,8 +84,7 @@ struct f2fs_crypt_info { }; #define F2FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x00000001 -#define F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL 0x00000002 -#define F2FS_WRITE_PATH_FL 0x00000004 +#define F2FS_WRITE_PATH_FL 0x00000002 struct f2fs_crypto_ctx { union { -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html