On Fri 13-09-24 09:39:57, Zhaoyang Huang wrote: > On Thu, Sep 12, 2024 at 6:16 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Thu 12-09-24 17:10:44, Zhaoyang Huang wrote: > > > On Thu, Sep 12, 2024 at 4:41 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > On Wed 04-09-24 14:56:29, Zhaoyang Huang wrote: > > > > > On Wed, Sep 4, 2024 at 10:44 AM Theodore Ts'o <tytso@xxxxxxx> wrote: > > > > > > On Wed, Sep 04, 2024 at 08:49:10AM +0800, Zhaoyang Huang wrote: > > > > > > > > > > > > > > > > After all, using GFP_MOVEABLE memory seems to mean that the buffer > > > > > > > > cache might get thrashed a lot by having a lot of cached disk buffers > > > > > > > > getting ejected from memory to try to make room for some contiguous > > > > > > > > frame buffer memory, which means extra I/O overhead. So what's the > > > > > > > > upside of using GFP_MOVEABLE for the buffer cache? > > > > > > > > > > > > > > To my understanding, NO. using GFP_MOVEABLE memory doesn't introduce > > > > > > > extra IO as they just be migrated to free pages instead of ejected > > > > > > > directly when they are the target memory area. In terms of reclaiming, > > > > > > > all migrate types of page blocks possess the same position. > > > > > > > > > > > > Where is that being done? I don't see any evidence of this kind of > > > > > > migration in fs/buffer.c. > > > > > The journaled pages which carry jh->bh are treated as file pages > > > > > during isolation of a range of PFNs in the callstack below[1]. The bh > > > > > will be migrated via each aops's migrate_folio and performs what you > > > > > described below such as copy the content and reattach the bh to a new > > > > > page. In terms of the journal enabled ext4 partition, the inode is a > > > > > blockdev inode which applies buffer_migrate_folio_norefs as its > > > > > migrate_folio[2]. > > > > > > > > > > [1] > > > > > cma_alloc/alloc_contig_range > > > > > __alloc_contig_migrate_range > > > > > migrate_pages > > > > > migrate_folio_move > > > > > move_to_new_folio > > > > > > > > > > mapping->aops->migrate_folio(buffer_migrate_folio_norefs->__buffer_migrate_folio) > > > > > > > > > > [2] > > > > > static int __buffer_migrate_folio(struct address_space *mapping, > > > > > struct folio *dst, struct folio *src, enum migrate_mode mode, > > > > > bool check_refs) > > > > > { > > > > > ... > > > > > if (check_refs) { > > > > > bool busy; > > > > > bool invalidated = false; > > > > > > > > > > recheck_buffers: > > > > > busy = false; > > > > > spin_lock(&mapping->i_private_lock); > > > > > bh = head; > > > > > do { > > > > > if (atomic_read(&bh->b_count)) { > > > > > //My case failed here as bh is referred by a journal head. > > > > > busy = true; > > > > > break; > > > > > } > > > > > bh = bh->b_this_page; > > > > > } while (bh != head); > > > > > > > > Correct. Currently pages with journal heads attached cannot be migrated > > > > mostly out of pure caution that the generic code isn't sure what's > > > > happening with them. As I wrote in [1] we could make pages with jhs on > > > > checkpoint list only migratable as for them the buffer lock is enough to > > > > stop anybody from touching the bh data. Bhs which are part of a running / > > > > committing transaction are not realistically migratable but then these > > > > states are more shortlived so it shouldn't be a big problem. > > > By observing from our test case, the jh remains there for a long time > > > when journal->j_free is bigger than j_max_transaction_buffers which > > > failed cma_alloc. So you think this is rare or abnormal? > > > > > > [6] j_free & j_max_transaction_buffers > > > crash_arm64_v8.0.4++> struct > > > journal_t.j_free,j_max_transaction_buffers 0xffffff80e70f3000 -x > > > j_free = 0x3f1, > > > j_max_transaction_buffers = 0x100, > > > > So jh can stay attached to the bh for a very long time (basically only > > memory pressure will evict it) and this is what blocks migration. But what > > I meant is that in fact, most of the time we can migrate bh with jh > > attached just fine. There are only relatively short moments (max 5s) where > > a buffer (jh) is part of a running or committing transaction and then we > > cannot really migrate. > Please correct me if I am wrong. According to __buffer_migrate_folio, > the bh can not be migrated as long as it has jh attached which could > remain until the next cp transaction is launched. In my case, the > jbd2' log space is big enough( j_free = 0x3f1 > > j_max_transaction_buffers = 0x100) to escape the launch. Currently yes. My proposal was to make bh migratable even with jh attached (which obviously needs some tweaks to __buffer_migrate_folio()). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR