On Thu 09-03-17 13:15:07, Jan Kara wrote: > On Thu 09-03-17 08:58:45, Vlastimil Babka wrote: > > On 03/09/2017 12:55 AM, akpm@xxxxxxxxxxxxxxxxxxxx wrote: > > > > > > The patch titled > > > Subject: compaction: add def_blk_aops migrate function for memory compaction > > > has been added to the -mm tree. Its filename is > > > compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch > > > > > > This patch should soon appear at > > > http://ozlabs.org/~akpm/mmots/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch > > > and later at > > > http://ozlabs.org/~akpm/mmotm/broken-out/compaction-add-def_blk_aops-migrate-function-for-memory-compaction.patch > > > > > > Before you just go and hit "reply", please: > > > a) Consider who else should be cc'ed > > > b) Prefer to cc a suitable mailing list as well > > > c) Ideally: find the original patch on the mailing list and do a > > > reply-to-all to that, adding suitable additional cc's > > > > > > *** Remember to use Documentation/SubmitChecklist when testing your code *** > > > > > > The -mm tree is included into linux-next and is updated > > > there every 3-4 working days > > > > > > ------------------------------------------------------ > > > From: zhouxianrong <zhouxianrong@xxxxxxxxxx> > > > Subject: compaction: add def_blk_aops migrate function for memory compaction > > > > That's not really a mm/compaction patch, but a block layer/migration patch. I > > don't know internals of those so well, so I added some CC's. > > Thanks! > > > > The reason for doing this is based on two factors. > > > > > > 1. larg file read/write operations with order 0 can fragmentize > > > memory rapidly. > > > > > > 2. when a special filesystem does not supply migratepage callback, > > > kernel would fallback to default function fallback_migrate_page. > > > but fallback_migrate_page could not migrate diry page nicely; > > > specially kcompactd with MIGRATE_SYNC_LIGHT could not migrate > > > diry pages due to this until clear_page_dirty_for_io in some > > > procedure. i think it is not suitable here in this scenario. > > > for dirty pages we should migrate it rather than skip or writeout > > > it in kcomapctd with MIGRATE_SYNC_LIGHT. i think this problem is > > > for all filesystem without migratepage not only for block device fs. > > > > > > So for compaction under large file writing supply migratepage for > > > def_blk_aops. > > > > Is this really safe to do? buffer_migrate_page() has some assumptions listed in > > its comment (and maybe more that are not listed). Do we know it's safe to use it > > for all def_blk_aops users? > > So this is definitely *not* OK for ext4 which is using def_blk_aops. Ext4 > will create temporary buffer heads that alias buffer heads attached to the > block device page when committing a transaction (see fs/jbd2/journal.c: > jbd2_journal_write_metadata_buffer()) and these will point to the original > page. > > Now I'm not saying ext4 / jbd2 cannot be modified so that it somehow makes > it clear to the page migration code it should stay away from the page. > However it demostrates that this patch is not safe as is... Which would suggest to enahnce the documentation of buffer_migrate_page. -- Michal Hocko SUSE Labs