mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, all.

Then porting patches from mainstream I've found some strange code:

 > commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
 > Author: Christoph Hellwig <hch@xxxxxx>
 > Date:   Tue Nov 1 07:40:09 2016 -0600
 >
 >     block: replace REQ_NOIDLE with REQ_IDLE
 >
 >     Noidle should be the default for writes as seen by all the compounds
 >     definitions in fs.h using it.  In fact only direct I/O really should
 >     be using NODILE, so turn the whole flag around to get the defaults
 >     right, which will make our life much easier especially onces the
 >     WRITE_* defines go away.
 >
 >     This assumes all the existing "raw" users of REQ_SYNC for writes
 >     want noidle behavior, which seems to be spot on from a quick audit.
 >
 >     Signed-off-by: Christoph Hellwig <hch@xxxxxx>
 >     Signed-off-by: Jens Axboe <axboe@xxxxxx>
 >
 > diff --git a/include/linux/fs.h b/include/linux/fs.h
 > index ccedccb28ec8..46a74209917f 100644
 > --- a/include/linux/fs.h
 > +++ b/include/linux/fs.h
 > @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, 
loff_t offset,
 >  #define WRITE                  REQ_OP_WRITE
 >
 >  #define READ_SYNC              0
 > -#define WRITE_SYNC             (REQ_SYNC | REQ_NOIDLE)
 > -#define WRITE_ODIRECT          REQ_SYNC
 > -#define WRITE_FLUSH            (REQ_NOIDLE | REQ_PREFLUSH)
 > -#define WRITE_FUA              (REQ_NOIDLE | REQ_FUA)
 > -#define WRITE_FLUSH_FUA                (REQ_NOIDLE | REQ_PREFLUSH | 
REQ_FUA)
 > +#define WRITE_SYNC             REQ_SYNC
 > +#define WRITE_ODIRECT          (REQ_SYNC | REQ_IDLE)
 > +#define WRITE_FLUSH            REQ_PREFLUSH
 > +#define WRITE_FUA              REQ_FUA
 > +#define WRITE_FLUSH_FUA                (REQ_PREFLUSH | REQ_FUA)
 >
 >  /*
 >   * Attribute flags.  These should be or-ed together to figure out what

The above commit changes the meaning of the REQ_SYNC flag, before the 
patch it was equal to WRITE_ODIRECT and after the patch it is equal to 
WRITE_SYNC. And thus I think it became treated differently (I see only 
one place left in wbt_should_throttle.).

But in __swap_writepage() both before and after the mentioned patch we 
still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:

 > [snorch@snorch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);
 > [snorch@snorch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);

It looks like we've changed the way how we handle swap page writes from 
"odirect" way to "regular" sync write way, these can be wrong. This may 
also affect deprecated cfq io-scheduler on older kernels.

Thanks in advance for any advice on what to do with these, may be I miss 
something.

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux