On Apr 16, 2014, at 9:15 AM, Phillip Susi <psusi@xxxxxxxxxx> wrote: > On 4/16/2014 10:01 AM, Matthew Wilcox wrote: > > On Tue, Apr 15, 2014 at 10:01:27PM -0400, Phillip Susi wrote: > >> A lot of disk writes, especially when they are small individual > >> files being written by several different processes, are hidden > >> behind the flush thread. Is there no way to properly track the > >> process actually responsible for the IO, even when it is the > >> flush thread that initiates the writeout? > > > > Correct. > > Wow. If I understand things correctly, this also means that if > process A dirties a ton of cache pages, then process B tries to write > a relatively small amount, it can end up blocking in the synchronous > flush path, and so it will appear that process B and flush are doing > all of the writes, and not process A. > > That seems like a severe defect. How can such a defect be tolerated > in this day and age? Why does the io accounting not track how many > pages the process dirties rather than how many it actually initiates > the writeout for? For Lustre (which has the added difficulty that the thread doing the actual writeout is on a remote server) we track the last process that dirtied the inode in the Lustre-private part of the inode itself, and account the writes against that process. By default, we store the process name + UID in a 32-byte string in the inode, but this can also be changed to store a cluster-unique job identifier (jobid). For us, recording the PID isn't very useful, since the PID will be different on each node in the cluster, but since it is just a string it would be possible to store any identifier that fits. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail