On Wed, Feb 24, 2021 at 01:38:48PM +0100, Jan Kara wrote: > > We allocate a page and try to read it. 29 threads pile up waiting > > for the page lock in filemap_update_page(). The error returned by the > > original I/O is shared between all 29 waiters as well as being returned > > to the requesting thread. The next request for index.html will send > > another I/O, and more waiters will pile up trying to get the page lock, > > but at no time will more than 30 threads be waiting for the I/O to fail. > > Interesting idea. It certainly improves current behavior. I just wonder > whether this isn't a partial solution to a problem and a full solution of > it would have to go in a different direction? I mean it just seems > wrong that each reader (let's assume they just won't overlap) has to retry > the failed IO and wait for the HW to figure out it's not going to work. > Shouldn't we cache the error state with the page? And I understand that we > then also have to deal with the problem how to invalidate the error state > when the block might eventually become readable (for stuff like temporary > IO failures). That would need some signalling from the driver to the page > cache, maybe in a form of some error recovery sequence counter or something > like that. For stuff like iSCSI, multipath, or NBD it could be doable I > believe... That felt like a larger change than I wanted to make. I already have a few big projects on my plate! Also, it's not clear to me that the host can necessarily figure out when a device has fixed an error -- certainly for the three cases you list it can be done. I think we'd want a timer to indicate that it's worth retrying instead of returning the error. Anyway, that seems like a lot of data to cram into a struct page. So I think my proposal is still worth pursuing while waiting for someone to come up with a perfect solution.