On Thu, Jul 25, 2019 at 09:44:35PM +0900, Tetsuo Handa wrote: > On 2019/07/25 20:32, Dave Chinner wrote: > > You've had writeback errors. This is somewhat expected behaviour for > > most filesystems when there are write errors - space has been > > allocated, but whatever was to be written into that allocated space > > failed for some reason so it remains in an uninitialised state.... > > This is bad for security perspective. The data I found are e.g. random > source file, /var/log/secure , SQL database server's access log > containing secret values... The results of a read after a write error are undefined. In this case, the test is doing enough IO to evict the cached pages that failed the write so the followup read is pulling them from disk, not from the page cache. i.e. if the test read back immediately after the failure, it would get whatever the write put into the page cache but writeback failed to write to disk.... Perhaps we need the write retry mechanisms we use for metadata in the data IO path, too, so that a single writeback failure doesn't cause this sort of thing to occur. > > For XFS and sequential writes, the on-disk file size is not extended > > on an IO error, hence it should not expose stale data. However, > > your test code is not checking for errors - that's a bug in your > > test code - and that's why writeback errors are resulting in stale > > data exposure. i.e. by ignoring the fsync() error, > > the test continues writing at the next offset and the fsync() for > > that new data write exposes the region of stale data in the > > file where the previous data write failed by extending the on-disk > > EOF past it.... > > > > So in this case stale data exposure is a side effect of not > > handling writeback errors appropriately in the application. > > But blaming users regarding not handling writeback errors is pointless > when thinking from security perspective. I'm not blaming anyone. I'm just explaining how the problem was exposed and pointing out that the responsibility for writing data correctly falls on *both* the filesystem and userspace applications. i.e. when the kernel fails it is userspace's responsibility to clean up the mess to ensure the /application's data/ is correctly on stable storage and not corrupt, missing, stale, etc. So forget security - fsync is a data integrity operation. Not checking that that it failed is extremely poor behaviour from a data integrity point of view. Fix the data integrity problems in the application and the security issues that data integrity failures /may/ expose are very effectively mitigated. > A bad guy might be trying to > steal data from inaccessible files. Which won't happen if user-triggered OOM does not cause writeback failures. i.e. the bug we need to find and fix is whatever is causing writeback to error out under OOM conditions. Writeback is a key component of memory reclaim, and it it fails under memory pressure we have much bigger problems... > > But I have to ask: what is causing the IO to fail? OOM conditions > > should not cause writeback errors - XFS will retry memory > > allocations until they succeed, and the block layer is supposed to > > be resilient against memory shortages, too. Hence I'd be interested > > to know what is actually failing here... > > Yeah. It is strange that this problem occurs when close-to-OOM. > But no failure messages at all (except OOM killer messages and writeback > error messages). Perhaps using things like trace_kmalloc and friends to isolate the location of memory allocation failures would help.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx