RE: [BUG] Possible silent data corruption in filesystems/page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ted,
Thanks for your explanation which convinced me.
Regards,
Mariusz.

-----Original Message-----
From: Theodore Ts'o [mailto:tytso@xxxxxxx] 
Sent: Monday, June 6, 2016 15:36
To: Barczak, Mariusz <mariusz.barczak@xxxxxxxxx>
Cc: Andreas Dilger <adilger@xxxxxxxxx>; Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; Jens Axboe <axboe@xxxxxxxxx>; Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>; linux-mm@xxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Wysoczanski, Michal <michal.wysoczanski@xxxxxxxxx>; Baldyga, Robert <robert.baldyga@xxxxxxxxx>; Roman, Agnieszka <agnieszka.roman@xxxxxxxxx>
Subject: Re: [BUG] Possible silent data corruption in filesystems/page cache

On Mon, Jun 06, 2016 at 07:29:42AM +0000, Barczak, Mariusz wrote:
> Hi, Let me elaborate problem in detail. 
> 
> For buffered IO data are copied into memory pages. For this case, the 
> write IO is not submitted (generally). In the background opportunistic 
> cleaning of dirty pages takes place and IO is generated to the device. 
> An IO error is observed on this path and application is not informed 
> about this. Summarizing flushing of dirty page fails.
> And probably, this page is dropped but in fact it should not be.
> So if above situation happens between application write and sync then 
> no error is reported. In addition after some time, when the 
> application reads the same LBA on which IO error occurred, old data 
> content is fetched.

The application will be informed about it if it asks --- if it calls fsync(), the I/O will be forced and if there is an error it will be returned to the user.  But if the user has not asked, there is no way for the user space to know that there is a problem --- for that matter, it may have exited already by the time we do the buffered writeback, so there may be nobody to inform.

If the error hapepns between the write and sync, then the address space mapping's AS_EIO bit will be set.  (See filemap_check_errors() and do a git grep on AS_EIO.)  So the user will be informed when they call fsync(2).

The problem with simply not dropping the page is that if we do that, the page will never be cleaned, and in the worst case, this can lead to memory exhaustion.  Consider the case where a user is writing huge numbers of pages, (e.g., dd if=/dev/zero
of=/dev/device-that-will-go-away) if the page is never dropped, then the memory will never go away.

In other words, the current behavior was carefully considered, and deliberately chosen as the best design.

The fact that you need to call fsync(2), and then check the error returns of both fsync(2) *and* close(2) if you want to know for sure whether or not there was an I/O error is a known, docmented part of Unix/Linux and has been true for literally decades.  (With Emacs learning and fixing this back in the late-1980's to avoid losing user data if the user goes over quota on their Andrew File System on a BSD
4.3 system, for example.  If you're using some editor that comes with some desktop package or some whizzy IDE, all bets are off, of course.
But if you're using such tools, you probably care about eye candy way more than you care about your data; certainly the authors of such programs seem to have this tendency, anyway.  :-)

Cheers,

						- Ted
--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux