Re: SSD + RAID

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




A change has been written to the WAL and fsync()'d, so Pg knows it's hit
disk. It can now safely apply the change to the tables themselves, and
does so, calling fsync() to tell the drive containing the tables to
commit those changes to disk.

The drive lies, returning success for the fsync when it's just cached
the data in volatile memory. Pg carries on, shortly deleting the WAL
archive the changes were recorded in or recycling it and overwriting it
with new change data. The SSD is still merrily buffering data to write
cache, and hasn't got around to writing your particular change yet.
All right. I believe you. In the current Pg implementation, I need to turn of disk cache.

But.... I would like to ask some theoretical questions. It is just an idea from me, and probably I'm wrong.
Here is a scenario:

#1. user wants to change something, resulting in a write_to_disk(data) call
#2. data is written into the WAL and fsync()-ed
#3. at this point the write_to_disk(data) call CAN RETURN, the user can continue his work (the WAL is already written, changes cannot be lost)
#4. Pg can continue writting data onto the disk, and fsync() it.
#5. Then WAL archive data can be deleted.

Now maybe I'm wrong, but between #3 and #5, the data to be written is kept in memory. This is basically a write cache, implemented in OS memory. We could really handle it like a write cache. E.g. everything would remain the same, except that we add some latency. We can wait some time after the last modification of a given block, and then write it out.

Is it possible to do? If so, then can we can turn off write cache for all drives, except the one holding the WAL. And still write speed would remain the same. I don't think that any SSD drive has more than some megabytes of write cache. The same amount of write cache could easily be implemented in OS memory, and then Pg would always know what hit the disk.

Thanks,

  Laci


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux