As far as I understand the code, current DAV implementation just ask the OS to rename the ".dav" to the real destination file. Problem is... power failure/disk failure. We can lose data if we have a power failure or disk failure at the wrong time. I am experiencing this in production. I have both cases, files missing because the ".dav" file was not actually renamed on disk before the power failure occurred and files with the wrong content (in particular, files truncated/partially written). Leaking ".dav" files is bad but replacing a good file with a truncated one is evil. Even talking about power failures/sudden USB unplugging, etc. I could suggest two (three) approaches: First approach: 1. Before the rename (from ".dav" to the real filename), do a buffer+OS flush (file descriptor flush + OS fsync). That is, be sure the file is stable on disk before the client gets the ACK. If this operation is considered too costly and affecting benchmarking or whatever, please provide a configuration option and let the admin to decide if she prefers speed or data loss/corruption. This approach would leak ".dav" files and the client could get an ACK for a file just uploaded but that will be missing after a power lost, but at least if the file is there, it is there. No corrupted. That is, if the file is there, it is correct. You don't get partial files or good files replaced with bad files. Second approach: This approach would require a durable database. The current lock database could be reused, but I am not sure about the "durable" (or the entire ACID) currently guaranteed by Apache. 1. When a file is uploaded, write in the database (durable!) the ".dav" path and the final path destination file. 2. When the upload is done, flush+sync the file, as explained in the previous approach. ACK the client. 3. Schedule a database update to remove the record. Do not do it now, do it in a few minutes. The idea here is to be sure that when this record is deleted, nothing could happen tho the data just uploaded. Also, you can group database deletes for better performance. 4. If the Apache HTTP server restarts, scan the database. For each registered ".dav" file, try to delete it. The file could not be there, and that would be OK. 5. Delete processed records in the database. This approach would delete stale ".dav" files left behind if the server crash/power failure" while files are uploaded. Third approach: This one would be bulletproof: Replace in the previous approach: 2. When the upload is done, flush+sync the file, as explained in the previous approach. Update the database and mark that record as "done". ACK the client. ... 4. If the Apache HTTP server restarts, scan the database. For each registered ".dav" file, try to delete it if not marked as "done". If marked as "done", try to rename the ".dav" file to the real filename. The file could not be there, and that would be OK. In a posix filesystem, I am not sure if a "rename"+"fsync" could guarantee stable storage. If that is the case, we could not require the database to be sure that the file is not going to vanish at power failure after the client was ACK'ed. Some of these approaches are costly. If you think the cost is unreasonable, please, provide a configuration toggle and let the admin to choose. Thanks. -- Jesús Cea Avión _/_/ _/_/_/ _/_/_/ jcea@xxxxxxx - https://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ jabber / xmpp:jcea@xxxxxxxxxx _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
Attachment:
signature.asc
Description: OpenPGP digital signature