Re: data loss on full file system?

Håkan T Johansson <f96hajo@xxxxxxxxxxx> · Sun, 2 Feb 2020 21:35:31 +0100

On Tue, 28 Jan 2020, Paul Emmerich wrote:

Yes, data that is not synced is not guaranteed to be written to disk,
this is consistent with POSIX semantics.

To get all 0s back during read() of a part that returned successfully from 
write() of data other than 0s does not seem to be consistent with POSIX:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

"
After a write() to a regular file has successfully returned:

    Any successful read() from each byte position in the file that was 
modified by that write shall return the data specified by the write() for 
that position until such byte positions are again modified.
"

Reason for my anxiety would be a typical use case:

Using the standard tool 'cp -r' to copy a set of important files from one 
place into a CEPH filesystem.  When the out-of-space condition is not 
reported - for large amounts of data even - the user might remove them 
from the original location without realising that the data is lost, 
possibly only discovering this months later.

Changing cp (or whatever standard tool is used) to call fsync() before 
each close() is not an option for a user.  Also, doing that would lead to 
terrible performance generally.  Just tested - a recursive copy of a 70k 
files linux source tree went from 15 s to 6 minutes on a local filesystem 
I have at hand.

Best regards,
Håkan

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Jan 27, 2020 at 9:11 PM Håkan T Johansson <f96hajo@xxxxxxxxxxx> wrote:

Hi,

for test purposes, I have set up two 100 GB OSDs, one
taking a data pool and the other metadata pool for cephfs.

Am running 14.2.6-1-gffd69200ad-1 with packages from
https://mirror.croit.io/debian-nautilus

Am then running a program that creates a lot of 1 MiB files by calling
   fopen()
   fwrite()
   fclose()
for each of them.  Error codes are checked.

This works successfully for ~100 GB of data, and then strangely also succeeds
for many more 100 GB of data...  ??

All written files have size 1 MiB with 'ls', and thus should contain the data
written.  However, on inspection, the files written after the first ~100 GiB,
are full of just 0s.  (hexdump -C)

To further test this, I use the standard tool 'cp' to copy a few random-content
files into the full cephfs filessystem.  cp reports no complaints, and after
the copy operations, content is seen with hexdump -C.  However, after forcing
the data out of cache on the client by reading other earlier created files,
hexdump -C show all-0 content for the files copied with 'cp'.  Data that was
there is suddenly gone...?

I am new to ceph.  Is there an option I have missed to avoid this behaviour?
(I could not find one in
https://docs.ceph.com/docs/master/man/8/mount.ceph/ )

Is this behaviour related to
https://docs.ceph.com/docs/mimic/cephfs/full/
?

(That page states 'sometime after a write call has already returned 0'. But if
write returns 0, then no data has been written, so the user program would not
assume any kind of success.)

Best regards,

Håkan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx