Il 20-06-2017 13:05 Carlos Maiolino ha scritto:
AFAIK, it will return ENOSPC with O_DIRECT, yes. With async writes, you
won't
have any error returned until you issue a fsync/fdatasync, which, per
my
understanding, it will return an EIO.
Ok, I was missing that; so ENOSPC will be returned for O_DIRECT only.
I'll take a note ;)
The application won't be alerted in any way unless it uses
fsync()/fdatasync()
with any filesystem being used, even using data=journal in ext4, this
won't
happen, ext4 gets mounted as read-only because there were 'metadata'
errors when
writing the file to the journal, but again, it is not a fix for a
faulty
application, it is not even reliable for shutting down the filesystem
the way
you are thinking this will. It will only shut down the filesystem
depending on
the amount of blocks being allocated, even when using data=journal, if
the
amount of blocks allocated are enough to hold the metadata, but not the
data,
you will see the same problem as you are seeing with XFS (or ext4
without
data=journal), so, don't rely on it.
This somewhat scares me. From my understanding, a full thin pool will
eventually bring XFS to an halt (filesystem shutdown) but, from my
testing, this can take a fair amount of time/failed writes. During this
period, any writes will be lost without nobody noticing that. In fact, I
opened a similar thread on the lvm mailing list discussing this very
same problem.
Yes, these options won't help, because they are configuration options
for metadata errors, not data errors.
Please, bear in mind that your question should be: "how can I stop a
filesystem
when async writes return I/O errors", because this isn't a XFS issue.
BUt again, there isn't too much you can do here, async writes are
supposed to
behave this way. And whoever is writing "data" to the device is
supposed to care
of their own data.
Imagine for example a situation where you have 2 applications using the
same
filesystem (quite common right?), then application A and B issues
buffered
writes, and for some reason, application A data, hits an IO error, for
any
reason, maybe a too busy storage, a missed scsi command, whatever,
anything that
can be retried.
then the filesystem shuts down because of that, which will also affect
application B, even if nothing wrong happened with application B.
One of the goals of multitasking is having applications running at the
same time
without affecting each other.
Now, consider that, application B is a well written application, and
application
A isn't.
App B cares for its data to be written to disk, while app A doesn't.
In case of a casual error, app B will retry to write its data, while
app A
won't.
Should we really shutdown the filesystem here affecting everything on
the
system, because application A is not caring for its own data?
Shutting a filesystem down, has basically one purpose: avoid
corruption, we
basically only shutdown a filesystem when keeping it alive can cause a
problem
with everything using it (really really simple explanation here).
Surely this can be improved, but at the end, the application will
always need to
check for its own data.
I think the key improvement would be to let the filesystem know about
the full thin pool - ie: returing ENOSPC at some convenient time (a wild
guess: can we return ENOSPC during delayed block allocation?)
I am not really a device-mapper developer and I don't know much about
its code
in depth. But, I know it will issue warnings when there isn't more
space left,
and you can configure a watermark too, to warn the admin when the space
used
reaches that watermark.
By now, I believe the best solution is to have a reasonable watermark
set on the
thin device, and the Admin take the appropriate action whenever this
watermark
is achieved.
Yeah, lvmthin *will* return appropriate warnings during pool filling.
However, this require active monitoring which, albeit a great idea and
"the right thing to do (tm)", it adds complexity and can itself fail. In
recent enought (experimental) versions, lvmthin can be instructed to
execute specific actions when data allocation is higher than some
threshold, which somewhat addresses my concerns at the block layer.
Thank you for your patience and sharing, Carlos.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html