Re: Shutdown filesystem when a thin pool become full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 20-06-2017 13:05 Carlos Maiolino ha scritto:

AFAIK, it will return ENOSPC with O_DIRECT, yes. With async writes, you won't have any error returned until you issue a fsync/fdatasync, which, per my
understanding, it will return an EIO.


Ok, I was missing that; so ENOSPC will be returned for O_DIRECT only. I'll take a note ;)


The application won't be alerted in any way unless it uses fsync()/fdatasync() with any filesystem being used, even using data=journal in ext4, this won't happen, ext4 gets mounted as read-only because there were 'metadata' errors when writing the file to the journal, but again, it is not a fix for a faulty application, it is not even reliable for shutting down the filesystem the way you are thinking this will. It will only shut down the filesystem depending on the amount of blocks being allocated, even when using data=journal, if the amount of blocks allocated are enough to hold the metadata, but not the data, you will see the same problem as you are seeing with XFS (or ext4 without
data=journal), so, don't rely on it.


This somewhat scares me. From my understanding, a full thin pool will eventually bring XFS to an halt (filesystem shutdown) but, from my testing, this can take a fair amount of time/failed writes. During this period, any writes will be lost without nobody noticing that. In fact, I opened a similar thread on the lvm mailing list discussing this very same problem.


Yes, these options won't help, because they are configuration options
for metadata errors, not data errors.

Please, bear in mind that your question should be: "how can I stop a filesystem
when async writes return I/O errors", because this isn't a XFS issue.

BUt again, there isn't too much you can do here, async writes are supposed to behave this way. And whoever is writing "data" to the device is supposed to care
of their own data.

Imagine for example a situation where you have 2 applications using the same filesystem (quite common right?), then application A and B issues buffered writes, and for some reason, application A data, hits an IO error, for any reason, maybe a too busy storage, a missed scsi command, whatever, anything that
can be retried.

then the filesystem shuts down because of that, which will also affect
application B, even if nothing wrong happened with application B.

One of the goals of multitasking is having applications running at the same time
without affecting each other.

Now, consider that, application B is a well written application, and application
A isn't.

App B cares for its data to be written to disk, while app A doesn't.

In case of a casual error, app B will retry to write its data, while app A
won't.

Should we really shutdown the filesystem here affecting everything on the
system, because application A is not caring for its own data?

Shutting a filesystem down, has basically one purpose: avoid corruption, we basically only shutdown a filesystem when keeping it alive can cause a problem
with everything using it (really really simple explanation here).

Surely this can be improved, but at the end, the application will always need to
check for its own data.

I think the key improvement would be to let the filesystem know about the full thin pool - ie: returing ENOSPC at some convenient time (a wild guess: can we return ENOSPC during delayed block allocation?)


I am not really a device-mapper developer and I don't know much about its code in depth. But, I know it will issue warnings when there isn't more space left, and you can configure a watermark too, to warn the admin when the space used
reaches that watermark.

By now, I believe the best solution is to have a reasonable watermark set on the thin device, and the Admin take the appropriate action whenever this watermark
is achieved.

Yeah, lvmthin *will* return appropriate warnings during pool filling. However, this require active monitoring which, albeit a great idea and "the right thing to do (tm)", it adds complexity and can itself fail. In recent enought (experimental) versions, lvmthin can be instructed to execute specific actions when data allocation is higher than some threshold, which somewhat addresses my concerns at the block layer.

Thank you for your patience and sharing, Carlos.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux