Re: trouble with generic/081

Zdenek Kabelac <zkabelac@xxxxxxxxxx> · Thu, 5 Jan 2017 22:12:25 +0100

Dne 5.1.2017 v 20:29 Eric Sandeen napsal(a):
On 1/5/17 1:13 PM, Zdenek Kabelac wrote:
Anyway, at this point I'm not convinced that anything but the filesystem
should be making decisions based on storage error conditions.

So far I'm not convinced  doing nothing is better then trying at least unmount.

Since doing nothing is known to cause  SEVERE filesystem damages,
while I've haven't heard about them when 'unmount' is in the field.

I'm pretty sure that's exactly what started this thread.  ;)

Failing IOs should never cause "severe filesystem damage" - that is what
a journaling filesystem is /for/.  Can you explain further?

well all I know are user reports - which we capable to use 'XFS'
with exhausted  thin-pool while  having 'snapshots' of their volumes.

Since there was no 'umount' and  XFS upon write error just retried
endlessly to write block over and over -  system appeared
to the users nice & usable for quite long time (especially when boxes had 32G 
of RAM or more...)

Maybe writes passed to 'uniquely' owned blocs....

Then after some day,two,free   OOM finally killed.
Users realized thin-pool was out-of-space - added room to VG and pool
and tried  xfs_repair - but whole FS was largely lost.

With umount   user cannot use the machine and is mostly forced to reboot
(which was the main point of umount -  to distract user work)

So  if I hear all voice correctly now -

we now want to let user continue to use such systems and let them figure out 
themself something is wrong when they get occasional write failure
and XFS now avoid destruction by  shutting down on any journal failure.

(A journal may not be replayable on mount if it needs to allocate more
thin blocks on replay and is unable to do so, but hat should just fail
gracefully)

I don't have a test sample myself - just some guides how to get to it.

Use  LV and make some thin snapshots.

Then change various parts of origin - at various moment before pool is 
out-of-space

So you will get lots of different scenarios of missing data.

You will mostly not get into those mentioned trouble if you
have just single thinLV and you exhaust thin-pool while using it.

Games with snapshot are needed.

Regards

Zdenek

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel