On Sat, Jul 11, 2020 at 5:55 AM Antti <antti.aspinen@xxxxxxxxx> wrote: > For example btrfs has for a long-time had this issue where after several months and being maybe more than 75% of disk space being in use, that when run on SSDs, system can randomly stops reading from the file system, starts thinking and then eventually returns. With each freezing the condition gets worse and eventually the system is eternally stuck and power reset is required. This is not normal and not acceptable. It is unfortunately true that there is a disproportionate burden placed on those having problems no one else is having. And troubleshooting amounts to either poking it with a stick (try this! no, try this! ok, now try this!) or providing sufficiently detailed reproduction steps. And that's tedious too. > The way this happens for example if you open Gnome Shell application launcher several times in a row, then likelyhood that Gnome completely freezes for duration of some seconds up to one minute increases. I don't see this behaviour when using any other file system so I've attributed it to btrfs but I have no way of knowing if it is an actual issue in btrfs other than it stopped when disk gets formatted to anything else. My suggestion for any such freeze/hang is to issue sysrq+t. This might not be easy to do at exactly the time of the hang, because the hang prevents it from being typed fast enough. (a) remote ssh session with sysrq+t typed out and ready to just hit enter (b) netconsole, same concept. Reproduce the problem and then hit enter. Then file a bug with 'journalctl -k -o short-monotonic > bug#_journal.txt' - likely the default dmesg buffer will be too small to hold everything but the journal will have it. That should expose the nature of the hang. If kernel messages show there's a blocked task for 2 minutes, in that case it's better to use sysrq+w. In this case it's not necessary to have extremely detailed reproduction steps, nor wait for someone to have a properly aged system to see what's going on. > And also notice that I wrote "maybe 75% full" because there is no way to know the actual free disk space from just "df -h". There are chapters about this in btrfs FAQ pages that df lies about disk space when using btrfs since evaluating free disk space in btrfs system is a tricky and challenging task with no good solution in sight. This is why e.g. use of "btrfs fs usage /" is required together with other tools to have some idea of available disk space. In the single device case, 'df' is expected to tell the truth. In the multiple device case, it should still tell the truth, but can be confusing because it can't tell the whole truth. And for that, there is 'btrfs filesystem usage /mnt' which provides quite a lot more information, to the degree it can be confusing at first. But the single device case is really straight forward, I just use 'df' and 'du' most of the time unless for some reason I want more information. Recent example of multiple device confusion: https://bugzilla.redhat.com/show_bug.cgi?id=1855174 https://lore.kernel.org/linux-btrfs/0326afd3-9e14-b682-30e7-1c8ae44813ea@xxxxxxxxxxxxxx/T/#t -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx