[Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=216322

Theodore Tso (tytso@xxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@xxxxxxx

--- Comment #2 from Theodore Tso (tytso@xxxxxxx) ---
So the problem is that the FITRIM ioctl does not check if a signal is pending,
and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
Apprentive", it will mindlessly send discard requests for any blocks not in use
by the file system until it is done.   Or to put it another way, "Neither rain,
nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its
appointed task."  :-)

The question is how to fix things.   The problem is that the FITRIM ioctl
interface is pretty horrible.   The fstrim_range.len variable is an IN/OUT
field where on the input it is the number of bytes that should be trimmed (from
start to start+len) and when the ioctl returns fstrm_range.len is the number of
bytes that were actually trimmed.   So this is not really amenable for
-ERESTARTSYS.

Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return
code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal
to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and
the rest of the file system trim operation will be aborted.

It might be that the only way we can fix this is to have FITRIM return EAGAIN,
which will stop the fstrim in its tracks.  This is... not great, but typically
fstrim is run out of crontab or a systemd timer once a month, so if the user
tries to suspend right as the fstrim is running, hopefully we'll get lucky next
month.    We can then try teach fstrim to do the right thing, and so this
lossage mode would only happen in the combination of a new kernel and an older
version of util-linux.

I'm not happy with that solution, but the alternative of creating a new FITRIM2
ioctl that has a sane interface means that you need an new kernel and a new
util-linux package, and if you don't, the user will have to deal with a hot
laptop bag and a drained battery.   And not changing FITRIM's behaviour will
have the same potential end result, if the user gets unlucky and tries to
suspend the laptop when there is more than 60 seconds left before FITRIM to
complete.   :-/

The other thing I'll note is that every file system has its own FITRIM
implementation, and I suspect they all have this issue, because the FITRIM
interface is fundamentally flawed.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux