Scriptlet to replace a directory can cause infinite loop in update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

recently we were faced with a bug that caused an infinite loop in a container. It was from a package (node.js) retaining a scriptlet to replace a directory as recommended in Fedora Guidelines [0].

There the errno and reason returned [1] is effectively ignored. Only whether the operation succeed is checked. There is no differentiation whether the rename failed because the directory exists which will correctly cause a rename with a numerical index appended to the directory name. Or if the rename failed due to any of the other reasons we might consider as a fail where jumping out of a scriptlet is preferable to even attempting a rename. The next numerical index is found with a trial-and-error that runs in a loop until some index succeeds in the rename, therein lies the potential for infinite loop.

Conclusion, before I get to the long longer "why". I propose we change the form from:

~~~
path = "/path/to/dir"
st = posix.stat(path)
if st and st.type == "directory" then
  status = os.rename(path, path .. ".rpmmoved")
  if not status then
    suffix = 0
    while not status do
      suffix = suffix + 1
      status = os.rename(path .. ".rpmmoved", path .. ".rpmmoved." .. suffix)
    end
    os.rename(path, path .. ".rpmmoved")
  end
end
~~~
to something along the lines of:
~~~
path = "/path/to/dir"
st = posix.stat(path)
reason = nil
if st and st.type == "directory" then
  status, reason, errno  = os.rename(path, path .. ".rpmmoved")
  if not status and errno == 17 then
    suffix = 0
    while not status and errno == 17 do
      suffix = suffix + 1
      status, reason, errno = os.rename(path .. ".rpmmoved", path .. ".rpmmoved." .. suffix)
      if errno ~= 17 then
print("Failed to rename a directory. " .. reason)
        break
      end
    end
    os.rename(path, path .. ".rpmmoved")
  else
    print("Failed to rename a directory. " .. reason)
  end
end
~~~
Not the best I have ever written but it adds checks that it failed for an expected reason where we intend to continue.

A bit of Lua & overlayFS background of why this is happening.

The scriptlet uses `os.rename`, which maps quite closely, from what I can tell from the behavior, to the C function `rename`.

OverlayFS does not like to rename directories from lower layers (which could be a layer created by let's say `RUN mkdir /opt/foo` directive in a Containerfile) to upper layers without some specific mount options, like `metacopy` or `redirect_dir` enabled that enables the rename operation to succeed [2]. If there is not metacopy or redirect_dir, the operation will fail with errno 18 EXDEV.

From what I understand, when one `podman run`s a container, there is a new upper layer for separating what happens in the running container, giving the temporary nature to changes in a running container. Or this "new" layer atop lower layers can also be observed when attempting to build a container. So this is a concern for both the situation of container build and the situation of container running.

In both of those cases, if we attempt to rename a directory with that call it will fail without the mentioned mount options. However thanks to the loop that ignores the exact errno and just blindly tries to find an index until the `os.rename` returns `true` instead of `nil`, we get into a situation where the scriptlet, on package upgrade, goes into an infinite loop.

A simple reproducer for this issue:
~~~
podman run -it --rm quay.io/fedora/fedora:41 bash -c 'dnf install -y lua >/dev/null && lua -e "status, str, code = os.rename(\"/var/games\", \"/var/games.moved\"); print(status, str, code)"; cat /proc/mounts | grep overlay'
~~~

This will attempt the rename close to what we'd expect in a scriptlet, and print the reason and specific errno code. Additionally prints how the overlay is mounted, to inspect options. This operation will succeed if there is either `redirect_dir=on` or the `metacopy=on` present in the mount options on the last line printed from running the container. `/var/games` was chosen due to no particular reason other than the directory being present on lower layers and not strictly necessary for runtime, allowing for experimentation.

Specifically with node.js, it happened simply because there was already an existing container, similar to quay.io/fedora/nodejs-22 and an attempt to update via build was made in style of:
~~~
FROM quay.io/fedora/nodejs-22

USER 0
RUN dnf update -y
USER default
~~~

This was enough to trigger the problem. This is very similar to the reproducer: `os.rename` was called on an already existing directory. However the directory was in a lower layer, making the fail related to the errno 18 EXDEV instead of errno 17 EEXIST which we would regard as a valid errno where we can continue.

Thoughts? Feedback?

Regards,
Jarek Prokop

[0] https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/#_scriptlet_to_replace_a_directory
[1] https://www.lua.org/manual/5.3/manual.html#pdf-os.rename
[2] https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#renaming-directories

--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux