Re: [Second Draft] Proposal to mirror Docker images

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello! Based on the responses I've received so far, some new
information I learned about Docker's Manifest v2, and discussions in
#fedora-releng, I would like to propose this second draft of the
proposal to mirror Docker images. Thank you for reading, and please do
voice your thoughts!


Notable change from the first post
==================================

Instead of metalink responses, we will add support to the Linux
docker client for the Docker manifest schema 2's urls feature.


High level view
===============

In summary, the proposal is to work with @runcom[0][1] to write a patch
for the docker client that will give it the capability to use the
Docker Manifest schema 2 urls feature[2] during docker pull operations.
We would also need to add support for Docker images to mirror list and
mirror manager. Additionally, we will need a small tool to pull the
content to be mirrored out of a docker registry and write them to disk
in a format that can be mirrored, as well as some Ansible code to run
the tool when there is new content to be mirrored.                     


Background
==========                                            

The Fedora project wishes to begin distributing new types of content
than it has in the past. One of the types that has been identified as a
goal is the Docker image. Adam Miller has already done the work that
will allow packagers to build Docker images, but we still need a way to
distribute those builds to Fedora's users. Adam Miller's implementation
helpfully drops the builds we want into a Docker registry.

                                            
Proposed Changes
================

Mirror List
-----------

Users will be pointing their docker clients at Mirror List when they
docker pull Fedora's Docker images. In order for this to work, we will
need to make two changes to Mirror List so that it can respond to the
docker client properly. The first change is that Mirror List will need
to respond with a special header and a body of "{}" when the docker
client sends a GET request for /v2/. The second change is that it will
need to return a Docker Manifest schema 2 document containing a list of
mirrors that have the requested blobs when the client makes additional
requests, so that the clients can be retrieve the blobs from a list of
mirrors near their locations, similar to how it does with the dnf
client today.

The docker client typically connects to port 5000. We could run a
second instance of Mirror List on port 5000 if we wanted to isolate it
from the current instance. We can also have the docker client pull from
443 as dnf does if we want to keep the deployment simpler.


Mirror Manager
--------------

We will need to make a few changes to Mirror Manager as well. We will
need to provide an interface to allow mirror admins to opt in/out of
mirroring Docker content. We will also need to modify the curler to
detect whether a given mirror is up to date or not. We will need to
make sure that UMDL is updated when content changes.

There was some discussion about how the Docker content would be
organized on the master mirror. We could either give an all-or-nothing
Docker module for mirrors to choose from, or we could split the Docker
content by arch (or perhaps even primary vs. secondary). I don't have
any preference about which we go with. At first it seemed that we
couldn't do it by arch since the docker client is presented with a list
of manifests by arch (which made it seem that all mirrors would need
all arches), but I *think* the client would then make a second request
to Mirror List for the specific arch it wanted. If I'm correct, this
would mean that this second request would be when Mirror List could
pick a list of mirrors that it knew had the requested content. I'm
happy to take the time verify my guesses here if this mailing list
wants to pursue that option. I'm happy to go with any way of splitting
that is desired, but it has been rightly suggested that we not choose
to create a module per Docker repository since there could be hundreds
or thousands of them.

There was a question about how we would deal with archived data, and I
believe that is still an open question. It sounds like we can plan that
out later.


docker
------

Patrick Uiterwijk suggested that I look into the new schema 2 manifest
that Docker has defined, and when I did so I happened upon a new
feature that was not part of schema 1: a list of URLs for each Blob can
be listed in the Manifest[2]. We had been thinking that we'd use
metalink responses and add support to the docker client, but this
feature is built-in to the Docker Manifest.

With great excitement, I thought it would be prudent to do some testing
with this feature. Sadly, I came to learn that the feature did not
work. I spent an unfortunate amount of time trying to figure out if I
had something wrong with my test setup before diving into the code.
Once I looked at the code, it became clear that the feature only works
for the Windows version of the Docker client! The original pull
request[3] was submitted by Microsoft and only worked for the Windows
client. Later, Antonio Murdaca submitted a pull request[1] to expand
the support for other operating systems, but it was not accepted. In
response to all that, he opened an issue[4] to request the feature be
expanded.

Despite the difficulty in getting this feature accepted upstream, I
think it might be good for us to work with Antonio to try to get this
feature implemented and accepted in upstream docker, rather than going
with the previous metalink proposal. We may be able to work with the
Fedora package maintainer to get Antonio's existing patch carried in
our downstream build until it is upstream, if everyone agrees that
would be a good mid-term solution.


New Tool
--------

The last piece that is needed is a tool that can create the filesystem
tree that we want to synchronize out to the mirrors. The mirrors only
need to carry manifests and blobs, so the tool needs only to pull these
documents out of the registry that Adam Miller has set up and write
them to disk in a particular structure. For optimization, we could use
hardlinks for blobs that are common across the various images (for
example, the Fedora base blob will be the same in all images) to save
rsync time and mirror disk space.

Additionally, we will need a playbook to run this new tool in response
to fedmsgs. We may be able to use Adam Miller's loopabull project to
run such a playbook at the right times.


Signing
-------

Patrick raised the question of signing. Docker supported signing within
the manifest with the schema 1 version, but with schema 2 the embedded
signatures have been removed in favor of the Notary service[5]. This
may be an option for Fedora, or there may be alternatives we could look
into if desired. I'm happy to dig more if we would prefer not to use
Notary for some reason.

The Blobs (Docker calls the Image layers Blobs) themselves are not
signed, but they are referenced by checksum in the Manifest. You can
see an example Manifest at [6]. Thus, theoretically, if the user trusts
the Manifest because it is signed by Fedora, they should be able to
trust the Blob layers that they download so long as they do match the
expected checksum. The Docker client does seem to check the checksum in
my experience.

By the way, the Manifest response from Mirror List will include the
expected checksums for the Blobs that the client is trying to pull.
Thus, in addition to Fedora signing the Manifest itself, we can also
have the client validate the checksums of the Blobs they receive from
the mirror. If we ensure that clients always communicate with Mirror
List over TLS, this will add another layer of validation for us.


Optional mirror registries
--------------------------

A notable drawback to this proposal is that users will not be able to
point their docker client directly at a mirror and docker pull. This is
due to the docker client not supporting a path to the docker v2 API,
and to it expecting to see certain headers in the response. Instead,
users will always have to point their clients at mirror list so that it
can send them the manifest with URLs to the blobs on the mirrors.

However, we could have a "phase 2" plan, where we ask mirrors to
consider running their own full registries for users to pull from. Of
course, this would require opt-in and hands on work by the mirror
admins (similar to how some mirrors support ftp or rsync, but not all
do). Without a registry on the mirror, there isn't a good way that I
know of to allow users to docker pull directly from a specific mirror.
I'm not sure how we could communicate to users about which mirrors have
done this vs. which haven't.


Pros/Cons
=========

In comparison to the previous proposal I sent:

Pros:
* The needed change in the docker client is more likely to be accepted
  upstream, which means non-Fedora OS users will still be able to
  docker pull Fedora images.
* The needed change is smaller than would be necessary for the metalink
  solution.
* There is already a working patch available for us to carry in the
  mid-term, if we wish to do so.

Cons:
* Mirror list will need to dynamically serve Manifests so that it can
  insert the URLs into them, as opposed to serving the metalink
  documents. In my opinion, this is a minor difference.


Conclusion
==========

Thanks for reading, and please respond with any comments or questions
you have about this proposal. I'm happy to clarify any points further,
and if you have any alternative proposals I'd love to hear those as
well.


[0] https://github.com/docker/distribution/issues/1825
[1] https://github.com/docker/docker/pull/23014
[2] https://docs.docker.com/registry/spec/manifest-v2-2/#/image-manifes
t-field-descriptions
[3] https://github.com/docker/docker/pull/22866
[4] https://github.com/docker/distribution/issues/1825
[5] https://docs.docker.com/notary/
[6] https://docs.docker.com/registry/spec/manifest-v2-2/#/example-image
-manifest

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux