Re: AI/ML Model and Pre-Trained Weight Packaging in Fedora

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 1, 2024 at 4:52 PM Tim Flink <tflink@xxxxxxxxxxxxxxxxx> wrote:
>
> On 2/28/24 19:03, Richard Fontana wrote:
> > On Tue, Feb 27, 2024 at 5:58 PM Tim Flink <tflink@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 2/26/24 19:06, Richard Fontana wrote:
> >>
> >> <snip>
> >>
> >>>> 4. Is it acceptable to package code which downloads pre-trained weights from a non-Fedora source upon first use post-installation by a user if that model and its associated weights are
> >>>>       a. For a specific model?
> >
> > What do you mean by "upon first use post-installation"? Does that mean
> > I install the package, and the first time I launch it or whatever, it
> > automatically downloads some set of pre-trained weights, or is this
> > something that would be controlled by the user? The example you gave
> > suggests the latter but I wasn't sure if I was misunderstanding.
>
> Once the package is installed, pre-trained weights would downloaded if and only if code written to use a specific model with pre-trained weights is run. In the cases I'm aware of, code that would cause the weights to be downloaded is not directly part of the packaged libraries and anything that could trigger the downloading of pre-trained weights would have to be written by a user or contained in a separate package. If a specific model with pre-trained weights is not used and not executed by another library/application, the weights will not be downloaded. With the ViT example, the vitb16 weights would be downloaded when that code (not included in the package) is run but the vitb32 weights would not be downloaded unless the example was changed or something else specified a pre-trained ViT model with the vitb32 weights. Similarly, the weights for other models (googlenet, as an example) would not be downloaded unless code that uses that specific model in its pre-trained form is executed post-installation.
>
> The implementations that I'm familiar with will check for downloaded weights as the code is initialized. When done in this way, the download is transparent to the user and unless code using these models/weights is written in such a way that the user a choice, there is not much a user could do to change the download URL or prevent the weights from being downloaded. The only ways I can think of off hand would be to modify the underlying libraries to override the hard-coded URLs or maybe put identically named files in the cache location but that would end up being dependant on model implementation. For the specific libraries I used as examples, I don't know what the local download folder is off the top of my head, nor do I know if they do any verification of downloads so putting files into the cached location may not work if they don't match the intended file contents.
>
> This is just my opinion but I doubt that many people writing code that uses pre-trained models are going to go out of their way to help users avoid downloading pre-trained weights. I know that for code that I've written using pre-trained models, it might be able to execute without the pre-trained weights but the output would just be noise in that situation. I would have a hard time justifying the work needed to make those downloads optional since it would make the code useless for what it was intended to do.
>
> It may also be worth noting that some models with pre-trained weights are almost useless without those weights. For some (mostly older) models, it's feasible to train a model from scratch but for many of the recent models, it's just not feasible. As an example, the weights for Meta's Llama 2 took 3.3 million hours of GPU time to train [1] with a cost into the millions of USD ignoring what it would take to obtain enough data to train a model that large.
>
> Apologies for my verbosity but I hope that I answered your question and the extra bits weren't entirely useless.
>

This sounds like it falls in the same bucket as pip, snapd, gem, and
other similar "package manager" functionality.



-- 
真実はいつも一つ!/ Always, there's only one truth!
--
_______________________________________________
legal mailing list -- legal@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to legal-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/legal@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Index of Archives]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Gnome Users]     [KDE Users]

  Powered by Linux