On Mon, 2024-04-22 at 13:36 +1000, NeilBrown wrote: > On Wed, 17 Apr 2024, Jeff Layton wrote: > > On Wed, 2024-04-17 at 10:43 +1000, NeilBrown wrote: > > > On Wed, 17 Apr 2024, Jeff Layton wrote: > > > > On Wed, 2024-04-17 at 09:05 +1000, NeilBrown wrote: > > > > > On Wed, 17 Apr 2024, Jeff Layton wrote: > > > > > > On Wed, 2024-04-17 at 07:48 +1000, NeilBrown wrote: > > > > > > > On Tue, 16 Apr 2024, Jeff Layton wrote: > > > > > > > > On Tue, 2024-04-16 at 13:16 +1000, NeilBrown wrote: > > > > > > > > > On Tue, 16 Apr 2024, Lorenzo Bianconi wrote: > > > > > > > > > > Introduce write_version netlink command through a "declarative" interface. > > > > > > > > > > This patch introduces a change in behavior since for version-set userspace > > > > > > > > > > is expected to provide a NFS major/minor version list it wants to enable > > > > > > > > > > while all the other ones will be disabled. (procfs write_version > > > > > > > > > > command implements imperative interface where the admin writes +3/-3 to > > > > > > > > > > enable/disable a single version. > > > > > > > > > > > > > > > > > > It seems a little weird to me that the interface always disables all > > > > > > > > > version, but then also allows individual versions to be disabled. > > > > > > > > > > > > > > > > > > Would it be reasonable to simply ignore the "enabled" flag when setting > > > > > > > > > version, and just enable all versions listed?? > > > > > > > > > > > > > > > > > > Or maybe only enable those with the flag, and don't disable those > > > > > > > > > without the flag? > > > > > > > > > > > > > > > > > > Those don't necessarily seem much better - but the current behaviour > > > > > > > > > still seems odd. > > > > > > > > > > > > > > > > > > > > > > > > > I think it makes sense. > > > > > > > > > > > > > > > > We disable all versions, and enable any that have the "enabled" flag set > > > > > > > > in the call from userland. Userland technically needn't send down the > > > > > > > > versions that are disabled in the call, but the current userland program > > > > > > > > does. > > > > > > > > > > > > > > > > I worry about imperative interfaces that might only say -- "enable v4.1, > > > > > > > > but disable v3" and leave the others in their current state. That > > > > > > > > requires that both the kernel and userland keep state about what > > > > > > > > versions are currently enabled and disabled, and it's possible to get > > > > > > > > that wrong. > > > > > > > > > > > > > > I understand and support your aversion for imperative interfaces. > > > > > > > But this interface, as currently implemented, looks somewhat imperative. > > > > > > > The message sent to the kernel could enable some interfaces and then > > > > > > > disable them. I know that isn't the intent, but it is what the code > > > > > > > supports. Hence "weird". > > > > > > > > > > > > > > We could add code to make that sort of thing impossible, but there isn't > > > > > > > much point. Better to make it syntactically impossible. > > > > > > > > > > > > > > Realistically there will never be NFSv4.3 as there is no need - new > > > > > > > features can be added incrementally. > > > > > > > > > > > > > > > > > > > There is no need _so_far_. Who knows what the future will bring? Maybe > > > > > > we'll need v4.3 in order to add some needed feature? > > > > > > > > > > > > > So we could just pass an array of > > > > > > > 5 active flags: 2,3,4.0,4.1,4.2. I suspect you wouldn't like that and > > > > > > > I'm not sure that I do either. A "read" would return the same array > > > > > > > with 3 possible states: unavailable, disabled, enabled. (Maybe the > > > > > > > array could be variable length so 5.0 could be added one day...). > > > > > > > > > > > > > > > > > > > A set of flags is basically what this interface is. They just happen to > > > > > > also be labeled with the major and minorversion. I think that's a good > > > > > > thing. > > > > > > > > > > Good for whom? Labelling allows for labelling inconsistencies. > > > > > > > > > > > > > Now you're just being silly. You wanted a variable length array, but > > > > what if the next slot is not v4.3 but 5.0? A positional interpretation > > > > of a slot in an array is just as just as subject to problems. > > > > > > I don't think it could look like a imperative interface, which the > > > current one does a bit. > > > > > > > > > > > > Maybe the kernel should be able to provide an ordered list of labels, > > > > > and separately an array of which labels are enabled/disabled. > > > > > Userspace could send down a new set of enabled/disabled flags based on > > > > > the agreed set of labels. > > > > > > > > > > > > > How is this better than what's been proposed? One strength of netlink is > > > > that the data is structured. The already proposed interface takes > > > > advantage of that. > > > > > > > > > Here is a question that is a bit of a diversion, but might help us > > > > > understand the context more fully: > > > > > > > > > > Why would anyone disable v4.2 separately from v4.1 ?? > > > > > > > > > > > > > Furthermore, what does it mean to disable v4.1 but leave v4.2 enabled? > > > > > > Indeed! > > > > > > > > > > > > I understand that v2, v3, v4.0, v4.1 are effectively different protocols > > > > > and you might want to ensure that only the approved/tested protocol is > > > > > used. But v4.2 is just a few enhancements on v4.1. Why would you want > > > > > to disable it? > > > > > > > > > > The answer I can think of that there might be bugs (perish the > > > > > thought!!) in some of those features so you might want to avoid using > > > > > them. > > > > > But in that case, it is really the features that you want to suppress, > > > > > not the protocol version. > > > > > > > > > > Maybe I might want to disable delegation - to just write delegation. > > > > > Can I do that? What if I just want to disable server-side copy, but > > > > > keep fallocate and umask support? > > > > > > > > > > i.e. is a list of versions really the granularity that we want to use > > > > > for this interface? > > > > > > > > > > > > > Our current goal is to replace rpc.nfsd with a new program that works > > > > via netlink. An important bit of what rpc.nfsd does is start the NFS > > > > server with the settings in /etc/nfs.conf. Some of those settings are > > > > vers3=, vers4.0=, etc. that govern how /proc/fs/nfsd/versions is set. > > > > We have an immediate need to deal with those settings today, and > > > > probably will for quite some time. > > > > > > > > I'm not opposed to augmenting that with something more granular, but I > > > > don't think we should block this interface and wait on that. We can > > > > extend the interface at some point in the future to take a new feature > > > > bitmask or something, and just declare that (e.g.) declaring vers4.2=n > > > > disables some subset of those features. > > > > > > I agree that we don't want to block "good" while we wait for "perfect". I > > > just want to be sure that what we have is "good". > > > > > > The current /proc interface uses strings like "v3" and "v4.1". > > > The proposed netlink interface uses pairs on u32s - "major" and "minor". > > > So we lose some easy paths for extensibility. Are we comfortable with > > > that? > > > > > > This isn't a big deal - certainly not a blocked. I just don't feel > > > entirely comfortable about the current interface and I'm exploring to > > > see if there might be something better. > > > > > > > > > > Ok, I'm not convinced that anything you've proposed so far is better > > than what we have, but I'm open-minded. > > Thanks. I admit that I don't think anything I've come up with is better > either. > > > > > At this point we have these to-dos: > > > > 1) make the threads interface accept an array of integers rather than a > > singleton. This is needed when sunrpc.pool_mode is not global. > > > > While this I think probably still makes sense, I'm feeling less > enthusiastic about it and probably wouldn't complain if it didn't > happen. > Ok. I think making it an array is fine. For now, it will just error out when you send down more than one value, but I think we can make it work the way you'd expect. Regardlness, cleaning up and unifying the pooled vs. non-pooled server handling seems like a good thing to do. The existing one is a long-standing kludge. > I don't like that fact that we need to configure a number of threads. I > would much rather it grow/shrink dynamically. I've got more patches > that are heading in this direction but they aren't ready yet. > > If we do land these and they prove to be effective, then configuring > per-pool threads would become extremely uninteresting. The demand on > each pool would adjust each pool separately. > > So if use an array here turns out to be problematic for some reason, I > won't complain. > +1 on making the thread pool sizing dynamic. I had started looking at what heuristics we should use, but I haven't gotten very far yet. Some general thoughts: nfsd threads spend most of their time waiting. They are only very rarely cpu-bound, so the max size of the thread pools should mostly scale with the amount of memory in the host. I have a patch that adds a new percpu counter to count how often we go to wake a thread and don't find one, and have to queue it. It clearly hits a lot more when there are fewer running nfsd threads. I don't think that's sufficient to know when to spawn a new thread though. For that, we probably ought to be looking at how long RPCs are sitting and waiting for a thread to pick them up. When that starts growing then we probably need to bring in more threads. Conversely, when a thread is sitting idle for a long time (10s? 60s?), we can probably spin it down. Maybe we can have a new LRU for the threads and have a periodic thread reaper job that checks for those. > > 2) have the interface pass down the scope string in the THREADS_SET > > instead of making userland change the UTS namespace before starting > > threads. This should just mean adding a new optional string attribute to > > the threads interfaces. > > > > ...anything else we're missing that needs doing here? We'd really like > > to see this in -next soon, so we can possibly make v6.10 with this. > > I haven't yet found any obvious gaps. I agree that we can and should > move forward promptly. > Great. I think Lorenzo is planning to send another set soon that should hopefully be reasonable for merge. > > > > > Thanks, > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > > -- Jeff Layton <jlayton@xxxxxxxxxx>