Re: [PATCH] Increased upper limit on lists of pool names

Eric Blake <eblake@xxxxxxxxxx> · Thu, 15 Mar 2012 08:31:32 -0600

On 03/14/2012 07:42 PM, Jesse J. Cook wrote:
> 256 (8 bits) is insufficient for large scale deployments. 65536 (16 bits) is a
> more appropriate limit and should be sufficient. You are more likely to run
> into other system limitations first, such as the 31998 inode link limit on
> ext3.
> ---
>  src/remote/remote_protocol.x |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x
> index 59774b2..58f0871 100644
> --- a/src/remote/remote_protocol.x
> +++ b/src/remote/remote_protocol.x
> @@ -103,7 +103,7 @@ const REMOTE_INTERFACE_NAME_LIST_MAX = 256;
>  const REMOTE_DEFINED_INTERFACE_NAME_LIST_MAX = 256;
>  
>  /* Upper limit on lists of storage pool names. */
> -const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 256;
> +const REMOTE_STORAGE_POOL_NAME_LIST_MAX = 65536;

While I agree that it would be nice to avoid capping the number of pools
that can actually exist, I have to say NACK to the approach in this patch.

The reason we originally chose a limit of 256 is because each storage
pool name is an arbitrary-length string, and when you pile multiple
strings in a row, you can generate a rather lengthy RPC message for
which the receiver has to pre-allocate memory in order to receive
correctly.  See this comment:

/* Maximum length of a memory peek buffer message.
 * Note applications need to be aware of this limit and issue multiple
 * requests for large amounts of data.
 */
const REMOTE_DOMAIN_MEMORY_PEEK_BUFFER_MAX = 65536;

That is, you cannot send any RPC with more than 64k data, because .
With a cap of 256 pool names, that means each pool name can be (on
average) about 256 bytes before you hit the RPC cap.  Bumping
REMOTE_STORAGE_POOL_NAME_LIST_MAX to 65536 is pointless; each
remote_nonnull_string already occupies several bytes towards the 64k
cap.  I could perhaps see bumping things to 512, which lowers your
average pool name to about 128 bytes before you hit the RPC cap; or even
a cap of 1024 names averaging 64 bytes each.

But the fact remains that you will hit a hard wall on RPC sizing, with
diminishing returns if you change the pool name list maximum, which
means that the _real_ solution to this has to be at a more fundamental
level.  We need a new API, which lets you specify a starting point and
range, in order to query only part of the pool list.  For example, see
how the recent virDomainGetCPUStats has start_cpu and ncpus parameters,
as well as a cap on only 128 cpus per RPC call.  If you have a beefy
machine with 256 cpus, you have to make two separate RPC calls in order
to collect all of the statistics.  In other words, I think we need:

virConnectListStoragePoolsRange(virConnectPtr conn, char **const names,
int start, int maxnames, unsigned int flags);

where flags controls whether we are listing active or defined pools
(virConnectListStoragePools and virConnectListDefinedStoragePools were
added back in the days before we realized how useful a flags argument
would be), where 'maxnames' can be at most 256 (per the RPC limits), but
where 'start' lets you choose which point in the array that you are
starting your enumeration so that multiple calls let you see the entire
array.

It may also be worth considering the addition of _just_ a new ranged RPC
call, but no new public API, by making the public API smart enough to
automatically call the ranged RPC multiple times to build up a single
return to the user.  After all, if we exposed the ranged call to the
user, then each client will have to add the logic to make the multiple
calls; whereas our current limitation is _only_ in the RPC aspect (once
the data is across the RPC and into the target process' memory space,
there's no longer a 64k cap to worry about).

The same need for a new ranged API or RPC applies to any situation where
the current RPC limits are starting to feel too small compared to the
amount of data available to transmit in a single call.

There's also an issue of concurrency - a single API call sees consistent
data, but back-to-back calls over subsets of the array could race with
another client adding or removing a pool in the meantime.  But I'm not
sure what can be done to prevent issues on that front; you already have
TOCTTOU races between grabbing the list of pools and acting on that list
even without a range command.  Maybe we can invent a new API for
batching several commands into one transaction, with multiple RPC calls
but with a guarantee that no RPCs from any other connection will alter
the state between the batch calls; but then we start to get into issues
with what happens if a network connection goes down in the middle of
operating on a batch command.  So it may be something that we declare as
not worth trying to solve.

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list