On 2019-10-01T16:08:43, Mike Perez <miperez@xxxxxxxxxx> wrote: Hi all, yay, survey time! > We conduct yearly user surveys to better under how our users utilize Ceph. > The Ceph Foundation collects the data under the Community Data License > agreement [0]; which helps the community make more of an informed decision > of where our efforts in the development of future releases should go. I also like we're collecting this under the CDLA 1.0 Sharing variant (means we need to avoid any e-mail addresses and org names though, I think; folks probably don't want those globally shared). > A second question that came up was how to layout questions for multiple > cluster deployments. An idea I had was having our general Ceph user survey > [2] separate from the deployment questions [3]. The general questions only > need to be answered once, and the deployment survey can be answered > multiple times to capture the different configurations. I'm looking into a > way to link the answers of both surveys together. I think, perhaps, we can just ask for the aggregates across all clusters. If we make it too detailed, it'll be too complex for respondents to enter and we'll not hear from them. Alternatively, we could perhaps have a table for the per-cluster questions, each row representing one cluster or even pool. (e.g., if they have a 10 PiB cluster hosting both hosting 1 PiB replicated metadata and hot data and 6 PiB 8+4 S3 data, what is the answer?) I guess my point is - unless we're going to that level of detail, we're already aggregating (and losing details) at the per-cluster level and per-org aggregation isn't too bad. I'd rather lower the barrier to respond -> "check all that apply" (would make most of our questions multiple choice). We should instead focus on getting that level of detail from Telemetry in 2020. An intermediate solution could be to ask them to "run this command on each of your clusters and paste the output here". But if we ask them to spend 5-10 minutes answering questions per cluster ... (Alas we can't ask them to just turn on Telemetry, not backported and feature complete to all relevant releases. Perhaps we could build a standalone Telemetry client for pre-Nautilus releases? But not for this cycle.) For the fields where we do ask numbers, instead of endless drop-down lists, I'd rather ask for a, well, number. Why give them a drop-down list for total raw capacity? Why not just ask for the number of clusters? How many nodes? Etc, even for replication size/EC profiles. And we should filter redundant questions - if we ask, say, for both the total number of nodes, and the total number of OSDs, we don't have to ask "how many OSDs per node". Unless we consider this a consistency check. The survey pad has a lot of feedback, some of it contradictory, and not all questions asked consistently. So it's not perfectly clear to me what the current consolidated draft would look like. Perhaps if we do that prior to posting to ceph-users, that'd be helpful. Regards, Lars -- SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg) "Architects should open possibilities and not determine everything." (Ueli Zbinden) _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx