Re: Regarding ODF import and Export support for HistogramChart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Devansh,

This answer is longly and touches general areas, so not suitable as direct comment to your patch. It contains my ideas about realizing ODF import/export.

Hi Michael, I have put you in CC because you can surely say something in regard to ODF and correct me where I'm wrong.

Devansh Varshney schrieb am 14.12.2024 um 13:49:
Thanks, Regina, for such detailed information. This helped me to approach the
import/export for the Histogram Chart.

I have added changes to the ODF Export for the Histogram Chart.
Have to add support for the Import and addition in the RNG file.

https://gerrit.libreoffice.org/c/core/+/177364

Would anyone from the community be able to help me by reviewing the PR?


I assume, that your intension is to implement a histogram chart similar to Excel.

You should specify the new chart type as it would be specified in the standard. That text can go to our Wiki, linked from https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes/List_of_LibreOffice_ODF_Extensions. Writing it down helps you to become clear about functionality and helps in writing the UNO information in the idl-file. Currently the info in the idl file is not detailed enough. You can look at section "19.15 chart:class" in ODF 1.3. [https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html] and in the corresponding information for Excel. Search for histogram on site:microsoft.com and look at its specification in [MS-ODRAWXML]. You need to extend the above mentioned List_of_LibreOffice_ODF_Extensions in any case.


You must extend the schema. Those changes go to https://opengrok.libreoffice.org/xref/core/schema/libreoffice/OpenDocument-v1.4%2Blibreoffice-schema.rng. That is missing in your patch.


The histogram chart does not belong to the charts, that are specified in the standard. Thus it needs a value for the chart:class attribute, that has a loext prefix, e.g. chart:class="loext:histogram". A schema change is not needed for this value, because the data type for the value of this attribute is already 'namespacedToken'.


You have added the 'bin' related information to the <chart:series> element. A <chart:plot-area> element can have several <chart:series> sub-elements. I guess, that you do not want to allow several series in the same histogram. Excel does no allow it. Restricting it in the schema is difficult. (Or do you have an idea, Michael?) I suggest to restrict it in the specification text.


You export the labels for the x-axis as loext:BinRange. I would not export them at all for these reasons:
(A) Excel does not export that information.
(B) The chart has a reference to the area of the data source in the table. The content of this area might come from an external source, e.g. a database engine. When the file is loaded, this data might be refreshed and changes. Thus the bin labels and their frequency values might not fit to the information that are put into the file when saving.


You write the 'bin' related information as attributes of the <chart:series> element. You should consider to use one child element instead, that contains all needed information. That way you can use a dedicated context when loading the file. The schema would get one new child-element for the <chart:series> element and a new section for this new element itself. Michael, what do you think?


Different variations (types) are possible for the histogram chart. You need to specify in the text how the bins are calculated. Especially how 'automatic' works and how overflow and underflow bins influence the bin intervals.


You use two attributes for a underflow bin, one whether such underflow exists and one with its value. I think that can be combined. In implementation and schema it would be optional. The specification text then needs to contain, what is used, when this attribute is missing. Same for overflow. Excel has data type ST_DoubleOrAutomatic.


The 'binCount' and 'binWidth' information are coupled to the chart variants FrequencyType=2 'Number of bins (BinCount)' and FrequencyType=1 'Bin Width'. You write them in all cases. Especially for variant FrequencyType=3 'By Category' the binWidth attribute is meaningless. On the other hand for the variant FrequencyType=1 it is mandatory. Michael, do you have a nice idea for the schema? In implementation they might both be optional with an assert, if they are missing in their corresponding FrequencyType. For UNO it might be sufficient to make them optional and mention the dependencies in the text.


You write the new attributes with XML_NAMESPACE_CHART. It has to be XML_NAMESPACE_LO_EXT.


You can use the histogram chart only in ODF extended. The according case distinctions are missing.


ODF uses for attributes and element names a style with natural language terms separated by hyphen. Please keep this style. So instead of an attribute loext:histogram-binwidth it should be loext:histogram-bin-width. And instead of loext:histo it should be loext:histogram.


On one hand you use a UNO property FrequencyType with datatype short and possible value 0 to 3, on the other hand you assign the property value to aFrequencies, which is a Sequence< double > ???


Excel uses for histograms the element CT_Binning (see 2.24.3.7 in [MS-ODRAWXML]). That has the attribute intervalClosed to determine, whether the start or end side of the bin interval is open. The corresponding attribute is missing.


Kind regards,
Regina














[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux