On Sun, Jan 12, 2014 at 7:49 AM, Tomasz Figa <tomasz.figa@xxxxxxxxx> wrote: > On 11.01.2014 06:25, Thomas Abraham wrote: >> >> On Fri, Jan 10, 2014 at 7:48 PM, Lukasz Majewski <l.majewski@xxxxxxxxxxx> >> wrote: >>> >>> Hi Thomas, >>> >>>> Hi Lukasz, >>>> >>>> On Fri, Jan 10, 2014 at 5:34 PM, Lukasz Majewski >>>> <l.majewski@xxxxxxxxxxx> wrote: >>>>> >>>>> Hi Thomas, >>>>> >>>>>> Add a new clock provider for ARM clock domain. This clock provider >>>>>> is composed of multiple components which include mux_core, >>>>>> div_core, div_core2, div_corem0, div_corem1, div_periph, div_atb, >>>>>> div_pclk_dbg, div_copy and div_hpm. This composition of mutiple >>>>>> components into a single clock provider helps with faster >>>>>> completion of CPU clock speed switching during DVFS operations. >>>>>> >>>>>> Signed-off-by: Thomas Abraham <thomas.ab@xxxxxxxxxxx> >>>>>> --- >>>>>> drivers/clk/samsung/clk-exynos4.c | 96 >>>>>> ++++++++++++++++++++++++++++++++++++- 1 files changed, 95 >>>>>> insertions(+), 1 deletions(-) >>>>>> >>>>>> diff --git a/drivers/clk/samsung/clk-exynos4.c >>>>>> b/drivers/clk/samsung/clk-exynos4.c index d967571..4bf2f93 100644 >>>>>> --- a/drivers/clk/samsung/clk-exynos4.c >>>>>> +++ b/drivers/clk/samsung/clk-exynos4.c >>>>>> @@ -108,8 +108,11 @@ >>>>>> #define APLL_CON0 0x14100 >>>>>> #define E4210_MPLL_CON0 0x14108 >>>>>> #define SRC_CPU 0x14200 >>>>>> +#define STAT_CPU 0x14400 >>>>>> #define DIV_CPU0 0x14500 >>>>>> #define DIV_CPU1 0x14504 >>>>>> +#define DIV_STAT_CPU0 0x14600 >>>>>> +#define DIV_STAT_CPU1 0x14604 >>>>>> #define GATE_SCLK_CPU 0x14800 >>>>>> #define GATE_IP_CPU 0x14900 >>>>>> #define E4X12_DIV_ISP0 0x18300 >>>>>> @@ -289,7 +292,7 @@ static unsigned long exynos4_clk_regs[] >>>>>> __initdata = { }; >>>>>> >>>>>> /* list of all parent clock list */ >>>>>> -PNAME(mout_apll_p) = { "fin_pll", "fout_apll", }; >>>>>> +PNAME(mout_apll_p) = { "fin_pll", "fout_apll1", }; >>>>>> PNAME(mout_mpll_p) = { "fin_pll", "fout_mpll", }; >>>>>> PNAME(mout_epll_p) = { "fin_pll", "fout_epll", }; >>>>>> PNAME(mout_vpllsrc_p) = { "fin_pll", "sclk_hdmi24m", }; >>>>>> @@ -306,6 +309,7 @@ PNAME(mout_onenand_p) = {"aclk133", >>>>>> "aclk160", }; PNAME(mout_onenand1_p) = {"mout_onenand", >>>>>> "sclk_vpll", }; >>>>>> >>>>>> /* Exynos 4210-specific parent groups */ >>>>>> +PNAME(armclk_p) = { "fout_apll", }; >>>>> >>>>> >>>>> Here you only give no parent clock, but at >>>>> samsung_coreclk_register() it is expected to provide list of >>>>> parents. >>>> >>>> >>>> Here only one parent is listed, but the core clock type does not limit >>>> the number of parents that can be specified. A specific implementation >>>> can define and use multiple parents. >>> >>> >>> I only pointed out that the definition of the: >>> >>> samsung_coreclk_register("armclk", armclk_p, >>> ARRAY_SIZE(armclk_p), "fout_apll", >>> &exynos4210_armclk_clk_ops, arm_clk, >>> &exyno4210_armclk_table); >>> >>> Could only use parent, especially when you plan to change mux clock >>> (apll vs. mpll) by writing directly to registers (which I think is bad). >> >> >> This definition is not limited to be used only on Exynos4210. This is >> a generic core clock registration helper function intended to be >> reusable across multiple Samsung SoCs. >> > > I think Lukasz meant that you should rather use parent list to pass any > input clocks of the core clock block, instead of hardcoded clock look-up > inside clock ops. Ok. > > >>> >>>> >>>>> >>>>>> PNAME(sclk_vpll_p4210) = { "mout_vpllsrc", "fout_vpll", }; >>>>>> PNAME(mout_core_p4210) = { "mout_apll", "sclk_mpll", }; >>>>>> PNAME(sclk_ampll_p4210) = { "sclk_mpll", "sclk_apll", }; >>>>>> @@ -1089,6 +1093,92 @@ static struct samsung_pll_clock >>>>>> exynos4x12_plls[nr_plls] __initdata = { VPLL_LOCK, VPLL_CON0, >>>>>> NULL), }; >>>>>> >>>>>> +#define EXYNOS4210_DIV_CPU0(apll, pclk_dbg, atb, periph, corem1, >>>>>> corem0) \ >>>>>> + ((apll << 24) | (pclk_dbg << 20) | (atb << 16) | \ >>>>>> + (periph << 12) | (corem1 << 8) | (corem0 << 4)) >>>>>> +#define EXYNOS4210_DIV_CPU1(hpm, copy) \ >>>>>> + ((hpm << 4) | (copy << 0)) >>>>>> +static const unsigned long exynos4210_armclk_data[][2] = { >>>>>> + { EXYNOS4210_DIV_CPU0(7, 1, 4, 3, 7, 3), >>>>>> EXYNOS4210_DIV_CPU1(0, 5), }, >>>>>> + { EXYNOS4210_DIV_CPU0(7, 1, 4, 3, 7, 3), >>>>>> EXYNOS4210_DIV_CPU1(0, 4), }, >>>>>> + { EXYNOS4210_DIV_CPU0(7, 1, 3, 3, 7, 3), >>>>>> EXYNOS4210_DIV_CPU1(0, 3), }, >>>>>> + { EXYNOS4210_DIV_CPU0(7, 1, 3, 3, 7, 3), >>>>>> EXYNOS4210_DIV_CPU1(0, 3), }, >>>>>> + { EXYNOS4210_DIV_CPU0(7, 1, 3, 3, 7, 3), >>>>>> EXYNOS4210_DIV_CPU1(0, 3), }, >>>>>> + { EXYNOS4210_DIV_CPU0(0, 1, 3, 1, 3, 1), >>>>>> EXYNOS4210_DIV_CPU1(0, 3), }, +}; >>>>>> + >>>>> >>>>> >>>>> What do you think about adding those parameters (like CPU dividers) >>>>> as an attribute to /cpus/cpu@0 node? >>>> >>>> >>>> Not in CPU node but may be in clock controller node since these values >>>> are actually used by the clock controller. >>> >>> >>> /cpus/cpu@0 seems like a good place for them (since those DIVs are >>> related to core) >> >> >> DIVs belong to the clock controller, not the CPU, and are addressed >> from the clock controller address space. > > > I also think that the contents of cpu@0 node should be limited only to CPU > specific data. Personally I don't even like the idea of having operating > points there - I would rather see frequency limits there, i.e. maximum > frequency allowed at given voltage; specific frequency values could be > inferred from available APLL/core clock configurations. > > >> >>> . >>> However, we can choose any better DT node to add it. >>> >>>> But since these values are >>>> Exynos4210 specific and not generic enough to be reused across >>>> multiple Exynos SoCs, there is little benefit in defining bindings and >>>> parsing code for these values. It would be simpler enough to just >>>> embed them in the code. >>> >>> >>> It would be less to code, but isn't it the same ugly code, which we >>> have now at exynos4xxx-cpufreq.c? >>> >>> With those values parsed from DT we can write generic code for the >>> "arm_clk" clock. One clock implementation for cpufreq-cpu0.c (and maybe >>> for arm_big_little.c) reused by Exynos4/5. >> >> >> As replied in the 2/3 patch, if these values would change across >> multiple platforms based on the same SoC, it makes sense to put them >> into DT. Any data that is purely SoC specific and not going to change >> across platforms can be embedded into the code itself. > > > Well, they do change. If not on per board basis, then at least with SoC > revisions. > > Anyway, the biggest problem is that the same data needs to be duplicated > (well, triplicated) for each driver that needs them. If you can find a > reasonable way to avoid redundancy, without having DT involved, I will > probably be fine with it. I have listed down the need for the three tables in the reply to your comments in 2/6 patch. > > >>>> >>>> These are frequencies supported by the core clock. But the cpufreq >>>> table can use all or subset of the supported frequencies. >>> >>> >>> I see your point, but I find this distinction here a bit superfluous. >> >> >> Replied to similar comment in 2/3 patch. >> >>> >>>> The core >>>> clock should be usable with the clock api independently and not tied >>>> to be used only by cpufreq driver. >>> >>> >>> But then still for Exynos it will use PLL's M P S coefficients which >>> only corresponds to values defined at cpufreq's frequency table. >> >> >> The PLL clocks are now separated out as PLL clock types in >> samsung/clk-pll.c file. The P,M,S values of the PLLs are now handled >> over there. So now the PLL is independent of the cpufreq driver and >> can support any number of clock speeds not limited to the ones needed >> by cpufreq. > > > Don't forget about opposite side of this relation. The APLL needs to support > at least those that are required by cpufreq. The core clock can round the rate requested by the cpufreq driver and provide an output with least delta (but always less then the frequency requested by cpufreq). > > >> >>> >>> The set of frequencies for PLL, cpufreq and this clock is the same, so >>> I think that we shall not define them in three different places. >>> >>> Could you give any example supporting your point of view? >> >> >> A PLL is a hardware component that can be reused in multiple SoCs. A >> PLL can generate and support 'x' number of clock speeds but a SoC >> using that PLL might use only 'y' (a subset of 'x') number of clock >> speeds of the PLL due to certain hardware limitations. Then there are >> platforms using a SoC which might use only 'z' (a subset of 'y') clock >> speeds due to the power/performance requirements of the platform. >> > > As observed with our platforms, 'x' is usually SoC or SoC-revision specific > and equal to 'y' and 'z' on respective platforms. That is not a hard rule. When used in products, 'z' is influenced by the power/performance needs of the product and 'y' tends to be superset of 'z'. 'x' is SoC specific and 'y' can a derived set of 'x'. > > >>> >>>> >>>>> >>>>>> + >>>>>> +static const struct samsung_core_clock_freq_table >>>>>> exyno4210_armclk_table = { >>>>>> + .freq = exynos4210_armclk_freqs, >>>>>> + .freq_count = ARRAY_SIZE(exynos4210_armclk_freqs), >>>>>> + .data = (void *)exynos4210_armclk_data, >>>>>> +}; >>>>>> + >>>>>> +static int exynos4210_armclk_set_rate(struct clk_hw *hw, unsigned >>>>>> long drate, >>>>>> + unsigned long prate) >>>>>> +{ >>>>>> + struct samsung_core_clock *armclk; >>>>>> + const struct samsung_core_clock_freq_table *freq_tbl; >>>>>> + unsigned long *freq_data; >>>>>> + unsigned long mux_reg, idx; >>>>>> + void __iomem *base; >>>>>> + >>>>>> + if (drate == prate) >>>>>> + return 0; >>>>>> + >>>>>> + armclk = container_of(hw, struct samsung_core_clock, hw); >>>>>> + freq_tbl = armclk->freq_table; >>>>>> + freq_data = (unsigned long *)freq_tbl->data; >>>>>> + base = armclk->ctrl_base; >>>>>> + >>>>>> + for (idx = 0; idx < freq_tbl->freq_count; idx++, freq_data >>>>>> += 2) >>>>>> + if ((freq_tbl->freq[idx] * 1000) == drate) >>>>>> + break; >>>>>> + >>>>>> + if (!armclk->fout_pll) >>>>>> + armclk->fout_pll = __clk_lookup("fout_apll");\ >>>>> >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[*] >>>>> >>>>> I'm a bit confused here for two reasons. Please correct me if I'm >>>>> wrong. >>>>> >>>>> 1. You go into this ->set_rate() because of calling clk_set_rate at >>>>> "arm_clk" clock (numbered as 12 at clk-exynos4.c) at cpufreq-cpu0.c >>>>> >>>>> In a Exynos4210 we have: >>>>> XXTI-> APLL -> fout_apll -> mout_apll -> mout_core -> div_core >>>>> -> div_core2 -> arm_clk >>>>> >>>>> In the code you call directly the fout_apll which changes >>>>> frequency. Then the change shall be propagated to all registered >>>>> clocks. >>>>> I think, that DIV and DIV1 shall be reduced before PLL change [*], >>>>> to reflect the changes at CCF. >>>> >>>> >>>> The core clock implementation encapsulates multiple clock blocks (such >>>> as dividers and muxes) which are in between the output of the APLL and >>>> the point that actually is the cpu domain clock output. >>> >>> >>> No problem with that. I mostly agree... >>> >>>> When a clock >>>> frequency change has to be made, all these clock blocks encapsulated >>>> within the core clock are programmed by pre-determined values. >>> >>> >>> And what about the situation with already defined clocks (like >>> "div_core" and "div_core2"). Those will not be updated when you first >>> call clk_set_rate() and change by hand DIV and DIV1. >>> >>> What if you would like to have the PCLK_DBG clock used in the future? >>> You would add it to CCF and the change will not propagate. >> >> >> I did intend to remove individual clock blocks which are now >> encapsulated within the core clock type from the clock driver file. I >> missed doing that in this patch series. > > > Yes, they should be removed, since they are encapsulated inside the core > clock now. > > >> >>> >>>> This >>>> approach allows very fast clock speed switching, instead of traversing >>>> the entire CCF clock tree searching for individual clock blocks to be >>>> programmed. >>> >>> >>> Those are mostly DIV and MUXes. Recalculation shouldn't be time >>> consuming. >> >> >> I was mainly referring to the time taken to search the clock tree for >> these individual clock blocks. > > > Hmm, why couldn't you simply look-up all the needed clock at initialization > and keep references to them? > > Still, I think it is fine to directly program registers of encapsulated > clocks, since they are not visible outside anymore and there is no need for > them to be visible. > > >> >>> >>>> >>>>> >>>>> >>>>>> + >>>>>> + if (drate < prate) { >>>>>> + mux_reg = readl(base + SRC_CPU); >>>>>> + writel(mux_reg | (1 << 16), base + SRC_CPU); >>>>>> + while (((readl(base + STAT_CPU) >> 16) & 0x7) != 2) >>>>>> + ; >>>>> >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [**] >>>>> >>>>> 2. I think, the above shall be done in a following way: >>>>> >>>>> clk_set_parent(mout_core, mout_mpll); >>>>> clk_set_rate(armclk->fout_pll, drate); >>>>> clk_set_parent(mout_core, mout_apll); >>>>> >>>>> The direct write to registers [**] doesn't look compliant to CCF. >>>>> >>>> >>>> As mentioned above, the clock block encapsulates these clock blocks >>>> into a single clock and only this single encapsulated clock is >>>> registered with CCF. The internal implementation of how the different >>>> clock blocks are managed within this clock is independent of the CCF. >>> >>> >>> I agree, that the CPU_DIV and CPU_DIV1 shall be changed atomically >>> (without CCF). >>> >>> But on the situation [**] the MUX can be changed by clk_set_parent() as >>> it is now done at exynosXXXX-cpufreq.c code. >> >> >> The mux is also encapsulated into a larger clock type and this new >> clock type know how the mux has to be configured. > > > IMHO it's fine to encapsulate the mux as well. There are no users of it > other than the core clock. > > >> >>> >>> >>>> >>>>> >>>>> I'd rather thought about using "mout_core" instead of "arm_clk". >>>>> Then we would get access to the parent directly: >>>>> >>>>> struct clk *parent = clk_get_parent(hw->clk); >>>>> >>>>> so we set the parents explicitly (at clk registration) and call >>>>> ->recalc_rate for clocks which are lower in the tree (like >>>>> "div_core", "div_core2"). >>>> >>>> >>>> That was not the intention as mentioned above. >>> >>> >>> This is just another possible solution to the problem. > > > Those clocks should not be dealt with separately and this was the intention > of this composite clock. I agree with Thomas here. > > Best regards, > Tomasz Thanks for your comments Tomasz. Thanks, Thomas. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html