Hi Loic, > But for a given PG, minimum_to_decode will always return the same chunks, right ? To be precise, PG, broken OSDs, generator matrix and complicated calculation logic are required to predict the size of minimum_chunks. Because those are almost equivalent to whole implementation of minimum_to_decode, we feel that might be no longer a testing. If you mentioned quantitative evaluation of SHEC's read data during recovery, we have already evaluated it as an average size of minimum_chunks by calling minimum_to_decode with all input patterns. Best regards, Takeshi Miyamae -----Original Message----- From: Loic Dachary [mailto:loic@xxxxxxxxxxx] Sent: Friday, January 23, 2015 6:09 PM To: Miyamae, Takeshi/宮前 剛 Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx); Kaga, Yoshihiro/加賀 芳宏; Kawaguchi, Shotaro/川口 翔太朗 Subject: Re: Deadline of Github pull request for Hammer release (question) Hi, On 23/01/2015 02:16, Miyamae, Takeshi wrote: > Hi Loic, > >> Could you give me the definition of a "lace bug" ? > > I'm sorry I meant "race bug". It was just a typo (I'm farthest from a native English speaker ...). > >> Note that there are two kind of tests in Ceph : ... > > OK. I understand the Ceph's style of testing. > >> Do you mean that two consecutive runs of minimum_to_decode will provide different results ? > > Yes, they will. The reason is that SHEC's layout (=calculation ranges > of parities) can be asymmetric, which means that the size of > minimum_chunks (=the number of chunks that must be read at least from the disks) depends on IDs of the broken chunks. > For instance, in case of SHEC(5,3,2) as described below, two chunks > would be needed in some PG cases (where single broken chunk were 1, 2, > or b), while three chunks in other PG cases (where single broken chunk were 3, 4, 5, or c). > > SHEC(5,3,2)'s layout : > data chunks 12345 > parity chunk a ----- > parity chunk b -- > parity chunk c --- > ("---" means calculation ranges of each parity) > > We called it a random aspect of SHEC's minimum_chunks. But for a given PG, minimum_to_decode will always return the same chunks, right ? Cheers > > Best regards, > Takeshi Miyamae > > -----Original Message----- > From: Loic Dachary [mailto:loic@xxxxxxxxxxx] > Sent: Friday, January 23, 2015 2:29 AM > To: Miyamae, Takeshi/宮前 剛 > Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; > Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx); Kaga, Yoshihiro/加賀 > 芳宏; Kawaguchi, Shotaro/川口 翔太朗 > Subject: Re: Deadline of Github pull request for Hammer release > (question) > > Hi, > > On 22/01/2015 17:34, Miyamae, Takeshi wrote: >> Hi Loic, >> >> Thank you for the code reference. We will use it to improve our codes. >> >>> However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ? >> >> Sure. The purpose of using threads is just to find lace bugs and ensure thread safety. > > I'm not familiar with the expression "lace bug" and all I could find is http://en.wiktionary.org/wiki/lace which does not help me (I'm french, english is not my native language). Could you give me the definition of a "lace bug" ? > >> However, I suppose those tests may be a little adhoc and be far from completeness. >> Actually, we couldn't detect some init lace bugs including #7914 (We will fix those bugs at the next release). >> Do you have any better ideas to combat these kind of bugs systematically? >> > > Race conditions such as http://tracker.ceph.com/issues/7914 are quite difficult detect with tests and I don't know of a sure method to do so. Carefully proofreading the code and keeping the logic simple is probably the most effective way ;-) Once a race condition has been fixed, however, it should be possible to recreate the conditions for it to appear in a functional test that can be run with make check. > > Note that there are two kind of tests in Ceph : the unit / functional tests that are run with make check, locally on the developer machine or by a bot on each pull request. These tests run quickly and do not include any benchmarking or stress tests, nor integration tests. The second kind is what you find in teuthology : these tests include stress tests, error injections etc. Most race conditions are discovered by running these teuthology tests and analyzing their output. > >>> What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ? >> >> I don't think current minimum_to_decode_2 is adequate, as you mentioned. >> The verification of return codes is required and we will compare it with expectations (EINVAL or EIO). >> However, as for the size of minimum_chunks, as it is hard to >> pre-estimate without knowing parity layout because of its random aspect, we won't improve it any more. > > Do you mean that two consecutive runs of minimum_to_decode will provide different results ? > >> Then, as for minimum_to_decode_3, we found it meaningless to verify >> minimum_chunks because that is an error case when array size of want_to_decode and available_chunks are illegally large. >> We will verify only return code of minimum_to_decode and remove the >> current check of minimum_chunks in minimum_to_decode_3. > > Ok. I'll take a closer look as soon as I can run them and try things. > > Thanks for explaining ! > >> >> Best regards, >> Takeshi Miyamae >> >> -----Original Message----- >> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >> Sent: Thursday, January 22, 2015 6:28 PM >> To: Miyamae, Takeshi/宮前 剛 >> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; >> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >> Subject: Re: Deadline of Github pull request for Hammer release >> (question) >> >> Hi, >> >> On 22/01/2015 09:42, Miyamae, Takeshi wrote: >>> Hi Loic, >>> >>> Thanks for your additional comments. >>> >>>> Would you be so kind as to update the >>>> https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code >>>> / >>>> M >>>> akefile.am >>> >>> Sure. We'll do that. >>> >>>> This is likely to fail on a slow machine. Instead of waiting you should use a lock. >>> >>> The purpose of sleep(1) is to ensure sequentiality between threads, >>> but it's not enough on slow machines as you mentioned. So, we'd like >>> another example that uses a mutex lock. What files should we refer to? >> >> src/test/common/test_sharedptr_registry.cc contains an example. The >> classes that are helpful in implementing thread safe are in >> src/common/Cond.h and src/common/Mutex.h >> >> However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ? >> >>> >>>> They however all have the same assert ( EXPECT_TRUE(shec->matrix == >>>> NULL); ) >>> >>> We agree that EINVAL is better than EIO on verifying input variables. >>> We'll review our whole sources and fix both our plugin and test codes. >> >> Thanks :-) What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ? >> >> Cheers >> >>> >>> P.S. We are debugging thread safe mSHEC to release it tomorrow. >>> >>> Best regards, >>> Takeshi Miyamae >>> >>> -----Original Message----- >>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >>> Sent: Wednesday, January 21, 2015 11:13 PM >>> To: Miyamae, Takeshi/宮前 剛 >>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>> 鷹詔; >>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>> Subject: Re: Deadline of Github pull request for Hammer release >>> (question) >>> >>> Hi, >>> >>> I see them at https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/TestErasureCodeShec.cc etc. thanks ! Would you be so kind as to update the https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/Makefile.am to add the compilation instructions for these files ? >>> >>> I see the tests includes a few threads (thread1 to thread5). It looks like they are used for measuring performance, am I right ? >>> >>> pthread_t tid; >>> pthread_create(&tid,NULL,thread1,shec); >>> sleep(1); >>> >>> This is likely to fail on a slow machine. Instead of waiting you should use a lock. You will find examples in other tests, using various techniques depending on the context. If you'd like I could point to one. I'd have to better understand the intent of this test before I do. >>> >>> From init_5 to init_22 there are various combinations of parameters which suggest checking for all kinds of errors (m being negative, or a string instead of a number etc.). They however all have the same assert ( EXPECT_TRUE(shec->matrix == NULL); ). It would be better to have an expectation that verifies the error has been caught as expected. Is it part of what your completing at the moment ? >>> >>> minimum_to_decode_2 expectation for when it succeeds should be more >>> precise that >>> >>> EXPECT_TRUE(minimum_chunks.size()); >>> >>> since minimum_chunks is expecting to have a size that does not vary. And if it does that would be a problem. >>> >>> In minimum_to_decode_3 after >>> >>> EXPECT_NE(want_to_decode,minimum_chunks); >>> >>> you could check the expected content of minimum_chunks also. Unless there is a random aspect to it ? >>> >>> Cheers >>> >>> On 20/01/2015 10:07, Miyamae, Takeshi wrote: >>>> Hi Loic, >>>> >>>> I'd appreciated your help very much. >>>> I have just uploaded 2 test files into the fork repository. >>>> >>>> https://github.com/t-miyamae/ceph >>>> files: >>>> src/test/erasure-code/TestErasureCodeShec.cc >>>> src/test/erasure-code/TestErasureCodeShec364.cc >>>> >>>> I'm sorry that tests for the thread safety codes has not been included yet. >>>> >>>> Thank you in advance, >>>> Takeshi Miyamae >>>> >>>> -----Original Message----- >>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>>> Sent: Tuesday, January 20, 2015 4:23 PM >>>> To: Miyamae, Takeshi/宮前 剛 >>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>> 鷹詔; >>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>> Subject: Re: Deadline of Github pull request for Hammer release >>>> (question) >>>> >>>> Hi, >>>> >>>> Thanks for the update. To speed up things, I could review what's already published. Did you add the tests already ? >>>> >>>> Cheers >>>> >>>> On 20/01/2015 01:48, Miyamae, Takeshi wrote: >>>>> Hi Loic, >>>>> >>>>> Thank you for your advice which suggested providing SHEC as an extra-package. >>>>> We believe it's generally a good idea, but we are hoping SHEC >>>>> would be merged into Hammer because that would have an apparent >>>>> advantage from a viewpoint of maintenance support. >>>>> >>>>> As for the thread safety issue, we totally agree with you and we are trying to solve it. >>>>> We will complete it in a few days and we'd like to ask you to >>>>> review it again so that it might be in time for Hammer's feature freeze (v0.93). >>>>> >>>>> Best regards, >>>>> Takeshi Miyamae >>>>> >>>>> -----Original Message----- >>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic >>>>> Dachary >>>>> Sent: Wednesday, January 14, 2015 3:03 AM >>>>> To: Miyamae, Takeshi/宮前 剛 >>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>> 鷹詔; >>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>> (question) >>>>> >>>>> Hi, >>>>> >>>>> On 13/01/2015 18:05, Miyamae, Takeshi wrote: >>>>>> Hi Loic, >>>>>> >>>>>> Thank you for your quick review. >>>>>> >>>>>>> Could you reference the jerasure files (they are in the jerasure >>>>>>> plugin already) instead of including them in your patch ? >>>>>> >>>>>> We had used the reference of jerasure v2.0 files before, but >>>>>> changed to including v1.2 files after the patent issue. >>>>> >>>>> If you are refering to http://erasure-code-patents.xyz/, it can safely be ignored. >>>>> >>>>>> However, we can restore the reference right away if we are needed. >>>>>> >>>>>>> In ::prepare you reuse the matrix, if possible. If your >>>>>>> intention is to improve performances, you should consider using the same approach as the isa plugin. >>>>>> >>>>>> We have found a performance issue caused by redundant >>>>>> initializations of the module, and solved it by caching the initialized variables in memory. >>>>>> If that is the same approach as isa-plugin you mentioned, we also >>>>>> do not have the same issue any more. >>>>> >>>>> The ISA plugin approach is thread safe, that's the important part. >>>>> >>>>>>> I'll have more comments once you have unit / functional tests >>>>>> >>>>>> We do have the those unit/functional tests, but the server is unreachable at this moment. >>>>>> Please let us upload them to the repository tomorrow. >>>>> >>>>> Great. >>>>> >>>>>>> Regarding your initial question: I think it is too late for the Hammer release. >>>>>> >>>>>> We are so disappointed to hear that, but we must apologize for the late request. >>>>>> If there would be any chance that SHEC be pulled into hammer, we >>>>>> are very happy to conduct all the necessary tests by ourselves. >>>>>> Please let us know the detailed schedule if this is a feasible option. >>>>> >>>>> Note that since this is a plugin it will be easy for someone to install it on a Hammer cluster, simply by copying the plugin files to each OSD / MON and restarting them. If there is some kind of urgency for you to be able to install this new plugin, I would advise making a package that contains just this file, restart the OSD once installed and recommend that the installation is done on all machines of the cluster (because every OSD / MON need to know about the plugin). >>>>> >>>>> Cheers >>>>> >>>>>> Best regards, >>>>>> Takeshi Miyamae >>>>>> >>>>>> -----Original Message----- >>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic >>>>>> Dachary >>>>>> Sent: Tuesday, January 13, 2015 9:46 PM >>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>>> 鷹詔; >>>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>> (question) >>>>>> >>>>>> Hi, >>>>>> >>>>>> It's a great start :-) A few comments: >>>>>> >>>>>> Could you reference the jerasure files (they are in the jerasure plugin already) instead of including them in your patch ? >>>>>> >>>>>> In ::prepare you reuse the matrix, if possible. If your intention >>>>>> is to improve performances, you should consider using the same >>>>>> approach as the isa plugin. See >>>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-cod >>>>>> e >>>>>> / >>>>>> i >>>>>> s >>>>>> a >>>>>> /ErasureCodePluginIsa.cc#L36 which is and independent type >>>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-cod >>>>>> e / i s a /ErasureCodeIsaTableCache.h with only one instance and >>>>>> is populated as needed and protected by a lock to make it thread >>>>>> safe : >>>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-cod >>>>>> e >>>>>> / >>>>>> i >>>>>> s >>>>>> a >>>>>> /ErasureCodeIsaTableCache.h#L101 >>>>>> >>>>>> I'll have more comments once you have unit / functional tests ( similar to http://workbench.dachary.org/ceph/ceph/blob/giant/src/test/erasure-code/TestErasureCodeJerasure.cc ). >>>>>> >>>>>> Regarding your initial question: I think it is too late for the Hammer release. But from what I read it looks like we'll be able to merge in the early stages of the next release and that will give us time to properly test it. >>>>>> >>>>>> Cheers >>>>>> >>>>>> On 13/01/2015 11:34, Miyamae, Takeshi wrote: >>>>>>> Hi Loic, >>>>>>> >>>>>>> I'm so sorry. The following is the correct repository. >>>>>>> >>>>>>> https://github.com/t-miyamae/ceph >>>>>>> >>>>>>> Best regards, >>>>>>> Takeshi Miyamae >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >>>>>>> Sent: Tuesday, January 13, 2015 7:26 PM >>>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, >>>>>>> Takanori/中尾 >>>>>>> 鷹詔; >>>>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>>> (question) >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 13/01/2015 11:24, Miyamae, Takeshi wrote: >>>>>>>> Hi Loic, >>>>>>>> >>>>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin. >>>>>>>> >>>>>>>> Thank you for your advices. >>>>>>>> We have uploaded our latest codes to the following folk repository for an early review. >>>>>>>> >>>>>>>> https://github.com/miyamae-takeshi/multiple-shec >>>>>>> >>>>>>> It's 404 ? Is it a private repository maybe ? >>>>>>> >>>>>>>> >>>>>>>> SHEC is located in src/erasure-code/shec directory. >>>>>>>> >>>>>>>> We are verifying SHEC's advantages on our Ceph cluster. It takes a little bit more. >>>>>>>> Would you please start the review before that? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Takeshi Miyamae >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic >>>>>>>> Dachary >>>>>>>> Sent: Wednesday, January 7, 2015 12:52 AM >>>>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, >>>>>>>> Takanori/中尾 >>>>>>>> 鷹詔 >>>>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>>>> (question) >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 06/01/2015 12:49, Miyamae, Takeshi wrote: >>>>>>>>> Dear Loic, >>>>>>>>> >>>>>>>>> I'm Takeshi Miyamae, one of the authors of SHEC's blueprint. >>>>>>>>> >>>>>>>>> Shingled Erasure Code (SHEC) >>>>>>>>> https://wiki.ceph.com/Planning/Blueprints/Hammer/Shingled_Eras >>>>>>>>> u >>>>>>>>> r >>>>>>>>> e >>>>>>>>> _ >>>>>>>>> C >>>>>>>>> o >>>>>>>>> d >>>>>>>>> e >>>>>>>>> _(SHEC) >>>>>>>> >>>>>>>> The work you have done is quite impressive :-) >>>>>>>> >>>>>>>>> We have revised our blueprint shown in the last CDS to extend >>>>>>>>> our erasure code layouts and describe the guideline for choosing SHEC among various EC plugins. >>>>>>>>> We believe the blueprint now answers all the comments given at the CDS. >>>>>>>> >>>>>>>> Great. >>>>>>>> >>>>>>>>> In addition, we would like to ask for your advice on the >>>>>>>>> schedule of our github pull request. More specifically, we >>>>>>>>> would like to know its deadline for Hammer release. >>>>>>>>> (As we have not really completed our verification of SHEC, we >>>>>>>>> are wondering if we should make it open for early preview.) >>>>>>>> >>>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin. >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>>> Thank you in advance, >>>>>>>>> Takeshi Miyamae >>>>>>>>> >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More >>>>>>>>> majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>>> >>>>>> >>>>>> -- >>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>> >>>>> >>>>> -- >>>>> Loïc Dachary, Artisan Logiciel Libre >>>>> >>>> >>>> -- >>>> Loïc Dachary, Artisan Logiciel Libre >>>> >>>> N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w >>> j:+v w j m zZ+ ݢj" !tml= >>>> >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >>> N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w >> j:+v w j m zZ+ ݢj" !tml= >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> > > -- > Loïc Dachary, Artisan Logiciel Libre > > N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w j:+v w j m zZ+ ݢj" !tml= > -- Loïc Dachary, Artisan Logiciel Libre ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f