Hi Loic, > Could you give me the definition of a "lace bug" ? I'm sorry I meant "race bug". It was just a typo (I'm farthest from a native English speaker ...). > Note that there are two kind of tests in Ceph : ... OK. I understand the Ceph's style of testing. > Do you mean that two consecutive runs of minimum_to_decode will provide different results ? Yes, they will. The reason is that SHEC's layout (=calculation ranges of parities) can be asymmetric, which means that the size of minimum_chunks (=the number of chunks that must be read at least from the disks) depends on IDs of the broken chunks. For instance, in case of SHEC(5,3,2) as described below, two chunks would be needed in some PG cases (where single broken chunk were 1, 2, or b), while three chunks in other PG cases (where single broken chunk were 3, 4, 5, or c). SHEC(5,3,2)'s layout : data chunks 12345 parity chunk a ----- parity chunk b -- parity chunk c --- ("---" means calculation ranges of each parity) We called it a random aspect of SHEC's minimum_chunks. Best regards, Takeshi Miyamae -----Original Message----- From: Loic Dachary [mailto:loic@xxxxxxxxxxx] Sent: Friday, January 23, 2015 2:29 AM To: Miyamae, Takeshi/宮前 剛 Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx); Kaga, Yoshihiro/加賀 芳宏; Kawaguchi, Shotaro/川口 翔太朗 Subject: Re: Deadline of Github pull request for Hammer release (question) Hi, On 22/01/2015 17:34, Miyamae, Takeshi wrote: > Hi Loic, > > Thank you for the code reference. We will use it to improve our codes. > >> However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ? > > Sure. The purpose of using threads is just to find lace bugs and ensure thread safety. I'm not familiar with the expression "lace bug" and all I could find is http://en.wiktionary.org/wiki/lace which does not help me (I'm french, english is not my native language). Could you give me the definition of a "lace bug" ? > However, I suppose those tests may be a little adhoc and be far from completeness. > Actually, we couldn't detect some init lace bugs including #7914 (We will fix those bugs at the next release). > Do you have any better ideas to combat these kind of bugs systematically? > Race conditions such as http://tracker.ceph.com/issues/7914 are quite difficult detect with tests and I don't know of a sure method to do so. Carefully proofreading the code and keeping the logic simple is probably the most effective way ;-) Once a race condition has been fixed, however, it should be possible to recreate the conditions for it to appear in a functional test that can be run with make check. Note that there are two kind of tests in Ceph : the unit / functional tests that are run with make check, locally on the developer machine or by a bot on each pull request. These tests run quickly and do not include any benchmarking or stress tests, nor integration tests. The second kind is what you find in teuthology : these tests include stress tests, error injections etc. Most race conditions are discovered by running these teuthology tests and analyzing their output. >> What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ? > > I don't think current minimum_to_decode_2 is adequate, as you mentioned. > The verification of return codes is required and we will compare it with expectations (EINVAL or EIO). > However, as for the size of minimum_chunks, as it is hard to > pre-estimate without knowing parity layout because of its random aspect, we won't improve it any more. Do you mean that two consecutive runs of minimum_to_decode will provide different results ? > Then, as for minimum_to_decode_3, we found it meaningless to verify > minimum_chunks because that is an error case when array size of want_to_decode and available_chunks are illegally large. > We will verify only return code of minimum_to_decode and remove the > current check of minimum_chunks in minimum_to_decode_3. Ok. I'll take a closer look as soon as I can run them and try things. Thanks for explaining ! > > Best regards, > Takeshi Miyamae > > -----Original Message----- > From: Loic Dachary [mailto:loic@xxxxxxxxxxx] > Sent: Thursday, January 22, 2015 6:28 PM > To: Miyamae, Takeshi/宮前 剛 > Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; > Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) > Subject: Re: Deadline of Github pull request for Hammer release > (question) > > Hi, > > On 22/01/2015 09:42, Miyamae, Takeshi wrote: >> Hi Loic, >> >> Thanks for your additional comments. >> >>> Would you be so kind as to update the >>> https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/ >>> M >>> akefile.am >> >> Sure. We'll do that. >> >>> This is likely to fail on a slow machine. Instead of waiting you should use a lock. >> >> The purpose of sleep(1) is to ensure sequentiality between threads, >> but it's not enough on slow machines as you mentioned. So, we'd like >> another example that uses a mutex lock. What files should we refer to? > > src/test/common/test_sharedptr_registry.cc contains an example. The > classes that are helpful in implementing thread safe are in > src/common/Cond.h and src/common/Mutex.h > > However, I'm not sure I understand why you need threads at all to test the plugin. Could you explain ? > >> >>> They however all have the same assert ( EXPECT_TRUE(shec->matrix == >>> NULL); ) >> >> We agree that EINVAL is better than EIO on verifying input variables. >> We'll review our whole sources and fix both our plugin and test codes. > > Thanks :-) What do you think of my comments (about minimum_to_decode_3 and minimum_to_decode_2 ) ? > > Cheers > >> >> P.S. We are debugging thread safe mSHEC to release it tomorrow. >> >> Best regards, >> Takeshi Miyamae >> >> -----Original Message----- >> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >> Sent: Wednesday, January 21, 2015 11:13 PM >> To: Miyamae, Takeshi/宮前 剛 >> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔; >> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >> Subject: Re: Deadline of Github pull request for Hammer release >> (question) >> >> Hi, >> >> I see them at https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/TestErasureCodeShec.cc etc. thanks ! Would you be so kind as to update the https://github.com/t-miyamae/ceph/blob/master/src/test/erasure-code/Makefile.am to add the compilation instructions for these files ? >> >> I see the tests includes a few threads (thread1 to thread5). It looks like they are used for measuring performance, am I right ? >> >> pthread_t tid; >> pthread_create(&tid,NULL,thread1,shec); >> sleep(1); >> >> This is likely to fail on a slow machine. Instead of waiting you should use a lock. You will find examples in other tests, using various techniques depending on the context. If you'd like I could point to one. I'd have to better understand the intent of this test before I do. >> >> From init_5 to init_22 there are various combinations of parameters which suggest checking for all kinds of errors (m being negative, or a string instead of a number etc.). They however all have the same assert ( EXPECT_TRUE(shec->matrix == NULL); ). It would be better to have an expectation that verifies the error has been caught as expected. Is it part of what your completing at the moment ? >> >> minimum_to_decode_2 expectation for when it succeeds should be more >> precise that >> >> EXPECT_TRUE(minimum_chunks.size()); >> >> since minimum_chunks is expecting to have a size that does not vary. And if it does that would be a problem. >> >> In minimum_to_decode_3 after >> >> EXPECT_NE(want_to_decode,minimum_chunks); >> >> you could check the expected content of minimum_chunks also. Unless there is a random aspect to it ? >> >> Cheers >> >> On 20/01/2015 10:07, Miyamae, Takeshi wrote: >>> Hi Loic, >>> >>> I'd appreciated your help very much. >>> I have just uploaded 2 test files into the fork repository. >>> >>> https://github.com/t-miyamae/ceph >>> files: >>> src/test/erasure-code/TestErasureCodeShec.cc >>> src/test/erasure-code/TestErasureCodeShec364.cc >>> >>> I'm sorry that tests for the thread safety codes has not been included yet. >>> >>> Thank you in advance, >>> Takeshi Miyamae >>> >>> -----Original Message----- >>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>> Sent: Tuesday, January 20, 2015 4:23 PM >>> To: Miyamae, Takeshi/宮前 剛 >>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>> 鷹詔; >>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>> Subject: Re: Deadline of Github pull request for Hammer release >>> (question) >>> >>> Hi, >>> >>> Thanks for the update. To speed up things, I could review what's already published. Did you add the tests already ? >>> >>> Cheers >>> >>> On 20/01/2015 01:48, Miyamae, Takeshi wrote: >>>> Hi Loic, >>>> >>>> Thank you for your advice which suggested providing SHEC as an extra-package. >>>> We believe it's generally a good idea, but we are hoping SHEC would >>>> be merged into Hammer because that would have an apparent advantage >>>> from a viewpoint of maintenance support. >>>> >>>> As for the thread safety issue, we totally agree with you and we are trying to solve it. >>>> We will complete it in a few days and we'd like to ask you to >>>> review it again so that it might be in time for Hammer's feature freeze (v0.93). >>>> >>>> Best regards, >>>> Takeshi Miyamae >>>> >>>> -----Original Message----- >>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary >>>> Sent: Wednesday, January 14, 2015 3:03 AM >>>> To: Miyamae, Takeshi/宮前 剛 >>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>> 鷹詔; >>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>> Subject: Re: Deadline of Github pull request for Hammer release >>>> (question) >>>> >>>> Hi, >>>> >>>> On 13/01/2015 18:05, Miyamae, Takeshi wrote: >>>>> Hi Loic, >>>>> >>>>> Thank you for your quick review. >>>>> >>>>>> Could you reference the jerasure files (they are in the jerasure >>>>>> plugin already) instead of including them in your patch ? >>>>> >>>>> We had used the reference of jerasure v2.0 files before, but >>>>> changed to including v1.2 files after the patent issue. >>>> >>>> If you are refering to http://erasure-code-patents.xyz/, it can safely be ignored. >>>> >>>>> However, we can restore the reference right away if we are needed. >>>>> >>>>>> In ::prepare you reuse the matrix, if possible. If your intention >>>>>> is to improve performances, you should consider using the same approach as the isa plugin. >>>>> >>>>> We have found a performance issue caused by redundant >>>>> initializations of the module, and solved it by caching the initialized variables in memory. >>>>> If that is the same approach as isa-plugin you mentioned, we also >>>>> do not have the same issue any more. >>>> >>>> The ISA plugin approach is thread safe, that's the important part. >>>> >>>>>> I'll have more comments once you have unit / functional tests >>>>> >>>>> We do have the those unit/functional tests, but the server is unreachable at this moment. >>>>> Please let us upload them to the repository tomorrow. >>>> >>>> Great. >>>> >>>>>> Regarding your initial question: I think it is too late for the Hammer release. >>>>> >>>>> We are so disappointed to hear that, but we must apologize for the late request. >>>>> If there would be any chance that SHEC be pulled into hammer, we >>>>> are very happy to conduct all the necessary tests by ourselves. >>>>> Please let us know the detailed schedule if this is a feasible option. >>>> >>>> Note that since this is a plugin it will be easy for someone to install it on a Hammer cluster, simply by copying the plugin files to each OSD / MON and restarting them. If there is some kind of urgency for you to be able to install this new plugin, I would advise making a package that contains just this file, restart the OSD once installed and recommend that the installation is done on all machines of the cluster (because every OSD / MON need to know about the plugin). >>>> >>>> Cheers >>>> >>>>> Best regards, >>>>> Takeshi Miyamae >>>>> >>>>> -----Original Message----- >>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic >>>>> Dachary >>>>> Sent: Tuesday, January 13, 2015 9:46 PM >>>>> To: Miyamae, Takeshi/宮前 剛 >>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>> 鷹詔; >>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>> (question) >>>>> >>>>> Hi, >>>>> >>>>> It's a great start :-) A few comments: >>>>> >>>>> Could you reference the jerasure files (they are in the jerasure plugin already) instead of including them in your patch ? >>>>> >>>>> In ::prepare you reuse the matrix, if possible. If your intention >>>>> is to improve performances, you should consider using the same >>>>> approach as the isa plugin. See >>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code >>>>> / >>>>> i >>>>> s >>>>> a >>>>> /ErasureCodePluginIsa.cc#L36 which is and independent type >>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code >>>>> / i s a /ErasureCodeIsaTableCache.h with only one instance and is >>>>> populated as needed and protected by a lock to make it thread safe : >>>>> http://workbench.dachary.org/ceph/ceph/blob/giant/src/erasure-code >>>>> / >>>>> i >>>>> s >>>>> a >>>>> /ErasureCodeIsaTableCache.h#L101 >>>>> >>>>> I'll have more comments once you have unit / functional tests ( similar to http://workbench.dachary.org/ceph/ceph/blob/giant/src/test/erasure-code/TestErasureCodeJerasure.cc ). >>>>> >>>>> Regarding your initial question: I think it is too late for the Hammer release. But from what I read it looks like we'll be able to merge in the early stages of the next release and that will give us time to properly test it. >>>>> >>>>> Cheers >>>>> >>>>> On 13/01/2015 11:34, Miyamae, Takeshi wrote: >>>>>> Hi Loic, >>>>>> >>>>>> I'm so sorry. The following is the correct repository. >>>>>> >>>>>> https://github.com/t-miyamae/ceph >>>>>> >>>>>> Best regards, >>>>>> Takeshi Miyamae >>>>>> >>>>>> -----Original Message----- >>>>>> From: Loic Dachary [mailto:loic@xxxxxxxxxxx] >>>>>> Sent: Tuesday, January 13, 2015 7:26 PM >>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 >>>>>> 鷹詔; >>>>>> Paul Von-Stamwitz (PVonStamwitz@xxxxxxxxxxxxxx) >>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>> (question) >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 13/01/2015 11:24, Miyamae, Takeshi wrote: >>>>>>> Hi Loic, >>>>>>> >>>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin. >>>>>>> >>>>>>> Thank you for your advices. >>>>>>> We have uploaded our latest codes to the following folk repository for an early review. >>>>>>> >>>>>>> https://github.com/miyamae-takeshi/multiple-shec >>>>>> >>>>>> It's 404 ? Is it a private repository maybe ? >>>>>> >>>>>>> >>>>>>> SHEC is located in src/erasure-code/shec directory. >>>>>>> >>>>>>> We are verifying SHEC's advantages on our Ceph cluster. It takes a little bit more. >>>>>>> Would you please start the review before that? >>>>>>> >>>>>>> Best regards, >>>>>>> Takeshi Miyamae >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ceph-devel-owner@xxxxxxxxxxxxxxx >>>>>>> [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic >>>>>>> Dachary >>>>>>> Sent: Wednesday, January 7, 2015 12:52 AM >>>>>>> To: Miyamae, Takeshi/宮前 剛 >>>>>>> Cc: Ceph Development; Shiozawa, Kensuke/塩沢 賢輔; Nakao, >>>>>>> Takanori/中尾 >>>>>>> 鷹詔 >>>>>>> Subject: Re: Deadline of Github pull request for Hammer release >>>>>>> (question) >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 06/01/2015 12:49, Miyamae, Takeshi wrote: >>>>>>>> Dear Loic, >>>>>>>> >>>>>>>> I'm Takeshi Miyamae, one of the authors of SHEC's blueprint. >>>>>>>> >>>>>>>> Shingled Erasure Code (SHEC) >>>>>>>> https://wiki.ceph.com/Planning/Blueprints/Hammer/Shingled_Erasu >>>>>>>> r >>>>>>>> e >>>>>>>> _ >>>>>>>> C >>>>>>>> o >>>>>>>> d >>>>>>>> e >>>>>>>> _(SHEC) >>>>>>> >>>>>>> The work you have done is quite impressive :-) >>>>>>> >>>>>>>> We have revised our blueprint shown in the last CDS to extend >>>>>>>> our erasure code layouts and describe the guideline for choosing SHEC among various EC plugins. >>>>>>>> We believe the blueprint now answers all the comments given at the CDS. >>>>>>> >>>>>>> Great. >>>>>>> >>>>>>>> In addition, we would like to ask for your advice on the >>>>>>>> schedule of our github pull request. More specifically, we >>>>>>>> would like to know its deadline for Hammer release. >>>>>>>> (As we have not really completed our verification of SHEC, we >>>>>>>> are wondering if we should make it open for early preview.) >>>>>>> >>>>>>> Although we're late in the Hammer roadmap, it's a good time for an early preview. It will help show what needs to be changed to accomodate the SHEC plugin. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> Takeshi Miyamae >>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More >>>>>>>> majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>>> >>>>>> >>>>>> -- >>>>>> Loïc Dachary, Artisan Logiciel Libre >>>>>> >>>>> >>>>> -- >>>>> Loïc Dachary, Artisan Logiciel Libre >>>>> >>>> >>>> -- >>>> Loïc Dachary, Artisan Logiciel Libre >>>> >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >>> N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w >> j:+v w j m zZ+ ݢj" !tml= >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> >> N r y b X ǧv ^ ){.n + z ]z {ay ʇڙ ,j f h z w > j:+v w j m zZ+ ݢj" !tml= >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- Loïc Dachary, Artisan Logiciel Libre ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f