[TOPIC 1/12] Next-gen reference backends

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Presenter: Patrick Steinhardt, Notetaker: Karthik Nayak)

* Summary: There have been multiple proposals for reference backends on the
  mailing list. Trying to converge to one solution.
* Problem: At GitLab we have certain repos with large amounts of references.
  Some repos have multi-million refs which causes scalability issues.
   * Current files backend uses a combination of loose files and packed-refs.
   * Deletion performance is bad.
   * Reference lookups are slow.
   * Storage space is also large.
   * There are some patches which improved the situation. e.g. skip-list for
     packed-refs by Taylor.
   * Atomic updates are currently not possible.
   * This is not an issue only faced by GitLab
* Two solutions proposed:
   * Reftables: Originally implemented by JGit (Shawn Pearce, 2017)
      * Google was storing the data in a table with one ref per row. This data
        was encrypted, which changes the ordering.
      * This led to realizing the ref storage itself was not optimal, so based
        on existing solutions at Google there was a proposal by Shawn and was
        implemented in JGit.
      * This solved the ref storage problem at Google.
      * The implementation in JGit by adoption was low because of compatibility
        requirement with CGit.
      * New patch series submitted which swaps out the packed-refs with
        ref-tables while keeping the existing file based loose-refs.
   * Incremental take on reference backend (aka. packed-refs v2) by Derrick
      * Uses pre-existing infrastructure in the git project. Makes it a more
        natural extension.
      * First part was to support a multi backend structure
      * Second part was packed references v2 in the Git project
* Question: How do we take it forward from here.
   * Emily: If the existing backend exists as a library. Might be easier to
     replace and experiment with.
      * Jeff: A lot of work in that direction has already been landed. But there
        is still some bleed of the implementation in other parts of the code.
        Might be messy to cleanup.
      * Patrick: Different implementations by different hosting providers with
        different requirements might cause issues for clients.[b]
   * Deletion performance is not the only issue faced (at GitLab) there are also
     deadlocks faced around this.
   * brian: If you have a large number of remote tracking refs you face the same
     perf issues.
   * Patrick: Any preference of which solution to go forward. GitLab is
     interested to pick this up and mostly going forward with reftables.
   * Reftables does support tombstoning, should solve the problem with multiple
     deletions.
      * There is still a problem with refs being a prefix of other refs.
   * Is there a world where loose refs are removed completely and replaced with
     reftables.
      * Debugging is much easier with loose refs, reftables is binary
        formatting. Might need additional tooling here. This is already proved
        to be working at Google.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux