Re: Quesion about querying distributed databases

me nefcanto <sn.1361@xxxxxxxxx> · Thu, 6 Mar 2025 11:14:25 +0330

I appreciate your time guys. Thank you very much.

> Having 1 table per database per server is too ugly.

Our databases are not one table per database. They are mapped to DDD's bounded contexts and usually by one table per domain entity.
For example, we have these databases:

Contacts
Courses
Seo
Payment
Forms
Geo
Sales
Media
Taxonomy
...
These are the tables we have in the Contacts database:

Addresses
AddressTypes
Attributes
BankAccounts
ContactContents
Contacts
Emails
Genders
JobTitles
JuridicalPersons
NaturalPersonRelations
NaturalPersons
Persons
Phones
PhoneTypes
Relations
RelationTypes
SocialNetworks
SocialProfiles
Titles
And, these are the tables we have in the Geo database:

AdministrativeDivisions
AdministrativeDivisionTypes
Cities
CityDivisions
Countries
Locations
SpatialDataItems
TelephonePrefixes
TimeZones
But we also do have databases that only have one table in them. The number of tables is not our criteria to break them. The business semantics is our criteria.

> Cross-database on MSSQL is identical to the cross schema on Postgres.

Cross-database query in SQL Server is not equivalent to cross-schema queries in Postgres. Because SQL Server also has the concept of schemas. In other words, both SQL Server and Postgres let you create databases, create schemas inside them, and create tables inside schemas. So SQL Server's cross-schema query equals Postgres's cross-schema query.

> If you truly need cross server support (versus say beefier hardware) how did you come to choose postgres?

We chose Postgres for these reasons that we did R&D about:

Native array per column support
Not having multiple storage engines like MariaDB to be confused about
Supporting expressions in unique constraints
It's usually considered one of the best when it comes to performance, especially in GIS we intend to develop more upon
As it claims on its website, it's the most advanced open-source database engine (but to be honest, we saw many serious drawbacks to that statement)
But here's the deal. We don't have one project only. We don't need cross-server queries for all of our projects. But we tend to keep our architecture the same across projects as much as we can. We chose Postgres because we had experience with SQL Server and MariaDB and assumed that cross-database query on the same server is something natural. Both of them support that. And both are very performant on that. On MariaDB all you have to do is to use `db_name.table_name` and on SQL Server all you have to do is to use `database_name.schema_name.table_name`. So we thought, for projects that do not need more than one server, we keep databases on the same server. When it needed more resources, we start by taking heavy databases onto their own servers, and we start implementing table partitinong on them.

But we have experienced some amazing improvements too in our initial tests. For example, creating all databases and tables and database objects on MariaDB takes more than 400 seconds, while the same took 80 seconds on Postgres. So amazing performance on DDL.
Also, 1 million records in bulk insertion take almost one-sixth to on-fourth of the time on MariaDB. These are valuable numbers. They warmed our hearts to keep digging as much as we can to see if we can perform this migration.

Regards
Saeed

On Thu, Mar 6, 2025 at 7:14 AM Rob Sargent <robjsargent@xxxxxxxxx> wrote:

On Mar 5, 2025, at 8:03 PM, Igor Korot jnit worked great for SQL Server. If you're small, we host them all on one server. If you get bigger, we can put heavy databases on separate machines.

However, I don't have experience working with other types of database scaling. I have used table partitioning, but I have never used sharding.

Anyway, that's why I asked you guys. However, encouraging me to go back to monolith without giving solutions on how to scale, is not helping. To be honest, I'm somehow disappointed by how the most advanced open source database does not support cross-database querying just like how SQL Server does. But if it doesn't, it doesn't. Our team should either drop it as a choice or find a way (by asking the experts who built it or use it) how to design based on its features. That's why I'm asking.

Cross-database on MSSQL is identical to cross schema on postgres. If you truly need cross server support (versus say beefier hardware) how did you come to choose postgres?  The numbers you present are impressive but not unheard of on this list.