MongoDB vs MariaDB vs PostgreSQL for Strapi

sanvit · February 2020

I am trying to deploy strapi for my api server, and I'm currently looking for the best database for it. Tried googling it, but couldn't find any relevant info, and people just seemed to use whatever they normally prefer. Have anybody tried strapi with multiple database backends? Which DB should perform the best?

Any suggestion would be awesome, as well as suggestions for other headless CMSes (preferably with an admin panel).

Thanks

PHP_Backend · February 2020

@sanvit said:
people just seemed to use whatever they normally prefer.

I think that's your answer.
In IT, everything has trade-offs. So people tends to use the tool that they prefer (or know).

lightblade · February 2020

I'm use mongoDb for my production and local. Never get headcache about relation or whatever to database

joepie91 · February 2020

Avoid MongoDB. It's a mediocre database at best, and you're quite likely to end up with some form of data corruption over time, given that it doesn't validate data integrity like an RDBMS would.

I've not kept track of MySQL/MariaDB lately (it's much less pleasant to work with than PostgreSQL), but where performance is concerned, PostgreSQL will easily win out over MongoDB, no matter how many misleading benchmarks MongoDB (the company) tries to publish...

@PHP_Backend said: In IT, everything has trade-offs. So people tends to use the tool that they prefer (or know).

Everything has tradeoffs, sure. But sometimes there are technologies that are just bad, that provide literally no redeeming features over already-existing options. MongoDB is one of those cases; it basically only exists because the company behind it needed a product to sell, it doesn't have any actual redeeming features.

(For other, more serious databases, there are generally pros and cons to each option. MongoDB is a special case here.)

evnix · February 2020

@joepie91 said: Avoid MongoDB. It's a mediocre database at best, and you're quite likely to end up with some form of data corruption over time, given that it doesn't validate data integrity like an RDBMS would.

Completely agree, also an absolute memory hog.
constant breakdowns, good as a small DB but when you need anything more than a few TBs, just stay away from it.
A nicely sharded MySQL or Postgres works much better.
if you have the cash, stick with Oracle or MSSQL (I have seen database sizes that make most big DBs look puny)
Cassandra could be a the goto DB, it works if you have the know-how and a team to maintain it.

willie · February 2020

Is strapi something that uses tb's of data? Mongo is easy to get started with (document db instead of relational) and has an easy replication setup (as does cassandra), plus it's very fast in unsafe mode, but I'd agree to stay away from it for serious purposes. Cassandra isn't especially hard to use either (I've played with it). ScyllaDB is an interesting alternative to Cassandra though I haven't tried it yet.

sanvit · February 2020

@PHP_Backend @lightblade @joepie91 @evnix @willie thankyou all with the replys, and sorry for responding too late. TBH I was really busy with other personal stuffs and didn't really have time to look into it.

It seems like for testing purposes, MongoDB should work fine and fast (especially with unsafe mode), but I should stick on with MariaDB on production since I am used to it and has more stability.

Once again, thankyou everyone!

joepie91 · February 2020

@willie said: Mongo is easy to get started with (document db instead of relational) and has an easy replication setup

Here's the problem, though: "getting started with" something is something you only do once, whereas "keeping it going" is something you will be doing effectively forever. MongoDB makes it "easy to get started", in exchange for making everything after that significantly harder and less reliable, forever.

That's great for their ability to market a subpar database product (and in fact, this seems to be quite literally their marketing strategy), but for the end user it means that the rare case is being optimized at the cost of the common case - ie. a very bad deal.

As for "easy replication setup" - it may be easy to get it running in something it claims is a replicated setup... but pretty much every single person I've spoken to who has actually maintained a serious production MongoDB cluster has called it a nightmare to operate and maintain, with constant inexplicable failures.

There's a reason there's a billion "this is how easy it is to get started with MongoDB" tutorials around the web, and virtually none that tell you how to maintain a MongoDB cluster in the long run. Most anyone running a serious deployment has migrated away to a serious database by that point.

(This is actually a great indicator for whether a new-ish technology is just hype, or a serious improvement; is the web full of "getting started" posts, or are there also in-depth articles about long-term use? If it's just the former and almost none of the latter, it's probably just hype.)

SagnikS · February 2020

@joepie91 said:
Avoid MongoDB. It's a mediocre database at best, and you're quite likely to end up with some form of data corruption over time, given that it doesn't validate data integrity like an RDBMS would.

Could you please elaborate a bit? I'm new to Mongo, but I have some experience with SQL. Mongo seems to look better than MySQL, but then good marketing makes shit look gold. I would love to hear your experiences/opinions

willie · February 2020

@joepie91 said: There's a reason there's a billion "this is how easy it is to get started with MongoDB" tutorials around the web, and virtually none that tell you how to maintain a MongoDB cluster in the long run. Most anyone running a serious deployment has migrated away to a serious database by that point.

I've done it, and there are definitely serious deployments around, though I agree with you that Mongo was always overhyped and people have caught onto it more by now. I still use it for a few things. SQL users now tend to do things like put JSON text into columns (MySQL and PostgreSQL both have acquired features to support this) to implement Mongo-like soft-schema approaches. I agree that for big permament production applications you are better off jumping through all the SQL hoops.

joepie91 · February 2020

@SagnikS said:

@joepie91 said:
Avoid MongoDB. It's a mediocre database at best, and you're quite likely to end up with some form of data corruption over time, given that it doesn't validate data integrity like an RDBMS would.

Could you please elaborate a bit? I'm new to Mongo, but I have some experience with SQL. Mongo seems to look better than MySQL, but then good marketing makes shit look gold. I would love to hear your experiences/opinions

That's a difficult question to answer, unless I know what impressions you've gotten of MongoDB, given how much different nonsense they've marketed over the years

So yeah, in what way(s) does it seem to look better to you than MySQL? Then I can address those points specifically.

WSS · February 2020

@joepie91 said:
So yeah, in what way(s) does it seem to look better to you than MySQL? Then I can address those points specifically.

It's "Alternative".

isunbejo · February 2020

@willie said:

@joepie91 said: There's a reason there's a billion "this is how easy it is to get started with MongoDB" tutorials around the web, and virtually none that tell you how to maintain a MongoDB cluster in the long run. Most anyone running a serious deployment has migrated away to a serious database by that point.

I've done it, and there are definitely serious deployments around, though I agree with you that Mongo was always overhyped and people have caught onto it more by now. I still use it for a few things. SQL users now tend to do things like put JSON text into columns (MySQL and PostgreSQL both have acquired features to support this) to implement Mongo-like soft-schema approaches. I agree that for big permament production applications you are better off jumping through all the SQL hoops.

This!

SagnikS · February 2020

@joepie91 said:
That's a difficult question to answer, unless I know what impressions you've gotten of MongoDB, given how much different nonsense they've marketed over the years

Not much to be honest, I have been skeptical towards Mongo (probably because their marketing made it sound too good), but I've heard that they apparently scale better than MySQL.

So yeah, in what way(s) does it seem to look better to you than MySQL? Then I can address those points specifically.

I haven't tested it myself, but Mongo claims to be easier to configure HA on. Their marketing seems to indicate that it's much better than MySQL.

joepie91 · February 2020

@SagnikS said: but I've heard that they apparently scale better than MySQL.

@SagnikS said: I haven't tested it myself, but Mongo claims to be easier to configure HA on. Their marketing seems to indicate that it's much better than MySQL.

Okay, so what that is really referring to, is that MongoDB uses sharding (basically, distributing records across multiple servers and using a deterministic algorithm to determine what server to ask for what record), which makes it "easy" to scale up in the sense that it doesn't require you to architect your data storage around a particular distribution model across servers, it just throws all the records into a big content-addressable bucket.

This is not a technique that's unique to MongoDB, and in fact there are quite a few databases that can be sharded (or are sharded by default).

What their marketing copy doesn't mention, however, is that sharding comes with severe tradeoffs; you can't have relational integrity, because a sharded system cannot assume that other servers are available to check the validity of certain references against, and there can be significant overhead associated with lookups when different servers have a different idea of which servers are currently online and "healthy", as well as a lot of opportunities for servers to serve up outdated versions of records.

All the while this functionality is not necessary for the vast majority of projects (you can scale very far on a single server, enough for 99.9% of usecases), and so sharding is a really bad default as a replication strategy, because you will be trading in data integrity guarantees that you do need, for scalability features that you don't need. There's a reason why RDBMSes don't shard by default.

If you still really want to do sharding for some reason, then there's an implementation of that for PostgreSQL and, from a quick search, it seems for MySQL as well (though I don't bother with MySQL personally, PostgreSQL is better and nicer to work with in almost every way).

So yeah, not quite the unique selling point that MongoDB are presenting it to be.

lightblade · February 2020

@joepie91 said:

Okay, so what that is really referring to, is that MongoDB uses sharding (basically, distributing records across multiple servers and using a deterministic algorithm to determine what server to ask for what record), which makes it "easy" to scale up in the sense that it doesn't require you to architect your data storage around a particular distribution model across servers, it just throws all the records into a big content-addressable bucket.

how about Replication?
https://docs.mongodb.com/manual/replication/

it's like master<=>slave cluster. i'm not use it, only install MongoDB in docker for dev at this time.

another source i found on internet
https://dba.stackexchange.com/questions/52632/difference-between-sharding-and-replication-on-mongodb

joepie91 · February 2020

@lightblade said:

@joepie91 said:

Okay, so what that is really referring to, is that MongoDB uses sharding (basically, distributing records across multiple servers and using a deterministic algorithm to determine what server to ask for what record), which makes it "easy" to scale up in the sense that it doesn't require you to architect your data storage around a particular distribution model across servers, it just throws all the records into a big content-addressable bucket.

how about Replication?
https://docs.mongodb.com/manual/replication/

it's like master<=>slave cluster. i'm not use it, only install MongoDB in docker for dev at this time.

another source i found on internet
https://dba.stackexchange.com/questions/52632/difference-between-sharding-and-replication-on-mongodb

In a sharded model, you can have redundancy by simply having >1 nodes responsible for the same record, and modifying your algorithm to produce >1 results. That seems to be more or less what MongoDB does, with its "each shard can be a replica set" approach.

If you want just replication, then that's the standard mode of operation of most every RDBMS that supports a cluster of more than one instance. But HA replication is not trivial; and it's something that MongoDB definitely doesn't offer, despite its marketing (because for true HA, you need a guarantee that each node in the cluster will always produce non-stale data, which MongoDB doesn't provide).

Basically, MongoDB still doesn't do anything special here, and if you just run a replica set, you still don't get the relational integrity that an RDBMS would provide.

Edit: Also, you usually want none of the above things. The complexity of highly-redundant setups tends to cause more problems than it solves.

willie · February 2020

Sharding (splitting a single big dataset across multiple servers) and replication (maintaining multiple copies of a dataset for HA) are completely separate things. MySQL and Mongo both had better replication features than Postgresql (PG) did, as of a few years ago. Many people considered PG to be generally better than MySQL, but still chose to run MySQL because they needed replication. It's possible that PG has better replication by now: anyone know?

There was a time when Mongo had sharding and PG didn't. Later there was a proprietary PG fork called CitusDB that had it, and PG itself has managed to catch up (at least partially) with Citus since then. So the most we can say is that PG has (possibly) caught up to Mongo for sharding by now. At one point Mongo had an advantage in this area and that was an attraction for some users. We can't blame the users for that. I don't know if MySQL has any sharding to this day.

Maintaining consistency with replication active is not complicated if you can tolerate slow operations: just wait for acks from multiple (or all) slave servers before declaring any operation fully committed. Mongo has modes that do that, they are slow, but they are there if you need them. Consistency across sharding is more complicated and there are various "eventual consistency" schemes etc. I think Cassandra was more sophisticated than Mongo about that. Mongo despite appearances was not really that concerned with enormous sharded datasets. It was more about easy setup and prototyping. I remember hearing that Riak was the most solid of the sharded db's though I never used it.

Mongo in unsafe mode was really faster than the SQL databases, if you could accept the un-safety. It took a bad rap over that because maybe people expected unsafe mode to be safe and got upset when they found out that it wasn't. That seems silly to me. Redis is unsafe and fast and people use it exactly when that combination suits their purposes. There is no confusion, so Redis is popular.

These days I'm trying to get better at SQL, so I prototype with SqLite, but Mongo did make some things easy.

flips · February 2020

Hm, not intending to derail this thread, but if there's any availble and easy understandable guidelines as to when (up to what size/other measurement) SQLite is preferrable to PostgreSQL/MariaDB, that would be nice.
(And also, is MySQL/MariaDB better than PostgreSQL for low memory VPS'es?)

vimalware · February 2020

Just suck it up and learn Pg. Then Spend 10years mastering it.

Your career will thank you.
Pg: A real Swiss army knife for 95% of your data manipulation needs if you count the plugin ecosystem.

joepie91 · February 2020

@willie said: Sharding (splitting a single big dataset across multiple servers) and replication (maintaining multiple copies of a dataset for HA) are completely separate things. MySQL and Mongo both had better replication features than Postgresql (PG) did, as of a few years ago. Many people considered PG to be generally better than MySQL, but still chose to run MySQL because they needed replication. It's possible that PG has better replication by now: anyone know?

AFAIK it is still the case that MySQL has better replication for a particular replication model than PostgreSQL; I forgot which model that was, though.

@willie said: There was a time when Mongo had sharding and PG didn't. Later there was a proprietary PG fork called CitusDB that had it, and PG itself has managed to catch up (at least partially) with Citus since then. So the most we can say is that PG has (possibly) caught up to Mongo for sharding by now. At one point Mongo had an advantage in this area and that was an attraction for some users. We can't blame the users for that. I don't know if MySQL has any sharding to this day.

I strongly doubt that anyone who initially picked MongoDB, actually picked it because they needed sharding. There were a bunch of other databases that implemented some variant of sharding before Mongo came around, and people who actually had a sharding requirement were very likely using those. The hype around MongoDB has always been focused around nebulous unsubstantiated "performance" claims, and hype around the "MEAN stack" (which MongoDB coined themselves), neither of which really had anything to do with sharding.

(Even today, almost noone is actually using the sharding features in PostgreSQL, because it just isn't a common requirement.)

@willie said: Maintaining consistency with replication active is not complicated if you can tolerate slow operations: just wait for acks from multiple (or all) slave servers before declaring any operation fully committed. Mongo has modes that do that, they are slow, but they are there if you need them. Consistency across sharding is more complicated and there are various "eventual consistency" schemes etc. I think Cassandra was more sophisticated than Mongo about that. Mongo despite appearances was not really that concerned with enormous sharded datasets. It was more about easy setup and prototyping. I remember hearing that Riak was the most solid of the sharded db's though I never used it.

That still doesn't give you relational integrity, though.

@willie said: Mongo in unsafe mode was really faster than the SQL databases, if you could accept the un-safety.

I've seen that claim thrown around, but mysteriously people fell silent every time I asked for a source. I've never seen a single benchmark that backed up this claim and was also verified by a third party as being a representative benchmark. Every plausible benchmark I've seen showed MongoDB to be slower than PostgreSQL, even when running MongoDB in unsafe mode.

Far as I can tell, the performance claim has never been more than that; a claim. MongoDB's recent benchmarking bullshit just further reinforces that conclusion for me.

@willie said: Redis is unsafe and fast and people use it exactly when that combination suits their purposes. There is no confusion, so Redis is popular.

It's certainly less of an issue than with MongoDB, because the Redis documentation is fairly clear that it's not a reliable data store, and not designed to be. But even then, I have to convince someone every once in a while that no, Redis is not meant to be used as a primary database...

@flips said: Hm, not intending to derail this thread, but if there's any availble and easy understandable guidelines as to when (up to what size/other measurement) SQLite is preferrable to PostgreSQL/MariaDB, that would be nice.

If you need an embedded database, ie. not a separate server process but part of your application: use SQLite, I guess. In every other case: use PostgreSQL (or MySQL/MariaDB if you already have that running).

SQLite doesn't perform very well, and it's pretty awful to work with (it's missing fairly basic features such as "renaming columns" and it has a very questionable storage/typing model), but it's pretty much your only option for a vaguely-relational database without a separate process.

It's not really worth using SQLite "for prototyping" or "for testing", IMO -- after a few months you'll discover that it's just costing you time and effort, because you constantly have to replicate database features that SQLite doesn't have natively, just to be able to use them in production where you're running PostgreSQL/MySQL/MariaDB. Just installing a database server on your development system will be much simpler in the long run.

@flips said: (And also, is MySQL/MariaDB better than PostgreSQL for low memory VPS'es?)

I did a quick test at idle a while ago, and PostgreSQL idled below MySQL in memory usage. I don't have good data on memory usage under load, but I've seen no reason to believe that PostgreSQL is somehow unsuitable for low-memory systems; or at least, not any more unsuitable than MySQL would be.

I'd even say that it feels like PostgreSQL is more efficient, though that's of course not a very reliable statement without the data to back it up

flips · February 2020

@joepie91 said: I did a quick test at idle a while ago, and PostgreSQL idled below MySQL in memory usage. I don't have good data on memory usage under load, but I've seen no reason to believe that PostgreSQL is somehow unsuitable for low-memory systems; or at least, not any more unsuitable than MySQL would be.
I'd even say that it feels like PostgreSQL is more efficient, though that's of course not a very reliable statement without the data to back it up

Thanks, great answers. I haven't played with PostgreSQL for a long time, but remember I used to like it better than MySQL. (Then I ended up working with a lot of system that were built with MySQL.) So I'd really like to give PG another go for my next project.

MongoDB vs MariaDB vs PostgreSQL for Strapi

Comments