to be honest, the way compound index is implemented in mongo is problematic (index intersection never worked for me). you have to tinker a lot sometimes, so the query params are in the right order so the query planner don't decide to pick other index or not use index at all.
not sure if this was fixed in recent versions, but if you have index on array of strings/ints, and then use `addToSet` operator, the index is not used for the `addToSet` operation, and the disk is hammered for each update to verify if the content of the array has unique values. with pretty high volume, you can easily saturate the disk queue, wondering what is happening and how you can fix that
after few years of working with few TB collections (tables) in mongo, I can say that this DB is good if you want to insert data, but don't need to read the data that much often. if you need json and read heavy DB, consider something else eg elastic (with all it's drawbacks), because at some point the limiting factor for you will be indexes in mongo, that are the most weak part of this DB (even more weak than the infamous data loss)
That's a valid concern. I completely understand that compound indexes should not be applied blindly, and the database should select an appropriate execution plan on its own.
That's also why I decided to concentrate on the entire workflow rather than on the index itself in the article: Find the slow query -> Analyze Explain -> Analyze the ratio between scanned and returned documents -> Create an index -> Test the same query again...
In this particular case, a compound index was selected based on filters and a sort clause. I completely agree that using index intersections in a real workload could be quite dangerous, especially if MongoDB chooses to use another index instead.
The `$addToSet` example is also very valid; however, for me, that is more like a schema problem than a regular indexing problem in a read query.
Still, the array could become extremely big regardless of the presence of indexes, and it might require a dedicated note in the article.
Data infrastructure in the real world, for non-trivial systems, moves slowly. Migrations are costly and risky. If you need to optimize query performance, you are likely working on one of those non-trivial systems. We have some MongoDB clusters at work (one in my team). Would we have picked it today? Maybe not, because the mindshare isn't there. Does it work fine right now though? Yes, definitely, we have no issues really (the suck in that system is elsewhere).
Absolutely, that's a very practical approach to take. In reality, the optimal solution for databases doesn't necessarily match what people would select right now from scratch.
If a system is working fine, the expense and risk of moving may end up being far greater than any optimization efforts on the existing database. This is also why I believe profiling tools, explain plans, and indexes are all useful; there are plenty of cases where MongoDB is sufficient but still needs optimization...
Here is my recipe for even faster query performance:
1: export the data out of MongoDB
2: import the data in PostgreSQL, ClickHouse, DuckDB, whatever
3: create the indexes there
4: uninstall MongoDB
But then they wouldn’t be able to sell their dashboard view. That’s the point of the tutorial/salespitch/s.
to be honest, the way compound index is implemented in mongo is problematic (index intersection never worked for me). you have to tinker a lot sometimes, so the query params are in the right order so the query planner don't decide to pick other index or not use index at all.
not sure if this was fixed in recent versions, but if you have index on array of strings/ints, and then use `addToSet` operator, the index is not used for the `addToSet` operation, and the disk is hammered for each update to verify if the content of the array has unique values. with pretty high volume, you can easily saturate the disk queue, wondering what is happening and how you can fix that
after few years of working with few TB collections (tables) in mongo, I can say that this DB is good if you want to insert data, but don't need to read the data that much often. if you need json and read heavy DB, consider something else eg elastic (with all it's drawbacks), because at some point the limiting factor for you will be indexes in mongo, that are the most weak part of this DB (even more weak than the infamous data loss)
That's a valid concern. I completely understand that compound indexes should not be applied blindly, and the database should select an appropriate execution plan on its own.
That's also why I decided to concentrate on the entire workflow rather than on the index itself in the article: Find the slow query -> Analyze Explain -> Analyze the ratio between scanned and returned documents -> Create an index -> Test the same query again...
In this particular case, a compound index was selected based on filters and a sort clause. I completely agree that using index intersections in a real workload could be quite dangerous, especially if MongoDB chooses to use another index instead.
The `$addToSet` example is also very valid; however, for me, that is more like a schema problem than a regular indexing problem in a read query. Still, the array could become extremely big regardless of the presence of indexes, and it might require a dedicated note in the article.
Does anyone even use MongoDB anymore? Is this article from 2019?
Data infrastructure in the real world, for non-trivial systems, moves slowly. Migrations are costly and risky. If you need to optimize query performance, you are likely working on one of those non-trivial systems. We have some MongoDB clusters at work (one in my team). Would we have picked it today? Maybe not, because the mindshare isn't there. Does it work fine right now though? Yes, definitely, we have no issues really (the suck in that system is elsewhere).
Absolutely, that's a very practical approach to take. In reality, the optimal solution for databases doesn't necessarily match what people would select right now from scratch.
If a system is working fine, the expense and risk of moving may end up being far greater than any optimization efforts on the existing database. This is also why I believe profiling tools, explain plans, and indexes are all useful; there are plenty of cases where MongoDB is sufficient but still needs optimization...
I mean 70% of Fortune 500 companies are MongoDB customers
Are you kidding? is top 5 worldwide....
https://db-engines.com/en/ranking