Databases are designed to hold data and retrieve it, so they are optimized to run queries. Over the years I’ve seen quite a few cases where the developers did things on the application side that could be easily done on the database side, and almost every time moving it to the database improved the performance. And I’m not talking about the business logic or the “application layer”, but the data access, which is the thing that the database is build for.
Here are some examples:
- Filtering data: I’ve seen cases where the application request more data that it needs and filter some of it in the application. This causes much more data to go over the network from the database to the application and it’s a huge overhead. This is common when the application is using paging to show the data to the user, so the application gets all the data but displays only a single page.
- When it’s difficult to write a single query to do the entire logic, developers sometimes use several queries and analyze the results in between. This also causes many round-trips to the database and back and is an overhead.
There are many posts and conversations about these issues and why it is better to do as much as you can on the database side. But in this post I wanted to write about the other cases, when it is a better idea to add some code to your application and do stuff there instead of the database.
Changing the view of the data
There are many cases in which users retrieve data in the application, and then ask to change the view on this data. The simplest example would be sorting. Let’s say I have a system that shows a list of cases I have. By default it shows the open cases ordered by date. After I take a look, I decide to sort it by the customer. In this case the developers might decide it’s easier to simply run the query again. It will use the same code but with a slightly different query (different “order by” clause). They are correct, of course, it’s easier to do that, easier to maintain and less code is involved. However, performance wise, this is not the right thing to do. Once the query is sent to the database, the database will need to execute it, and it might be a resource consuming operation (it might be a complex join for example). Even if the query is simple, the round trip from the application to the database and back is simply a waste of time. The application has all the data, so sorting it is quite straight forward. It might require some more work from the developers, but will release the database to do other things, and reduce the round trips between the application and the database.
A classic example that I ran into a several times is the dashboard model. It doesn’t have to be a dashboard, of course, but this scenario is very common in these cases. In several cases in the past I’ve seen databases that run a single, very simple query dozens or even hundreds times a second. Even though the query is simple and practically takes nothing to run, executing it so many times is a hard work for the database. In some cases the database performs many soft parses which causes contention on the shared pool.
When I investigated several cases, I realized that this is a model I call the dashboard model. An application that shows a dashboard to the users. The dashboard is using a single or very few simple queries, but it executes them all the time in order to keep the dashboard current. The developers didn’t consider the amount of queries that are sent to the database, so they simply executed the dashboard queries for every user every second. When the amount of users increased, so did the amount of executions in the database. In some cases this was most of the database load, while it would have been much more efficient to use these resources to other things.
The solution here is quite simple to understand, but requires changing the architecture of this part of the application. The application should execute the dashboard queries once every second, so it will have the most current information. Then it needs to cache the result and send them to the clients as needed. This method reduces the amount of executions in the database dramatically, as well as the application overhead and round trips, therefore the general performance is better and the application becomes more scalable.
These are only two examples for cases in which it is better to “leave the database alone” and do things on the application side (as oppose to what we, the DBAs, usually say). Let the database do all or most of the work is generally desired, but in some cases it is simply a waste of time and resources. When we analyze a system or design one, it is important to think about these scenarios and choose the right architecture that will be the best in terms of performance and scalability, and use the different components wisely.