What can you do design a database to be “performance friendly” to reports? In an ideal world you would like to have a perfectly normalized database, but it isn’t an ideal world. To support reports you often need to consider the following changes to your database design:
-
Take advantage of database features. Each database implements joins, indices, SQL Select statement execution, and access paths in slightly different ways. All of these things influence the performance of your queries and thus your reports. This sort of basic knowledge is taught in database certification courses and I highly recommend that all Agile DBAs become certified on the database technologies that they work with. The disadvantage is that this approach helps to lock you into your database vendor because you come to rely on the unique features that they provide.
-
Introduce aggregate tables. An aggregate table stores denormalized copies of data. For example, a CustomerOrders aggregate table would store summary totals of the orders for customers. There would be one row for each customer which records the number of orders placed, the number of shipments made, the grand total of the orders for that customer, and so on. The primary disadvantage is that you need to maintain the aggregate table.
-
Remove unnecessary data. The smaller the amount of data to process, the faster your queries will run. By removing unnecessary data, either by archiving it or simply deleting it, you can improve the performance of your reports. The disadvantages are that you need to develop an archiving/deletion strategy and in the case of archiving you also need to develop a date recovery process to support retrieval of archived data.
-
Caching. Caches, either of data or of objects, can dramatically improve your system performance by replacing relatively slow disk accesses with memory accesses. The disadvantages are the increased complexity and the increased chances of cross schema referential integrity problems introduced by caches.
-
Partition a table. The goal is to take a large table, which results in poor performance, and reorganize it into several smaller tables. Tables can be partitioned vertically by storing different columns in each table as well as horizontally by storing collections of rows in different tables. Combinations are also possible. The primary disadvantages are that partitioning will complicate your mapping efforts (Chapter 11) and queries can become more complex because you need to work with several tables to support a single concept.
-
Disallow real-time reports. Many organizations choose to only support batch reporting against databases to ensure that report queries do not interfere with operational applications, to ensure consistent performance levels within data marts and data warehouses, and to reserve update windows for those databases.
-
Introduce indices. If a report needs to obtain data in a different order in which it is stored a common way to support this is to introduce an index that access the data in the required order. The disadvantage is that this slows down run-time performance due to the need to update the additional index.