Newsletter May 2023

The biggest problems with performance and scalability arise from not using SQL Server optimally. On my site you will find a comprehensive list of recommendations for SQL Server programming. In this newsletter I will discuss the division of tasks between application and database code.

If you don't want to receive my newsletter, let me know.

Best regards,

Paul van Oordt

Stay on the server with stored procedures

'PL/SQL is Oracle's answer to client-server architecture,' I once read, in the late '80s, and I didn't understand much about it. Let me explain. 😉

In the client-server format, an application consists of several communicating processes, usually on separate computers. One of the (server) processes implements the data layer, and that process can be programmed with PL/SQL or T‑SQL as a procedural extension to SQL. Earlier this was not necessary, because the data was available in-process and could be directly accessed by Cobol or PL/1. Such software is still in use, by the way, and if you convert this 1-to-1 to a client-server architecture, you get a very poorly performing system, because often database calls are made per row. This is the worst case scenario for communicating with the database.

It is better to build SQL queries as strings in the application code and send them to the database server. The results are placed in, for example, a DataGrid object and are then available in the application. SQL Server performs set-based queries, and in that sense it uses the power of the database server. But there is also a lot left unused.

There is a sort of automated variant of this in the form of object-relational mappers (ORM), or code first. The calls to the database process are no longer written by a programmer, but generated by a tool such as Entity Framework. The generated code is hard to read, however, and hard to tune. Anyway, the simplicity of ORMs is somewhat deceptive. The data model and queries still need to be carefully thought through in order to obtain a well-functioning and scalable data layer.

For all of the above options there is an unnecessary amount of inter-process traffic, both in terms of round trips and in terms of the amount of data that goes over the line.

This is much improved by running data-related tasks, such as handling a payment, entirely in the database server process. Programming is then done in T‑SQL and the code is stored in the database as a stored procedure. The advantages being:

no unnecessary round trips between client and server
no unnecessary data over the line
transactions remain as short as possible and are started and ended within one procedure call
complex data operations are performed by the process that specializes in this: SQL Server with its advanced optimizer, indexes, statistics, monitoring and (auto-)tuning features, leading to optimal performance, scalability and manageability
sharp interfaces in the form of procedure calls with parameters, return code and data sets
through this minimal coupling coding and refactoring, tuning, testing and documenting can all be done in isolation
the data can be protected by making it accessible only through procedures (I will write more about this another time)

Many systems started with no real knowledge of SQL Server or T‑SQL. And sometimes, using an ORM, even practically without such knowledge. That can be a great way to begin, but it can also easily become a limiting factor for growth. A lot of applications that I encounter in my practice bump against this limit. Reprogramming parts of the logic in T‑SQL then becomes an interesting option.

If you consider this step, or if you want more insight into the performance and scalability or the limitations of your current system, I am happy to assist you. We can, for example, code some functionality in T‑SQL as a proof of principle. I like to share my knowledge, both on the job or in a classroom.

And triggers?

In addition to procedures, there are stored queries, also known as views, and stored functions. Inline table valued functions are sort of parameterized views and, like views, can be useful for reusing code and hiding complexity. Scalar and multi-statement table valued functions can have adverse effects on performance and should be used with caution.

And then there are triggers, modules with T-SQL code just like stored procedures, that execute on an insert, update and/or delete on a table. These also should be used sparingly. One of the problems is that they are difficult to test - which is precisely one of the advantages of procedures. Triggers are the recommended instrument to manage data denormalization, but denormalization itself is often not a good idea and rarely necessary.

Triggers can also be used to program business logic, but it is not my choice. Suppose we have a basic application for a library where a member is simply removed from the table when a membership ends. There are two rules: If the member still has books at home, the operation cannot be carried out, and any book reservations are removed. This can be programmed entirely server-side in a delete trigger. A rollback is done if there are still open loans for the member, and the relevant lines are removed from the reservations table. My preference, however, is to do all this in a procedure where the logic is immediately visible, testable, tuneable and documentable. See the simplified code below.

Thanks for reading.


create procedure dbo.member_delete @memberid int
as
if exists (select * from dbo.loan where memberid = @memberid)
begin
  raiserror('member has books at home', 16, 1);
  return(-1);
end;
begin try
  begin tran
    delete dbo.reservation
      where memberid = @memberid;
    delete dbo.member
      where memberid = @memberid;
    raiserror('member deleted', 10, 1);
  commit tran;
end try
begin catch
  rollback tran;
  throw;
end catch;

May 2023; the newsletter appears a couple of times per year; subscribe / unsubscribe