Newsletter October 2021

After two previous newsletters I sent before the summer, here's a new one with hopefully some more interesting topics from my practice.

If you don't want to receive the newsletter, let me know.

Best regards,

Paul van Oordt

The use of cursors and other loops

Most developers know that using cursors and while loops does not lead to high-performing code. Normally these constructs are unnecessary, and usually they are a sign of insufficient familiarity with set-based thinking and programming. Any developer writing SQL code should be well versed in set-based thinking and programming and resist the temptation to reduce the complexity of the problem by looping over the rows within a set. The SQL language is rich enough to insert, update or delete multiple rows in one statement, even in complex cases. SQL Server is designed to run this kind of code efficiently, and generally does so much faster than repeatedly executing small statements, even without server roundtrips.

Nevertheless, there are situations where the use of cursors or other loops is necessary or desirable. A loop is necessary, for example, if you want to send emails from SQL Server to addresses stored in a table. Calling the procedure sp_send_dbmail for each of the rows can only be done in a loop, not in a single statement.

Using a loop results in slower code, but can still be useful for data conversions. As an example, I will denormalize the order amount in a very large order table as the sum of the associated order lines. Whether this is desirable is another discussion. The following also applies to other, less simple data conversions, and also, for example, to archiving data.

Populating the new orders.amount field for the whole table can be done easily with this statement:

with Q as
(select O.orderid, sum(L.amount) amount
  from orders O
    left join orderlines L on O.orderid = L.orderid
  group by O.orderid)
update orders
  set amount = Q.amount
  from Q
  where orders.orderid = Q.orderid;

This code is written quickly, and executes faster than the alternatives, but has one major drawback: locking. The entire update takes place within one transaction and both tables are completely locked for the duration of the transaction. That means there will need to be a maintenance window for this operation. Even then it is risky, because when the end of the window is in sight, the query cannot just be stopped. Everything done until then needs to be rolled back, which may take a very long time.

Using a cursor over orders, this problem can be avoided. The denormalized amount is written per order, as well as a bit that indicates whether the denormalization has been carried out for this order.

update orders
  set amount = 
       (select sum(orderlines.amount) 
          from dbo.orderlines 
          where orderid = @orderid)
     ,amount_denormalized = 1 
  where orderid = @orderid;

The bit field amount_denormalized is read by any routine that wants to know the order amount. The code must be adapted anyway as a result of the model change, and in this way it always works, whether the data conversion has been carried out completely, not at all or only partially. The conversion code can easily be stopped and resumed by including amount_denormalized = 0 in the cursor definition. Locking is only very brief and there is no expensive rollback. The entire data conversion will take longer with a cursor than with a single statement, but with a cursor it is completely 'online'. Regular operations are not affected by the data conversion and can continue to run normally.

Obviously, an index on orderlines.orderid with amount incuded is necessary to properly handle this workload. That index is recommended in any case, especially when the amount is not denormalized.

Another day at somebody's office

Fortunately, I can work again together with the customer behind a laptop or with a few people in a room with a large screen. That's the way I prefer to work for one or a few days. On the customer's premisses. I use to say I do not intend to hang around with a client for long, and I mean that. I am generous in sharing my knowledge, and whatever I can show you how to do, I have no desire to do myself. I'd prefer when you call me in again if you have a new issue that you need help with. So that I can give you another insight into what SQL Server can do, how it works under the hood, and how, based on my experience with hundreds of systems, I would approach the issue.

Troubleshooting together with the customer, analyzing performance problems, thinking through the possibilities of new functionality, these are intense days. Together we sit behind that laptop or in that room, and we are in a constant dialogue. I tell what I'm doing, why I'm looking here now, that I want to check this, really want to write that out and test it because the devil is in the details. You tell me what the requirements are, what can be found where, what is most important, what you have already thought of and figured out yourself. It is fun, intense and educational.

We alternate breadth with depth of exploration. Without breadth there is the danger of overlooking an easy solution. And without depth, it all remains 'possible' and 'in principle'. With performance issues, I usually take at least one query and show you completely how it can be optimized through indexing and/or rewriting. So that you can do that yourself for the entire system. When we think about new functionality, we make a proof of principle together.

For over 20 years I have been working with SQL Server with about 200 customers and it remains a pleasure to create neat new code or to clean, improve and simplify existing software, to leave behind a smooth whirring system. I will be please to visit you, either from Utrecht or from Antwerp.

At Kegro Doors, Groesbeek.

October 2021; the newsletter appears a couple of times per year; subscribe / unsubscribe