Paul van Oordt SQL Services
Freelance SQL Server specialist and troubleshooter, Utrecht / Antwerpen
+31 627400408 - WhatsApp - email@example.com
LinkedIn - curriculum vitae
vanoordt.nl - Nederlands
Recommendations for SQL Server development
Below is a checklist for the development of a SQL Server application. This list can be used when you develop software yourself, or when assessing that of others. The rules below allow you to create a maintainable and well performing application with an optimally enforced quality of data.
As to my services, I am not available for extended development work, but I can train a development team or assist it in modelling and writing the algorithms and code. Experience learns that the sooner I am involved, the higher the gain and the less work that has to be redone. A fresh outside perspective may benefit a system a lot. This is especially the case when adapting existing software. Some ideas on how the model could be improved may be around for years, but they might just not be the best choice.
Typically a project for me is one or a couple of days. Knowledge transfer is always an important part of the work. See also the possibility of an in-house SQL Server crash course.
Planning and design
When you are planning to build a SQL Server application, be sure that you are well aware of the possibilities that the product offers. You have to decide on the features below in an early stage, as including them later will usually require big changes. When evaluating third party software this gives you an idea of the level of expertise of the supplier, and to what extent the application is updated to use new SQL Server features.
This is an incomplete list of features that you need to know and understand to some extent, or that you want the third party to have used where relevant.
- the system supplied data types and the possibility to create .NET user defined types
- nullability and defaults, computed columns, sequences and the identity property
- primary keys and unique constraints, foreign keys, and check constraints
- stored procedures, scalar and table valued functions, DML triggers, CLR integration, views
- transactions and isolation levels, row versioning
- server and database triggers
- (read only) filegroups, table partitioning and dropping or moving partitions
- file tables, temporal tables, external tables, synonyms
- logins, database users, server and database roles, application roles, schema's
- row level security, encryption, dynamic data masking
- full text search
- in memory OLTP
- service broker
Modelling and database design
Note: The rules below also apply when you are developing code first using an ORM like Entity Framework.
When modelling and creating databases and tables the following is recommended.
- Make quality, understandability and maintainability of your data (model) the highest priority.
- Normalise; make sure that every atomic fact is stored in the database exactly once. Prevent redundancy, other than redundancy that is managed by SQL Server itself, such as persisted computed columns, nonclustered indexes and indexed views.
- Avoid multiple tables with the same primary key.
- Denormalisation or duplicating data on another server are last resorts. Don't do this before all other possibilities to gain performance or scalability are exhausted. The added complexity will be a burdon for the rest of the applications life cycle.
- In principle use a single database for an application, avoiding complexity in database management, code version management, high availability and programmering. If useful apply schema's, row level security, (read only) filegroups, table partitioning, et cetera.
- The right use of data types and nullability simplifies your program code significantly, is good for performance, and makes the data model understandable for others. Always use the minimally fit data type, for example:
- no int or char(1), but bit for a boolean
- no datetime, but date for a date
- no varchar, but date, time of datetime for a date and/or time
- no int but smallint for a year and tinyint for a quarter, but of course these will usually be derived values (computed columns)
- no varchar, but char for values of fixed length such as a land code
- no nvarchar(4000), but for example nvarchar(200) for a person name; the used length of a data type may impact the estimated amount of memory for a query
- Don't use leading zero's such as in '0001' or other prefixes. If necessary do that in a computed column, and use a numerical data type for the number:
n int not null,
code as 'PR'+replicate('0', 8-len(cast(n as varchar(8)))) + cast(n as varchar(8)) -- PR00001234
- Never combine two attributes in a single column, for example corporate code and sequence number 'CORP-0001'. Use two columns and combine them if necessary in a computed column.
- Resist the temptation to misuse columns, such as code -1 means 'not used'. Specificy an extra column instead:
used bit not null default (1)
- Consider the possibility to use unicode instead of ASCII. That will solve all issues with ASCII codepages and the comparison of data from different regions, at the price of a double amount of storage for character data, two bytes instead of a single per character.
- Avoid the use of (n)text and image, use (n)varchar(max) and varbinary(max) instead.
- Use hierarchyid to represent hierarchical relations between records in a single table and get access to a number of useful predefined functions on hierarchies.
- Be strict with nullability, use defaults if required. Be sure however that you understand the logic of nulls very well.
- Use key, referential and check constraints wherever possible to guarantee data integrity. Programming against the data will be much easier as assumptions on regularities need not be tested, and exceptions (order lines without an order, non unique identification numbers) need not be dealt with. Besides constraints can benefit performance.
- Every table has a primary key. Use meaningless keys, for which there can be no reason to update them. Occasionally meaningful keys are acceptable, for instance a land code char(2) according to the ISO standard.
- Tables that implement an n:m relation can have a compound key consisting of the foreign keys and need not have another artificial key. This is the best option for understandability and performance. Note that the order of the key columns can be of importance for performance.
- Define unique keys where relevant.
- Use foreign keys for referential integrity. Consider cascade delete for FK's where that is useful and safe, for example a row from a n:m table can usually be deleted if the referenced row is deleted.
- Use check constraints to further limit the values within columns.
- Be sure that referential and check constraints are and remain trusted:
select object_name( parent_object_id) tab, name, is_not_trusted
select object_name( parent_object_id) tab, name, is_not_trusted
from sys. check_constraints;
- Use a naming convention, paying attention to
- use of explanatory names
- prefixing the names with the object type, like 'tbl', 'vw', 'usp' etc. ('sp_' is meant for procedures in the master-database, so don't use this one and neither use 'xp_', 'fn_', 'ms_' of 'if_')
- a verb in the procedure name, do or don't in a consistent manner
- table names in multiple or singular mode
- use of underscores, spaces, capitals, 'CamelCase'
- names of key columns, referencing columns, constraints
- English or another language. Reserved words can always be used with '[ ]':
create table dbo.[group]
The following is recommended to create clear, correct, maintainable and well performing T-SQL code.
Add comments on decisive points, not too little, and also not too much:
- Decide on a consistent way to access the database, for example using stored procedures for all queries including selects. This has the following advantages:
If you do not use stored procedures, parametrise statements using SqlCommand.Parameters.Add() (in c#). This way you avoid sending a lot of similar statements to SQL Server that all have to be compiled seperately.
- The T-SQL code can be found in a single place, the database and not in the application code.
- Reuse of code according to the principes of low coupling, high cohesion is easy.
- In most cases stored procedures perform better than ad hoc queries.
- Security can easily be done by granting permissions only to the procedures and not to the underlying tables.
- Use set based routines instead of procedural code. Use loops (including cursors) only when really necessary, for example when sending mails or writing files. Loops are one of the biggest performance killers in relational databases. Moreover they lead to code that is unnecessary long and complicated. In some cases loops can be avoided by joining with a 'nums' table. Also in some cases table valued parameters can be used to avoid loops. From the client application, don't pass a set of values as a string; use a table valued parameter.
- Put in a single query what can be put in a single query. To that end use for instance common table expressions (cte's) and the output clause.
Code can be divided into steps with cte's very well. This is much more efficient than the use of temporary tables to store intermediate results, as the statement including the cte will be optimized and executed as a single unit.
- When using temporary tables, add a primary key and apply the same rules as for ordinary tables (data types, nullability, constraints).
- Do not use table variables for big datasets. For table variables no statistics are maintained and the optimizer 'estimates' that there is a single row in the table. A table variable with many rows may therefor lead to a bad query plan.
- Be cautious with data types of variables and parameters. Use the data type of the column that it is compared with. When done in a sloppy way this can have big consequences for performance, because it may not be possible to use statistics. Avoid implicit data type conversions, in particular between varchar and nvarchar.
- The use of the output clause can simplify code with lesser chance for blocking and deadlocks. For example an output clause can be added to an insert .. select where the inserted data is used in the rest of the code.
- Use computed columns, possibly persisted, to store the logic of a derived attribute in the table, as in:
name as firstname+' '+isnull(prefix+' ', '') + familyname
- T-SQL is designed to work with relational data. It is less apt for such operations as string processing or big calculations. Consider using .NET assemblies executed within the SQL Server process space for such tasks, or perform the task on the client.
- Avoid index hints in operational code. These will overrule the optimizer, now and in the future. Besides it may lead to errors when when the index is ever dropped.
- Use the nolock hint or the read uncommitted isolation level only if you are really prepared to read uncommitted or even wrong data.
It may be better to consider the read committed snapshot option to prevent readers and writers blocking each other.
- Write triggers so that they are valid for an arbitrary number of affected rows. To do that join with the inserted and deleted tables. Do not use a cursor for this.
- Use transactions to group statements that need to be executed as a logical unity. Be aware that every query can fail due to reasons outside its control, for isntance a deadlock, and that the resulting database state should be clear and consistent. Put the transaction within a 'try .. catch ..' to do the error handling including rollback.
- Keep transactions as short as possible. Don't do work inside a transaction that can be done outside of it, such as checking the validity of parameters.
- Use the proper isolation level for transactions. Sometimes it is necessary to keep data locked from the moment it is first read within a transaction, for example when doing a money transfer and first checking for adequate balance. In this case the default isolation level of read committed is not good enough. Instead the following level should be used: repeatable read if the balance is stored in a single updatable field, or serializable when the balance is calculated from a set of records.
- Check the default isolation level that your client software uses. Sometimes that is needlessly high, leading to impaired concurrency. The isolation level of existing connections can be seen as follows (refer to SQL Docs for decoding the column transaction_isolation_level):
select session_id, program_name, client_interface_name, transaction_isolation_level
where is_user_process = 1;
- Be cautious with user defined scalar functions. These are executed per row and often lead to bad performance. From SQL Server 2019 this problem is alleviated with 'scalar UDF inlining'. Also multi-statement table valued functions usually perform badly.
- Do not use code from internet unless you understand exactly what happens or completely trust the source. Many scalar functions circulate, for instance for date formatting, that are real performance killers. Incorrect or outdated code is also quite common.
- Use 'yyyymmdd' to enter a date. This always works independent of the 'set dateformat' option.
- In the case of dynamic SQL, do not use exec, but use sp_executesql with explicit parameters. This allows query plans to be cached and protects against SQL injection.
- Preferably use 'with schemabinding' when creating views, functions and triggers.
- Take care of a consistent and clear layout of your code. This improves readability and lessens the chance for mistakes:
- Name the schema with every object reference, thus avoiding misunderstandings, unexpected results and also gain a (limited) performance advantage.
- Use explicit column lists with inserts and updates, so that adding a column to a table never leads to failing code.
- Qualify all columns with the schema and table name or the table alias (unless trivial).
- Use the join syntax with the 'join' keyword, instead of the outdated ANSI syntax.
- Use optional keywords such as 'inner' join in a consistent way.
- Use brackets in the where clause if 'and' and 'or' are used both, just to exclude all possibility of doubt.
- Use a consistent way to indent code and spread it over multiple rows.
- Apply capitals of keywords in a consistent way.
- End statements with ";".
- Document the purpose of the procedure, function etc, as in:
-- reports defaulters
- Provide meta information such as author, date, status, and modification history. Do this in a systematic way, as in:
-- PvO, 20180806
-- PvO, 20190220, added parameter @treshold
- Also add a comment to dynamic sql, so that it is possible to trace back a piece of dynamic sql to the code that generates it, as in:
declare @cmd nvarchar(max) = N'/* generated by proc p */ select .. '
- In larger pieces of code it can be useful to specify the conditions that apply for an 'else':
if @custid is null and @showall = 1
else -- @custid is not null or isnull(@showall, 0) = 0
- For every cte (or temporary table with intermediate results), specify the content of the set, as in:
with UI -- unpaid invoices
as (select ... )
, C2 -- customers with > two reminders in the past year
as (select ... )
, UIC2 -- unpaid invoices of customers with > two reminders in the past year
as (select ... )
- Assumptions that are not checked or enforced in the data model, as in:
-- string contains '<br/>' at least every other 200 characters
-- date is not smaller than 1 minute before getdate()
- Choices made because of performance, as in:
-- rows are inserted with dummy values for column C to prevent growth after an update
- Whether a covering index was created especially for the query. When the query is changed it is clear that also the index may need to be changed:
-- for this query: covering index Tbl1.Idx1
Most of the recommendations above not only benefit data quality and correct and maintainable code, but are also beneficial for performance. Neglecting these principles because of performance is usually a bad idea. Some extra recommendations to write well performing code:
- Index tables when writing queries (and not only if there is a performance problem in production).
- Never create an index without checking what there already is. SQL Server doesn't prevent you from creating two identical, or otherwise redundant indexes. Often an existing index may be modified by adding key- or included columns.
- Determine whether an index fragments and whether this is bad for some queries. If so, possibly set a fill factor and include the index in a defragmenting routine.
- Test the queries that you write for performance.
- An easy way to see performance data is using the following set options in the Management Studio:
set statistics io, time on;
- Have a look at the actual query plan, and handle any warnings and missing indexes.
- For more information see my checklist SQL Server performance tuning.
(c) Paul of Oordt, vanoordt.nl