Microsoft* SQL Server* Tuning Guide for Online Transaction Processing workload on 4th Generation Intel® Xeon® Scalable Processors Based Platform

ID 766472
Updated 11/7/2022
Version Latest
Public

author-image

By

Introduction

This guide is targeted towards users who are already familiar with Microsoft* SQL Server* and provides pointers and system settings for hardware and software that will provide the best performance for most situations. However, please note that we rely on the users to carefully consider these settings for their specific scenarios, since Microsoft SQL Server can be deployed in multiple ways, and this is a reference to one such use-case.  

Microsoft SQL Server is a relational database management system developed by Microsoft. SQL Server is offered in many different editions, but this guide will focus on the SQL Server Enterprise edition. SQL Server’s database architecture is based on a Client-Server architecture model. This tuning guide will focus on Online Transaction Processing (OLTP) workloads (TPCE like) while using SQL Server for Windows*.

4th Gen Intel® Xeon® Scalable processors deliver industry-leading, workload-optimized platforms with built-in AI acceleration, providing a seamless performance foundation to help speed data’s transformative impact, from the multi-cloud to the intelligent edge and back. Improvements of particular interest to this workload applications are:

  • Enhanced Performance
  • More Intel® Ultra Path Interconnect (Intel® UPI)
  • Increased DDR5 Memory Speed & Capacity

 

The hardware and software used in testing the server configuration for this tuning guide:

Hardware Memory 32 * 64GB Hynix DDR5, 4400MT/s
I/O Controllers 4x RS3SC008 Intel Raid
Disks

1 x Intel® 1.9TB SSD OS Drive

10 x Intel® SC2KG019T7 SSD (1.9TB) Backups

8 x Intel® SC2KG960G8 SSD (960GB) Log

5 x Intel® SC2BA800G3 SSD (800GB) Tempdb

2 x Intel® SC2BB016T4 SSD (1.6TB) mdf

85 x Intel® SC2KG960G8 SSD (960GB) DB

NIC 2x Intel® Ethernet Network Adapter X520 Series 210G BASE-T
Software Operating System Microsoft Windows Server 2022 Datacenter
Kernel 10.0.20348 Build 20348
SQL Version 2022.160.1000.6
Workload Kit TPCE Kit 1041
Benchcraft 2.5.6

 

Note The configuration described in this article is based on 4th Generation Intel Xeon processor hardware. Server platform, memory, hard drives, network interface cards can be determined according to customer usage requirements.

Hardware

BIOS

Please note, that all BIOS settings outlined below are based on Intel’s Software Development Platforms designed for the 4th Gen Intel Xeon Scalable processors. Begin by resetting your BIOS to default setting, then follow the suggestion below for changes:

Configuration Item Recommended Value
EDKII Menu -> Socket Configuration -> Processor Configuration -> Hardware Prefetcher Disabled
EDKII Menu -> Socket Configuration -> Processor Configuration -> Adjacent Cache Prefetch Disabled
EDKII Menu -> Socket Configuration -> Processor Configuration -> DCU Streamer Prefetcher Enabled
EDKII Menu -> Socket Configuration -> Processor Configuration -> DCU IP Prefetcher Enabled
EDKII Menu -> Socket Configuration -> Processor Configuration -> LLC Prefetch Disabled
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU P State Control -> Energy Efficient Turbo Disabled
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU P State Control -> Turbo Mode Enabled
EDKII Menu -> Socket Configuration -> UP Configuration -> UPI General Configuration -> Boot Performance Mode Max perf
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> Hardware PM State Control -> Hardware P State Disabled
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU - Advanced PM Tuning -> Energy Perf BIAS -> ENERGY_PERF_BIAS_CFG_mode Performance
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU - Advanced PM Tuning -> Energy Perf BIAS -> Workload Configuraiton I /O sensitive
EDKII Menu -> Socket Configuration -> Memory Configuration -> Memory RAS Configuration -> Patrol Scrub Disabled
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU C State Control -> Package C State Co\C1 state
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU C State Control -> Enhanced Halt State (C1E) Disable
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU C State Control -> CPU C6 Report Disable
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU C State Control ->CPU C1 auto demotion Disable
EDKII Menu -> Socket Configuration -> Advanced Power Management Configuration -> CPU C State Control -> CPU C1 auto undemotion Disable
EDKII Menu -> Socket Configuration -> Common RefCode Configuration -> UMA-Based Clustering Disable (All2All)
EDKII Menu -> Socket Configuration -> Uncore Configuration -> Uncore General Configuration -> SNC (Sub NUMA) Enable SNC2 (2-clusters)

 

Memory Configuration

This workload runs best with 2 DIMMS per channel populated with 4400 MTS DDR5 registered memory. For specifics consult your platform’s manual.

 

Storage

The table below details the type and number of drives used. Each type of drive sets (DB, Temp DB, etc.…) are organized into RAID volumes, with the DB drives being evenly split into RAID0 volumes per IO controller. The only exception to this is the LOG volume which is configured as RAID10. The following are the recommended settings for the RAID volumes and can be set via the Intel® RAID Web Console 3 for Windows.

Drive Purpose RAID Level Stripe Size (KB) Read Ahead Write Back Cache Policy Disk Cache Policy
DB 0 64 No Write Through Enabled
Temp DB 0 64 No Write Through Enabled
Backups 0 64 Yes Write Through Enabled
Log 10 64 No Write Back Enabled

Network

This workload runs best with the following settings for all adapters connecting the Server to the Client:

The following PowerShell script is used as an example with ‘Ethernet 2’(say):

Set-NetAdapterAdvancedProperty -Name "Ethernet 2" -RegistryKeyword '*JumboPacket' –

RegistryValue '9014'

Set-NetAdapterRss

-Name "Ethernet 2" -Enable 1 -BaseProcessorGroup 0 -BaseProcessorNumber 0 -MaxProcessors 16 -NumaNode 65535 -MaxProcessorGroup 0 -MaxProcessorNumber 16 -NumberOfReceiveQueues 8  -Profile Closest

 

In addition, the following must be set in SQL Server using the SQL Server sp_configure command.

sp_configure network_packet_size,8192

go

RECONFIGURE WITH OVERRIDE

Go

 

Software Tuning

Software configuration tuning is essential. From the Operating System to SQL Server configuration settings, they are all designed for general purpose applications and default settings are almost never tuned for best performance.

 

Microsoft Windows Server* 2022 settings

Configuration Item Recommended Value
Windows Firewall Control Panel -> Windows Firewall -> Tunr Windows Firewall on or off (left side of panel) -> select Turn Off Windows Firewall for both private and public networks
Visual Performance Setting Computer -> properties -> advanced system settings -> advanced -> performance -> settings -> visual: adjust best performance
Virtual Memory Setting  Computer -> properties -> Advanced tab, virtual memory: change custom size: 4096 intial and maximum (not system managed)
Power Profile Control Panel -> Hardware and Sound -> Power Options -> High Performance Option
Large Page Enable Large Page Enable (Reboot to take effect):  Control Panel -> Administrative tools -> local security policy -> local policy -> user rights assignment -> Lock pages in memory -> properties -> add user or group -> Administrators (or the user to be used with SQL Server)
Remove Windows Defender PowerShell command:  Remove-WindowsFeature - Name Windows-Defender-Features

 

OLTP Architecture

The TPC-E like workload consists of transactions that simulate the interchanges commonly associated with brokers, customers, and a real-time stock exchange. These transactions are intended to represent a balanced mixture of disk input/output and CPU usage. Performance is measured in transactions per second and reported as tpsE.

Example of an OLTP benchmarking hardware configuration:

 

 

 

Tuning SQL Server for OLTP Workload

The following sp_configure commands should be used to configure SQL Server:

sp_configure show_advanced_options,1

go

RECONFIGURE WITH OVERRIDE

go

sp_configure backup_compression,1

go

RECONFIGURE WITH OVERRIDE

go

sp_configure "default trace enabled",0

go

RECONFIGURE WITH OVERRIDE

go

sp_configure lightweight_pooling,1

go

RECONFIGURE WITH OVERRIDE

go

sp_configure max_degree_of_parallelism,1

go

RECONFIGURE WITH OVERRIDE

go

sp_configure max_serv,<This number should be equal to 90% of system memory>

go

RECONFIGURE WITH OVERRIDE

go

sp_configure max_worker_threads,3000

go

RECONFIGURE WITH OVERRIDE

go

sp_configure priority_boost,1

go

RECONFIGURE WITH OVERRIDE

go

sp_configure recovery_interval,32767

go

RECONFIGURE WITH OVERRIDE

go

sp_configure remote_query_timeout,0

go

RECONFIGURE WITH OVERRIDE

go

sp_configure set_working_set_size,1

go

RECONFIGURE WITH OVERRIDE

go

sp_configure network_packet_size,8192

go

RECONFIGURE WITH OVERRIDE

go

 

The following flags should be used on the command line when starting up SQL Server prior to running the OLTP workload.

Sqlservr.exe -x -c -T827 -T652 -T661 -T834 -T3502 -T3979 -T8040 -T8095 -T8099 -T8088 -T8744 -T9038 -T8101

 

Related Tools

Microsoft Windows Performance Monitor tool (Perfmon) can be used to monitor overall system and disk performance metrics.

 

Best Practices for Testing and Verification

This workload is best run just after a fresh restoration of the database from backup and restart of the system. This workload should drive CPU Utilization to 99% across all logical processors. CPU kernel utilization should be around 12%. This is an I/O intensive workload that is characterized by approximate 850K IOPs, Random Access, and a 90/10 Read/Write ratio. The clients for this workload initiate transactions with 500+ simultaneous connections to the system under test with 500+ transactions in flight. This workload is more sensitive to memory speed than capacity. Typically, this workload reaches peak performance with memory size equal to about 3% of the database size. For verification of the database integrity, refer to the TPCE kit mentioned in resources for details. The TPCE kit provides verification tools suitable for an audit per the TPCE benchmark spec.

 

Conclusion

We have shared our best-known methods to optimally benchmark our 4th Generation Intel Xeon Scalable Processors using an OLTP benchmark. We have covered both software and hardware configuration considerations to get the best performance.

 

Additional Resources

Microsoft SQL Server

Transaction Processing Council

TPCE benchmark kit version 1.14.0