SQL Pt 2: Data Types

Janu Sung
7 min readJan 24, 2021

Intro

If you’re working with the backend of an application, a well-designed database is essential for efficient and effective queries. Part of this design is thinking about the type of data the database will store. This article is an overview of some commonly seen datatypes one can use within a SQL relational database.

Data types define the characteristics of the data that can be stored in a table’s column. It defines the possible set of values that are accepted — this allows for data within the column to be stored in a consistent and understood format. By knowing the data type within a column, it allows users to know which calculations and formulations can be applied to values within the column. Furthermore, it allows the database to sort the data in a way that is logical, increasing efficiency.

Data types affect storage and performance.

Different data types require different space allotments — some take up a lot of space, others not so much. This space allotment directly impacts the processing speed of the database — by appropriately defining the type of data used, the less time the database has to infer values or convert them, effectively increasing performance value.

For smaller databases, this may not seem too important, however, when dealing with larger databases with thousands of users, these small degrees of difference summate to large performance losses or gains.

The process of ‘right-sizing’ goes further into this concept and aims to appropriately set the data type for a given field based on business questions and the organization’s future direction — essentially answering the question: is this data type the appropriate type for the needs of this application/business? In order to provide the right answer, we need to know what datatypes are available and their unique characteristics.

String Data Types

String data types only convey text info, and thus is appropriately also known as an ‘alphanumerical’ data type. They cannot perform computations or calculations, and must be written within quotes “”.

CHAR (byte size) — Character Data Type

  • Is a fixed storage data type: this means that depending on how the user sets the byte size, the value within the field will always carry a byte size as defined by the user.

Example:

CHAR(5) → will set whatever value is placed within the field to a size of 5 bytes, even if the value doesn’t require the full 5 bytes. Conversely, it will only allow values that require 5 bytes or less.

‘James’ → has a length of 5 and therefore takes up 5 bytes of size and will be accepted as a field value.

‘Bob’ → has a length of 3 and requires 3 bytes of size, however it will be set to take up 5 bytes of size.

‘Timothy’ → has a length of 7 and exceeds the 5 bytes of size allotted to the field and therefore will not be stored.

  • CHAR can store a maximum value of 255 bytes and tends to be faster than VARCHAR, but less responsive.

VARCHAR (byte size) — Variable Character Data Type

  • Is a variable storage data type: this means a user can set a maximum byte size for the value, however, unlike CHAR, will only take up the necessary amount of space required to store the value.

Example:

VARCHAR (5) → will set a maximum byte size to 5 bytes and will accept any value within that limit.

‘Angie’ → has a size of 5 bytes and will be accepted as a field value taking up 5 bytes of space.

‘Beth’ → has a size of 4 bytes and will be accepted as a field value taking up 4 bytes of space.

‘Serena’ → has a size of 6 bytes and will not be accepted as a field.

  • VARCHAR can store a maximum value of 65,535 bytes and tends to be more responsive to data insertion but slower on retrieval.

NVARCHAR (byte size) — Variable Character Data Type for UNICODE

  • Is similar to VARCHAR, however can support multiple languages. It can store up to 8,000 bytes, with each UNICODE character taking two bytes (twice the size of VARCHAR), leaving a maximum of 4,000 characters.

ENUM (pre-set val-1, pre-set val-2, … pre-set val-n) — Enumerate Data Type

  • Will provide pre-set values for the field. The value must be within the provided values or will not be accepted.

Example:

Lets say a user is filling out a gender column and can opt for ‘male’, ‘female’ or ‘neutral’.

ENUM (‘M’, ‘F’, ‘N’) → will only allow for “M’, ‘F’, ’N’ as possible values. Any other value will throw an error.

Integer Data Types

An ‘integer’ is considered a whole number, negative or positive, without any decimals. While discussing these data types, we should keep in mind the two ways of defining the range of the data type: signed and unsigned.

Signed: implies negative values are included.

Unsigned: implies that negative values are excluded.

*queries are default set to signed

*We’ll get to decimals in the next section on Fixed and Floating-point Datatypes.

TINYINT

  • Size allotment: 1
  • Signed range: -128 to 127
  • Unsigned range: 0 to 255

SMALLINT

  • Size allotment: 2
  • Signed range: -32,768 to 32,767
  • Unsigned range: 0 to 65,535

MEDIUMINT

  • Size allotment: 3
  • Signed range: -8,388,608 to 8,388,607
  • Unsigned range: 0 to 16,777,215

INT

  • Size allotment: 4 bytes
  • Signed range: -2,147,483,648 to 8,388,607
  • Unsigned range: 0 to 16,777,215

BIGINT

  • Size allotment: 8 bytes
  • Signed range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
  • Unsigned range: 0 to 18,446,744,073,709,551,615

Fixed and Floating-point Data Types

A ‘fixed’ or ‘floating-point’ contains a decimal. Fixed point data represents exact values and sets an exact space allotment similarly to CHAR. Floating point data represents an approximation and sets a maximum space allotment similarly to VARCHAR. Both are fractional representations of a number and are defined by two traits: precision and scale.

Precision: refers to the number of digits within the number (including after the decimal).

Scale: refers to the number of digits within the number after the decimal.

DECIMAL (precision, scale) or NUMERIC (precision, scale)

  • The decimal data type is synonymous with the numeric data type and are fixed point data types. They will always take up the # of digits described in both precision and scale.

Example:

DECIMAL (5,3) → Entered Value: 10.5 → will return → 10.500

  • Since precision is set to 5, the total number of digits must amount to 5, and similarly since scale is set to 3, two zeros are added to the end of the entered value in order to fulfill this condition.

DECIMAL (5,3) → Entered Value: 10.57869 → will return → 10.579

  • In this instance, the entered value exceeds the allotted precision and scale, therefore the value will be rounded to a number that fits both conditions.

*when only a single value is entered in the parenthesis, it will treat the scale value as zero → DECIMAL (5) = DECIMAL (5,0)

FLOAT (precision, scale) and DOUBLE (precision, scale)

  • Both the float and double data types are floating point data types. They will represent an approximation, it aims to deliver a balance between range and precision. Double is similar to float, except that it allows for much larger numbers.

FLOAT:

  • Takes 4 bytes to be stored = size
  • Precision: single (based off size)
  • Maximum # of digits: 23

DOUBLE :

  • Takes 8 bytes to be stored = size
  • Precision: double (based off size)
  • Maximum # of digits: 53

Time Data Types

DATE

  • Used to represent a date in the format YYY-MM-DD
  • Time is not included in this representation
  • Range of DATE:
  • 1st of January 1000–31st of December 9999

DATETIME

  • Used to represent date + time
  • YYY-MM-DD HH:MM:SS
  • These data points represent date + time precisely

TIMESTAMP

  • Used for a well-defined, exact point in time
  • Range of TIMESTAMP:
  • 1st of January 1970 UTC — 19th of January 2038, 3:14:07 UTC (Coordinated Universal Time)
  • Timestamp records the moment in time as the number of seconds passed after the 1st of January 1970 00:00
  • Representing a moment in time as a number allows you to easily obtain the difference between two TIMESTAMP values
  • So if you have a start time and end time, you can find the difference to find the duration of the event
  • TIMESTAMP is appropriate if you need to handle time zones

*Time data types must be writing within quotes “”

Special Mention: BLOB

The BLOB datatype stands for Binary Large Object — it refers to a file of binary data with 1’s and 0’s. It can store a maximum size of 65,535 bytes.

The BLOB data type is very helpful for saving files in your database, to be used later on in your application, like a profile picture or music file.

Such files include:

  • .doc
  • .xlsx
  • .XML
  • .jpg
  • .wav

All these data types might seem like a jumble of ‘things’ but the sooner you start to use them within your applications, the sooner you’ll start to see their role within the design of your database and the efficiency/capability of your backend as a whole. Hope this overview of some common SQL data types helps you out in your next app and growth.

--

--

Janu Sung

Just another one of those dreamers with a sparkle in his eyes.