Database

System Design

Database

Database Indexing

Database Indexing

Imagine you have a large library with thousands of books. Let’s now say a client walks in and requests a book by its title. Without any organization, you would have to search all the books until you find the one you are looking for. This process is inefficient and time-consuming.

However, if the library has a well-designed indexing system it gets very easy to locate any book. The indexing system will contain a catalog that lists all the books in alphabetical order along with their corresponding shelf numbers.

By consulting the index, you can easily pinpoint the exact location of a book request saving you time and effort.

What is database Indexing?

Database Indices work in a similar way. In the analogy above, the library represents a database, the books represent the data stored and the library index represents a database index.

A database index is a powerful tool that enables you to quickly retrieve records from a database file. An index is a data structure that improves the speed of the data retrieval process. An index in a database is also very similar to a book index.

Computer Example

Let’s say you have a large database that has a user’s table which contains information for about 1 million users.

ID

FirstName

LastName

Email

Country

1

James

Omollo

[email protected]

Kenya

2

Gbemisola

Oluwasegun

[email protected]

Nigeria

3

Trishna

Pranav

[email protected]

India

4

Esther

Coleman

[email protected]

UK

5

Zara

Shahidi

[email protected]

USA

Assuming Zara is the last record on our table when you query the database for her details it will take a long time since you will have to traverse through all the 1 million records(full table scan).

If the data was ordered alphabetically using the first name, searching could happen faster since we could jump down to halfway through the data to see if Zara comes before or after that row.

We could then half the remaining rows and make the same comparison by use of binary search. Indices help us create a sorted list without really creating sorted tables which would take a lot of space.

Let’s create an index and see how it maps back to the user table using the “FirstName” column. In this example, we will use Postgres Database.

CREATE INDEX users_firstName ON users(FirstName);

FirstName

ID

Esther

5

Gbemisola

2

James

1

Trishna

3

Zara

4

The Index has the user’s first names sorted in alphabetical order. When you perform the SQL Query below to get a user’s record using FirstName as the filter, the database can swiftly locate the relevant rows, rather than scanning the entire table sequentially.

The key will be used to quickly find the rows that match a specific FirstName. This indexing technique significantly improves query performance, especially when dealing with larger datasets.

SELECT FirstName From Users WHERE FirstName = 'Zara';

Remember, indexing is not limited to a single column. You can create Indices on multiple columns or even compound Indices that span across multiple columns, depending on your specific use cases and query patterns.

Index architectures

1. Clustered Index

Popular SQL databases like SQL servers automatically create a clustered index if a constraint or primary key is defined on a table. Each table can only have one clustered index based on the primary key.

The PK defines the order in which the data is physically stored on a disk. Use the SQL script below to create a user table and insert data in an SQL Server Database.

    CREATE TABLE users (
        ID INTEGER PRIMARY KEY NOT NULL,
        FirstName VARCHAR(50) NOT NULL,
        LastName VARCHAR(50) NOT NULL,
        Email VARCHAR(100) NOT NULL,
        Country VARCHAR(50) NOT NULL
    );
    INSERT INTO users (ID, FirstName, LastName, Email, Country) VALUES
           (2, 'Gbemisola', 'Oluwasegun', '[email protected]', 'Nigeria'),
           (3, 'Trishna', 'Pranav', '[email protected]', 'India'),
     (1, 'James', 'Omollo', '[email protected]', 'Kenya'),
           (5, 'Zara', 'Shahidi', '[email protected]', 'USA'),
           (4, 'Esther', 'Coleman', '[email protected]', 'UK')

When you query the database:

SELECT * FROM users

We get the user’s data sorted ascending by the ID(PK). Remember we did not insert the data ascending by the PK.

The database automatically creates a clustered index. Run the following command to confirm.

 Execute sp_helpindex users;

A clustered index determines the physical order of the data rows in a table. However, inserts and updates can be more expensive compared to non-clustered Indices, as they may require data to be physically rearranged.

2. Non-Clustered Index

Our library analogy is a good example of a non-clustered index. The data is stored in one place, the index in another place. The index contains pointers to where the data is stored. Since a non-clustered stores data separate from actual data, the table can contain more than one clustered index.

Just like how a book can have an index by chapters at the beginning and another index at the end. Inserts and updates on non-clustered Indexes are generally faster than on clustered Indices, as they don’t require rearranging the data.

Non-clustered Indexes are commonly used to optimize query performance for specific columns that are frequently accessed.

Types of Database Indices Implementations

Databases Indices make use of data structures and algorithms to access indexed data efficiently. The most popular ones are:

1. Binary-Tree(B-Tree)

This is the most popular database index data structure. It organizes data in a self-balancing tree structure, typically a binary search tree. It provides efficient searching, insertion, and deletion operations by keeping the tree balanced and minimizing the number of disk access required.

2. Hash

Hash indexes use a hash function to map keys to specific locations in the index. The hash function distributes the data uniformly across the index, allowing for fast lookup operations.

Hashing architectures are efficient for equality searches, but they may suffer from collisions, where multiple keys hash to the same location, requiring additional handling.

Benefits of database Indices

  1. Improves query performance for Frequently accessed columns: If certain columns are frequently used in queries, using searching, filtering, or join operations, it’s beneficial to index those columns to retrieve data quickly and improve query performance.

  2. Faster Data Retrieval for Large Tables: Indexing is very useful for large tables that contain a significant amount of records. Without using Indices, querying large tables can be very slow and resource intensive.

  3. Improved Scalability: Indexing improves scalability in a database system. As the amount of data grows, Indices provides a way to efficiently access and retrieve data ensuring that query performance remains consistent even with large datasets.

Drawbacks of Database Indices

While database Indexing offers numerous benefits, it has some the following drawbacks:

  1. Overhead during Data Modification: When performing data modifications such as creating, updating, or delete in a table that has an index, the Indices need to be updated to reflect the data changes. These operations incur some overhead. The more Indices a database has, the longer it takes to perform data modifications, potentially affecting write operations.

  2. Increased Storage Space: Indices require additional storage space to store the index data structures. Indices can consume a significant amount of disk space and the space requirement increases proportionally with the number of indexed columns and the size of indexed data.

Conclusion

Database indexing is an important technique you can add to your toolbox to improve query performance and data retrieval in databases. When you create Indices on frequently accessed columns, the database can quickly map and retrieve data based on the specified conditions resulting in faster execution times.

Indexing is particularly beneficial in large tables, complex queries involving multiple tables joins, and frequently accessed tables. It’s important to carefully consider columns and queries that would benefit most from indexing. Creating Indices for every column can result in unnecessary overhead and additional costs.

Whenever you're ready

There are 4 ways we can help you become a great backend engineer:

The MB Platform

Join 1000+ backend engineers learning backend engineering. Build real-world backend projects, learn from expert-vetted courses and roadmaps, track your learnings and set schedules, and solve backend engineering tasks, exercises, and challenges.

The MB Academy

The “MB Academy” is a 6-month intensive Advanced Backend Engineering BootCamp to produce great backend engineers.

Join Backend Weekly

If you like post like this, you will absolutely enjoy our exclusive weekly newsletter, Sharing exclusive backend engineering resources to help you become a great Backend Engineer.

Get Backend Jobs

Find over 2,000+ Tailored International Remote Backend Jobs or Reach 50,000+ backend engineers on the #1 Backend Engineering Job Board