MySQL Index Selectivity
In your journey to master MySQL performance optimization, understanding index selectivity is crucial. This concept can dramatically impact how efficiently your queries run and ultimately determine whether your database performs like a sports car or crawls like a turtle in rush hour traffic.
What is Index Selectivity?
Index selectivity (or cardinality) refers to the ratio of unique values in an indexed column to the total number of rows in the table. It's expressed as a number between 0 and 1, where:
- High selectivity (closer to 1): Many unique values relative to total rows
- Low selectivity (closer to 0): Few unique values relative to total rows
The formula for calculating index selectivity is:
Selectivity = Number of Unique Values / Total Number of Rows
Why Index Selectivity Matters
MySQL's query optimizer uses selectivity to determine whether to use an index for a query. Indexes with higher selectivity are generally more efficient because they narrow down the result set more effectively.
Consider this analogy: Finding a person by their unique national ID number (high selectivity) versus finding all people with brown hair (low selectivity). The first search is much more targeted!
Measuring Index Selectivity in MySQL
Let's look at how to check the selectivity of your indexes:
SELECT
COUNT(DISTINCT column_name) / COUNT(*) AS selectivity
FROM
table_name;
For example, to check the selectivity of an email column:
SELECT
COUNT(DISTINCT email) / COUNT(*) AS email_selectivity
FROM
customers;
Output example:
+-----------------+
| email_selectivity |
+-----------------+
| 0.9973 |
+-----------------+
This result (0.9973) indicates very high selectivity, making it an excellent candidate for indexing.
High vs. Low Selectivity Examples
High Selectivity Columns
Columns with high selectivity typically include:
- Primary keys - Always unique, so selectivity = 1
- Email addresses - Usually unique for each user
- Username - Often unique by design
- Phone numbers - Generally unique per customer
Creating indexes on these columns is very effective:
CREATE INDEX idx_customer_email ON customers(email);
Low Selectivity Columns
Columns with low selectivity include:
- Status flags (active/inactive) - Usually just 2 values
- Gender - Typically few distinct values
- Category types - Often limited set of values
- Boolean fields - Only true/false values
For example, a gender
column might have a selectivity like this:
SELECT
COUNT(DISTINCT gender) / COUNT(*) AS gender_selectivity
FROM
users;
Output example:
+------------------+
| gender_selectivity |
+------------------+
| 0.0003 |
+------------------+
With such low selectivity, an index on this column alone would be inefficient in most cases.
Composite Index Selectivity
Sometimes combining low selectivity columns can create a more selective index:
CREATE INDEX idx_category_status ON products(category_id, status);
To measure composite index selectivity:
SELECT
COUNT(DISTINCT CONCAT(category_id, ':', status)) / COUNT(*) AS composite_selectivity
FROM
products;