Skip to main content

MySQL Globalization

In today's interconnected world, applications often need to support multiple languages and cultural formats. MySQL provides robust globalization features that enable developers to create databases that work seamlessly across different languages and regions. This guide will take you through MySQL's globalization capabilities, helping you build applications that can be used worldwide.

What is MySQL Globalization?

MySQL globalization refers to the set of features that enable MySQL to handle different languages, character sets, sorting rules, time zones, and regional formats. These capabilities are essential for creating applications that can be used by people worldwide, regardless of their language or location.

The main components of MySQL globalization include:

  • Character Sets: Encoding systems that define how characters are stored and represented
  • Collations: Rules that determine how characters are compared and sorted
  • Internationalization (i18n): Designing databases to handle various languages and regions
  • Localization (l10n): Adapting databases for specific languages and regional requirements

Character Sets and Encodings

Understanding Character Sets

A character set is a collection of symbols and encoding rules used to represent textual data. MySQL supports numerous character sets, with UTF-8 (specifically utf8mb4) being the recommended choice for most applications.

Common MySQL Character Sets

Character SetDescriptionBest Use Case
utf8mb4UTF-8 encoding with full Unicode support (includes emojis)Modern multilingual applications
latin1Western European languagesLegacy applications with Western European languages only
asciiOnly ASCII characters (0-127)When storage space is critical and only ASCII is needed
binaryBinary string storageFor storing binary data

Setting and Checking Character Sets

You can set character sets at various levels: server, database, table, column, or connection.

Server Level:

sql
-- Check server character set defaults
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

Database Level:

sql
-- Create database with specific character set
CREATE DATABASE multilingual_app
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

-- Change existing database
ALTER DATABASE multilingual_app
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

Table Level:

sql
-- Create table with specific character set
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
description TEXT
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Alter existing table
ALTER TABLE products
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

Column Level:

sql
-- Specify character set for individual columns
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
notes TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
);

Connection Level:

sql
-- Set character set for current connection
SET NAMES 'utf8mb4';
-- Or more explicitly
SET character_set_client = utf8mb4;
SET character_set_results = utf8mb4;
SET character_set_connection = utf8mb4;

Understanding Collations

Collations are rules that determine how characters are compared and sorted. They are crucial for operations like ORDER BY, GROUP BY, and string comparisons.

Collation Naming Convention

MySQL collation names follow a specific pattern:

character_set_language_sensitivity

For example, in utf8mb4_unicode_ci:

  • utf8mb4 is the character set
  • unicode indicates it follows Unicode rules
  • ci means case-insensitive (vs. cs for case-sensitive)

Common Collation Types

SuffixMeaningExample
_ciCase-insensitive'a' = 'A'
_csCase-sensitive'a' ≠ 'A'
_binBinarySorts by code points
_generalGeneral language rules
_unicodeUnicode Collation Algorithm

Practical Collation Example

sql
-- Create a multilingual products table
CREATE TABLE products (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
description TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
category VARCHAR(100) COLLATE utf8mb4_swedish_ci -- Swedish-specific sorting
);

-- Insert multilingual data
INSERT INTO products (name, description, category) VALUES
('Café espresso', 'Strong Italian coffee', 'Drycker'), -- "Drinks" in Swedish
('Grüne Tee', 'Organic green tea from Japan', 'Drycker'),
('Crème brûlée', 'Classic French dessert', 'Efterrätt'); -- "Dessert" in Swedish

-- Query with proper sorting (Swedish rules for category)
SELECT * FROM products ORDER BY category, name;

The result will show items sorted according to Swedish rules for the category column and Unicode rules for the name column.

Best Practices for Character Sets and Collations

  1. Use utf8mb4 for new applications: It supports the full Unicode character set, including emojis and special characters.

  2. Be consistent: Use the same character set and collation throughout your database schema to avoid conversion issues.

  3. Choose appropriate collations:

    • Use utf8mb4_unicode_ci for general international usage
    • Use language-specific collations (like utf8mb4_swedish_ci) when sorting according to specific language rules is important
  4. Set character sets at all levels:

    sql
    -- In your MySQL configuration file (my.cnf or my.ini)
    [client]
    default-character-set = utf8mb4

    [mysql]
    default-character-set = utf8mb4

    [mysqld]
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
  5. Check for conversion issues:

    sql
    -- Find columns with potential character set issues
    SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = 'your_database'
    AND CHARACTER_SET_NAME IS NOT NULL
    ORDER BY TABLE_NAME, COLUMN_NAME;

Time Zones and Date Handling

Setting MySQL Server Time Zone

The server time zone affects how TIMESTAMP values are stored and retrieved:

sql
-- Check current time zone setting
SELECT @@global.time_zone, @@session.time_zone;

-- Set global time zone
SET GLOBAL time_zone = 'UTC';

-- Set session time zone
SET time_zone = 'Europe/Paris';

Working with Different Time Zones

sql
-- Create a table storing timestamps
CREATE TABLE global_events (
id INT AUTO_INCREMENT PRIMARY KEY,
event_name VARCHAR(200),
event_time TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert an event
INSERT INTO global_events (event_name, event_time)
VALUES ('Virtual Conference', '2023-10-15 14:30:00');

-- Query the event in different time zones
SELECT
event_name,
event_time,
CONVERT_TZ(event_time, 'UTC', 'America/New_York') AS new_york_time,
CONVERT_TZ(event_time, 'UTC', 'Asia/Tokyo') AS tokyo_time,
CONVERT_TZ(event_time, 'UTC', 'Europe/London') AS london_time
FROM global_events;

Internationalization of MySQL Applications

Storing Translated Content

There are two common approaches for storing translations:

  1. Single Table with Language Code
sql
CREATE TABLE product_descriptions (
product_id INT,
language_code CHAR(2),
description TEXT,
PRIMARY KEY (product_id, language_code)
);

-- Insert translations
INSERT INTO product_descriptions VALUES
(1, 'en', 'Organic coffee from Colombia'),
(1, 'es', 'Café orgánico de Colombia'),
(1, 'fr', 'Café biologique de Colombie');

-- Query for specific language
SELECT description FROM product_descriptions
WHERE product_id = 1 AND language_code = 'es';
  1. Separate Tables for Each Language
sql
CREATE TABLE products_base (
id INT PRIMARY KEY,
price DECIMAL(10,2),
sku VARCHAR(50)
);

CREATE TABLE products_en (
product_id INT PRIMARY KEY,
name VARCHAR(200),
description TEXT,
FOREIGN KEY (product_id) REFERENCES products_base(id)
);

CREATE TABLE products_es (
product_id INT PRIMARY KEY,
name VARCHAR(200),
description TEXT,
FOREIGN KEY (product_id) REFERENCES products_base(id)
);

Handling Number and Currency Formats

MySQL provides functions to format numbers according to locale conventions:

sql
-- Create a table for products with international pricing
CREATE TABLE international_products (
id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);

-- Insert sample data
INSERT INTO international_products VALUES
(1, 'Laptop', 1299.99),
(2, 'Smartphone', 899.50);

-- Query with different number formats
SELECT
name,
price,
FORMAT(price, 2, 'en_US') AS us_format, -- 1,299.99
FORMAT(price, 2, 'de_DE') AS german_format, -- 1.299,99
CONCAT('$', FORMAT(price, 2)) AS us_currency,
CONCAT('€', FORMAT(price, 2, 'de_DE')) AS euro_currency
FROM international_products;

Real-world Application: Multilingual E-commerce Database

Let's design a simplified schema for a multilingual e-commerce site:

sql
-- Create database with proper character set and collation
CREATE DATABASE ecommerce_global
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

USE ecommerce_global;

-- Products table (language-neutral data)
CREATE TABLE products (
id INT AUTO_INCREMENT PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
price DECIMAL(10,2) NOT NULL,
image_url VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Product translations table
CREATE TABLE product_translations (
product_id INT NOT NULL,
language_code CHAR(2) NOT NULL,
name VARCHAR(200) NOT NULL,
description TEXT,
PRIMARY KEY (product_id, language_code),
FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE
);

-- Categories with translations
CREATE TABLE categories (
id INT AUTO_INCREMENT PRIMARY KEY,
parent_id INT NULL,
FOREIGN KEY (parent_id) REFERENCES categories(id) ON DELETE SET NULL
);

CREATE TABLE category_translations (
category_id INT NOT NULL,
language_code CHAR(2) NOT NULL,
name VARCHAR(100) NOT NULL,
PRIMARY KEY (category_id, language_code),
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE CASCADE
);

-- Product-category relationships
CREATE TABLE product_categories (
product_id INT NOT NULL,
category_id INT NOT NULL,
PRIMARY KEY (product_id, category_id),
FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE,
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE CASCADE
);

-- Orders with location-aware data
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
currency_code CHAR(3) NOT NULL DEFAULT 'USD',
exchange_rate DECIMAL(10,6) NOT NULL DEFAULT 1.0,
total_amount DECIMAL(10,2) NOT NULL,
country_code CHAR(2) NOT NULL,
time_zone VARCHAR(40) NOT NULL DEFAULT 'UTC'
);

Sample Data and Queries

sql
-- Insert products
INSERT INTO products (sku, price) VALUES
('LAPTOP-01', 1299.99),
('PHONE-01', 899.50);

-- Insert product translations
INSERT INTO product_translations VALUES
(1, 'en', 'Professional Laptop', 'High-performance laptop for professionals'),
(1, 'es', 'Laptop Profesional', 'Laptop de alto rendimiento para profesionales'),
(1, 'fr', 'Ordinateur Portable Professionnel', 'Ordinateur portable haute performance pour les professionnels'),
(2, 'en', 'Smartphone', 'Latest model smartphone with advanced features'),
(2, 'es', 'Teléfono inteligente', 'Teléfono de última generación con características avanzadas'),
(2, 'fr', 'Smartphone', 'Smartphone de dernière génération avec des fonctionnalités avancées');

-- Query products in different languages
SELECT
p.id,
p.sku,
p.price,
COALESCE(pt_en.name, pt_any.name) AS name_en,
COALESCE(pt_es.name, pt_any.name) AS name_es
FROM products p
LEFT JOIN product_translations pt_en ON p.id = pt_en.product_id AND pt_en.language_code = 'en'
LEFT JOIN product_translations pt_es ON p.id = pt_es.product_id AND pt_es.language_code = 'es'
LEFT JOIN product_translations pt_any ON p.id = pt_any.product_id
GROUP BY p.id;

-- Currency conversion example
SELECT
p.id,
pt.name,
p.price AS usd_price,
p.price * 0.92 AS eur_price,
p.price * 1.36 AS cad_price,
FORMAT(p.price, 2, 'en_US') AS formatted_usd,
FORMAT(p.price * 0.92, 2, 'de_DE') AS formatted_eur
FROM products p
JOIN product_translations pt ON p.id = pt.product_id
WHERE pt.language_code = 'en';

Solving Common Globalization Challenges

1. Converting Data Between Character Sets

sql
-- Check if there are any conversion issues
SELECT
CONVERT('Café' USING utf8mb4) AS original,
CHAR_LENGTH(CONVERT('Café' USING utf8mb4)) AS char_length;

-- Convert data from one character set to another
ALTER TABLE legacy_products MODIFY
product_name VARCHAR(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

2. Handling Sorting for Special Characters

sql
-- Create a table with names
CREATE TABLE contacts (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) COLLATE utf8mb4_unicode_ci
);

-- Insert names with special characters
INSERT INTO contacts (name) VALUES
('Anna'), ('Ánna'), ('Bella'), ('Carlos'),
('Çetin'), ('David'), ('Élodie'), ('François');

-- Different sorting behaviors
SELECT name FROM contacts ORDER BY name COLLATE utf8mb4_general_ci;
SELECT name FROM contacts ORDER BY name COLLATE utf8mb4_spanish_ci;
SELECT name FROM contacts ORDER BY name COLLATE utf8mb4_swedish_ci;

3. Migrating from latin1 to utf8mb4

sql
-- Step 1: Back up your database!

-- Step 2: Check current encoding
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'your_database'
AND DATA_TYPE IN ('varchar', 'char', 'text', 'enum', 'set')
ORDER BY TABLE_NAME, COLUMN_NAME;

-- Step 3: Modify character set at database level
ALTER DATABASE your_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Step 4: Modify each table
ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- Step 5: Check for any remaining latin1 columns
-- (Repeat the check query from Step 2)

Summary

MySQL's globalization features provide a robust foundation for building applications that work seamlessly across languages and regions. By properly implementing character sets, collations, and time zone handling, you can create databases that provide an excellent experience for users worldwide.

Key takeaways:

  1. Use utf8mb4 character set for new applications to ensure full Unicode support
  2. Choose appropriate collations based on your language requirements
  3. Be consistent with character sets and collations across your database
  4. Design your schema with internationalization in mind from the beginning
  5. Consider user experience factors like time zones and number formatting

With these practices, your MySQL applications will be ready to serve users from around the world with a localized, natural experience.

Additional Resources

  • MySQL Character Set Documentation: The official MySQL documentation covers character sets and collations in detail.
  • Unicode Consortium: For understanding the Unicode standard and its implementation.
  • Internationalization (i18n) Best Practices: Various resources on designing applications for global audiences.

Exercises

  1. Create a database for a blog that supports content in multiple languages.
  2. Write a query to display product prices in different currency formats based on a user's country.
  3. Implement a solution for storing and retrieving user profiles with names in different scripts (Latin, Cyrillic, Chinese, etc.).
  4. Create a date/time display function that shows event times in a user's local time zone.
  5. Design a schema for a multilingual documentation system where each document can exist in multiple language versions.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)