Python Data Types
When working with Pandas, you'll be handling lots of data. Before diving into Pandas' complex data structures, it's essential to understand Python's built-in data types that form the foundation of data manipulation.
Introduction
Python is a dynamically-typed language, which means you don't need to declare a variable's type when creating it. Python automatically determines the data type based on the assigned value. Understanding these basic data types is crucial because Pandas builds upon them to create more complex data structures.
Let's explore the fundamental data types you'll encounter when working with Python and later with Pandas.
Numeric Data Types
Python has three numeric data types that you'll use regularly.
Integers
Integers are whole numbers without a decimal point.
# Integer examples
age = 25
count = -10
zero = 0
print(type(age))
print(age)
Output:
<class 'int'>
25
Floating-Point Numbers
Floating-point numbers (floats) are numbers with a decimal point.
# Float examples
height = 5.11
temperature = -2.5
pi_value = 3.14159
print(type(height))
print(height)
Output:
<class 'float'>
5.11
Complex Numbers
Complex numbers have a real and imaginary part, with the imaginary part written with a "j" suffix.
# Complex number examples
complex_num = 3 + 4j
print(type(complex_num))
print(complex_num)
print(complex_num.real) # Access the real part
print(complex_num.imag) # Access the imaginary part
Output:
<class 'complex'>
(3+4j)
3.0
4.0
String Data Type
Strings are sequences of characters enclosed in quotes (single, double, or triple quotes).
# String examples
name = 'John'
message = "Python is fun"
multi_line = """This is a
multi-line
string"""
print(type(name))
print(name)
print(message)
print(multi_line)
Output:
<class 'str'>
John
Python is fun
This is a
multi-line
string
Common String Operations
Strings in Python come with many built-in operations:
text = "Python for Data Analysis"
# String methods
print(text.upper())
print(text.lower())
print(text.split())
print(len(text))
print('Data' in text) # Check if 'Data' exists in the string
Output:
PYTHON FOR DATA ANALYSIS
python for data analysis
['Python', 'for', 'Data', 'Analysis']
24
True
Boolean Data Type
Boolean values represent either True
or False
. They are commonly used for logical operations and control flow.
# Boolean examples
is_active = True
has_permission = False
print(type(is_active))
print(is_active)
print(has_permission)
# Boolean operations
print(True and False) # Logical AND
print(True or False) # Logical OR
print(not True) # Logical NOT
Output:
<class 'bool'>
True
False
False
True
False
Collection Data Types
Python offers several collection data types that are essential for data manipulation, especially when working with Pandas.
Lists
Lists are ordered, mutable collections that can contain items of different data types.
# List examples
numbers = [1, 2, 3, 4, 5]
mixed_list = [1, "Hello", 3.14, True]
print(type(numbers))
print(numbers)
print(mixed_list)
# Accessing elements
print(numbers[0]) # First element
print(numbers[-1]) # Last element
print(numbers[1:3]) # Slicing (elements at index 1 and 2)
# Modifying lists
numbers.append(6) # Add an element
numbers.extend([7, 8]) # Add multiple elements
numbers.remove(3) # Remove an element
numbers.insert(2, 10) # Insert an element at index 2
print(numbers)
Output:
<class 'list'>
[1, 2, 3, 4, 5]
[1, 'Hello', 3.14, True]
1
5
[2, 3]
[1, 2, 10, 4, 5, 6, 7, 8]
Tuples
Tuples are ordered but immutable collections that can contain items of different data types.
# Tuple examples
coordinates = (10, 20)
mixed_tuple = (1, "Hello", 3.14)
print(type(coordinates))
print(coordinates)
# Accessing elements
print(coordinates[0])
print(coordinates[1])
# Tuples are immutable, so this will raise an error:
# coordinates[0] = 5 # TypeError: 'tuple' object does not support item assignment
# Tuple unpacking
x, y = coordinates
print(f"x: {x}, y: {y}")
Output:
<class 'tuple'>
(10, 20)
10
20
x: 10, y: 20
Dictionaries
Dictionaries are unordered collections of key-value pairs. They are mutable and highly efficient for lookups.
# Dictionary examples
person = {
"name": "John",
"age": 30,
"city": "New York"
}
print(type(person))
print(person)
# Accessing values
print(person["name"])
print(person.get("age")) # Safe way to access (returns None if key doesn't exist)
# Modifying dictionaries
person["email"] = "[email protected]" # Add a new key-value pair
person["age"] = 31 # Update a value
print(person)
# Dictionary methods
print(person.keys()) # Get all keys
print(person.values()) # Get all values
print(person.items()) # Get all key-value pairs
Output:
<class 'dict'>
{'name': 'John', 'age': 30, 'city': 'New York'}
John
30
{'name': 'John', 'age': 31, 'city': 'New York', 'email': '[email protected]'}
dict_keys(['name', 'age', 'city', 'email'])
dict_values(['John', 31, 'New York', '[email protected]'])
dict_items([('name', 'John'), ('age', 31), ('city', 'New York'), ('email', '[email protected]')])
Sets
Sets are unordered collections of unique elements. They are useful for membership testing and eliminating duplicate entries.
# Set examples
fruits = {"apple", "banana", "orange", "apple"} # Note: duplicates are automatically removed
print(type(fruits))
print(fruits) # Note how duplicates are removed
# Set operations
fruits.add("grape") # Add an element
fruits.remove("banana") # Remove an element
print(fruits)
# Set operations
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
print(set1.union(set2)) # Elements in either set
print(set1.intersection(set2)) # Elements in both sets
print(set1.difference(set2)) # Elements in set1 but not in set2
Output:
<class 'set'>
{'apple', 'orange', 'banana'}
{'apple', 'orange', 'grape'}
{1, 2, 3, 4, 5, 6}
{3, 4}
{1, 2}
None Type
Python has a special data type None
that represents the absence of a value or a null value.
# None example
value = None
print(type(value))
print(value)
print(value is None) # The correct way to check for None
Output:
<class 'NoneType'>
None
True
Data Type Conversion
Python allows you to convert between different data types using built-in functions.
# Type conversion examples
num_str = "42"
num_int = int(num_str) # Convert string to integer
num_float = float(num_str) # Convert string to float
back_to_str = str(num_int) # Convert integer to string
print(num_str, type(num_str))
print(num_int, type(num_int))
print(num_float, type(num_float))
print(back_to_str, type(back_to_str))
# Converting collections
list_example = [1, 2, 3, 2, 1]
tuple_from_list = tuple(list_example)
set_from_list = set(list_example) # Removes duplicates
print(tuple_from_list)
print(set_from_list)
Output:
42 <class 'str'>
42 <class 'int'>
42.0 <class 'float'>
42 <class 'str'>
(1, 2, 3, 2, 1)
{1, 2, 3}
Practical Application: Student Data Management
Let's see how these data types might be used together to manage student data, which is a common scenario before using Pandas for more advanced data management.
# Student data management using Python data types
students = [
{
"id": 1,
"name": "Alice",
"scores": [85, 90, 78],
"grade": "A",
"active": True
},
{
"id": 2,
"name": "Bob",
"scores": [75, 82, 79],
"grade": "B",
"active": True
},
{
"id": 3,
"name": "Charlie",
"scores": [92, 95, 88],
"grade": "A",
"active": False
}
]
# Calculate average score for each student
for student in students:
avg_score = sum(student["scores"]) / len(student["scores"])
student["average"] = round(avg_score, 2)
print(f"{student['name']}'s average score: {student['average']}")
# Find active students with an A grade
a_grade_active = [s["name"] for s in students if s["grade"] == "A" and s["active"]]
print(f"Active students with A grade: {a_grade_active}")
# Store unique grades
grades = {s["grade"] for s in students}
print(f"Unique grades: {grades}")
# Create a dictionary mapping student IDs to names
id_to_name = {s["id"]: s["name"] for s in students}
print(f"ID to name mapping: {id_to_name}")
Output:
Alice's average score: 84.33
Bob's average score: 78.67
Charlie's average score: 91.67
Active students with A grade: ['Alice']
Unique grades: {'A', 'B'}
ID to name mapping: {1: 'Alice', 2: 'Bob', 3: 'Charlie'}
This example showcases how different Python data types work together. We used:
- Lists to store the collection of student records
- Dictionaries for each student's data
- Lists inside dictionaries for storing scores
- Strings for names and grades
- Booleans for tracking active status
- List comprehensions and dictionary comprehensions to extract specific data
Relevance to Pandas
Understanding these Python data types is crucial for working with Pandas because:
-
Pandas' DataFrame columns often correspond to these basic Python types:
- Numbers (int, float) for numerical data
- Strings for text data
- Booleans for binary/logical data
-
Pandas' Series objects are similar to Python lists but with additional functionality.
-
Pandas' DataFrames can be conceptualized as dictionaries of Series (columns).
-
When extracting data from Pandas objects, you'll often get these Python types as results.
-
Converting between different data types is a common operation in data analysis.
Summary
In this lesson, we covered Python's fundamental data types:
- Numeric Types: Integers, floating-point numbers, and complex numbers
- Strings: Text sequences with various operations
- Booleans: True/False values for logical operations
- Collections: Lists, tuples, dictionaries, and sets
- None Type: Representing the absence of a value
- Type Conversion: Converting between different data types
These data types form the building blocks for more complex data structures in Pandas. Understanding how they work and interact with each other will greatly simplify your journey into data analysis with Pandas.
Exercises
- Create a list containing different data types (string, integer, float, boolean).
- Write a function that counts the occurrences of each data type in a list.
- Create a dictionary to store information about a book (title, author, year, genres).
- Use a set to find unique words in a paragraph.
- Create a nested data structure representing a simple e-commerce order with customer details, products, and quantities.
Additional Resources
- Python Official Documentation on Data Types
- Real Python: Python Data Types
- Python Data Types Visualization
- W3Schools Python Data Types
By mastering these fundamental data types, you're now ready to explore how Pandas extends them to create powerful data structures for data analysis.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)