Python Introduction
What is Python?
Python is a high-level, interpreted programming language known for its simplicity and readability. Created by Guido van Rossum in the late 1980s, Python has grown to become one of the most popular programming languages in the world, particularly for data science, machine learning, web development, and automation.
Python's philosophy emphasizes code readability with its notable use of significant whitespace (indentation). Its syntax allows programmers to express concepts in fewer lines of code than languages like C++ or Java.
Why Learn Python for Data Analysis?
Before diving into Pandas (a powerful data analysis library), understanding Python fundamentals is essential because:
- Python is the foundation: Pandas is built on top of Python
- Syntax understanding: You'll need to know basic Python syntax to write Pandas code
- Customization: Advanced data analysis often requires custom Python logic
- Problem-solving: Python's broad capabilities help in cleaning and transforming data
Installing Python
To get started with Python, you'll need to install it on your computer. Visit python.org to download the latest version.
To check if Python is already installed, open your terminal or command prompt and type:
python --version
or
python3 --version
Python Basics
Comments
Comments in Python start with the #
symbol and are ignored by the interpreter:
# This is a comment
print("Hello World") # This is an inline comment
Variables and Data Types
Variables in Python don't require explicit type declaration:
# Integer
age = 25
# Float
height = 5.9
# String
name = "John Doe"
# Boolean
is_student = True
# List
scores = [95, 87, 92, 78]
# Dictionary
person = {"name": "John", "age": 25, "city": "New York"}
# Print variable types
print(type(age)) # <class 'int'>
print(type(height)) # <class 'float'>
print(type(name)) # <class 'str'>
print(type(scores)) # <class 'list'>
Basic Operations
Python supports various operations on different data types:
# Arithmetic operations
a = 10
b = 3
print(a + b) # Addition: 13
print(a - b) # Subtraction: 7
print(a * b) # Multiplication: 30
print(a / b) # Division: 3.3333...
print(a // b) # Floor division: 3
print(a % b) # Modulus: 1
print(a ** b) # Exponentiation: 1000
# String operations
first_name = "John"
last_name = "Doe"
full_name = first_name + " " + last_name # String concatenation
print(full_name) # Output: John Doe
# List operations
numbers = [1, 2, 3, 4]
print(numbers[0]) # Accessing by index: 1
numbers.append(5) # Adding an element
print(numbers) # Output: [1, 2, 3, 4, 5]
String Manipulation
Strings in Python are versatile and have many built-in methods:
text = "Hello, Python World!"
print(len(text)) # Length: 20
print(text.upper()) # HELLO, PYTHON WORLD!
print(text.lower()) # hello, python world!
print(text.replace("Hello", "Hi")) # Hi, Python World!
print(text.split(",")) # ['Hello', ' Python World!']
# String formatting
name = "Alice"
age = 30
print(f"My name is {name} and I am {age} years old.") # f-strings (Python 3.6+)
print("My name is {} and I am {} years old.".format(name, age)) # str.format()
Control Structures
Conditional Statements
x = 10
# If-else statement
if x > 10:
print("x is greater than 10")
elif x == 10:
print("x is equal to 10")
else:
print("x is less than 10")
# Output: x is equal to 10
Loops
# For loop
for i in range(5):
print(i, end=" ")
# Output: 0 1 2 3 4
print() # New line
# While loop
count = 0
while count < 5:
print(count, end=" ")
count += 1
# Output: 0 1 2 3 4
Functions
Functions in Python are defined using the def
keyword:
def greet(name):
"""This function greets the person passed in as a parameter"""
return f"Hello, {name}!"
# Calling the function
message = greet("Python learner")
print(message) # Output: Hello, Python learner!
# Function with default parameters
def add_numbers(a=0, b=0):
return a + b
print(add_numbers(5, 3)) # Output: 8
print(add_numbers(5)) # Output: 5
print(add_numbers()) # Output: 0
Real-World Python Application: Data Processing
Here's a practical example showing how Python can process data (a task you'll do more efficiently with Pandas later):
# Sample data: Monthly expenses
expenses = [
{"month": "January", "rent": 1000, "utilities": 150, "groceries": 350, "entertainment": 100},
{"month": "February", "rent": 1000, "utilities": 130, "groceries": 300, "entertainment": 120},
{"month": "March", "rent": 1000, "utilities": 140, "groceries": 310, "entertainment": 90}
]
# Calculate total expenses per month
for month_data in expenses:
total = sum(value for key, value in month_data.items() if key != "month")
month_data["total"] = total
print(f"{month_data['month']}: ${total}")
# Calculate average monthly expense for each category
categories = ["rent", "utilities", "groceries", "entertainment"]
for category in categories:
avg = sum(month[category] for month in expenses) / len(expenses)
print(f"Average {category}: ${avg:.2f}")
# Find the month with highest total expense
highest_month = max(expenses, key=lambda x: x["total"])
print(f"Month with highest expenses: {highest_month['month']} (${highest_month['total']})")
Output:
January: $1600
February: $1550
March: $1540
Average rent: $1000.00
Average utilities: $140.00
Average groceries: $320.00
Average entertainment: $103.33
Month with highest expenses: January ($1600)
This example demonstrates several Python concepts:
- List of dictionaries to store structured data
- List comprehensions for calculations
- For loops to iterate through data
- The
sum()
function to calculate totals - Dictionary manipulation
- String formatting to display results
Transitioning to Pandas
While Python's built-in data structures are powerful, analyzing large datasets becomes cumbersome. This is where Pandas comes in - it provides specialized data structures like DataFrame and Series that make data manipulation much more efficient.
Consider our expenses example above. In Pandas, the same analysis would be much more concise:
import pandas as pd
# Create a DataFrame
expenses_df = pd.DataFrame([
{"month": "January", "rent": 1000, "utilities": 150, "groceries": 350, "entertainment": 100},
{"month": "February", "rent": 1000, "utilities": 130, "groceries": 300, "entertainment": 120},
{"month": "March", "rent": 1000, "utilities": 140, "groceries": 310, "entertainment": 90}
])
# Set month as index
expenses_df.set_index("month", inplace=True)
# Calculate total expenses per month
expenses_df["total"] = expenses_df.sum(axis=1)
print(expenses_df)
# Calculate average monthly expense for each category
print("\nAverage monthly expenses:")
print(expenses_df.mean())
# Find month with highest total expense
highest_month = expenses_df["total"].idxmax()
highest_amount = expenses_df.loc[highest_month, "total"]
print(f"\nMonth with highest expenses: {highest_month} (${highest_amount})")
Summary
In this introduction to Python, we've covered:
- Basic Python syntax and data types
- Variables and operations
- String manipulation
- Control structures (conditionals and loops)
- Functions
- A real-world data processing example
- How Python connects to Pandas
These fundamentals will serve as building blocks as you learn Pandas for data analysis. Understanding Python's syntax and data structures will make your journey with Pandas much smoother.
Exercises
- Create variables with different data types (int, float, string, boolean, list, dictionary) and print their types.
- Write a function that takes a list of numbers and returns their average.
- Create a dictionary to store information about your favorite book (title, author, year published, etc.) and write code to print each key-value pair.
- Write a program that asks a user for their name and age, then prints a greeting message that includes how many years until they turn 100.
- Create a list of dictionaries representing different products with name and price, then write code to:
- Print all products over a certain price
- Calculate the total price of all products
- Find the most expensive and least expensive products
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)