Neo4j Basics

Introduction

Neo4j is a popular graph database management system that belongs to the NoSQL family. Unlike traditional relational databases that store data in tables, Neo4j uses a graph structure to represent and store data. This makes it exceptionally powerful for managing highly connected data and performing complex relationship-based queries.

In this tutorial, we'll explore the fundamentals of Neo4j, understand its core concepts, and learn how to perform basic operations using the Cypher query language.

What is a Graph Database?

At its core, a graph database stores data in nodes (entities) and relationships (connections between entities). This structure closely mirrors how we naturally think about many domains - as objects connected by various relationships.

Key Components of Neo4j

Nodes - Represent entities (like a person, place, or thing)
Relationships - Connect nodes and have a type and direction
Properties - Key-value pairs that can be attached to both nodes and relationships
Labels - Used to group nodes into sets

Let's visualize a simple graph model:

Setting Up Neo4j

Before diving into code examples, you'll need to set up Neo4j. The easiest way to get started is to use Neo4j Desktop or Neo4j Sandbox.

Using Neo4j Desktop

Download and install Neo4j Desktop
Create a new project
Add a database to your project
Start the database

Using Neo4j Sandbox

Neo4j Sandbox provides a free, temporary Neo4j instance in the cloud:

Visit Neo4j Sandbox
Create an account or sign in
Launch a blank sandbox

Basic Cypher Query Language

Cypher is Neo4j's query language, designed to be visually intuitive and easy to understand. Let's explore some basic operations.

Creating Nodes

To create a simple node:

CREATE (n:Person {name: 'John', age: 30})

This creates a node with the label Person and two properties: name and age.

To create multiple nodes at once:

CREATE (a:Person {name: 'Alice', age: 25}),
       (b:Person {name: 'Bob', age: 27}),
       (c:Company {name: 'Acme', founded: 2010})

Creating Relationships

To create relationships between nodes:

MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:FRIEND_OF]->(b)

This creates a FRIEND_OF relationship from Alice to Bob.

You can also create nodes and relationships in a single query:

CREATE (a:Person {name: 'Alice', age: 25})-[:WORKS_AT]->(c:Company {name: 'Acme'})

Querying Data

To retrieve all Person nodes:

MATCH (p:Person)
RETURN p

To find specific nodes:

MATCH (p:Person {name: 'Alice'})
RETURN p

To query relationships:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN p.name, c.name

Example output:

╒══════════╤══════════╕
│ p.name   │ c.name   │
╞══════════╪══════════╡
│ "Alice"  │ "Acme"   │
└──────────┴──────────┘

Updating Properties

To update node properties:

MATCH (p:Person {name: 'John'})
SET p.age = 31
RETURN p

Deleting Nodes and Relationships

To delete a relationship:

MATCH (a:Person {name: 'Alice'})-[r:FRIEND_OF]->(b:Person {name: 'Bob'})
DELETE r

To delete a node (and all its relationships):

MATCH (p:Person {name: 'John'})
DETACH DELETE p

Practical Example: Building a Movie Recommendation System

Let's build a simple movie recommendation database to demonstrate Neo4j's power with connected data.

Step 1: Create the Schema

// Create Movie nodes
CREATE (matrix:Movie {title: 'The Matrix', released: 1999, tagline: 'Welcome to the Real World'})
CREATE (cloudAtlas:Movie {title: 'Cloud Atlas', released: 2012, tagline: 'Everything is Connected'})
CREATE (forrestGump:Movie {title: 'Forrest Gump', released: 1994, tagline: 'Life is like a box of chocolates'})

// Create Person nodes
CREATE (keanu:Person {name: 'Keanu Reeves', born: 1964})
CREATE (tomHanks:Person {name: 'Tom Hanks', born: 1956})
CREATE (halleBerry:Person {name: 'Halle Berry', born: 1966})

// Create relationships
CREATE (keanu)-[:ACTED_IN {roles: ['Neo']}]->(matrix)
CREATE (keanu)-[:ACTED_IN {roles: ['Robert Frobisher', 'Hae-Joo Chang']}]->(cloudAtlas)
CREATE (tomHanks)-[:ACTED_IN {roles: ['Forrest Gump']}]->(forrestGump)
CREATE (tomHanks)-[:ACTED_IN {roles: ['Dr. Henry Goose', 'Isaac Sachs']}]->(cloudAtlas)
CREATE (halle)-[:ACTED_IN {roles: ['Luisa Rey', 'Jocasta Ayrs']}]->(cloudAtlas)

Step 2: Query for Movie Recommendations

Now, let's find movie recommendations based on actors:

// Find movies that Keanu Reeves acted in
MATCH (keanu:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(movie:Movie)
RETURN movie.title

// Find co-actors (people who acted in the same movies as Keanu)
MATCH (keanu:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(movie:Movie)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name, movie.title

// Find recommendations: Movies that co-actors acted in but Keanu didn't
MATCH (keanu:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(movie:Movie)<-[:ACTED_IN]-(coActor:Person),
      (coActor)-[:ACTED_IN]->(recommendation:Movie)
WHERE NOT (keanu)-[:ACTED_IN]->(recommendation)
RETURN DISTINCT recommendation.title, coActor.name

Example output of the last query:

╒═════════════════╤═══════════════╕
│ recommendation  │ coActor.name  │
╞═════════════════╪═══════════════╡
│ "Forrest Gump"  │ "Tom Hanks"   │
└─────────────────┴───────────────┘

Advanced Features

Indexes

Indexes improve query performance. To create an index:

CREATE INDEX FOR (p:Person) ON (p.name)

Constraints

Constraints ensure data integrity. To create a uniqueness constraint:

CREATE CONSTRAINT FOR (m:Movie) REQUIRE m.title IS UNIQUE

Path Finding

Neo4j excels at finding paths between nodes:

// Find the shortest path between two actors
MATCH p=shortestPath((bacon:Person {name: 'Kevin Bacon'})-[*]-(meg:Person {name: 'Meg Ryan'}))
RETURN p

When to Use Neo4j

Neo4j is particularly well-suited for:

Social networks - Modeling users, friendships, follows, etc.
Recommendation engines - "People who bought X also bought Y"
Fraud detection - Identifying suspicious patterns in connected data
Knowledge graphs - Representing complex domains with many entity types and relationships
Network and IT operations - Modeling infrastructure dependencies

Summary

In this tutorial, we've explored the fundamentals of Neo4j and graph databases:

Neo4j represents data as nodes (entities) and relationships
Cypher is a powerful query language for working with graph data
Basic operations include creating, reading, updating, and deleting nodes and relationships
Neo4j excels at traversing relationships and finding paths between entities

Graph databases like Neo4j offer a flexible and intuitive way to model and query highly connected data. As your applications grow and require more complex relationship-based queries, Neo4j provides a powerful alternative to traditional relational databases.

Additional Resources

Practice Exercises

Extend the movie database by adding more movies, actors, and relationship types (like DIRECTED, PRODUCED, etc.)
Create a query to find all actors who have worked with the same director more than once
Build a small social network model with users, posts, comments, and friendships
Implement a query to recommend friends based on mutual connections
Try modeling a real-world domain of your choice (e.g., a transportation network, organization chart, or product catalog)

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is a Graph Database?​

Key Components of Neo4j​

Setting Up Neo4j​

Using Neo4j Desktop​

Using Neo4j Sandbox​

Basic Cypher Query Language​

Creating Nodes​

Creating Relationships​

Querying Data​

Updating Properties​

Deleting Nodes and Relationships​

Practical Example: Building a Movie Recommendation System​

Step 1: Create the Schema​

Step 2: Query for Movie Recommendations​

Advanced Features​

Indexes​

Constraints​

Path Finding​

When to Use Neo4j​

Summary​

Additional Resources​

Practice Exercises​