Kubernetes Jobs

Introduction

In Kubernetes, most workloads are designed to run continuously (like web servers or databases), but there are scenarios where you need to run a task that completes and terminates. This is where Kubernetes Jobs come in.

A Job creates one or more Pods that run until they successfully complete their assigned tasks. Jobs are perfect for batch processing, data migrations, sending emails, or any task that needs to run once to completion rather than continuously.

Unlike regular Pods or Deployments that restart when they complete, a Job tracks task completion and ensures the specified number of successful completions is achieved before considering the Job done.

Basic Job Concepts

Before diving into examples, let's understand the core concepts of Kubernetes Jobs:

One-time execution: Jobs run tasks that complete and terminate, rather than running indefinitely
Completion tracking: A Job tracks the successful completion of Pods
Parallelism: Jobs can run multiple Pods in parallel
Retries: Jobs can automatically retry failed Pods
Completion count: Jobs can be configured to run a specific number of completions

Creating Your First Job

Let's start with a simple Job that runs a Pod to print a message and then exits.

apiVersion: batch/v1
kind: Job
metadata:
  name: hello-job
spec:
  template:
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["echo", "Hello from Kubernetes Job!"]
      restartPolicy: Never

Let's save this file as hello-job.yaml and apply it to our cluster:

kubectl apply -f hello-job.yaml

You can check the status of your Job using:

kubectl get jobs

Output:

NAME        COMPLETIONS   DURATION   AGE
hello-job   1/1           8s         12s

To see the logs from the Pod created by the Job:

kubectl get pods --selector=job-name=hello-job

Output:

NAME              READY   STATUS      RESTARTS   AGE
hello-job-abcd1   0/1     Completed   0          2m

And to see the output of the Job:

kubectl logs $(kubectl get pods --selector=job-name=hello-job -o jsonpath='{.items[0].metadata.name}')

Output:

Hello from Kubernetes Job!

Job Completion and Parallelism

Jobs can be configured to run multiple times or in parallel. This is useful for batch processing tasks that can be parallelized.

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-job
spec:
  completions: 5      # Run 5 successful pods
  parallelism: 2      # Run up to 2 pods in parallel
  template:
    spec:
      containers:
      - name: worker
        image: busybox
        command: ["sh", "-c", "echo Processing item $RANDOM && sleep 5"]
      restartPolicy: Never

In this example:

completions: 5 means the Job will create Pods until 5 of them have successfully completed
parallelism: 2 means the Job will run up to 2 Pods at the same time

Job Timeouts and Retries

It's a good practice to set timeouts for your Jobs to prevent them from running indefinitely if they encounter issues.

apiVersion: batch/v1
kind: Job
metadata:
  name: job-with-timeout
spec:
  backoffLimit: 4           # Number of retries before job is considered failed
  activeDeadlineSeconds: 100 # The job can run for a maximum of 100 seconds
  template:
    spec:
      containers:
      - name: worker
        image: busybox
        command: ["sh", "-c", "echo Starting work && sleep 60 && echo Work completed"]
      restartPolicy: OnFailure

In this example:

backoffLimit: 4 specifies that the Job will retry failed Pods up to 4 times
activeDeadlineSeconds: 100 means the entire Job will be terminated if it runs for more than 100 seconds, regardless of its completion state
restartPolicy: OnFailure tells Kubernetes to restart the Pod if it fails, rather than creating a new one

Real-World Example: Database Migration Job

Let's look at a practical example: using a Job to run a database migration.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  template:
    spec:
      containers:
      - name: migration
        image: flyway/flyway:latest
        args:
        - -url=jdbc:postgresql://my-db-service:5432/mydb
        - -user=postgres
        - -password=secretpassword
        - migrate
        env:
        - name: FLYWAY_LOCATIONS
          value: filesystem:/flyway/sql
        volumeMounts:
        - name: migration-scripts
          mountPath: /flyway/sql
      volumes:
      - name: migration-scripts
        configMap:
          name: db-migration-scripts
      restartPolicy: Never
  backoffLimit: 2

This Job uses Flyway (a database migration tool) to apply migration scripts to a PostgreSQL database. The migration scripts are stored in a ConfigMap for easy management.

Job Patterns

1. Work Queue Pattern

For processing items in a queue:

2. Indexed Job Pattern

For assigning specific indices to each pod in a batch job:

apiVersion: batch/v1
kind: Job
metadata:
  name: indexed-job
spec:
  completions: 5
  parallelism: 3
  completionMode: Indexed
  template:
    spec:
      containers:
      - name: indexed-container
        image: busybox
        command: ["sh", "-c", "echo My index is ${JOB_COMPLETION_INDEX}"]
      restartPolicy: Never

In this example, each Pod gets a unique index (0-4) in the JOB_COMPLETION_INDEX environment variable.

CronJobs: Scheduled Jobs

While Jobs run once, Kubernetes also offers CronJobs that can schedule Jobs to run at specific times:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello-cron
spec:
  schedule: "*/5 * * * *"  # Run every 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command: ["echo", "Hello from scheduled job!"]
          restartPolicy: OnFailure

This CronJob runs a new Job every 5 minutes.

Job Best Practices

Always set a backoffLimit to control the number of retries
Set appropriate timeouts (activeDeadlineSeconds) to prevent infinite runs
Use labels for easy identification and filtering of Jobs
Clean up completed Jobs to avoid cluster clutter:

kubectl delete jobs --field-selector status.successful=1

Use CronJobs for recurring tasks rather than manually creating Jobs
Set appropriate resource requests and limits to ensure your Jobs get the resources they need

TTL Controller for Automatic Cleanup

Kubernetes provides TTL controller for cleaning up finished Jobs:

apiVersion: batch/v1
kind: Job
metadata:
  name: cleanup-example
spec:
  ttlSecondsAfterFinished: 100  # Delete job 100 seconds after it finishes
  template:
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["echo", "This job will be auto-deleted after completion"]
      restartPolicy: Never

Practical Exercises

Exercise 1: Create a Job that processes a list of URLs and checks if they are reachable
Exercise 2: Create a Job that generates a report by processing data from a persistent volume
Exercise 3: Implement a CronJob that performs regular database backups

Summary

Kubernetes Jobs provide a powerful way to run batch processes, one-time tasks, or any workload that needs to run to completion. Key points to remember:

Jobs manage the execution of Pods that run to completion
They track successful completions and can retry failed Pods
Jobs can run multiple Pods in parallel for faster processing
CronJobs allow scheduling Jobs to run at specific times
Setting proper timeouts and cleanup policies is essential for production use

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Basic Job Concepts​

Creating Your First Job​

Job Completion and Parallelism​

Job Timeouts and Retries​

Real-World Example: Database Migration Job​

Job Patterns​

1. Work Queue Pattern​

2. Indexed Job Pattern​

CronJobs: Scheduled Jobs​

Job Best Practices​

TTL Controller for Automatic Cleanup​

Practical Exercises​

Summary​

Additional Resources​

Introduction

Basic Job Concepts

Creating Your First Job

Job Completion and Parallelism

Job Timeouts and Retries

Real-World Example: Database Migration Job

Job Patterns

1. Work Queue Pattern

2. Indexed Job Pattern

CronJobs: Scheduled Jobs

Job Best Practices

TTL Controller for Automatic Cleanup

Practical Exercises

Summary

Additional Resources