Air-Gapped Deployment#
This guide covers deploying Scout in air-gapped environments where production nodes do not have internet access.
Overview#
Scout supports air-gapped deployments through a staging node architecture. Ansible automatically deploys a K3s cluster and Harbor registry on the staging node, which acts as a proxy between the internet and your production cluster.
Architecture#
┌──────────────┐ ┌──────────────────────────────┐
│ Internet │────────▶│ Staging Node │
└──────────────┘ │ - K3s cluster (standalone) │
│ - Harbor registry proxy │
│ - Squid forward proxy │
└──────────────────────────────┘
│
┌───────────────────────┐ │ HTTPS (registry)
│ Ansible Control Node │ │
│ ("Jump Node") │ │
│ - Helm CLI │ │
│ - Scout repo │ │
│ - Kubeconfig access │ │
└───────────┬───────────┘ │
│ K8s API │
│ │
▼ ▼
┌─────────────────────────────────────────────────────────┐
│ Production K3s Cluster │
│ - No internet access │
│ - Pulls images via Harbor │
│ - Managed by Ansible from jump node │
└─────────────────────────────────────────────────────────┘
How it works:
Harbor pull-through proxy automatically caches container images from the internet
Production nodes pull images through Harbor without needing internet access
K3s artifacts are downloaded to the Ansible control node and distributed to air-gapped production nodes
SELinux packages are downloaded via a Kubernetes Job on the staging cluster
Requirements#
Operating System#
Critical: Production K3s nodes must run Rocky Linux 9 (or compatible RHEL 9-based distribution).
This requirement exists because air-gapped installations download SELinux packages (k3s-selinux and container-selinux) from Rancher’s repository using a Kubernetes Job that runs Rocky Linux 9 containers. The downloaded packages must match the production node OS to ensure compatibility.
Staging Node#
Internet access for downloading artifacts and container images
Separate physical or virtual machine from production cluster
Sufficient storage for Harbor registry cache (recommend 100Gi+)
SSH access from Ansible control node
Rocky Linux 9 (recommended for consistency, but not strictly required)
Production Nodes#
Rocky Linux 9 (required)
No internet access needed
Network connectivity to staging node Harbor registry
SSH access from Ansible control node
Ansible Control Node#
Network access to both staging and production K8s API servers (port 6443)
kubectlcommand-line tool installedAnsible
kubernetes.corecollection installedInternet access (for downloading Helm charts and k3s artifacts to control node)
Network Connectivity#
Production nodes → Staging Harbor (HTTPS, typically port 443 but configurable in the inventory)
Production nodes → Staging Squid proxy (TCP port 3128, for IdP authentication)
Ansible control → Staging K8s API (port 6443)
Ansible control → Production K8s API (port 6443)
Ansible control → All nodes (SSH, port 22)
Staging Host Configuration#
Adding Staging to Inventory#
Add a staging group to your inventory.yaml:
staging:
hosts:
staging.example.edu:
ansible_host: staging
ansible_python_interpreter: /usr/bin/python3
vars:
# K3s cluster join token for staging cluster
staging_k3s_token: !vault |
$ANSIBLE_VAULT;1.1;AES256
...encrypted token...
# Harbor admin password
harbor_admin_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...encrypted password...
# Storage size for Harbor registry cache
harbor_storage_size: 100Gi
Required Variables#
Variable |
Description |
|---|---|
|
Cluster join token for the staging K3s cluster (separate from production |
|
Admin password for Harbor web UI and API |
|
Persistent volume size for cached images (recommend 100Gi minimum) |
Generating Staging Credentials#
Generate and encrypt credentials using Ansible Vault:
# Generate staging K3s token
openssl rand -hex 32 | ansible-vault encrypt_string --vault-password-file vault/pwd.sh --name 'staging_k3s_token'
# Generate Harbor admin password
openssl rand -hex 32 | ansible-vault encrypt_string --vault-password-file vault/pwd.sh --name 'harbor_admin_password'
See Configuring Secrets for more details on Ansible Vault.
Optional TLS Configuration#
By default, the staging playbook generates a self-signed TLS certificate for Harbor ingress mode. To use your own certificate, define in staging vars:
staging:
vars:
tls_cert_path: /path/to/cert.pem
tls_key_path: /path/to/key.pem
When using self-signed certs, k3s nodes skip TLS verification. When providing valid certs, TLS verification is enabled.
To use bare hostname (no subdomain):
staging:
vars:
harbor_subdomain: '' # Uses staging hostname directly instead of harbor.<hostname>
Air-Gapped Configuration Variables#
Enabling Air-Gapped Mode#
Set the global air_gapped flag in your inventory:
all:
vars:
air_gapped: true
K3s Air-Gapped Variables#
Configure these variables in the k3s_cluster vars section or globally in all vars:
k3s_cluster:
vars:
# Enable air-gapped installation mode (default: false)
air_gapped: true
# Timeout for downloading k3s artifacts to control node in seconds (default: 300)
k3s_artifact_download_timeout: 300
# SELinux package installation (default: auto-detect based on target node SELinux status)
# Set to true/false to override auto-detection
# k3s_selinux_enabled: true
# Rancher repository channel for SELinux packages (default: stable)
# Options: stable, testing, latest
k3s_selinux_channel: stable
Variable Reference#
Variable |
Default |
Description |
|---|---|---|
|
|
Enable air-gapped installation mode |
|
|
Timeout in seconds for downloading k3s binary and install script to Ansible control node |
|
auto-detect |
Install SELinux packages ( |
|
|
Rancher repository channel for SELinux packages. Options: |
|
|
Rancher RPM repository site (rarely needs changing) |
Deployment Steps#
Follow these steps to deploy Scout in air-gapped mode:
1. Enable Air-Gapped Mode#
Set air_gapped: true in your inventory file:
all:
vars:
air_gapped: true
2. Configure Staging Host#
Add the staging host and credentials to your inventory (see Staging Host Configuration above).
3. Configure Production Cluster#
Define your production K3s cluster nodes as usual in the server, workers, and optionally gpu_workers groups. Ensure all nodes run Rocky Linux 9.
server:
hosts:
prod-server.example.edu:
ansible_host: prod-server
workers:
hosts:
prod-worker-1.example.edu:
ansible_host: worker-1
prod-worker-2.example.edu:
ansible_host: worker-2
4. Deploy#
Deploy Scout components normally:
ansible-playbook -i inventory.yaml playbooks/main.yaml
# Or use the Makefile
make all
What happens:
stagingplay installs a single-node K3s cluster on the staging host (online mode), deploys Traefik, Harbor (container image proxy), and Nexus (package proxy for conda/PyPI/Maven) via Helm, and installs Squid forward proxy for outbound IdP accessk3splayDownloads K3s artifacts (binary, install script) to Ansible control node
Downloads SELinux packages via Kubernetes Job on staging cluster
Distributes artifacts to production nodes that lack internet access
Installs K3s with Harbor registry mirrors configured so production nodes can pull container images through Harbor
Other Scout plays
Helm charts are deployed from Ansible control node (charts are bundled in the Scout repository)
Container images are pulled by production nodes through Harbor
Harbor automatically caches images from upstream registries on first pull
How It Works#
Harbor Pull-Through Proxy#
Harbor acts as a transparent caching proxy for container registries:
Production pod requests an image (e.g.,
docker.io/postgres:15)K3s containerd is configured to rewrite requests to Harbor (e.g.,
staging.example.edu/dockerhub-proxy/postgres:15)Harbor checks its cache:
Cache hit: Returns cached image immediately
Cache miss: Downloads from internet, caches, returns to requester
Subsequent requests for the same image are served from Harbor cache
Supported registries:
Docker Hub (
docker.io)GitHub Container Registry (
ghcr.io)Quay.io (
quay.io)K8ssandra Container Registry (
cr.k8ssandra.io)Kubernetes Registry (
registry.k8s.io)Elastic Docker Registry (
docker.elastic.co)NVIDIA GPU Cloud (
nvcr.io)Apache Superset (
apachesuperset.docker.scarf.sh)
Squid Forward Proxy#
Squid is installed as a system service on the staging node to provide outbound HTTPS access for services on the air-gapped production cluster that need to reach external APIs — specifically, Keycloak’s server-to-server calls to external identity provider (IdP) OAuth endpoints.
How it works:
Squid listens on port 3128 on the staging node with a strict domain allowlist
Keycloak on the production cluster is configured to route IdP traffic through the proxy
Squid permits HTTPS CONNECT requests only to allowed domains, denying all other traffic
Allowed domains are computed automatically from your IdP configuration:
IdP configured |
Domains allowed |
|---|---|
|
|
|
|
Keycloak integration:
When air_gapped: true and an external IdP is configured, the Keycloak Ansible role automatically configures Keycloak’s spi-connections-http-client-default-proxy-mappings SPI to route IdP traffic through Squid. No manual configuration is needed.
Adding extra domains:
To allow additional outbound access through the proxy (e.g., for Azure OpenAI API), set squid_extra_allowed_domains in your staging inventory vars:
staging:
vars:
squid_extra_allowed_domains:
- api.openai.com
Network requirements:
Production cluster nodes must be able to reach the staging node on port 3128 (TCP). Add this to the network connectivity requirements alongside the existing Harbor registry access.
K3s Artifact Distribution#
In air-gapped mode, k3s installation artifacts are handled differently:
Download phase (on Ansible control node):
K3s binary downloaded from GitHub releases
Install script downloaded from
get.k3s.ioSELinux RPMs downloaded via Kubernetes Job on staging cluster
Distribution phase:
Artifacts copied from control node to production nodes via SSH
Install script run with
INSTALL_K3S_SKIP_DOWNLOAD=trueSELinux packages installed via
dnf
SELinux Package Download#
SELinux packages are downloaded using a unique Kubernetes Job approach:
Ansible creates a Job on the staging cluster
Job runs a Rocky Linux 9 init container that:
Configures Rancher k3s yum repository
Downloads
k3s-selinuxandcontainer-selinuxwith all dependenciesSaves RPMs to a shared volume
Main container keeps the pod running
Ansible extracts RPMs using
kubectl cpfrom the podRPMs are fetched to control node and distributed to production nodes
This approach ensures correct package versions for Rocky Linux 9 without requiring yum repositories on air-gapped nodes.
Limitations#
Version Upgrades#
When upgrading k3s or other components:
Test the upgrade in your staging environment first
Use
-eflag to override versions temporarily (see Testing Upgrades)Update
group_vars/all/versions.yamlafter validating the upgradeDeploy to production
Network Isolation#
Air-gapped mode prevents production nodes from accessing the internet, but:
Production nodes still need access to staging Harbor
Ansible control node needs access to K8s APIs
This is not a completely isolated environment (no external network access)
Operating System Support#
Air-gapped installations only support Rocky Linux 9 for production nodes due to SELinux package requirements. The staging node can run other distributions, but Rocky Linux 9 is recommended for consistency.
Additional Information#
Jump Node Architecture ADR - Security rationale for separating staging and jump nodes