design.md 5.8 KB

Context

This is a greenfield Ansible project to automate the provisioning of a two-server chained proxy setup. The two servers are:

  • Relay server (中转机): Runs shadowsocks-rust as an encrypted proxy. Acts as the transit node — general traffic from clients exits here. Also serves as the first hop for chained traffic to the landing server.
  • Landing server (落地机): Runs Trojan as an encrypted proxy disguised as HTTPS traffic. Provides a local IP exit for geo-sensitive services (AI platforms, streaming). Also supports direct client connections.

The chaining is done on the client side using Surge's underlying-proxy feature:

  • Client → Relay (Shadowsocks) → Landing (Trojan) → Internet (for chained traffic)
  • Client → Landing (Trojan) → Internet (for direct traffic)
  • Client → Relay (Shadowsocks) → Internet (for relay-only traffic)

The Ansible playbook provisions the server-side daemons only. Client-side Surge configuration is documented but not deployed by Ansible.

Goals / Non-Goals

Goals:

  • Fully automated, idempotent server provisioning via Ansible
  • shadowsocks-rust deployed on relay server as a systemd service
  • Trojan deployed on landing server as a systemd service with TLS (Let's Encrypt)
  • Basic server hardening (firewall, SSH key-only, fail2ban)
  • Document client-side Surge configuration with underlying-proxy chain and routing rules
  • Landing server supports both chained (via relay) and direct connections

Non-Goals:

  • Client-side Surge deployment or configuration management (document only)
  • Web UI or dashboard for proxy management
  • Automatic failover or high availability
  • VPN tunneling (this is a proxy-only setup)
  • Traffic logging or analytics

Decisions

1. Protocol pairing: Shadowsocks on relay, Trojan on landing

  • Relay (SS): Shadowsocks is fast and lightweight, ideal for the transit hop. shadowsocks-rust provides best performance with AEAD ciphers.
  • Landing (Trojan): Trojan disguises traffic as normal HTTPS, beneficial for the endpoint that handles geo-sensitive services. Requires a domain and TLS cert.

Why this pairing over the reverse: The relay is a transit node where speed matters most. The landing server faces service providers that may inspect traffic patterns — Trojan's HTTPS disguise is more valuable here.

2. Ansible project structure: roles-based layout

inventory/
  hosts.yml
group_vars/
  all.yml
  relay.yml
  landing.yml
roles/
  base/              # Common system setup
  shadowsocks/       # shadowsocks-rust installation and config
  trojan/            # Trojan installation and config
site.yml             # Main playbook

Why over a flat playbook: Roles enable reuse, testing, and clear separation. Each role is independently testable.

3. shadowsocks-rust deployment

  • Download pre-built binary from GitHub releases (configurable version)
  • Configure via JSON config file generated from Jinja2 template
  • Run as systemd service under dedicated ssserver user
  • Use AEAD cipher (e.g., aes-256-gcm or chacha20-ietf-poly1305)

4. Trojan deployment with TLS

  • Install Trojan (trojan-go or trojan-gfw) from release binary
  • TLS certificate via Let's Encrypt (certbot) with auto-renewal
  • Requires a domain name pointing to the landing server
  • Trojan listens on port 443, with a fallback web server for non-Trojan traffic (camouflage)
  • Run as systemd service under dedicated user

5. Client-side Surge configuration (documented, not deployed)

The project includes a reference Surge client config showing:

[Proxy]
Relay-SS = ss, relay_ip, ss_port, encrypt-method=aes-256-gcm, password=xxx
Landing-Trojan = trojan, landing_domain, 443, password=xxx
Landing-Chain = trojan, landing_domain, 443, password=xxx, underlying-proxy=Relay-SS

[Proxy Group]
Chain = select, Landing-Chain
Direct-Landing = select, Landing-Trojan

[Rule]
# Sukka's rulesets (https://github.com/SukkaW/Surge)
# DOMAIN-SET and non_ip rules MUST come before ip rules

# AI services - through chain (relay → landing, exit from landing IP)
DOMAIN-SET,https://ruleset.skk.moe/List/domainset/ai.conf,Chain
RULE-SET,https://ruleset.skk.moe/List/non_ip/ai.conf,Chain

# Streaming - through chain
RULE-SET,https://ruleset.skk.moe/List/non_ip/stream_us.conf,Chain

# IP-based rules last
RULE-SET,https://ruleset.skk.moe/List/ip/stream_us.conf,Chain

# Default - relay only
FINAL,Relay-SS

All domain matching is delegated to Sukka's externally maintained rulesets — no self-maintained domain lists. Rulesets auto-update every 12 hours.

The underlying-proxy on Landing-Chain means Surge first connects to the relay via SS, then through that connection reaches the landing server via Trojan.

6. Server hardening baseline

The base role handles:

  • UFW firewall with default deny, allowing only SSH + service-specific ports
  • SSH hardened (key-only auth, no root login)
  • fail2ban for SSH brute-force protection
  • Automatic security updates (unattended-upgrades)

Risks / Trade-offs

  • [Single point of failure on relay] → If the relay goes down, chained traffic stops. Mitigation: direct landing connection remains available; add monitoring as future enhancement.
  • [TLS certificate for Trojan] → Landing server requires a domain name and Let's Encrypt cert. Mitigation: automate cert provisioning with certbot in the Ansible role.
  • [Cert renewal] → Let's Encrypt certs expire every 90 days. Mitigation: certbot auto-renewal via cron/systemd timer, with a handler to reload Trojan.
  • [Rule maintenance] → Domain lists change over time. Mitigation: delegated entirely to Sukka's rulesets (https://github.com/SukkaW/Surge), which auto-update every 12 hours — no local maintenance needed.
  • [Security of proxy credentials] → Passwords stored in Ansible vars. Mitigation: use Ansible Vault for all secrets; restrict deployed config file permissions.