Plain-English Recap

What We Did Today

A walkthrough of the dev-server cleanup, the concepts behind it, and the workflow you'll use from here.

2026-05-01 mikvehworks.com (dev server) 137.184.21.83

Quick Reference — How to Use the Workflow

Same pipeline, two ways to drive it. Pick whichever feels natural in the moment.

1. In the terminal

The three workflow scripts live at /home/agentruntime/bin/ and run as the agentruntime user. If you're logged in as root, switch over first: sudo -u agentruntime -i.

# start a new feature (creates branch, folder, subdomain, SSL cert, pm2 process)
wf-new my-feature

# pull the latest from dev into the feature's branch and restart its subdomain
wf-sync my-feature

# tear it all down after the PR is merged
wf-retire my-feature

Each script will pause before any sudo command and ask before continuing — so it doesn't surprise you with system-level changes.

2. After `wf-retire` — nginx cleanup (run as root)

The aigent doesn't have permission to delete files in /etc/nginx/ (intentional, for safety). After wf-retire finishes, switch to your root shell and run:

sudo rm -f /etc/nginx/sites-enabled/my-feature.mikvehworks
sudo rm -f /etc/nginx/sites-available/my-feature.mikvehworks
sudo nginx -t && sudo systemctl reload nginx

3. Talking to your aigent

Or just say it in plain English. Each phrase maps directly to one script:

Set up a new feature called my-feature → wf-new my-feature

Sync my-feature with dev → wf-sync my-feature

Retire my-feature → wf-retire my-feature

The aigent will pause before sudo and ask. Automation does the work, you stay in control of when it actually fires.

What you still own either way: committing your code, pushing to GitHub, and merging the PR. The scripts (and the aigent) only handle the infrastructure plumbing — never the code review or merge decisions.

01The TL;DR

In one paragraph.

You wanted multiple branches running on multiple subdomains at the same time, all on this one dev server. It was already working — sort of — but fragile and tangled. We cleaned it up and built an automation pipeline on top. Each feature now lives in its own folder, on its own subdomain, on its own port, with auto-restart and reboot survival. The four-phrase workflow is installed, tested twice end-to-end, and production-ready. The next time you start a feature, you just say "set up a new feature called X" and 30 seconds later it's live.

02The Setup You Have Now

Five subdomains, five folders, five ports. Each row is independent.

URL	Folder	Branch	Port
mikvehworks.com	/opt/watchflow-dev	dev	3009
invoices.mikvehworks.com	/home/agentruntime/wf-invoice	invoice-module	3011
activity.mikvehworks.com	/home/agentruntime/wf-activity-log	activity-log	3013
reporting.mikvehworks.com	/home/agentruntime/wf-reporting	reporting	3015
marketing.mikvehworks.com	/home/agentruntime/wf-marketing	marketing	3017

Edit code in a folder → changes show on its URL. Other rows keep running undisturbed.

03What Was Broken (and Why)

Three problems we found and fixed today.

1. Processes were "babysat by luck"

Your subdomain servers were started by hand using nohup — a Linux command that means "run this and don't kill it when I log off." Works fine until something goes wrong:

If a process crashed → it stayed dead. No auto-restart.
If the server rebooted → everything died. Had to be manually restarted.

2. The disk was 100% full

Earlier in the session, root-Claude (running as the root user) made a mistake — it created four duplicate copies of your code under /opt/watchflow-*. Each copy needs its own node_modules/ folder, weighing about 4 GB (mostly because of onnxruntime-node and Playwright's bundled browsers). That's roughly 17 GB of pure waste, on a 96 GB disk that was already crowded.

3. Two Claude sessions didn't know about each other

You have two Linux user accounts on this server: root (system admin) and agentruntime (your aigent app). Each had its own Claude session, its own pm2, its own folders. They were stepping on each other's toes — both trying to run the same subdomains.

04What We Actually Fixed

Five concrete changes, in order.

Deleted the 17 GB of duplicate folders — freed disk back to 16 GB headroom.
Repaired the broken node_modules in /opt/watchflow-dev (it was a self-referencing symlink — pointing at itself, infinite loop).
Stopped the manually-started processes and re-started them under your aigent's pm2 instead.
Registered a systemd unit so your aigent's pm2 starts automatically on reboot.
Confirmed all four subdomains return HTTP 200 — healthy, no crash loops.

05Concepts You Asked About

Plain definitions for the moving parts.

Branch vs. folder

A branch is a version of your code that lives on GitHub. A folder is where that code is downloaded on a computer.

You can download the same repo into multiple folders — one per branch you want to work on at the same time. That's called a git worktree. All your wf-* folders share one underlying .git directory, but each has a different branch checked out.

`nohup` vs. `pm2`

Two ways to run a Node.js server in the background:

nohup node server.js & → "Just run it and don't kill it when I log off." No safety net. Dies on crash. Dies on reboot.
pm2 start server.js → "Run it, watch it, restart it if it dies." Auto-recovers. Combined with a systemd unit, also survives reboot.

Subdomains, nginx, and ports

Your server runs many Node.js processes at once. Each one listens on a numbered "port" (3009, 3011, 3013...). When a visitor types invoices.mikvehworks.com in their browser, here's the chain of events:

invoices.mikvehworks.com ↓ DNS lookup 137.184.21.83 (your server's IP) ↓ nginx reads /etc/nginx/sites-enabled/ 127.0.0.1:3011 (forward to this local port) ↓ wf-invoice node process (responds with HTML)

Change the port number, or kill the process → the subdomain breaks. Keep them stable → everything works.

Two servers, not one

The site at mywatchflow.com (your live production site, with real customer data) runs on a different physical machine at 24.144.109.84. This dev server cannot reach it. They're completely separate.

Merging code to main on GitHub does not automatically deploy to mywatchflow.com — that machine has its own deploy process, run separately.

06Your Four-Phrase Workflow

Talk to your aigent like this. Each phrase runs a script that handles the plumbing.

Set up a new feature called X

aigent creates the branch, the folder, the subdomain, the SSL cert, and starts the process. You get back a working URL.

I want to work on X in this chat

aigent switches its working directory to the right folder so further edits go in the right place.

Sync X with main

aigent pulls the latest main branch into your feature branch, pushes, and restarts the subdomain.

Retire X

After you've merged the feature, aigent tears it down: stops the process, deletes the cert, removes the folder, deletes the branch.

You still own: committing your code, pushing to GitHub, and merging the PR. The aigent only handles the infrastructure plumbing — never the code review or deploy decisions.

07What Lives Where

Reference table for paths you'll see again.

Thing	Where it lives
Main dev folder	/opt/watchflow-dev
Feature folders	/home/agentruntime/wf-*
nginx configs	/etc/nginx/sites-available/
SSL certs (Let's Encrypt)	/etc/letsencrypt/live/
pm2 dumps (root)	/root/.pm2/dump.pm2
pm2 dumps (aigent)	/home/agentruntime/.pm2/dump.pm2
Workflow scripts (planned)	/home/agentruntime/bin/wf-*
Production server	24.144.109.84 never touched from here

08What Could Have Gone Wrong (but didn't)

The careful checks paid off.

We almost killed the wrong process. Your aigent's orchestrator (PID 843) was on the same machine. Stopping it would have cut you off from your own automation. We protected it explicitly.
npm install failed mid-way because the disk was full. The aigent caught it, stopped, and asked — instead of leaving things half-broken.
The first plan would have left subdomains owned by root, which would have broken your edit-and-see-it-live flow. The aigent flagged it and we changed direction.

Lesson: Slow, careful, "stop and ask" beats fast and confident. We checked git status before deleting, verified PIDs before killing, and read every error before moving on. Nothing of yours was lost.

09The Pipeline Test (and the Two Bugs)

We installed the workflow scripts, then tested them — twice. The first run found two real bugs. Here's what happened.

Step 1 — Scripts installed

Your aigent installed three executables in /home/agentruntime/bin/: wf-new, wf-sync, wf-retire. Plus a section in /home/agentruntime/CLAUDE.md documenting the four phrases. All passed bash -n syntax check.

Step 2 — First end-to-end run: wf-new test-pipeline

The script:

Created branch test-pipeline off dev
Made a worktree at /home/agentruntime/wf-test-pipeline
Picked port 3019 automatically (next free)
Ran npm ci to install 1,107 packages
Paused for sudo — we said "go" — ran 5 commands: nginx vhost write, symlink, syntax check, reload, certbot SSL issue
Started the process under pm2, pushed the branch to GitHub

Result: https://test-pipeline.mikvehworks.com returned HTTP 200. The pipeline worked.

But the aigent flagged two issues it had to manually patch around:

Bug #1 — `.env` wasn't being written

wf-new was passing the port to pm2 as a one-time argument, but never writing a .env file. So if you ever ran pm2 restart --update-env from a normal shell, pm2 would re-read the environment, find no PORT defined, fall back to the default 3010 (already taken), and crash with EADDRINUSE.

Fix: Patched wf-new to copy /opt/watchflow-dev/.env into the new worktree and override PORT with the assigned port. Now every new feature is self-contained and survives any pm2 restart.

Bug #2 — storefront bypass was missing on `dev`

Your app has a check that looks at the incoming hostname — for *.mikvehworks.com subdomains, it skips the storefront and renders the app. That bypass was added on the invoice-module branch but never merged back to dev. So every new feature branched off dev would inherit the missing bypass and 404 itself out.

Fix: Cherry-picked commit 302cc325 (renamed e1b96af9 on dev) onto dev, pushed, restarted watchflow-dev pm2. From now on, every wf-new inherits the bypass automatically.

Step 3 — Retire test-pipeline

Ran wf-retire test-pipeline. The aigent stopped the process, deleted the cert, removed the worktree, and deleted both the local and remote branches. The only manual step was three sudo rm commands you ran in your root shell — the aigent doesn't have permission to delete files in /etc/nginx/ (intentional, for safety).

Step 4 — The second run: wf-new test-pipeline-2

This was the real test. Same command, same protocol, but with both bugs fixed:

Check	Result
HTTP status	200 OK at `test-pipeline-2.mikvehworks.com`
pm2 row	online, 0 restarts, port 3019
SSL cert	Issued, expires 2026-07-30
`.env` written	PORT=3019 written automatically
Storefront bypass inherited	Yes — no manual patch needed
Manual interventions during run	Zero (besides the sudo "go ahead")

Step 5 — Retire test-pipeline-2

Same retire flow, same clean exit. All four real feature subdomains (invoices, activity, reporting, marketing) returned 200 throughout. Nothing else on the server was disturbed.

Bottom line: The pipeline went from "in theory" to "tested twice, found bugs, fixed bugs, retested, clean." That's how you tell automation is actually production-ready — not by it working once, but by it working after the bugs are out.

A note on the `node_modules` detour

Mid-pipeline, when restarting watchflow-dev on the new code, we hit an unrelated mess: this repo tracks node_modules as a symlink (it shouldn't — node_modules is normally git-ignored). The previous broken-symlink cleanup left git's idea of the file out of sync with the real installed packages directory. Switching branches threw "Your local changes would be overwritten."

We sidestepped it by moving the real node_modules to /tmp, letting git restore its (broken) symlink, switching to dev, then moving the real folder back. No work lost. The underlying repo weirdness is still there — a real cleanup someday is to remove node_modules from git tracking and add it to .gitignore. Not urgent.

10Still To Do (Low Priority)

Two small things, neither blocking.

Add narrow rm permission to aigent's sudoers, scoped to /etc/nginx/sites-{available,enabled}/*.mikvehworks. That would let wf-retire finish end-to-end without you running 3 manual sudo rm commands. Low risk because the path glob locks it to nginx vhost files only. Defer until you're tired of typing them.
Stop tracking node_modules in git. It's tracked-as-symlink for historical reasons. git rm --cached -r node_modules + add to .gitignore would clean it up. Worth doing eventually but doesn't affect anything today.

What's not on the list anymore: install scripts (done), test the pipeline (done twice), document the four phrases (done in CLAUDE.md), fix .env handling (done), get the storefront bypass onto dev (done). Workflow is complete.

Quick Reference — How to Use the Workflow

1. In the terminal

2. After wf-retire — nginx cleanup (run as root)

3. Talking to your aigent

01The TL;DR

02The Setup You Have Now

03What Was Broken (and Why)

1. Processes were "babysat by luck"

2. The disk was 100% full

3. Two Claude sessions didn't know about each other

04What We Actually Fixed

05Concepts You Asked About

Branch vs. folder

nohup vs. pm2

Subdomains, nginx, and ports

Two servers, not one

06Your Four-Phrase Workflow

07What Lives Where

08What Could Have Gone Wrong (but didn't)

09The Pipeline Test (and the Two Bugs)

Step 1 — Scripts installed

Step 2 — First end-to-end run: wf-new test-pipeline

Bug #1 — .env wasn't being written

Bug #2 — storefront bypass was missing on dev

Step 3 — Retire test-pipeline

Step 4 — The second run: wf-new test-pipeline-2

Step 5 — Retire test-pipeline-2

A note on the node_modules detour

10Still To Do (Low Priority)

2. After `wf-retire` — nginx cleanup (run as root)

`nohup` vs. `pm2`

Bug #1 — `.env` wasn't being written

Bug #2 — storefront bypass was missing on `dev`

A note on the `node_modules` detour