GridSite Learning Center

What this article delivers

Three reference architectures (Retail AR, Industrial Vision, Fleet Telematics) with ASCII diagrams
Latency budgets, SLOs, and CAM pillar mapping (I-PWR, I-COOL, I-NWK, I-DATA, I-CTRL)
Bill of materials, deployment steps, and failure-mode notes
Copy-paste snippets (Kubernetes/Helm, policy) you can adapt today

1) Shared assumptions and quick math

Latency budget model

End-to-end latency (RTT) ≈ Access(last-mile) + Metro + Backbone + DC-fabric + Inference + App logic

Speed of light in fiber ≈ 5 µs/km one way.
Every router/LB hop adds ~0.1–0.5 ms; TLS adds 1–2 RTT for handshake unless resumed.
P99 matters. Budget to P99, not P50.

SLO taxonomy (use or adapt)

Availability — % of successful requests within SLO latency window
Latency — p50/p95/p99 target for “time to first token” (LLM) or “decision time” (vision)
Freshness — Max model staleness (hours) allowed before redeploy
Durability — RPO/RTO for authoritative data & artifacts

CAM in one minute

A-Level: business tolerance for downtime/RPO (A0–A5).
I-Score: mean of 5 pillars (0–5).
CAM Tier: intersection of A-Level and I-Score (0–5).

Architectures below show target pillar scores and the resulting CAM Tier you can certify.

2) Reference Architecture A — Retail AR “Pick-and-See” (sub-20 ms UX)

Use case: A shopper lifts a product; a phone/tablet overlay shows price, reviews, and in-stock alternatives within 20 ms.

Diagram

[Phone/Tablet] --Wi-Fi 6E [In-Store Edge Pod] ==(Anycast)==> [Metro Edge PoP A/B]
       |                           |                               |
    Camera + AR SDK             Triton/vLLM                   Model Registry (quorum)
       |                           |                               |
       +-- tiny preproc         + Feature store read         + Object storage (core)

SLOs

Latency: p95 ≤ 20 ms RTT store↔inference; p99 ≤ 35 ms
Availability: 99.95% monthly
Freshness: model ≤ 24 h old
Durability: RPO ≤ 1 h (features & events)

Placement logic

Inference in-store (1–2 GPUs) for “first paint” with metro PoP as spillover/backup. Features/metadata from metro cache; authoritative store in two core colos.

Latency budget (example)

Component                     Budget
Wi-Fi + local switch          2 ms
In-store edge pod (LLM/vision)8–10 ms
Optional metro round-trip     8–12 ms
Total                         18–24 ms

Bill of Materials (per large store)

Compute: 1× short-depth 2U server, 2× L4 or A10 class GPUs; 256 GB RAM
Network: 2× WAN circuits (cable + 5G), SD-WAN CPE, Anycast-capable ingress (Envoy)
Cooling: Fan-assisted short rack or wall-mount enclosure (I-COOL target 1–2)
Storage: 2 TB NVMe for local warm cache; no durable state
Security: TPM 2.0, secure boot, SPIFFE/SPIRE agent

CAM mapping (A-Level A2 typical)

I-PWR 1–2 — Small UPS covers 10 min; app re-routes to metro if store drops
I-COOL 1–2 — Enclosure fans; thermal alarms to fail to metro
I-NWK 3 — Dual ISP or ISP + 5G; Anycast to two metros; dual DNS
I-DATA 3 — Feature store cached; authoritative in two cores; immutable nightly backup
I-CTRL 3 — GitOps; runners in metro & core; break-glass offline creds

Composite I-Score ≈ 2.4–3.0 → 2–3. CAM Tier for A2 = 2 (I2) or 2 (I3 also yields 2). To reach Tier 3, lift I-CTRL or I-NWK one notch.

Failure modes and behavior

Store power loss: Fail to metro PoP; degrade to “lite overlay” without personalization.
WAN flap: Keep serving from store; queue writes, eventually sync features.
Model rollout gone bad: Blue-green with store→metro canary; auto-rollback via GitOps.

Deployment quick start (Helm)

# values-retail.yaml
ingress:
  anycast: true
  protocols: [h3, h2]
triton:
  image: nvcr.io/nvidia/tritonserver:24.05
  args: ["--exit-on-error=false","--pinned-memory-pool-byte-size=268435456"]
  resources:
    limits: { nvidia.com/gpu: 1 }
featureStore:
  endpoint: http://feature-cache.metro.local:8080
mesh:
  spiffe: true
  mTLS: required

helm repo add retail-ai https://charts.retail.example
helm install store-edge retail-ai/retail-stack -f values-retail.yaml

3) Reference Architecture B — Industrial Vision “Stop the Arm” (≤ 20 ms deterministic)

Use case: High-speed line rejects defective parts; robotic arm must halt before contact.

Diagram

[GigE Camera] -> [PoE Switch] -> [Edge GPU Node(s)] -> [PLC/SCADA]
                                         |               ^
                                         | mTLS+PTP      | EtherCAT/Profinet
                                         v               |
                                   [Local Feature DB]    |
                             (bounded write-behind to core)

SLOs

Decision time: p99 ≤ 20 ms from last frame to PLC signal
Availability: 99.99% monthly (A-Level A3 or A4, depending on safety case)
RPO/RTO: RPO ≤ 15 min (metadata), RTO ≤ 5 min for line restart

Placement logic

Everything critical stays on-prem: inference, feature cache, PLC link. Core/cloud only for analytics, retraining, and artifact storage.

Deterministic timing practices

Use PTP (IEEE-1588) for sub-1 ms clock skew.
Pin GPU/CPU interrupts; isolate NIC queues (RSS) for camera streams.
Prefer gRPC over Unix domain sockets or shared memory on single host.

Bill of Materials (per line)

Compute: 2× industrial GPU PCs (IP65 if needed), each with 1× L4/A2 (fanless if possible)
Network: 1× PoE+ switch; 2× uplinks to plant core; OT/IT segmentation w/ firewall
Cooling: Direct-to-chip kits or sealed fanless chassis (I-COOL 3)
Power: UPS 2 kVA per node; plant micro-grid or generator (I-PWR 3–4)
Storage: NVMe mirrored; WORM snapshot daily for config & models
Safety: Interlock relays; SIL-rated PLC

CAM mapping

I-PWR 3–4 — Genset or micro-grid + UPS with autonomy ≥ RTO
I-COOL 3 — No thermal excursions during shift; redundant pump/fan where liquid cooling used
I-NWK 3 — Dual uplinks; plant ring; out-of-band 4G for alerts; DNS split horizon
I-DATA 3–4 — On-prem primary; second site async; nightly immutable backup; periodic restore drill
I-CTRL 4 — GitOps; two admin teams (OT/IT) with RBAC; policy-as-code; break-glass tested

I-Score ≈ 3–4 → CAM Tier 3 (A3) or Tier 4 (A4 with I4).

Failure modes and behavior

GPU node fails: Hot spare takes camera stream within 1 s via VIP; PLC alarm but no stop if spare ok.
WAN outage: No impact on line; metrics buffered; alerts via LTE.
Cooling loop blockage: Thermal kill-switch halts inference; PLC puts line into safe state; maintenance SLA < 5 min.

Policy gate (OPA Rego) to block unsigned models

package admission

default allow = false

allow {
  input.request.kind.kind == "ModelDeployment"
  sig := input.request.object.spec.signature
  verify(sig, input.request.object.spec.digest)
  input.request.object.spec.tags[_] == "safety-approved"
}

verify(sig, digest) {
  # abstracted; call out to cosign/rekor verifier
}

4) Reference Architecture C — Fleet Telematics “Sense → Decide → Sync” (≤ 70 ms assist; sub-second aggregation)

Use case: 100k vehicles run local safety inference; roadway MEC clusters aggregate events and serve personalized models back to the fleet.

Diagram

[Vehicle Edge (ARM or x86)] --multi-IMSI/5G [MEC PoP A/B] ==Anycast==> [Regional Agg]
       |                                 |                        |
  Camera/LiDAR + tiny model             vLLM/Triton + KV cache   Model registry (quorum)
       |                                 |                        |
       +-- local decision (10–30 ms)  Driver UI               Cloud core analytics

SLOs

On-vehicle assist: p99 ≤ 30 ms (local only)
MEC query: p95 ≤ 70 ms RTT vehicle↔MEC for map/traffic augment
Availability: 99.9% monthly (A-Level A2/A3 by feature class)
Freshness: regional map/model deltas ≤ 5 min

Placement logic

Primary decisions on vehicle to avoid WAN dependency. MEC holds regional embeddings, map tiles, and personalization features. Regional aggregator pushes model deltas; core runs training & analytics.

Bill of Materials (per MEC PoP)

Compute: 4× 2U servers, each 2× L4/A10; 512 GB RAM; NVMe cache 8 TB
Network: Dual ISPs, Anycast VIPs; RPKI ROAs; DDoS scrubbing
Power/Cooling: N+1 UPS; room or containerized pod; economizer if climate allows
Security: HSM for signing model releases; SPIRE for workload identity

CAM mapping (for MEC)

I-PWR 3 — N+1 UPS/genset; MEC must stay up during utility blips
I-COOL 3 — Redundant CRAH or liquid-to-rack; free-cool if possible
I-NWK 4 — Two ISPs + Anycast; secondary DNS; tolerate any one carrier outage
I-DATA 3–4 — KV cache + quorum registry (two regions + MEC); RPO ≤ 15 min
I-CTRL 3–4 — Active-active control-planes across two MECs; GitOps; signed releases

I-Score ≈ 3.4–3.8 → 3–4. For A2, CAM Tier 2–3; for A3, aim for I4 to hit Tier 3.

Failure modes and behavior

Carrier down: Multi-IMSI modems prefer alt carrier; MEC Anycast shifts; degraded but safe.
MEC loss: Vehicles operate locally; miss regional enrichments until next PoP reached.
Model rollback: Signed model fails post-deploy checks; vehicle uses previous local model.

MEC bootstrap (Kubernetes)

# Control plane across two MEC sites
kubeadm init --control-plane-endpoint mec.anycast.local:6443 --upload-certs
kubeadm join mec-a ... --control-plane
kubeadm join mec-b ... --control-plane

# SPIRE/SPIRE agents
helm install spire spiffe/spire-server -n spire
helm install spire-agent spiffe/spire-agent -n spire --set nodeAttestor=tpm

5) Decision matrix: where should your model run?

Constraint	Core Cloud	Metro Edge PoP	In-Store/On-Prem Edge	On-Device
RTT target	> 70 ms	15–40 ms	3–20 ms	0–10 ms
Data sovereignty	Medium	High (regional)	Very high	Highest
Cap-ex/ops	Lowest	Medium	Medium–High	Low (per node, high fleet)
Control	Medium	High	Highest	High but fragmented
CAM pillar stress	I-DATA	I-NWK	I-PWR / I-COOL / I-CTRL	I-CTRL (supply chain)

Pick the lowest-cost tier that meets your SLO. Prove it with CAM; invest only in the pillars that cap your Tier.

6) Implementation checklist (copy/paste into your runbook)

Define SLOs (p95 latency, availability, freshness, durability) per user journey.
Choose A-Level from the Availability Standard (A2 for “important,” A3/A4 for revenue-critical).
Place compute: device → store → metro → core, stopping at the first tier that meets SLO.
Score pillars with CAM rubric; list the cheapest improvements to hit target Tier.
Ship identity: SPIFFE/SPIRE + mTLS; sign every artifact (images, models, policies).
Test RTO/RPO: cut power, flap WAN, corrupt a replica; verify SLOs don’t break.
Automate rollouts: GitOps, progressive delivery, canaries; fail closed on unsigned deploys.
Attest continuously: export pillar KPIs to Grafana; publish green CAM badge.

7) Final take

Real-time AI isn’t a monolith. Some answers must come from the phone, some from the back room, some from the metro, and a few from the core. The right architecture is the one that meets your SLO at the lowest total cost, and the right proof is a CAM Tier you can show to customers, auditors, and your CFO.

Back to Resources Read the CAM Technical Spec

Edge vs Core: Reference Architectures for Real-Time AI

What this article delivers

1) Shared assumptions and quick math

Latency budget model

SLO taxonomy (use or adapt)

CAM in one minute

2) Reference Architecture A — Retail AR “Pick-and-See” (sub-20 ms UX)

SLOs

Placement logic

Latency budget (example)

Bill of Materials (per large store)

CAM mapping (A-Level A2 typical)

Failure modes and behavior

Deployment quick start (Helm)

3) Reference Architecture B — Industrial Vision “Stop the Arm” (≤ 20 ms deterministic)

SLOs

Placement logic

Deterministic timing practices

Bill of Materials (per line)

CAM mapping

Failure modes and behavior

Policy gate (OPA Rego) to block unsigned models

4) Reference Architecture C — Fleet Telematics “Sense → Decide → Sync” (≤ 70 ms assist; sub-second aggregation)

SLOs

Placement logic

Bill of Materials (per MEC PoP)

CAM mapping (for MEC)

Failure modes and behavior

MEC bootstrap (Kubernetes)

5) Decision matrix: where should your model run?

6) Implementation checklist (copy/paste into your runbook)

7) Final take