GridMetro: A Metro Connectivity Fabric Built for Edge-First Workloads
Most metro networks were built to haul north–south traffic from buildings to a few centralized data centers and then out to the public internet. Edge flips the flow.
Inference clusters, sensor gateways, realtime observability stacks, and low-latency APIs want east–west paths inside a metropolitan area—deterministic, sub-millisecond, and easy to stitch across multiple last-mile owners. GridMetro is a thought-leadership blueprint for a neutral, partner-powered metro platform that unifies transit, transport, and peering into a single service plane optimized for edge computing.
Rather than overbuilding fiber or competing with incumbents, GridMetro assumes partnership: we federate the diverse plant of existing metro fiber operators behind a common technical standard, commercial framework, and automation API. Customers see one catalog, one SLA, one NOC. Partners keep owning their last-mile and rings, expose inventory and performance via an NNI/API, and participate in revenue share. The result is a metro fabric that behaves like a single network—even though it's stitched from many.
1) What "edge-optimized" means in a metro
For GPUs serving tokens to a million phones, or CDUs moving heat away from 80 kW racks across town, latency and jitter budgets matter more than raw throughput. GridMetro targets ≤1.0 ms one-way (≤2.0 ms RTT) between any two on-net edge POPs inside a metro "zone" (≈≤50 km radius), jitter <100 µs p95, and loss ≤0.01% under load. Those numbers are reachable only when the fabric is designed as if it were a single switch backplane stretched over a city: short, diverse paths; clean L2/L3 demarc; consistent queuing; identical clocking; and policy you can automate.
Concretely, GridMetro delivers three product families on one underlay:
Transport (L1/L2): deterministic private paths—dark/lit waves, Ethernet EVPN/VXLAN, or MEF-style EPL/EVPL—between any pair of on-net sites.
Transit (L3): high-quality IP with RPKI, MANRS, and always-on volumetric DDoS mitigation, engineered for low jitter and quick failover.
Peering: local exchange fabric for public/remote peering and metro anycast, so your edge services terminate traffic where the users are without tromboning to a distant IX.
Every port, service, and metric looks the same to the customer regardless of which partner's glass the bits actually traverse.
2) Physical and logical architecture
Physical topology. GridMetro's spine is a set of dual counter-rotating rings across the metro, landing in major carrier hotels and neutral facilities (the "core POPs") and fanning out along diverse laterals to micro-POPs at edge sites (retail, industrial parks, neighborhood DCs, cell/MEC sites). Where partners have existing rings, we interconnect with NNIs at two or more POPs; where gaps exist, we fill with leased waves or targeted builds. Every lateral aims for path diversity down to the conduit: separate river/rail crossings and handholes, with route maps in the as-builts.
Logical control plane. The fabric runs EVPN over VXLAN (or over MPLS/SRv6 where a partner requires it). EVPN gives us scalable multi-tenant L2/L3VPN, fast convergence, and clean interop at NNIs. The control plane is eBGP everywhere; each partner is a distinct ASN with signed RPKI ROAs. Customer edge (CE) devices speak either L2 (Q-in-Q handoff) or L3 (eBGP) into our PE.
Timing. To support PTP for radio fronthaul or highly synchronized applications, every core POP houses a GNSS grandmaster with oscillators and holdover; edge POPs run boundary clocks. NTP/NTS is available metro-wide; PTP domains are scoped to customers who need them.
Capacity. Standard NNIs are 100 G and 400 G today, with 800G on the roadmap. Access ports are 1/10/25/100 G. Wavelengths are available at 100/200/400 G line rates. All POPs provision A/B power, diverse PDUs, and field-replaceable optics inventory.
Security on the wire. NNIs and sensitive customer handoffs support MACsec. All BGP sessions enforce GTSM, TTL-security, max-prefix, and RPKI origin validation. Remote management is on an isolated OOB.
3) Services, precisely defined
3.1 Transport
GridMetro Wave. Point-to-point optical wavelengths with hard SLAs on latency (documented per route), availability, and MTTR. Offered protected (ring-switch) or unprotected (customer diversity).
GridMetro EVPN. L2VNI for stretch-L2 use cases, L3VNI for routed domains. EVPN-ELAN (multipoint) and EVPN-VPWS (point-to-point) available. Deterministic QoS profiles (see §5) and optional bandwidth calendaring (e.g., 40 G baseline with time-boxed bursts to 100 G nightly for replication).
MEF-aligned handoffs. EPL/EVPL semantics for enterprises that want strict EVC behavior.
3.2 Transit
GridMetro Transit. Metro-scoped IP transit with edge-biased route preference to keep flows local; multi-homed upstreams per metro; RPKI validation and MANRS controls; always-on scrubbing via metro scrubbing centers.
Metro Anycast. We announce your anycast /32 or /128 from POPs across the metro and weight them with real-time probe data, so clients land on the nearest healthy edge cluster.
3.3 Peering
GridMetro Exchange. A neutral peering fabric connecting local ISPs, CDNs, clouds, and enterprises. Route servers support BFD, standard BGP communities, and built-in RPKI.
Remote Peering. VLANs to regional/national IXs (where justified) without hauling traffic out of the metro for local eyeballs.
Cloud on-ramps. Where clouds provide metro on-ramp POPs or local zones, we offer virtual circuits to them as first-class endpoints.
4) QoS, latency engineering, and failure behavior
Classes. The fabric exposes four classes with strict shaping at ingress:
- Real-Time (EF): voice, control-loops, latency-critical inference.
- Interactive (AF-low drop): APIs and storage metadata.
- Bulk (BE): backups, model syncs, prefetch.
- Scavenger: experiments and things you can hurt without tears.
Each class has metro-specific queuing profiles and per-hop behaviors documented in the service guide. SLAs are per class; you don't buy "a circuit," you buy a latency budget with a queuing contract.
Latency budget. We publish path-specific one-way latency (POP↔POP) and enforce that on turn-up using TWAMP and optical path data. Where partners hand traffic off, they must meet the same budget or advertise a higher one we expose to customers. Fast reroute targets <50 ms path switch on fiber cuts; jitter under failover is bounded and published.
Congestion signals. Customers can opt into In-band Network Telemetry (INT) or sFlow/NetFlow exports. The NOC projects per-class headroom at each hop so you can place jobs where the metro is quietest.
5) Security and route hygiene
GridMetro enforces RPKI origin validation on every BGP edge, max-prefix with sane defaults, and well-documented communities that give customers control without tickets. Example community behavior (illustrative):
65535:10 — Prefer local-metro exit (keep traffic in-metro if a path exists). 65535:20 — De-prefer upstream transit (use peering first). 65535:666 — Blackhole destination (DDoS sinkhole at scrubbing edge). 65535:300X — Color X (policy-based selection for multi-site anycast).
DDoS protection is default-on: flows exceeding heuristics are diverted to metro scrubbing and returned via GRE/VXLAN or native handback. MACsec is offered on NNIs and large enterprise handoffs. Management planes are out-of-band; all change is authenticated, authorized, and logged.
6) Automation, telemetry, and APIs
The commercial promise ("one catalog, one SLA") is only real if provisioning is programmable across partners.
Inventory & quoting. Partners publish on-net buildings, fiber routes, and POP ports via an API (or periodic CSV/GeoJSON). GridMetro's portal surfaces availability by address, shows documented path diversity, and quotes lead time and NRC/MRC instantly.
Order → turn-up. Customer orders generate intent (port_speed=100G, service=EVPN, class=Interactive, endpoints=[POP-A, POP-B]). The orchestrator programs the PEs and sends standard LSO/MEF APIs to partner controllers to stand up the NNI side.
Acceptance. We run RFC 2544/EtherSAM/Y.1564 for L2 or TWAMP for L3, capture baselines (latency/jitter/loss per class), and attach them to the service record.
Telemetry. Per-class latency, jitter, loss, errors, optics health, and route-convergence counters stream to the portal and API. Customers can subscribe to webhooks for threshold breaches, and pull time-series to their Grafana.
Everything the portal does, the API does. That matters when your CI/CD needs to stand up ephemeral circuits for an event (e.g., "give me 40 G Interactive between POP-N and POP-S for six hours starting at 18:00").
7) Commercial construct with partners
GridMetro is neutral. The Partner Agreement standardizes:
NNI terms. Port speeds, LOA/CFA, MACsec options, loop and light levels, test procedures, maintenance windows.
Performance disclosure. Partners publish path latencies, diversity metadata, and scheduled works at least 10 days ahead.
Revenue share. Simple split on MRC with volume tiers; NRC flows through.
SLA back-to-back. Credits flow downstream if a partner-owned segment causes an SLA miss; customers see a single credit schedule.
Data sharing. Minimal operational telemetry is shared for SLA verification (anonymized where required).
This lets smaller ring operators monetize spare capacity, and lets big incumbents win business they wouldn't have touched—without building portals and OSS for niche edge use cases.
8) Example: a metro edge pattern
Consider a DFW-sized metro with ten core POPs and twenty micro-POPs embedded in retail parks, industrial zones, and office clusters. A customer lands GPU pods in three micro-POPs near user density, keeps a storage spine in a core POP, and advertises anycast for the inference API. GridMetro provides:
- EVPN L3VNI between the three GPU pods and the storage spine with Interactive class at 2×100 G.
- Transit feeding those pods with strong local preference, RPKI, and DDoS scrubbing.
- Exchange peering with last-mile ISPs so most traffic never leaves the metro.
- Deterministic latency of ~0.5–0.9 ms one-way between any two pods; jitter under 100 µs.
- Calendared bulk window (23:00–06:00) when the EVPN bursts to 400 G for model refresh.
If a fiber cut takes out a northern ring span, fast reroute flips within 50 ms; latency may climb by 0.2–0.3 ms but remains inside the published budget. The customer's orchestrator sees class headroom drop via the API and shifts a batch job south automatically.
9) Operations and change
A metro fabric is only as good as its change discipline. GridMetro runs a central NOC with a strict maintenance policy: concurrent maintainability across rings, night windows, and automated pre/post checks. All routing policy lives in Git, linted and promoted through rings (staging POP → one quadrant → full metro). Customer-facing incidents show the fault domain immediately (conduit, span, NNI, PE, route server), the blast radius (which POPs/classes are affected), and the ETA for restoration. Post-incident, customers get the timeline, counters, and route diffs—not prose.
10) Why this matters for edge
Edge loses its economic argument if packets meander to a faraway IX or if a single partner's maintenance takes out your east–west path. GridMetro's neutral fabric makes the metro feel like a local backplane: low and predictable latency, clean QoS, built-in peering, and API-driven provisioning. That is what lets you run synchronous replication across two sites without drama, deploy anycast inference that actually lands locally, and spin up ephemeral transport for events and surges without scheduling a month-long cross-connect dance.
If you already operate rings or last-mile in a city, GridMetro is an invitation: bring your plant, plug into a common NNI and API, and sell into edge-heavy demand with a product set that's coherent and modern. If you operate GPU clusters, storage spines, or latency-sensitive applications, GridMetro is the fabric you build on—one catalog, one SLA, the metro as a programmable switch.
Light tie-in to the GridSite ecosystem. GridSite/ComputeComplete sit one layer up the stack. We help source on-net edge sites, deploy the Facility Monitoring & Management Network (FMMN), and operate the NOC that watches cooling loops and route convergence in the same pane of glass. GridMetro is the connective tissue: the way your sites talk to each other and to local users. Between the site layer (GridSite), the fabric (GridMetro), and the application layer (your workloads), the metro finally behaves like what edge always promised—a short, reliable path between where compute lives and where value is created.