ExploreIt Scalability Architecture Proposal

From 53 to 2 Million Locations

Executive Review • March 2026

Executive Summary

53
Current Locations
2M
Target Locations
Key Insight: The "Pyramid of Attention" — Users need different data granularity at different zoom levels. Global view shows countries, city view shows neighborhoods, street view shows individual buildings.

Recommendation: Implement a Tiered Aggregation Architecture that pre-computes data at 4 levels of detail, reducing storage by 98% while maintaining sub-200ms response times.

The Problem

Scale Challenge

38,000×
Growth Factor

Current PostgreSQL database with 53 locations cannot handle 2 million records with spatial queries.

Technical Challenges

  • Query timeouts with large datasets
  • Unbounded memory usage
  • Wikipedia API rate limits
  • Image storage explosion (500GB+)
  • Viewport query performance
  • No caching strategy
Why Current Approach Fails: Querying 2M rows for every map pan/zoom creates database locks and 5-10 second response times. Users abandon the app.

Architecture Options

Option A: Simple Scale

Scale PostgreSQL vertically, add indexes, optimize queries

Verdict: Fails at >100K locations. Query times exceed 2 seconds.

⚠️ Option B: Viewport-Based Loading

Only load data visible in current map viewport

Verdict: Works but creates "loading holes" when panning. Complex implementation.

Option C: Tiered Aggregation (Recommended)

Pre-aggregate data into 4 tiers by zoom level (World → Country → City → Street)

Verdict: Predictable performance, 98% storage reduction, clean architecture.

Recommended Solution: Tiered Data Model

Tier 1: World View (Zoom 0-4)
~200 aggregated points representing countries/regions. Counts and summary data only.
Tier 2: Country View (Zoom 5-8)
~5,000 points. Major cities and landmarks per country. Basic metadata.
Tier 3: City View (Zoom 9-13)
~100,000 points. All locations within city bounds. Full metadata, no images.
Tier 4: Street View (Zoom 14+)
Full 2M locations. Complete data including images, descriptions, Wikipedia links.

Data Model & Storage Requirements

Data Tier Record Count Storage Latency
Tier 1 (World) 200 ~50 KB <10ms
Tier 2 (Country) 5,000 ~2 MB <20ms
Tier 3 (City) 100,000 ~40 MB <50ms
Tier 4 (Street) 2,000,000 ~1 GB <100ms
Redis Cache Hot data + tiles ~5 GB <5ms
Total 2.1M records ~12 GB

98% reduction vs. naive approach (would require 600GB+)

Image Strategy: Don't Store, Reference

Current Approach (Problem)

  • Download and store images locally
  • 2M locations × 250KB = 500GB storage
  • Storage costs: $200+/month
  • CDN bandwidth costs
  • Slow initial page loads

Recommended Approach

  • Store only image URLs (300MB total)
  • Load from Wikipedia CDN directly
  • Lazy loading on viewport entry
  • Progressive enhancement
  • Automatic image optimization
Cost Savings: $200/month → $20/month (90% reduction)
Performance Gain: First meaningful paint improves by 3-4 seconds

Performance Targets & SLAs

Metric Target Current Status
Global zoom (Tier 1) < 50ms N/A On Track
Country zoom (Tier 2) < 100ms N/A On Track
City zoom (Tier 3) < 150ms N/A On Track
Street zoom (Tier 4) < 200ms N/A On Track
Cache hit rate > 80% 0% On Track
Image load time < 2s 5-10s On Track

Implementation Roadmap (12 Weeks)

1

Phase 1: Foundation (Weeks 1-2)

Database schema design, PostgreSQL partitioning, Redis setup

2

Phase 2: Aggregation Engine (Weeks 3-5)

Build tiered aggregation pipeline, data transformation scripts

3

Phase 3: API Layer (Weeks 6-7)

Refactor location endpoints, implement tier selection logic

4

Phase 4: Image Strategy (Weeks 8-9)

Implement lazy loading, CDN integration, URL-only storage

5

Phase 5: Wikipedia Integration (Weeks 10-11)

Batch sync system, rate limiting, error handling

6

Phase 6: Testing & Deployment (Week 12)

Load testing, performance validation, production rollout

Infrastructure & Monthly Costs

Component Spec Cost/Month
PostgreSQL 4 vCPU, 8GB RAM, 100GB SSD $60
Redis Cache 6GB memory, clustered $40
Application Server 2 vCPU, 4GB RAM $35
CDN (Wikipedia) Free tier via Wikimedia $0
Bandwidth ~500GB/month estimated $20
Monitoring Datadog/DataDog equivalent $30
Total $185/month

Estimated first-year cost: $2,220 + $5,000 one-time development

Risk Assessment

Risk Severity Mitigation
Wikipedia API limits HIGH Implement exponential backoff, cache aggressively, request API key increase
Aggregation complexity MEDIUM Automated testing, gradual rollout, fallback to raw data
Cache invalidation MEDIUM Time-based TTL, versioned cache keys, manual flush capability
Image availability LOW Graceful degradation, placeholder images, fallback to description
Performance regression MEDIUM A/B testing, feature flags, rollback strategy

Success Metrics

Performance KPIs

  • Average API response: <100ms
  • 95th percentile: <300ms
  • Cache hit rate: >80%
  • Page load time: <2s
  • Zero query timeouts

Business KPIs

  • User session duration: +30%
  • Bounce rate: -20%
  • Map interactions: +50%
  • Global user reach: +200%

Technical Debt Metrics

  • Test coverage: >80%
  • Mean time to recovery: <30 minutes
  • Deployment frequency: Daily
  • Zero critical security vulnerabilities

Next Steps

Immediate Action Required: Approve Phase 1 budget and assign 2 backend engineers
1

Phase 1: Foundation

Begin database schema migration, set up Redis cluster, establish monitoring baseline

2

Immediate Actions (This Week)

• Finalize technical specification
• Request Wikipedia API key increase
• Set up staging environment

?

Go/No-Go Decision

Review Phase 1 results after Week 2. Proceed to Phase 2 if performance targets met.

Questions & Discussion

ExploreIt Scalability Architecture Proposal
From 53 to 2 Million Locations

Use arrow keys or buttons to navigate • ESC to exit

1 / 14