Building Scalable APIs with Go: A Startup’s Guide to Not Overengineering

Here’s an uncomfortable truth: most startups will never need to “scale.” Not in the way Twitter or Uber needed to scale. The infrastructure decisions that make sense for 100 million daily active users are actively harmful when you have 1,000.

This post is about right-sizing your API architecture. We’ll do the math on when you actually need to worry about scale, why containerization beats serverless for most cases, and how to build APIs that are simple enough to debug at 3 AM but robust enough to grow with you.

The Scale Math Most Startups Ignore

Let’s start with numbers. Engineers love talking about “millions of requests,” but let’s break down what that actually means.

AWS Lambda limits (as of 2024):

  • 1,000 concurrent executions (default, can request increase)
  • 10,000 concurrent executions (typical raised limit)
  • 15-minute maximum execution time
  • ~$0.20 per million requests + compute time

GCP Cloud Run limits:

  • 1,000 concurrent requests per instance (configurable)
  • Up to 1,000 instances per service (default)
  • No execution time limit for HTTP
  • ~$0.40 per vCPU-hour + requests

Now let’s work backwards from “I need more than Lambda can handle”:

TimeframeRequestsRequests/secReality Check
1 month10M~4/secSeed stage SaaS
1 month100M~40/secGrowing startup
1 month1B~400/secSeries B+
1 month10B~4,000/secYou have a scaling team

With 10,000 concurrent Lambda executions and assuming 100ms average response time, you can handle:

10,000 concurrent × (1000ms / 100ms) = 100,000 requests/second

That’s 8.6 billion requests per month.

Unless you’re building the next TikTok, Lambda can handle your traffic. The question isn’t whether serverless can scale—it’s whether it should be your architecture.

Why We Use Cloud Run (Not Lambda)

At SID Technologies, we run all our Go services on GCP Cloud Run. Not because we hit Lambda’s limits—we’re nowhere close—but because containerization makes life easier.

The Case Against Lambda for APIs

Cold starts matter for APIs. Lambda cold starts range from 100ms to several seconds depending on runtime, package size, and VPC configuration. For a CLI tool or batch job, who cares. For an API where p99 latency matters, those cold starts show up in your metrics.

Go on Lambda has relatively fast cold starts (~100-200ms), but Cloud Run keeps instances warm by default. Our p99 latency is consistent because we’re not paying the cold start tax on the first request after idle.

Local development is easier with containers. A Dockerfile that works locally works in production. Lambda’s deployment model requires the SAM CLI, emulators, and configuration that doesn’t match production. With Cloud Run:

# Local
docker run -p 8080:8080 my-service

# Production
gcloud run deploy my-service --image=gcr.io/project/my-service

Same container, same behavior.

Debugging is straightforward. When something breaks in production, I can pull the exact image, run it locally, and reproduce the issue. Lambda’s execution environment is opaque. You’re debugging through CloudWatch logs and guesswork.

No vendor lock-in. Our containers run on Cloud Run today. They could run on AWS ECS, Kubernetes, or a VPS tomorrow. Lambda functions require rewriting.

When Lambda Makes Sense

Lambda wins for:

  • Event-driven workloads (S3 triggers, SQS consumers)
  • Infrequent, bursty traffic (webhooks, cron jobs)
  • Cost optimization at very low scale (pay-per-invocation)

We use Lambda for background jobs and event processing. But for APIs? Containers.

The Simplest Architecture That Works

Here’s what our API architecture looks like:

                    ┌─────────────────┐
                    │   Cloud Load    │
                    │    Balancer     │
                    └────────┬────────┘

         ┌───────────────────┼───────────────────┐
         │                   │                   │
         ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│  auth-service   │ │ billing-service │ │   api-service   │
│  (Cloud Run)    │ │  (Cloud Run)    │ │  (Cloud Run)    │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
         └───────────────────┼───────────────────┘

                    ┌────────▼────────┐
                    │    Postgres     │
                    │  (Cloud SQL)    │
                    └─────────────────┘

That’s it. No Kubernetes. No service mesh. No API gateway (Cloud Run handles routing). No message queues for synchronous operations.

Each service:

  • Is a single Go binary in a Docker container
  • Connects directly to the database
  • Handles its own authentication via shared middleware
  • Scales independently based on CPU/memory

What We Don’t Have

  • No Kubernetes. Cloud Run is managed Kubernetes under the hood. We get the scaling benefits without the operational overhead. When we need K8s features, we’ll migrate. We don’t need them yet.

  • No API Gateway. Cloud Run services get HTTPS endpoints automatically. Authentication happens in middleware. Rate limiting happens in middleware. We don’t need a separate gateway service.

  • No Service Mesh. Our services talk to each other via HTTP. It’s simple. Yes, we could add Istio for observability and traffic management. We don’t need it yet.

  • No Event Bus for Sync Operations. If Service A needs data from Service B, it makes an HTTP call. We don’t wrap everything in Pub/Sub or Kafka. Events are for async operations (notifications, analytics), not request/response.

Go Patterns That Actually Matter

Forget the “high-throughput” patterns you read about for building the next trading platform. Here’s what actually matters for startup APIs:

Bounded Concurrency

The one pattern worth implementing properly: don’t spawn unbounded goroutines.

// Bad: unbounded goroutine creation
func ProcessRequests(requests <-chan Request) {
    for req := range requests {
        go handleRequest(req) // This will kill you under load
    }
}

// Good: worker pool with bounded concurrency
func ProcessWithPool(requests <-chan Request, numWorkers int) {
    for i := 0; i < numWorkers; i++ {
        go func() {
            for req := range requests {
                handleRequest(req)
            }
        }()
    }
}

For most APIs, you won’t need this. HTTP servers handle concurrency for you. But for background processing, batch jobs, or fan-out operations, bounded concurrency prevents OOM kills.

Timeouts on Everything

Every external call needs a timeout. Every database query. Every HTTP request. No exceptions.

ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()

result, err := db.QueryContext(ctx, query)

The default timeout should be aggressive. 5 seconds for HTTP calls. 10 seconds for database queries. If you need longer, make it explicit and justify it.

Circuit Breakers for External Services

When calling third-party APIs (Stripe, SendGrid, external services), implement circuit breakers:

type CircuitBreaker struct {
    failures    int
    maxFailures int
    resetAfter  time.Duration
    lastFailure time.Time
    mu          sync.Mutex
}

func (cb *CircuitBreaker) Execute(fn func() error) error {
    cb.mu.Lock()
    if cb.failures >= cb.maxFailures {
        if time.Since(cb.lastFailure) < cb.resetAfter {
            cb.mu.Unlock()
            return ErrCircuitOpen
        }
        cb.failures = 0 // Reset after timeout
    }
    cb.mu.Unlock()

    err := fn()
    if err != nil {
        cb.mu.Lock()
        cb.failures++
        cb.lastFailure = time.Now()
        cb.mu.Unlock()
    }
    return err
}

When Stripe has an outage, you don’t want every request in your system waiting 30 seconds for a timeout. Circuit breakers fail fast after detecting problems.

Structured Logging from Day One

Not a performance pattern, but critical for debugging:

logger.Info("request processed",
    "user_id", userID,
    "duration_ms", duration.Milliseconds(),
    "status", status,
)

Use slog (Go 1.21+) or zerolog. JSON output. Structured fields. When you’re debugging at 3 AM, grep and jq are your friends.

What Actually Causes Scale Problems

In my experience, scale problems at startups are rarely about request volume. They’re about:

N+1 Queries

Your API fetches a list of 100 users, then makes 100 database queries to get their profiles. This is 101 queries instead of 2.

// Bad: N+1
users, _ := db.Query("SELECT id FROM users LIMIT 100")
for users.Next() {
    var id int
    users.Scan(&id)
    profile, _ := db.Query("SELECT * FROM profiles WHERE user_id = ?", id)
    // ...
}

// Good: Single query with JOIN or IN clause
query := `
    SELECT u.id, p.*
    FROM users u
    LEFT JOIN profiles p ON p.user_id = u.id
    LIMIT 100
`

N+1 queries are the #1 cause of slow APIs I’ve seen. Use query logging in development to catch them.

Unbounded Result Sets

// Bad: fetching all records
users, _ := db.Query("SELECT * FROM users")

// Good: always paginate
users, _ := db.Query("SELECT * FROM users LIMIT 100 OFFSET ?", offset)

If a table can grow unboundedly, every query needs a LIMIT.

Missing Indexes

Your query is slow? Check EXPLAIN ANALYZE. Add an index. This solves 90% of database performance issues.

Synchronous External Calls

Don’t call Stripe, send emails, or hit external APIs in the request path if you can avoid it. Queue the work, return immediately, process async.

// Bad: synchronous email in request handler
func HandleSignup(w http.ResponseWriter, r *http.Request) {
    user := createUser(r)
    sendWelcomeEmail(user) // This blocks the response
    json.NewEncoder(w).Encode(user)
}

// Good: async via queue
func HandleSignup(w http.ResponseWriter, r *http.Request) {
    user := createUser(r)
    queue.Publish("send-welcome-email", user.ID)
    json.NewEncoder(w).Encode(user)
}

Monitoring: The Metrics That Matter

Track these, ignore everything else until you need it:

  1. Request latency (p50, p95, p99): p99 is where problems hide
  2. Error rate: 5xx responses as a percentage of total
  3. Database query time: Slow queries kill APIs
  4. Saturation: CPU and memory utilization

Cloud Run provides these out of the box. Add application-level tracing when you have more than one engineer debugging issues.

The Anti-Patterns to Avoid

Don’t Prematurely Distribute

If your services are in the same monorepo, deploying to the same cloud, and owned by the same team—consider whether they need to be separate services at all.

A single service with good module boundaries is easier to develop, test, and debug than three services that always deploy together.

Don’t Add Caching Before You Need It

Redis adds operational complexity. Before adding a cache:

  1. Check your database queries (indexes? N+1?)
  2. Check your database connection pooling
  3. Profile actual request latency

Caching hides problems. Fix the root cause first.

Don’t Use Kubernetes Until You Have a Platform Team

Kubernetes is powerful. Kubernetes is also a full-time job to operate. Cloud Run, ECS, and similar managed services give you 80% of the benefits with 10% of the complexity.

When you have dedicated platform engineers and multi-region requirements, revisit this decision.

Conclusion

The goal of API architecture at a startup is not to handle Twitter-scale traffic. It’s to:

  1. Ship fast: Simple architectures have fewer moving parts to break
  2. Debug easily: When things break at 3 AM, can you figure out why?
  3. Grow incrementally: The architecture should accommodate 10x growth without rewriting

Containers on Cloud Run give us all three. We write Go services, package them in Docker, and deploy with Pilum. No Kubernetes. No service mesh. No Lambda cold starts.

When we hit the limits of this architecture—when we’re processing billions of requests and our database is the bottleneck—we’ll evolve. But we won’t prematurely optimize for problems we don’t have.

Build for today. Monitor for tomorrow. Scale when the metrics tell you to, not when your ego does.