Cloud Architecture

Serverless Architecture: When It Makes Sense (And When It Really Doesn't)

TL;DR

Serverless is phenomenal for event-driven, bursty, stateless workloads — and absolutely terrible for long-running processes, stateful logic, and ML inference. The sweet spot is a hybrid: serverless at the edges (API handlers, webhooks, cron jobs, file processing) and containers at the core (business logic, data pipelines, anything that runs longer than 30 seconds). Do the math before you commit — a $47k Lambda bill is easier to rack up than you think.

March 8, 202623 min read
ServerlessAWS LambdaCloudArchitectureCost Optimization

Let me tell you about the most expensive Lambda function I've ever seen. A team I was consulting for decided to deploy a machine learning model — a PyTorch image classifier, about 1.2 GB with dependencies — as an AWS Lambda function. Their reasoning? "We're a serverless shop. Everything goes in Lambda."

The model took 8-12 seconds per inference. Each invocation allocated 3 GB of memory. They were processing product images from an e-commerce catalog — about 50,000 images per day, with spikes during new inventory uploads.

The first month's Lambda bill: $47,000.

For context, the same workload running on two g4dn.xlarge GPU instances would have cost about $1,100/month. They were paying 42x more for worse performance and longer latency. The cold starts alone were adding 15-20 seconds for the first request after idle periods, because Lambda had to load a 1.2 GB container image every time.

I helped them migrate the inference to ECS Fargate with GPU support in about a week. Their bill dropped to $1,400/month (slightly more than bare EC2 because Fargate has overhead, but the operational simplicity was worth it). The CTO told me it was the highest-ROI consulting engagement they'd ever had, which says more about the original decision than about my skills.

This story isn't about Lambda being bad. Lambda is an incredible piece of technology. This story is about using the right tool for the job — and serverless is a very specific tool for very specific jobs.

What Serverless Actually Means

Let's clear up some confusion, because "serverless" is one of the most overloaded terms in our industry.

┌─────────────────────────────────────────────────────────────────┐
│                   The Serverless Spectrum                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  "Serverless" in practice means:                                │
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ FaaS     │  │ Managed  │  │ Managed  │  │ Managed  │       │
│  │          │  │ Queues   │  │Databases │  │  Auth    │       │
│  │ Lambda   │  │ SQS/SNS  │  │ DynamoDB │  │ Cognito  │       │
│  │ Cloud    │  │ EventBr. │  │ Aurora   │  │ Auth0    │       │
│  │ Functions│  │          │  │ Serverl. │  │          │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│                                                                  │
│  What it IS:                   What it ISN'T:                   │
│                                                                  │
│  ✓ No servers to manage        ✗ No servers exist               │
│  ✓ Pay per execution           ✗ Always cheaper                 │
│  ✓ Auto-scales to zero         ✗ Infinitely scalable            │
│  ✓ Event-driven                ✗ Good for everything            │
│  ✓ Managed infrastructure      ✗ Zero operational overhead      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The key insight: serverless doesn't mean there are no servers. It means the servers are someone else's problem. You're trading control for convenience, and that trade-off has real implications for cost, performance, and flexibility.

The Serverless Mindset Shift

Serverless isn't just a deployment model — it's an architecture pattern. You don't just "put your app in Lambda." You restructure your application around events, stateless functions, and managed services. If you're trying to make a traditional web server work in Lambda, you're going to have a bad time.

The Sweet Spot: When Serverless Wins

After working with serverless in production across a dozen projects, I've identified the workloads where it genuinely shines. The common thread is: event-driven, bursty, stateless, and short-lived.

1. Webhook Handlers

This is the serverless killer app. Webhooks are inherently event-driven, unpredictable in volume, and need to respond quickly. You might get zero webhooks for an hour, then 10,000 in a minute when someone does a bulk operation.

// webhook-handler/index.ts
import { APIGatewayProxyEvent, APIGatewayProxyResult } from "aws-lambda";
import { verifyWebhookSignature } from "./utils/crypto";
import { processStripeEvent } from "./processors/stripe";
import { processGithubEvent } from "./processors/github";
 
export const handler = async (
  event: APIGatewayProxyEvent
): Promise<APIGatewayProxyResult> => {
  const source = event.pathParameters?.source;
  const body = JSON.parse(event.body || "{}");
  const signature = event.headers["x-webhook-signature"] || "";
 
  // Verify webhook authenticity
  if (!verifyWebhookSignature(body, signature, source)) {
    return { statusCode: 401, body: JSON.stringify({ error: "Invalid signature" }) };
  }
 
  try {
    switch (source) {
      case "stripe":
        await processStripeEvent(body);
        break;
      case "github":
        await processGithubEvent(body);
        break;
      default:
        return { statusCode: 400, body: JSON.stringify({ error: `Unknown source: ${source}` }) };
    }
 
    return { statusCode: 200, body: JSON.stringify({ received: true }) };
  } catch (error) {
    console.error("Webhook processing failed:", error);
    // Return 200 anyway to prevent retries — we'll handle via DLQ
    return { statusCode: 200, body: JSON.stringify({ received: true, queued: true }) };
  }
};

Always Return 200 for Webhooks

Return 200 even if processing fails, then route failures to a dead letter queue. Most webhook providers will retry on non-2xx responses, which can create thundering herd problems when your downstream service is already struggling. Accept the event, acknowledge receipt, and handle failures asynchronously.

2. File Processing Triggers

S3 event triggers are another area where serverless is genuinely the best option. When a file lands in a bucket, a Lambda fires. No polling, no idle servers waiting for files.

// image-processor/index.ts
import { S3Event } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import sharp from "sharp";
 
const s3 = new S3Client({});
 
interface ThumbnailSize {
  suffix: string;
  width: number;
  height: number;
}
 
const SIZES: ThumbnailSize[] = [
  { suffix: "thumb", width: 150, height: 150 },
  { suffix: "medium", width: 600, height: 600 },
  { suffix: "large", width: 1200, height: 1200 },
];
 
export const handler = async (event: S3Event): Promise<void> => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
 
    // Skip if this is already a processed image
    if (key.includes("/processed/")) continue;
 
    console.log(`Processing: ${bucket}/${key}`);
 
    const { Body } = await s3.send(
      new GetObjectCommand({ Bucket: bucket, Key: key })
    );
 
    const imageBuffer = Buffer.from(await Body!.transformToByteArray());
 
    // Generate all thumbnail sizes in parallel
    await Promise.all(
      SIZES.map(async (size) => {
        const resized = await sharp(imageBuffer)
          .resize(size.width, size.height, { fit: "inside", withoutEnlargement: true })
          .webp({ quality: 85 })
          .toBuffer();
 
        const outputKey = key.replace(
          /^uploads\//,
          `processed/${size.suffix}/`
        ).replace(/\.[^.]+$/, ".webp");
 
        await s3.send(
          new PutObjectCommand({
            Bucket: bucket,
            Key: outputKey,
            Body: resized,
            ContentType: "image/webp",
          })
        );
      })
    );
 
    console.log(`Generated ${SIZES.length} thumbnails for ${key}`);
  }
};

3. Scheduled Tasks (Cron Jobs)

Running a server 24/7 just to execute a task every 6 hours? That's literally what EventBridge + Lambda was designed for.

// daily-report/index.ts
import { ScheduledEvent } from "aws-lambda";
import { getActiveUsers, getRevenueMetrics, getSystemHealth } from "./metrics";
import { sendSlackReport } from "./notifications";
 
export const handler = async (event: ScheduledEvent): Promise<void> => {
  console.log("Generating daily report:", event.time);
 
  const [users, revenue, health] = await Promise.all([
    getActiveUsers({ period: "24h" }),
    getRevenueMetrics({ period: "24h" }),
    getSystemHealth(),
  ]);
 
  const report = {
    date: event.time,
    activeUsers: users.count,
    newSignups: users.newSignups,
    mrr: revenue.mrr,
    churnRate: revenue.churn,
    errorRate: health.errorRate,
    p99Latency: health.p99,
    alerts: health.activeAlerts,
  };
 
  await sendSlackReport(report);
 
  console.log("Daily report sent successfully");
};

Cold Starts: The Reality in 2026

Cold starts have been the #1 complaint about serverless since Lambda launched. Let's look at where things actually stand in 2026 with real benchmarks.

┌─────────────────────────────────────────────────────────────────┐
│                  Cold Start Benchmarks (2026)                    │
│                  128-512 MB Memory | us-east-1                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Runtime              P50         P95         P99               │
│  ─────────────────────────────────────────────────────────────  │
│  Node.js 22           130ms       280ms       410ms             │
│  Python 3.12          140ms       310ms       450ms             │
│  Go (AL2023)           35ms        80ms       120ms             │
│  Rust (AL2023)         30ms        75ms       110ms             │
│  Java 21 (SnapStart)  250ms       520ms       800ms             │
│  .NET 8 (Native AOT)  210ms       380ms       550ms             │
│                                                                  │
│  With Provisioned      0ms*        0ms*        0ms*             │
│  Concurrency          (* plus ~$15/month per provisioned unit)  │
│                                                                  │
│  Container Image       800ms     2,100ms     3,500ms            │
│  (1 GB image)                                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Some context for these numbers:

Node.js and Python have gotten remarkably fast. A 130ms cold start on a typical API Lambda is effectively invisible to end users. If your API response takes 200-500ms anyway, an extra 130ms on the first request after idle is fine. Most users won't notice.

Go and Rust are insanely fast. If cold starts are a concern and you can write Go or Rust, your cold start problem essentially doesn't exist. A 35ms cold start is faster than most network round trips.

Java with SnapStart was a game-changer. Before SnapStart, Java cold starts were 3-8 seconds — genuinely unusable for API workloads. SnapStart takes a snapshot of the initialized JVM and restores it, cutting cold starts by 80-90%.

Container images are still slow if they're large. If you're using Lambda with a 1 GB container image (which you might need for ML or heavy dependencies), cold starts are still painful. This is where that $47k story started.

Cold Start Traps

Memory allocation affects cold start duration. A Lambda with 128 MB gets less CPU and starts slower. Bumping to 512 MB or 1 GB can actually reduce cold starts (and total execution time) enough to be cheaper overall. Always benchmark — the cheapest memory setting is rarely the cheapest total cost.

Minimizing Cold Starts in Practice

// GOOD: Lazy-load heavy dependencies
// Only import what you need, when you need it
import type { DynamoDBClient } from "@aws-sdk/client-dynamodb";
 
let dynamoClient: DynamoDBClient | undefined;
 
function getDynamoClient(): DynamoDBClient {
  if (!dynamoClient) {
    // This import only happens on first invocation
    const { DynamoDBClient } = require("@aws-sdk/client-dynamodb");
    dynamoClient = new DynamoDBClient({});
  }
  return dynamoClient;
}
 
// BAD: Importing everything at the top level
// import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
// import { S3Client } from "@aws-sdk/client-s3";
// import { SESClient } from "@aws-sdk/client-ses";
// import { SNSClient } from "@aws-sdk/client-sns";
// Even if this handler only uses DynamoDB, ALL of these get loaded
 
export const handler = async (event: any) => {
  const client = getDynamoClient();
  // ... handler logic
};

Serverless Patterns That Work

After years of building with serverless, these are the patterns I reach for repeatedly. They're battle-tested and they scale.

Pattern 1: API Gateway + Lambda + DynamoDB

The bread and butter. This stack handles a surprising number of use cases.

┌─────────────────────────────────────────────────────────────────┐
│           The Serverless API Stack                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Client                                                         │
│    │                                                            │
│    ▼                                                            │
│  ┌──────────────┐                                               │
│  │ API Gateway  │  ← Auth, throttling, request validation       │
│  │  (HTTP API)  │                                               │
│  └──────┬───────┘                                               │
│         │                                                       │
│    ┌────┴────┐                                                  │
│    ▼         ▼                                                  │
│  ┌──────┐ ┌──────┐                                              │
│  │GET / │ │POST /│  ← Individual Lambda per route               │
│  │users │ │users │    (or shared handler with routing)          │
│  └──┬───┘ └──┬───┘                                              │
│     │        │                                                  │
│     ▼        ▼                                                  │
│  ┌──────────────┐                                               │
│  │   DynamoDB   │  ← Single-digit ms latency                   │
│  └──────────────┘                                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
// api/users/get.ts
import { APIGatewayProxyHandlerV2 } from "aws-lambda";
import { DynamoDBDocumentClient, GetCommand } from "@aws-sdk/lib-dynamodb";
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
 
const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const TABLE = process.env.USERS_TABLE!;
 
export const handler: APIGatewayProxyHandlerV2 = async (event) => {
  const userId = event.pathParameters?.id;
 
  if (!userId) {
    return { statusCode: 400, body: JSON.stringify({ error: "Missing user ID" }) };
  }
 
  const result = await ddb.send(
    new GetCommand({ TableName: TABLE, Key: { pk: `USER#${userId}`, sk: "PROFILE" } })
  );
 
  if (!result.Item) {
    return { statusCode: 404, body: JSON.stringify({ error: "User not found" }) };
  }
 
  return {
    statusCode: 200,
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(result.Item),
  };
};

Pattern 2: SQS + Lambda (Async Processing)

When you need to decouple the request from the processing. This is the "accept and process later" pattern.

// queue-processor/index.ts
import { SQSEvent, SQSBatchResponse } from "aws-lambda";
 
interface OrderEvent {
  orderId: string;
  userId: string;
  items: Array<{ productId: string; quantity: number }>;
  total: number;
}
 
export const handler = async (event: SQSEvent): Promise<SQSBatchResponse> => {
  const batchItemFailures: SQSBatchResponse["batchItemFailures"] = [];
 
  for (const record of event.Records) {
    try {
      const order: OrderEvent = JSON.parse(record.body);
 
      await processOrder(order);
 
      console.log(`Processed order ${order.orderId}`);
    } catch (error) {
      console.error(`Failed to process record ${record.messageId}:`, error);
      // Report individual item failure — SQS will retry only this message
      batchItemFailures.push({ itemIdentifier: record.messageId });
    }
  }
 
  // Partial batch response — only failed items go back to the queue
  return { batchItemFailures };
};
 
async function processOrder(order: OrderEvent): Promise<void> {
  // Validate inventory, charge payment, send confirmation...
  // Each step could be its own Lambda in a Step Functions workflow
}

Partial Batch Responses Are Essential

Always use ReportBatchItemFailures with SQS + Lambda. Without it, if one message in a batch of 10 fails, ALL 10 get retried. With it, only the failed message retries. I've seen teams burn through their SQS budget because they didn't know this feature existed.

Pattern 3: Step Functions for Orchestration

When your workflow has multiple steps, branching logic, or error handling needs, Step Functions are the way to go. Don't try to orchestrate complex workflows inside a single Lambda.

┌─────────────────────────────────────────────────────────────────┐
│           Order Processing Step Function                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────┐                                            │
│  │ Validate Order  │                                            │
│  └────────┬────────┘                                            │
│           │                                                     │
│     ┌─────┴─────┐                                               │
│     │  Valid?   │                                               │
│     └─┬───────┬─┘                                               │
│    Yes│       │No                                               │
│       ▼       ▼                                                 │
│  ┌─────────┐ ┌─────────────┐                                    │
│  │ Reserve │ │ Notify User │                                    │
│  │Inventory│ │  (Invalid)  │                                    │
│  └────┬────┘ └─────────────┘                                    │
│       │                                                         │
│       ▼                                                         │
│  ┌──────────────┐                                               │
│  │Process Payment│───── Fail ──▶ ┌──────────────┐               │
│  └──────┬───────┘               │Release Stock │               │
│         │ Success               │ + Notify     │               │
│         ▼                       └──────────────┘               │
│  ┌──────────────┐                                               │
│  │  Send        │                                               │
│  │  Confirmation│                                               │
│  └──────────────┘                                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Serverless Anti-Patterns: When It Really Doesn't Work

Now for the part that might save you from a $47k bill. These are workloads that look like they could be serverless but absolutely should not be.

Anti-Pattern 1: Long-Running Processes

Lambda has a 15-minute timeout. If your process might take longer than that, Lambda is the wrong choice. Even if it usually finishes in 5 minutes, hitting the timeout on 2% of invocations means 2% of your operations silently fail.

┌─────────────────────────────────────────────────────────────────┐
│              Lambda Timeout Reality                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Execution Time Distribution:                                    │
│                                                                  │
│  ████████████████████████████████                    Typical     │
│  █████████████████████████████████████████           P95         │
│  ███████████████████████████████████████████████     P99         │
│  ████████████████████████████████████████████████████████████ XX │
│  0 min    3 min     6 min     9 min    12 min   15 min TIMEOUT  │
│                                                                  │
│  That tail? That's where your data gets corrupted because       │
│  the Lambda died mid-operation with no graceful shutdown.       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Better alternative: ECS Fargate tasks or EKS jobs. No timeout limits, graceful shutdown, and you can run for hours if needed.

Anti-Pattern 2: ML Inference

This was the $47k story. ML models are typically:

  • Large (hundreds of MBs to several GBs) — slow cold starts
  • CPU/GPU intensive — Lambda gives you limited compute
  • Latency-sensitive — cold starts kill user experience
  • Steady-traffic — you're paying per-invocation for predictable load

Better alternative: SageMaker endpoints, ECS with GPU instances, or even a simple EC2 instance with an auto-scaling group.

Anti-Pattern 3: Stateful Workloads

Lambda functions are stateless by design. If your application needs to maintain state between requests — WebSocket connections, in-memory caches, session data — Lambda is fighting you.

// DON'T DO THIS — state is lost between invocations
let connectionPool: DatabasePool;
let cache: Map<string, CachedItem> = new Map();
let requestCount = 0;
 
export const handler = async (event: any) => {
  requestCount++; // This resets on cold start
  // cache may or may not exist depending on whether
  // this is a warm or cold invocation
  // connectionPool might be stale, might not exist
 
  // You're building on quicksand
};

The Warm Container Trap

Yes, Lambda reuses containers and your global state persists between warm invocations. No, you should not rely on this. AWS provides zero guarantees about container reuse. If your application breaks when the cache is empty or the connection pool is gone, your application has a bug — it just manifests randomly, which is worse.

Anti-Pattern 4: High-Throughput, Steady-State APIs

If your API handles 1,000+ requests per second consistently, 24/7, serverless is almost certainly more expensive than containers. Lambda's per-invocation pricing doesn't make sense when you have predictable, constant load.

Cost Analysis: Honest Math

This is the section that most serverless articles skip, and it's the most important one. Let's do real math with real numbers.

Scenario: REST API Handling User Requests

Assumptions: Average 200ms execution time, 256 MB memory, Node.js runtime.

┌─────────────────────────────────────────────────────────────────┐
│           Monthly Cost Comparison (us-east-1, 2026)              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Daily Requests    Lambda        Fargate (0.5 vCPU)   EC2       │
│  ──────────────────────────────────────────────────────────────  │
│  10,000/day        $0.86         $27.00               $15.00    │
│  100,000/day       $8.64         $27.00               $15.00    │
│  500,000/day       $43.20        $27.00               $15.00    │
│  1,000,000/day     $86.40        $54.00*              $30.00*   │
│  5,000,000/day     $432.00       $108.00*             $60.00*   │
│                                                                  │
│  * Scaled to handle load (2-4 instances)                        │
│                                                                  │
│  ┌──────────────────────────────────────────────┐               │
│  │ The Crossover Point: ~400,000 requests/day   │               │
│  │ Below this, Lambda wins. Above this,         │               │
│  │ containers win. Every time.                  │               │
│  └──────────────────────────────────────────────┘               │
│                                                                  │
│  Lambda pricing:                                                 │
│  $0.20 per 1M requests + $0.0000166667 per GB-second            │
│  200ms × 256MB = 0.05 GB-s per request                          │
│  Cost per request: $0.0000010334                                │
│                                                                  │
│  Fargate pricing:                                                │
│  0.5 vCPU + 1 GB = ~$0.037/hour = ~$27/month                   │
│  Can handle ~200 concurrent requests                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

But here's the thing these numbers don't capture: Lambda costs nothing when idle. If your traffic is bursty — heavy during business hours, dead at night and on weekends — Lambda's effective cost might be 60-70% of the steady-state calculation.

// Cost calculator — run this with your own numbers
interface CostParams {
  dailyRequests: number;
  avgExecutionMs: number;
  memoryMb: number;
  burstFactor: number; // 1.0 = steady, 3.0 = very bursty
}
 
function calculateMonthlyCost(params: CostParams): {
  lambda: number;
  fargate: number;
  recommendation: string;
} {
  const { dailyRequests, avgExecutionMs, memoryMb, burstFactor } = params;
 
  // Lambda costs
  const monthlyRequests = dailyRequests * 30;
  const gbSeconds = (avgExecutionMs / 1000) * (memoryMb / 1024) * monthlyRequests;
  const lambdaCost =
    (monthlyRequests / 1_000_000) * 0.20 + // Request cost
    gbSeconds * 0.0000166667; // Compute cost
 
  // Fargate costs (0.5 vCPU, 1 GB, handles ~200 req/s)
  const peakRps = (dailyRequests * burstFactor) / 86400;
  const instances = Math.max(1, Math.ceil(peakRps / 200));
  const fargateCost = instances * 0.037 * 24 * 30; // hourly rate × hours
 
  const recommendation =
    lambdaCost < fargateCost * 0.8
      ? "Lambda (significantly cheaper)"
      : lambdaCost < fargateCost
        ? "Lambda (slightly cheaper, but consider Fargate for simplicity)"
        : "Fargate (cheaper at this scale)";
 
  return { lambda: Math.round(lambdaCost * 100) / 100, fargate: Math.round(fargateCost * 100) / 100, recommendation };
}
 
// Example: your typical SaaS API
console.log(calculateMonthlyCost({
  dailyRequests: 100_000,
  avgExecutionMs: 200,
  memoryMb: 256,
  burstFactor: 2.5,
}));
// { lambda: 8.64, fargate: 27.00, recommendation: "Lambda (significantly cheaper)" }

Don't Forget the Hidden Costs

Lambda pricing doesn't include API Gateway ($1/million requests for HTTP API, $3.50/million for REST API), CloudWatch Logs (often $5-20/month for active Lambdas), X-Ray tracing, or the time your team spends debugging distributed systems. Containers have simpler observability and debugging stories. Factor everything in.

The Hybrid Approach: Serverless Edges, Container Core

After building and operating serverless systems for years, the pattern I keep coming back to is the hybrid approach. It's not as clean or ideological as "full serverless" or "all containers," but it works.

┌─────────────────────────────────────────────────────────────────┐
│               The Hybrid Architecture                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  SERVERLESS EDGE                    CONTAINER CORE              │
│  (Event-driven, bursty)            (Steady-state, complex)     │
│                                                                  │
│  ┌──────────────────┐              ┌──────────────────┐        │
│  │ API Gateway      │              │ ECS/EKS Cluster  │        │
│  │ + Lambda         │──requests──▶│                  │        │
│  │ (lightweight     │              │ Core business    │        │
│  │  auth, routing,  │              │ logic, complex   │        │
│  │  validation)     │              │ queries, data    │        │
│  └──────────────────┘              │ processing       │        │
│                                     │                  │        │
│  ┌──────────────────┐              │                  │        │
│  │ S3 Triggers      │──events───▶│                  │        │
│  │ (file upload     │              │                  │        │
│  │  processing)     │              └──────────────────┘        │
│  └──────────────────┘                      │                   │
│                                            │                   │
│  ┌──────────────────┐              ┌───────▼──────────┐        │
│  │ EventBridge      │              │ RDS / ElastiCache│        │
│  │ + Lambda         │              │ (Persistent      │        │
│  │ (cron jobs,      │              │  state)          │        │
│  │  notifications)  │              └──────────────────┘        │
│  └──────────────────┘                                          │
│                                                                  │
│  ┌──────────────────┐                                          │
│  │ SQS + Lambda     │                                          │
│  │ (async tasks,    │                                          │
│  │  email sending,  │                                          │
│  │  webhooks)       │                                          │
│  └──────────────────┘                                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The philosophy is simple:

  • Serverless for the edges: API routing, webhooks, file processing, cron jobs, async tasks. These are bursty, event-driven, and stateless — exactly what Lambda excels at.
  • Containers for the core: Business logic, data processing, anything that maintains state or runs longer than a few seconds. Predictable cost, full control, easy debugging.

This isn't a compromise — it's using each technology where it's strongest.

Implementing the Hybrid: API Gateway as a Router

// infra/api-stack.ts (AWS CDK)
import * as cdk from "aws-cdk-lib";
import * as apigateway from "aws-cdk-lib/aws-apigatewayv2";
import * as lambda from "aws-cdk-lib/aws-lambda";
 
export class ApiStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string) {
    super(scope, id);
 
    const httpApi = new apigateway.HttpApi(this, "HttpApi", {
      corsPreflight: {
        allowOrigins: ["https://myapp.com"],
        allowMethods: [apigateway.CorsHttpMethod.ANY],
      },
    });
 
    // Lightweight routes → Lambda
    // Auth check, input validation, then forward to core service
    const authHandler = new lambda.Function(this, "AuthHandler", {
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: "index.handler",
      code: lambda.Code.fromAsset("functions/auth"),
      memorySize: 256,
      timeout: cdk.Duration.seconds(10),
    });
 
    // Heavy routes → ALB → ECS
    // Complex business logic stays in containers
    const albIntegration = new apigateway.HttpAlbIntegration(
      "CoreServiceIntegration",
      this.coreServiceAlb // Reference to your ECS ALB
    );
 
    httpApi.addRoutes({
      path: "/api/auth/{proxy+}",
      methods: [apigateway.HttpMethod.ANY],
      integration: new apigateway.HttpLambdaIntegration("AuthIntegration", authHandler),
    });
 
    httpApi.addRoutes({
      path: "/api/v1/{proxy+}",
      methods: [apigateway.HttpMethod.ANY],
      integration: albIntegration,
    });
  }
}

Vendor Lock-In and Exit Strategies

Let's talk about the elephant in the room. Serverless, more than almost any other architectural choice, ties you to a specific cloud provider. Your Lambda functions use AWS SDK, your DynamoDB access patterns are DynamoDB-specific, your Step Functions workflows are AWS-specific.

Is this a problem? It depends on how likely you are to switch clouds. For most companies, the answer is "approximately never." But it's still smart to minimize unnecessary coupling.

The Hexagonal Architecture Approach

Keep your business logic cloud-agnostic by wrapping cloud services behind interfaces:

// ports/storage.ts — the interface (cloud-agnostic)
export interface StoragePort {
  get(key: string): Promise<Buffer | null>;
  put(key: string, data: Buffer, contentType: string): Promise<void>;
  delete(key: string): Promise<void>;
  listByPrefix(prefix: string): Promise<string[]>;
}
 
// adapters/s3-storage.ts — AWS implementation
import { S3Client, GetObjectCommand, PutObjectCommand, DeleteObjectCommand, ListObjectsV2Command } from "@aws-sdk/client-s3";
 
export class S3Storage implements StoragePort {
  private client: S3Client;
  private bucket: string;
 
  constructor(bucket: string) {
    this.client = new S3Client({});
    this.bucket = bucket;
  }
 
  async get(key: string): Promise<Buffer | null> {
    try {
      const response = await this.client.send(
        new GetObjectCommand({ Bucket: this.bucket, Key: key })
      );
      return Buffer.from(await response.Body!.transformToByteArray());
    } catch (err: any) {
      if (err.name === "NoSuchKey") return null;
      throw err;
    }
  }
 
  async put(key: string, data: Buffer, contentType: string): Promise<void> {
    await this.client.send(
      new PutObjectCommand({ Bucket: this.bucket, Key: key, Body: data, ContentType: contentType })
    );
  }
 
  async delete(key: string): Promise<void> {
    await this.client.send(
      new DeleteObjectCommand({ Bucket: this.bucket, Key: key })
    );
  }
 
  async listByPrefix(prefix: string): Promise<string[]> {
    const response = await this.client.send(
      new ListObjectsV2Command({ Bucket: this.bucket, Prefix: prefix })
    );
    return (response.Contents || []).map((obj) => obj.Key!);
  }
}
 
// adapters/gcs-storage.ts — GCP implementation (if you ever need it)
// Same interface, different cloud SDK
// handlers/process-upload.ts — uses the port, not the adapter
import { StoragePort } from "../ports/storage";
 
export function createUploadProcessor(storage: StoragePort) {
  return async (fileKey: string): Promise<void> => {
    const file = await storage.get(fileKey);
    if (!file) throw new Error(`File not found: ${fileKey}`);
 
    // Process file... (pure business logic, no AWS imports)
    const processed = await transformFile(file);
 
    await storage.put(`processed/${fileKey}`, processed, "application/octet-stream");
  };
}

Pragmatic Lock-In Management

Don't abstract everything on day one — that's premature generalization. Wrap the services you're most likely to swap (storage, queues, databases). Accept direct coupling for services that are deeply integrated (Step Functions, EventBridge). The goal isn't zero lock-in — it's manageable switching cost.

Infrastructure as Code Is Your Exit Plan

Even if your code has cloud-specific imports, having your entire infrastructure defined in CDK or Terraform means you have a complete blueprint of your system. If you ever need to move, you know exactly what needs to be rebuilt.

┌─────────────────────────────────────────────────────────────────┐
│              Lock-In Risk Assessment                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Service              Lock-In Level    Migration Effort          │
│  ─────────────────────────────────────────────────────────────  │
│  Lambda (compute)     Low              Wrap in container         │
│  API Gateway          Low              Any reverse proxy         │
│  DynamoDB             HIGH             Redesign data model       │
│  Step Functions       HIGH             Rewrite orchestration     │
│  SQS/SNS              Medium           RabbitMQ/Kafka swap       │
│  S3                   Low              Any object storage        │
│  CloudWatch           Medium           Datadog/Grafana swap      │
│  Cognito              HIGH             Auth0/custom auth         │
│                                                                  │
│  Rule of thumb: data layer lock-in hurts the most.              │
│  Compute layer lock-in is usually manageable.                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

My Serverless Decision Framework

After everything we've covered, here's the decision framework I actually use when a new workload comes in:

┌─────────────────────────────────────────────────────────────────┐
│              Should This Be Serverless?                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. How long does it run?                                       │
│     > 15 min ──────────────────▶ Containers/VMs                 │
│     < 15 min ──────────────────▶ Continue                       │
│                                                                  │
│  2. What's the traffic pattern?                                 │
│     Bursty/unpredictable ──────▶ Strong serverless signal       │
│     Steady 24/7 ───────────────▶ Probably containers            │
│                                                                  │
│  3. Is it stateless?                                            │
│     Yes ───────────────────────▶ Continue                       │
│     No ────────────────────────▶ Containers                     │
│                                                                  │
│  4. Package size?                                               │
│     < 250 MB ──────────────────▶ Lambda (zip)                   │
│     250 MB - 1 GB ─────────────▶ Lambda (container image)      │
│     > 1 GB ────────────────────▶ Containers                     │
│                                                                  │
│  5. Cost at expected scale?                                     │
│     < crossover point ─────────▶ Serverless                    │
│     > crossover point ─────────▶ Containers                    │
│                                                                  │
│  6. Latency requirements?                                       │
│     > 200ms acceptable ────────▶ Serverless is fine             │
│     < 100ms required ──────────▶ Provisioned or containers     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Wrapping Up

Serverless isn't a religion. It's a tool. A very good tool, for the right problems.

The pattern I keep returning to after years of production experience:

  1. Start with the workload characteristics, not the technology. Bursty? Stateless? Short-lived? Event-driven? Serverless is probably your best bet.
  2. Do the math. Lambda's free tier is generous and per-invocation pricing is incredible for low-traffic workloads. But the curve crosses containers faster than most people think.
  3. Go hybrid. Use serverless at the edges where its strengths shine — API routing, webhooks, cron jobs, file processing. Use containers for the core business logic where you need control, state, and predictable costs.
  4. Design for portability where it's cheap to do so. Hexagonal architecture, infrastructure as code, and clean interfaces between your business logic and cloud services.
  5. Monitor costs obsessively. Set up billing alerts. Review Lambda costs weekly during the first month of any new deployment. That $47k bill wasn't a one-time event — it was a month of nobody looking at the numbers.

The best architecture isn't the most serverless one or the most container-native one. It's the one where each component uses the technology that best fits its workload characteristics, cost profile, and operational requirements. That's not a very exciting conclusion, but it's an honest one — and in this industry, honest engineering beats hype every time.

Frequently Asked Questions

Don't miss a post

Articles on AI, engineering, and lessons I learn building things. No spam, I promise.

OR

Osvaldo Restrepo

Senior Full Stack AI & Software Engineer. Building production AI systems that solve real problems.