AWS Cloud Architecture: From EC2 to Serverless - A Solutions Architect's Guide

As an AWS Solutions Architect, I've designed dozens of production systems on AWS. This guide distills the lessons learned into practical patterns you can apply immediately. We'll cover compute, storage, databases, networking, and security with real-world examples.

Compute: Choosing the Right Service

AWS offers multiple compute options. Here's when to use each:

1# Decision Tree for Compute Selection
2
3# EC2 (Virtual Machines)
4When to use:
5  - Need full control over OS and runtime
6  - Long-running processes (24/7 applications)
7  - Legacy applications that can't be containerized
8  - Specific instance types needed (GPU, high memory)
9
10Example: Web application with persistent connections
11Instance: t3.medium (2 vCPU, 4GB RAM)
12Cost: ~$30/month (with Reserved Instance)
13
14# AWS Lambda (Serverless Functions)
15When to use:
16  - Event-driven, short-lived tasks (<15 minutes)
17  - Unpredictable or spiky traffic patterns
18  - Want zero server management
19  - Infrequent execution (pay per invocation)
20
21Example: Image processing on S3 upload
22Cost: $0.20 per 1 million requests + compute time
23Free tier: 1M requests/month
24
25# ECS/EKS (Container Orchestration)
26When to use:
27  - Microservices architecture
28  - Need portability across clouds
29  - Team already uses Docker/Kubernetes
30  - Want easier scaling than EC2
31
32Example: API microservices cluster
33ECS Fargate: Pay per task, no EC2 management
34Cost: ~$50/month for 2 tasks (0.5 vCPU, 1GB each)
35
36# Elastic Beanstalk (Platform as a Service)
37When to use:
38  - Quick deployments without infrastructure code
39  - Standard web applications (Node, Python, Java, etc.)
40  - Small teams without DevOps expertise
41
42Example: Django/Flask web app
43Handles EC2, load balancer, auto-scaling automatically

EC2 Best Practices

1# 1. Use Auto Scaling for resilience and cost optimization
2# Auto Scaling Group configuration example
3
4# Launch Template for consistent EC2 configuration
5aws ec2 create-launch-template \
6  --launch-template-name app-server-template \
7  --version-description "v1" \
8  --launch-template-data '{
9    "ImageId": "ami-0c55b159cbfafe1f0",
10    "InstanceType": "t3.medium",
11    "SecurityGroupIds": ["sg-0123456789abcdef0"],
12    "UserData": "IyEvYmluL2Jhc2gKY3VybCAtc1MgaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL215YXBwL2luc3RhbGwuc2ggfCBiYXNo",
13    "IamInstanceProfile": {
14      "Arn": "arn:aws:iam::123456789012:instance-profile/app-server-role"
15    },
16    "TagSpecifications": [{
17      "ResourceType": "instance",
18      "Tags": [
19        {"Key": "Name", "Value": "AppServer"},
20        {"Key": "Environment", "Value": "production"}
21      ]
22    }]
23  }'
24
25# Create Auto Scaling Group
26aws autoscaling create-auto-scaling-group \
27  --auto-scaling-group-name app-server-asg \
28  --launch-template "LaunchTemplateName=app-server-template,Version=1" \
29  --min-size 2 \
30  --max-size 10 \
31  --desired-capacity 3 \
32  --target-group-arns "arn:aws:elasticloadbalancing:region:account:targetgroup/app-tg" \
33  --health-check-type ELB \
34  --health-check-grace-period 300 \
35  --vpc-zone-identifier "subnet-abc123,subnet-def456,subnet-ghi789"
36
37# 2. Use Reserved Instances for cost savings (up to 75% off)
38aws ec2 purchase-reserved-instances-offering \
39  --reserved-instances-offering-id offering-id \
40  --instance-count 3
41
42# 3. Enable detailed monitoring for better insights
43aws ec2 monitor-instances --instance-ids i-1234567890abcdef0
44
45# 4. Use IMDSv2 for better security
46aws ec2 modify-instance-metadata-options \
47  --instance-id i-1234567890abcdef0 \
48  --http-tokens required \
49  --http-put-response-hop-limit 1

Lambda Patterns and Pitfalls

1// Lambda Best Practices with Node.js
2
3// 1. Initialize outside handler for connection reuse
4const AWS = require('aws-sdk');
5const s3 = new AWS.S3(); // ✅ Reused across invocations
6const dbConnection = createDBConnection(); // ✅ Connection pooling
7
8exports.handler = async (event) => {
9  // ❌ Don't initialize here - creates new connection every time
10  // const s3 = new AWS.S3();
11  
12  try {
13    // 2. Use async/await for cleaner error handling
14    const data = await s3.getObject({
15      Bucket: process.env.BUCKET_NAME,
16      Key: event.key
17    }).promise();
18    
19    // 3. Process data
20    const result = processData(data.Body);
21    
22    // 4. Return proper response
23    return {
24      statusCode: 200,
25      headers: {
26        'Content-Type': 'application/json',
27        'Access-Control-Allow-Origin': '*' // CORS
28      },
29      body: JSON.stringify(result)
30    };
31  } catch (error) {
32    // 5. Log errors for CloudWatch
33    console.error('Error:', error);
34    
35    return {
36      statusCode: 500,
37      body: JSON.stringify({ error: error.message })
38    };
39  }
40};
41
42// 6. Use environment variables for configuration
43const CONFIG = {
44  bucket: process.env.BUCKET_NAME,
45  table: process.env.DYNAMODB_TABLE,
46  apiKey: process.env.API_KEY // Use AWS Secrets Manager for sensitive data
47};
48
49// 7. Implement retry logic for external APIs
50async function fetchWithRetry(url, maxRetries = 3) {
51  for (let i = 0; i < maxRetries; i++) {
52    try {
53      const response = await fetch(url);
54      return await response.json();
55    } catch (error) {
56      if (i === maxRetries - 1) throw error;
57      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
58    }
59  }
60}
61
62// 8. Cold start optimization
63// Keep functions warm with scheduled events
64// CloudWatch Events rule: rate(5 minutes)
65exports.warmUp = async (event) => {
66  if (event.source === 'aws.events') {
67    console.log('Warming up...');
68    return { statusCode: 200 };
69  }
70  return exports.handler(event);
71};

⚠️ Lambda Caveats: 15-minute max execution time, 10GB memory limit, /tmp storage is ephemeral (512MB max), cold starts can add 1-3 seconds latency for first request.

S3 Storage Patterns

1import boto3
2import json
3from datetime import datetime, timedelta
4
5s3 = boto3.client('s3')
6
7# 1. Lifecycle policies for cost optimization
8def create_lifecycle_policy(bucket_name):
9    """
10    Automatically transition objects to cheaper storage classes
11    """
12    lifecycle_config = {
13        'Rules': [
14            {
15                'Id': 'Archive old logs',
16                'Status': 'Enabled',
17                'Prefix': 'logs/',
18                'Transitions': [
19                    {
20                        'Days': 30,
21                        'StorageClass': 'STANDARD_IA'  # Infrequent Access
22                    },
23                    {
24                        'Days': 90,
25                        'StorageClass': 'GLACIER'  # Long-term archive
26                    }
27                ],
28                'Expiration': {
29                    'Days': 365  # Delete after 1 year
30                }
31            },
32            {
33                'Id': 'Delete incomplete multipart uploads',
34                'Status': 'Enabled',
35                'AbortIncompleteMultipartUpload': {
36                    'DaysAfterInitiation': 7
37                }
38            }
39        ]
40    }
41    
42    s3.put_bucket_lifecycle_configuration(
43        Bucket=bucket_name,
44        LifecycleConfiguration=lifecycle_config
45    )
46
47# 2. Pre-signed URLs for secure temporary access
48def generate_upload_url(bucket, key, expiration=3600):
49    """
50    Allow users to upload directly to S3 without AWS credentials
51    """
52    url = s3.generate_presigned_url(
53        'put_object',
54        Params={'Bucket': bucket, 'Key': key},
55        ExpiresIn=expiration,
56        HttpMethod='PUT'
57    )
58    return url
59
60def generate_download_url(bucket, key, expiration=300):
61    """
62    Share private files temporarily
63    """
64    url = s3.generate_presigned_url(
65        'get_object',
66        Params={'Bucket': bucket, 'Key': key},
67        ExpiresIn=expiration
68    )
69    return url
70
71# 3. Enable versioning for data protection
72def enable_versioning(bucket_name):
73    s3.put_bucket_versioning(
74        Bucket=bucket_name,
75        VersioningConfiguration={'Status': 'Enabled'}
76    )
77
78# 4. Server-side encryption
79def upload_with_encryption(bucket, key, data):
80    s3.put_object(
81        Bucket=bucket,
82        Key=key,
83        Body=data,
84        ServerSideEncryption='AES256',  # or 'aws:kms' for KMS encryption
85        # For KMS:
86        # SSEKMSKeyId='arn:aws:kms:region:account:key/key-id'
87    )
88
89# 5. Event notifications for processing
90def setup_lambda_trigger(bucket_name, lambda_arn):
91    """
92    Trigger Lambda when files are uploaded
93    """
94    s3.put_bucket_notification_configuration(
95        Bucket=bucket_name,
96        NotificationConfiguration={
97            'LambdaFunctionConfigurations': [
98                {
99                    'LambdaFunctionArn': lambda_arn,
100                    'Events': ['s3:ObjectCreated:*'],
101                    'Filter': {
102                        'Key': {
103                            'FilterRules': [
104                                {'Name': 'prefix', 'Value': 'uploads/'},
105                                {'Name': 'suffix', 'Value': '.jpg'}
106                            ]
107                        }
108                    }
109                }
110            ]
111        }
112    )
113
114# Storage Class Cost Comparison (per GB/month in us-east-1):
115# S3 Standard: $0.023
116# S3 Intelligent-Tiering: $0.023 (auto-moves between tiers)
117# S3 Standard-IA: $0.0125 (minimum 30-day storage)
118# S3 One Zone-IA: $0.01 (single AZ, less resilient)
119# S3 Glacier: $0.004 (retrieval time: minutes to hours)
120# S3 Glacier Deep Archive: $0.00099 (retrieval time: 12 hours)

RDS and Database Selection

1-- RDS Best Practices
2
3-- 1. Enable automated backups (retention 7-35 days)
4-- AWS Console or CLI:
5-- aws rds modify-db-instance --db-instance-identifier mydb \
6--   --backup-retention-period 7 --preferred-backup-window "03:00-04:00"
7
8-- 2. Use Multi-AZ for high availability (automatic failover)
9-- Cost: 2x single AZ, but essential for production
10-- Failover time: typically 60-120 seconds
11
12-- 3. Read replicas for read-heavy workloads
13CREATE READ REPLICA mydb-read-1
14  FROM mydb-source
15  IN REGION us-east-1;
16
17-- Application connection strategy:
18-- Write operations → Primary instance
19-- Read operations → Read replica (load balance across multiple replicas)
20
21-- 4. Parameter groups for optimization
22-- Example MySQL parameters:
23SET GLOBAL max_connections = 500;
24SET GLOBAL slow_query_log = 1;
25SET GLOBAL long_query_time = 2;  -- Log queries >2 seconds
26SET GLOBAL innodb_buffer_pool_size = '12G';  -- 70-80% of instance RAM
27
28-- 5. Connection pooling (application side)
29-- ❌ Bad: New connection per query
30-- connection = mysql.connect(host, user, password, database)
31-- connection.execute(query)
32-- connection.close()
33
34-- ✅ Good: Connection pool
35-- pool = mysql.createPool({
36--   host: process.env.DB_HOST,
37--   user: process.env.DB_USER,
38--   password: process.env.DB_PASSWORD,
39--   database: process.env.DB_NAME,
40--   connectionLimit: 10,
41--   queueLimit: 0
42-- });
43
44-- 6. Use RDS Proxy for serverless applications
45-- Benefits:
46--   - Connection pooling (reduces DB connections)
47--   - Automatic failover (no code changes)
48--   - IAM authentication support
49--   - Enforces SSL/TLS
50
51-- When to use alternatives:
52-- DynamoDB: Key-value access, need single-digit ms latency, unpredictable scale
53-- Aurora Serverless: Variable workload, want auto-scaling database
54-- ElastiCache (Redis/Memcached): Caching layer, session storage, real-time analytics

Security Best Practices

1{
2  "security_checklist": {
3    "iam": {
4      "principles": [
5        "Use IAM roles, never hardcode credentials",
6        "Follow least privilege principle",
7        "Enable MFA for root and admin users",
8        "Rotate access keys every 90 days",
9        "Use AWS Organizations for multi-account setup"
10      ],
11      "example_policy": {
12        "Version": "2012-10-17",
13        "Statement": [
14          {
15            "Effect": "Allow",
16            "Action": [
17              "s3:GetObject",
18              "s3:PutObject"
19            ],
20            "Resource": "arn:aws:s3:::my-bucket/uploads/*",
21            "Condition": {
22              "IpAddress": {
23                "aws:SourceIp": "203.0.113.0/24"
24              }
25            }
26          }
27        ]
28      }
29    },
30    "network": {
31      "vpc_design": "Use private subnets for databases and apps, public for load balancers",
32      "security_groups": "Whitelist specific ports, not 0.0.0.0/0",
33      "nacls": "Stateless firewall at subnet level",
34      "flow_logs": "Enable VPC Flow Logs for audit trail"
35    },
36    "data_protection": {
37      "encryption_at_rest": "Enable for S3, EBS, RDS, DynamoDB",
38      "encryption_in_transit": "Use TLS/SSL for all connections",
39      "secrets_management": "AWS Secrets Manager or Systems Manager Parameter Store",
40      "key_management": "AWS KMS for encryption key management"
41    },
42    "monitoring": {
43      "cloudwatch": "Set alarms for CPU, disk, memory, error rates",
44      "cloudtrail": "Log all API calls for compliance",
45      "guardduty": "Threat detection service",
46      "config": "Track configuration changes"
47    }
48  }
49}

Cost Optimization Strategies

Right-sizing: Use AWS Compute Optimizer recommendations to match instance types to actual usage
Reserved Instances: Commit to 1 or 3 years for 40-75% savings on predictable workloads
Savings Plans: Flexible alternative to RIs with similar discounts
Spot Instances: Use for fault-tolerant workloads (up to 90% off), great for batch processing
S3 Intelligent-Tiering: Automatically moves data between access tiers based on usage patterns
Delete unused resources: EBS volumes, snapshots, Elastic IPs, old AMIs
Use AWS Cost Explorer: Identify spending trends and anomalies
Tag everything: Enable cost allocation tags for departmental charge-back
Auto-scaling: Scale down during off-hours with scheduled actions
CloudFront CDN: Reduce data transfer costs by caching at edge locations

Key Architecture Patterns

3-Tier Architecture: Load Balancer → App Servers (private subnet) → Database (private subnet)
Serverless API: API Gateway → Lambda → DynamoDB (no servers to manage)
Microservices: ECS/EKS with Service Mesh for inter-service communication
Event-Driven: S3 → EventBridge → Lambda for decoupled processing
Static Website: S3 + CloudFront + Route 53 (costs <$1/month)
High Availability: Multi-AZ deployments with Auto Scaling Groups across 3 AZs
Disaster Recovery: Pilot Light or Warm Standby in secondary region
Data Lake: S3 → Glue → Athena for analytics without moving data

These patterns form the foundation of scalable AWS architectures. Always start with the Well-Architected Framework's five pillars: Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization. Design for failure, automate everything, and monitor relentlessly.