How to do serverless in AWS?

How to do serverless in AWS?

·

11 min read

AWS serverless ecosystem consists of a lot of tools and services. Although there are couple of guides on the implementation details of individual services, there is not enough material highlighting the principles involved in choosing an AWS service. A wrong choice can be a costly mistake when our serverless system hits scale. This article is more of a survey of serverless fundamentals in AWS. It will briefly touch upon composition paradigms and the services which can be used for any of these paradigms. Common use cases of AWS services and their comparison will also be discussed.

Before we proceed to architect a serverless system, we must understand what serverless systems offer and what are the common use cases for serverless systems.

Why serverless?

Serverless is no silver bullet and it promises certain things which makes it useful for certain scenarios.

  • It allows us to prototype things faster and thus enables us to move from ideas to market.
  • It makes us focus more on our product instead of spending time on infrastructure management.
  • It is a low cost solution because it follows the Pay as you use model.
  • Scalability and high availability are built into the AWS services.

Considering the promises serverless do, there are some use cases for which serverless makes more sense.

Serverless architecture is best used to perform short-lived tasks and manage workloads that experience infrequent or unpredictable traffic. Some of the use cases are:

  • Trigger Based Tasks: Any user activity that triggers an event or a series of events is a good candidate for serverless architecture.
  • Building Restful APIs: You can leverage Amazon API Gateway with serverless functions to build REST APIs that scale with demand.
  • Asynchronous Processing: Serverless functions can handle behind-the-scenes application tasks, such as rendering product information or transcoding videos after upload, without interrupting the flow of the application or adding user-facing latency.
  • Web Applications: Evolving nature of SaaS products and web applications make them suitable candidate for serverless.

How to design serverless architecture?

Now we understand the benefits and popular use cases for serverless, we will jump to the fundamentals of serverless systems.

Whenever we are designing serverless or microservices, we must define the events in our system, bounded context of our APIs and the composition paradigm we will be choosing for our system.

Events & Bounded Context

It is better to divide large business operations to small bounded contexts. This enables us to clearly define the responsibility of any given part of our system. We can define this process to three steps:

  1. Be explicit about the inter-relationship of our bounded contexts (small broken pieces of our large business operation).
  2. Define events and service boundaries for these bounded contexts.
  3. Group events by reducing data dependencies.

This infoq article has a good guide on bounded contexts.

A bounded context can be any part of our system clearly depicting the flow. One example of a bounded context in a serverless system can be seen in the following diagram.

Bounded Context .drawio (2).png

We'll discuss the design decisions about these services later in the article.

Composition

There are two paradigms to compose serverless systems:

  1. Orchestration
  2. Choreography

Each of these paradigms have different characteristics which enable us to make the choice for our use case.

Orchestration

  • Synchronous - Request Response paradigm
  • Services are controlled by a centralized function or process.
  • The controller function needs to wait for each downstream process to complete and return a response.
  • Services are very tightly coupled
  • Failure could stop the entire process / flow

Choreography

  • An asynchronous or event driven paradigm
  • Services work independently and are decoupled from each other
  • Failures and retries are easier to manage
  • Difficult to implement timeouts.
  • End-to-end monitoring and reporting are comparatively difficult.

Since we understand bounded contexts and composition paradigms, we will move on to implementation of these principles in AWS.

Serverless tools in AWS

There are several serverless services in AWS. Some of them are listed down below in their respective category.

  • Compute
    • AWS Lambda
    • AWS Fargate
  • Application Integration
    • AWS Event Bridge
    • AWS SQS
    • AWS SNS
    • AWS Step Functions
    • AWS API Gateway
    • AWS AppSync
  • Data Store
    • AWS S3
    • AWS DynamoDB
    • AWS RDS Proxy
    • AWS Aurora Serverless AWS Kinesis, AWS Cognito and AWS CloudWatch are also used often in serverless systems.

Serverless Orchestration in AWS

AWS step functions are used for serverless orchestration in AWS. Step functions are all about modelling business transactions through series of state transitions. Step functions enable us to do end to end monitoring and reporting via audit histories. The order flow is modeled and source controlled.

stepfunctions_graph.png

Above is one of the example of a step function flow. We can clearly see it as state machine. There are two types of step functions workflows:

  1. Standard (Workflow which takes more than 5 minutes to complete)
  2. Express (Workflow taking less than 5 minutes and when we need high execution volume)

Some of the common use cases for AWS step functions are given below

  • Microservices orchestration
  • IT Automation
  • Data processing and ETL orchestration

Implementation details and extended flow examples can be studied at AWS official examples page.

Serverless Choreography in AWS

Choreography in AWS allows us to modify each step of the flow independently. It also enables us to scale the different parts of our system independently of each other. Since we have already discussed bounded contexts and events, events allows us to extend functionality in choreography comparatively easily.

There are two considerations which we must discuss before proceeding:

  • End to end monitoring and reporting is comparatively difficult.
  • Difficult to implement timeout logic.

Let's discuss the example choreography architecture given below:

choreography.drawio.png

There is no centralized service controlling everything. Instead each part of the system relies on the incoming event, does its part and forwards it to a destination. The whole system is driven by events. For this example:

  • A client makes request to API which is powered by AWS API Gateway
  • API Gateway integrates with AWS Lambda Function in an asynchronous way (more on this later.)
  • AWS Lambda does its processing and pushes the event to AWS EventBridge's event bus.
  • EventBridge have rules and it notifies the matching rule destinations.
  • A Lambda and an SNS topic can also be destination for an event from event bus.

As we saw that the question of choosing SNS & EventBridge at their respective places is still not answered. At this stage, there should be more questions like:

  • When to use AWS SQS and when to use AWS SNS?
  • When to use an AWS SNS and when to use AWS EventBridge?
  • What is synchronous and asynchronous invocation of a Lambda?

By a quick read about these services, we find out that some services appears to do very similar things like SQS, SNS and EventBridge. In order to have the correct mental model for these services, we must understand how these services integrate with the compute part of our serverless system i.e. Lambda.

Lambda Integration Models

There are three ways in which Lambda integrates with the other AWS services.

  1. Poll based integration (Synchronous)
  2. Push based integration (Synchronous)
  3. Event based integration (Asynchronous)

Before we proceed, lets discuss the synchronous and asynchronous invocation of lambda functions.

Synchronous Invocations

In synchronous invocation, we wait for the function to process the event and return a response.

  • Lambda runs the function and waits for a response
  • Lambda returns the response from the function's code with additional data, such as the version of the function that was invoked

image.png

Asynchronous Invocations

In asynchronous invocation, Lambda queues the event for processing and returns a response immediately.

  • Lambda places the event in a queue and returns a success response without additional information
  • A separate process reads events from the queue and sends them to your function

image.png

AWS has very detailed documentation for both kind of invocations. Let's move on to the lambda integration models.

Poll Based Integration

Lambda is a service which contains many resources such as our lambda function, event queue, event resource mapping etc. In this kind of integration, event resource mapping is responsible for polling the target service and fetching data when available.

  • Lambda polls the records from stream or queue using event resource mapping
  • Resource mapping then invokes your function synchronously based on BatchSize or BatchWindow
  • Each event that your function processes can contain hundreds or thousands of items

Below are the services which do poll based integration with AWS Lambda:

  • AWS DynamoDB Streams
  • AWS Kinesis Streams
  • AWS SQS
  • AWS MQ
  • Apache Kafka (managed and self-managed both)

Below is an example of this kind of integration from AWS Docs.

image.png

Push Based Integration

In this kind of integration, services themselves are responsible for pushing the event to Lambda Service.

  • The services pushes an event to Lambda
  • Lambda then invokes your function and return a response

AWS API Gateway, AWS Cognito & AWS CloudFront do this kind of integration with AWS Lambda. The example diagram in choreography section can be seen as an example of API Gateway's integration with Lambda.

For the API Gateway scenario, there is a catch here. Since our Lambda function is invoked synchronously by lambda service, it returns a response. There is a way to configure Lambda so that it integrates with API Gateway in an asynchronous way. Official docs has a very good guide for it.

Event Based Integration

In this kind of invocation, services invokes the lambda and immediately gets success response after handing over the event. Lambda then processes the event in the background.

  • Services hand off the event to Lambda and Lambda handles the rest
  • Lambda manages the function's asynchronous event queue
  • We can configure Lambda to send an invocation record to another service
    • SQS Queue
    • SNS Topic
    • Lambda Function
    • EventBridge event bus

S3, SNS, EventBridge & CloudWatch do this kind of integration with lambda. Below is an example where SNS integrates with Lambda in an asynchronous way.

image.png

Most of the times, we are dealing with poll based integration and event based integration. Below is a short visual summary of all three types of integrations.

image.png

Now we understand different composition paradigms, how they are implemented in AWS and how Lambda integrates with the serverless services in AWS. Let's discuss briefly the design principle and use cases for some common services.

When to use AWS SQS and AWS SNS?

SQS and SNS are used to decouple our applications in cloud. But both of these services have some very distinct characteristics making them suitable for different purposes.

image.png

SQS

  • SQS is basically for 1:1 reliable asynchronous communication between two entities.
  • SQS temporarily holds messages and is owned by the consumer. Its job of consumer to get the message from SQS. (REMEMBER: Lambda polling) Lambda does this for us automatically.
  • It is used to decouple our services in a way where we want to process the messages as per the capacity of consumer.

SNS

  • SNS is for 1:N message/event publishing (pub/sub model). It is also called “Fanning out the events”
  • It is used for application where we need high throughput. Which means we want to fan out events as soon as they arrive. It is the job of SNS to publish the event to subscribers.

When to use SNS & EventBridge?

SNS & EventBridge are both used to distribute events. EventBridge is under the hood same as AWS CloudWatch Events API but it provides more features like integrating with external services. SNS is more useable for fanning out events to different services while EventBridge's intended use is for Application Integration.

SNS

  • SNS is for 1:N message/event publishing (pub/sub model). It is also called “Fanning out the events”
  • It is used for application where we need high throughput. Which means we want to fan out events as soon as they arrive. It is the job of SNS to publish the event to subscribers.

EventBridge

  • EventBridge is for 1:N ((N:N sometimes) for message distribution
  • Recommended to be used for application integration like SaaS or third party integration like datadog, shopify etc.

image.png

Sometimes people face difficulty in the limit of 5 target on a matching rule when they try to use EventBridge in place of SNS. What we can do is that to have an SNS topic as a destination of matching rule and route the event to SNS for fanning out. AWS EventBridge is primarily meant for routing event in our system. The convention is to have a separate rule against each service to which we want to route an event. This is good for separation of concerns.

Conclusion

We discussed bounded contexts and composition paradigms, how to implement them in AWS and mental models for some of the services central to event driven systems. After having the knowledge of all these things, we should be able to make an educated choice for our system on the basis of our budget, our use case, our team size and the project timelines. There is no right or wrong for most of the people and systems. But these design decisions and principles become important when we are dealing with serverless at scale.

This whole article is meant to serve as a mental model for designing serverless in AWS. For the implementation details of individual services, AWS Docs are the best place to go.

Note: I took help from my experience and existing resources for this article. Any mistake or factual inaccuracy can be reported in comments. I will try my best to maintain & update this with time to time.