Cloudfront can be simply defined as a CDN (Content Delivery Network), caching your static assets in a datacenter nearer to your viewers. But Cloudfront is a lot more complex and versatile than this simple definition. Cloudfront is a “pull” CDN, which means that you don’t push your content to the CDN. The content is pulled into the CDN Edge from the origin at the first request of any piece of content.
In addition to the traditional pull and cache usage, Cloudfront can also be used as:
A Networking Router
A Firewall
A Web Server
An Application Server
Why is using a CDN relevant?
The main reason is to improve the speed of delivery of static content. By caching the content on the CDN edge, you not only reduce the download time from a few seconds to a few milliseconds, but you also reduce the load and amount of requests on your backend (Network, IO, CPU, Memory, …).
Static content can be defined as content not changing between two identical requests done in the same time frame.
Identical can be as simple as the same URI, or as fine grained as down to the authentication header. The time frame can range between 1 second to 1 year. The most common case is caching resources like Javascript or CSS and serving the same file to all users forever. But caching a JSON response tailored to a user (Authentication header) for a few seconds reduces the backend calls when the user has the well-known “frenetic browser reload syndrome”.
Edges, Mid-Tier Caches, and Origins
Cloudfront isn’t “just” some servers in datacenters around the world. The service is a layered network of Edge Locations and Regional Edge Caches (or Mid-Tier Caches).
Edge Locations are distributed around the globe with more than 400 points of presence in over 90 cities across 48 countries. Each Edge Location is connected to one of the 13 Regional Edge Caches.
Regional Edge Caches are transparent to you and your visitors, you can’t configure them or access them directly. Your visitors will interact with the nearest Edge Location, which will connect to the attached Regional Edge Cache and finally to your origin. Therefore, in this article, we will refer to Cloudfront as the combination of Edge Locations and Region Edge Caches.
What Have We Learned?
Cloudfront is more than just a simple “pull-cache-serve” service
You improve delivery speed to your visitors
You can increase resilience by always using a healthy backend
You improve overall speed to your backend by leveraging AWS’s backbone
You can modify any request to tailor the response to your visitor’s device or region
You don’t always need a backend
You protect your backend by reducing the number of calls reaching it
If you’re a Nodejs/TypeScript developer and have some experience in AWS, Lambda, API Gateway, VPC fundamentals etc, you can comfortably follow this article. In other words, this is a 201 level of complexity article.
In this article, we will try to understand the challenges of accessing AWS S3 from a Lambda when that “Lambda is put inside a private subnet of a VPC and this private subnet has no outbound access to the internet”. This article will take you through a solution in the next step. This solution is creating a VPC Endpoint of gateway type to AWS S3 service on that AWS account.
Note: Please make sure to undo the deployments after you’re done with the examples given below. It can help you to avoid unwanted bills from AWS 😉
It’s assumed that you have an active AWS account, installed nodejs/typescript, Serverless Framework, Git, and your favourite IDE 🧘🏽
Let’s See it in Action
In real time, the resources like AWS S3, EventBridge, SQS etc., integrate multiple services or are consumed by multiple services. In this example, let’s think of AWS S3 as a shared resource.
Hence, let’s create a service called 'shared-resources' intended to manage all shared resources through this service. For the sake of this example, this service creates a bucket and also a Lambda called ‘health’. This Lambda simply returns 200 upon successful invocation.
See 'How to Deploy and Test' section for deployment instructions.
Scenario 1:
Each time a .csv file gets added to the folder 'PrivateCSVFiles' of an AWS S3 bucket(*usage-details-sg), a CreateObject event will be generated. This event triggers a Lambda (parseUsageDetailsPri). The expected responsibility of this Lambda is to read the file (test.csv), print the contents and then exit. However, it will time out because it is inside the private subnet. As you’ll recall the private subnet has access to neither internet nor S3 bucket (no private connection).
The lambda-vpc-s3access-timesout repo takes you through the source code in which lambda will time out whenever it will try to access the S3.
See the 'How to Deploy and Test' section for deployment and test instructions.
The following configuration creates a VPC (10.0.0.0/16) and a private subnet (10.0.1.0/24) while associating that subnet with the VPC. It also creates a security group.
For the sake of this example, it allows all protocols & all IP addresses.
When I manually uploaded test.csv in ‘*usage-details-sg/PrivateCSVFiles/’ I could see the following logs in the AWS CloudWatch.
See the 'How to Remove the Service from the AWS Cloud' section to remove/un-deploy this service(CloudFormation stack) so that there are no unwanted bills from AWS.
Scenario 2:
Each time a .csv file gets added to the folder 'PrivateCSVFiles' of an AWS S3 bucket(*usage-details-sg), a CreateObject event will be generated. This event triggers a lambda (parseUsageDetailsPri). The expected responsibility of the lambda is to read the file, print the contents and then exit. In this scenario, we’ve created a VPCEndpoint of gateway type to the AWS S3 service. VPCEndpoint provides a private connection from the VPC to the AWS S3 service on the same AWS account without the need of reaching the public internet.
Note: In this solution, no code changes needed in the lambda (parseUsageDetailsPri) compared to Scenario 1. Changes are adding a VPCEnpoint component through Serverless Framework.
The lambda-vpc-vpcendpoint-sucess repo takes you through the source code in which lambda can successfully access the AWS S3.
See 'How to deploy and Test section' for deployment and test instructions.
Note: It’s suggested to remove the service created for Scenario 1 before proceeding with Scenario 2.
The following configuration creates a VPC(10.0.0.0/16), private subnet(10.0.5.0/24), VPCEndpoint of gateway type for S3 service, and associates that private subnet with the VPC. It also creates a security group.
For the sake of this example, it also allows all protocols & all IP addresses.
The main difference with Scenario 1 is creating a VPCEndpoint that lets the VPC reach the S3 service without the need of reaching the public internet. Link to the VPCEndpoint pricing page is here.
I could see the following logs in the cloud watch.
See 'How to Remove the Service from the AWS Cloud' section to remove/un-deploy this service (CloudFormation stack) so that there are no unwanted bills from AWS.
How to Deploy & Test
Clone the Repo
From your command prompt, navigate to a directory where you want to clone this repo.
Run yarn install. This will install all the dependent npm modules & serverless plugins.
$ yarn install
Deploy to AWS
$ yarn deploy:dev --aws-profile
Deploy Scenario 1 or Scenario 2
Change directory
// Run the following command to try Scenario 1
$ cd lambda-vpc/lambda-vpc-s3acess-no-success
// Run the following command to try Scenario 2
$ cd lambda-vpc/lambda-vpc-vpcendpoint-s3access-success
Run yarn install
$ yarn install
Deploy to AWS
Running the following command deploys the service. It can be seen in the CloudFormation page of the AWS web console.
Note: Be mindful that usage of AWS services/resources can add cost.
On *nix and mac OS, AWS profile is configured in the '${home}/.aws/credentials'file:
$ yarn deploy:dev --aws-profile
For Scenario 1, it took me about three & half minutes to complete this deployment, it may be different for you.
For Scenario 2, it took me about three & half minutes to complete this deployment, it may be different for you.
Testing
Navigate to the AWS S3 service in the AWS web console
Search for *usage-details-sg bucket
Navigate to that bucket and manually create a ‘PrivateCSVFiles’ directory in that bucket
Navigate to ‘PrivateCSVFiles’ directory and manually upload test.csv file.
Navigate to AWS CloudWatch logs in the AWS web console
In LogGroups, search for *parseUsageDetailsPri lambda logs a. Scenario 1 — Lambda times out after 30 seconds b. Scenario 2 — Lambda prints the contents of the file test.csv
Move back to Scenario 1 or Scenario 2
How to Remove a CloudFormation Stack Deployed using SLS Framework
1. Make sure you are in the same directory where you executed ‘yarn deploy:dev’ command for Scenario 1 or Scenario 2 and run the following command.
$ yarn remove:dev --aws-profile
Note: It takes a good amount of time to undo or remove the deployment as it has networking resources.
2. Make sure to undo the shared resources service as well.
// navigate to the shared resources directory from the repo
$ cd s3bucket-usagedetails
$ yarn remove:dev --aws-profile
If you read this line, you probably will like to know why ‘a lambda put inside a public subnet while trying to access S3’ fails with a time out error?
Initial thought was it should be able to access the S3 bucket. Fact is even though lambda is in a public subnet, it doesn't own a public IP hence not able to access internet. Following is an excerpt from AWS documentation.
Ensure that instances in your subnet have a globally unique IP address (public IPv4 address, Elastic IP address, or IPv6 address)
Hence, it will still fail with a timeout error.
If you’re interested in knowing how to configure a Natgateway to a private subnet thru ‘Serverless Framework’, go through the vpc.yml file of the repo lambda-vpc-natgateway-s3access-success.
Note: This service will create EIP and that can add significant cost to the billing. Hence, it’s highly recommended to undo the deployment of this service as soon as you’re done with the testing.
Note: While having multiple services and those services own their VPCs, make sure not to overlap the subnet ranges of different services.
Conclusion:
In this article, we’ve tested two scenarios.
Before testing these scenarios, we’ve deployed the shared resources service using ‘Serverless Framework’. This service helped to create a bucket that will be consumed by Scenario 1 or Scenario 2.
In Scenario 1, a Lambda is inside a private subnet & trying to access AWS S3. Private subnet has no internet access. Serverless.yml file in the source code should help to understand how a private subnet is configured in a VPC. AWS CloudFormation is used for this configuration. We’ve deployed the service, invoked the lambda by adding the test.csv file to PrivateCSVFiles folder of the S3 bucket. When this lambda is invoked, it is timed out after 30 seconds wait time. We saw it in the AWS CloudWatch logs. We understand, as private subnet has no internet access and lambda is inside that private subnet, lambda can’t reach the AWS S3. After testing this, we removed this service.
In Scenario 2, a Lambda is inside a private subnet & trying to access AWS S3. VPC hosting that private subnet is configured with a VPCEndpoint. Serverless.yml file in the source code should help to understand how a VPC is configured with a VPCEndpoint of gateway type and for the S3 service. AWS CloudFormation is used for this configuration. We’ve deployed the service, invoked the lambda by adding test.csv file to PrivateCSVFiles folder of the S3 bucket. When this lambda is invoked, in the AWS CloudWatch logs we could see that lambda was able to read the contents of test.csv. We understand that VPCEndpoint helped the lambda to reach the AWS S3 through a private connection between VPC & S3. After testing this, we removed this service.
After testing both the scenarios, we removed the shared resources service as well.