Serverless Consulting

Accessing S3 Using Lambda Inside a Private Subnet of a VPC

With No Outbound Access to the Internet

Accessing S3 Using Lambda Inside a Private Subnet of a VPC

By
Ganesh Adapa
April 4, 2022

If you’re a Nodejs/TypeScript developer and have some experience in AWS, Lambda, API Gateway, VPC fundamentals etc, you can comfortably follow this article. In other words, this is a 201 level of complexity article.

In this article, we will try to understand the challenges of accessing AWS S3 from a Lambda when that “Lambda is put inside a private subnet of a VPC and this private subnet has no outbound access to the internet”. This article will take you through a solution in the next step. This solution is creating a VPC Endpoint of gateway type to AWS S3 service on that AWS account.

Note: Please make sure to undo the deployments after you’re done with the examples given below. It can help you to avoid unwanted bills from AWS 😉

It’s assumed that you have an active AWS account, installed nodejs/typescript, Serverless Framework, Git, and your favourite IDE 🧘🏽

Let’s See it in Action

In real time, the resources like AWS S3, EventBridge, SQS etc., integrate multiple services or are consumed by multiple services. In this example, let’s think of AWS S3 as a shared resource.

Hence, let’s create a service called 'shared-resources' intended to manage all shared resources through this service. For the sake of this example, this service creates a bucket and also a Lambda called ‘health’. This Lambda simply returns 200 upon successful invocation.

See 'How to Deploy and Test' section for deployment instructions.

Scenario 1:

Each time a .csv file gets added to the folder 'PrivateCSVFiles' of an AWS S3 bucket(*usage-details-sg), a CreateObject event will be generated.

This event triggers a Lambda (parseUsageDetailsPri).

The expected responsibility of this Lambda is to read the file (test.csv), print the contents and then exit. However, it will time out because it is inside the private subnet. As you’ll recall the private subnet has access to neither internet nor S3 bucket (no private connection).

The lambda-vpc-s3access-timesout repo takes you through the source code in which lambda will time out whenever it will try to access the S3.

See the 'How to Deploy and Test' section for deployment and test instructions.

The following configuration creates a VPC (10.0.0.0/16) and a private subnet (10.0.1.0/24) while associating that subnet with the VPC. It also creates a security group.

For the sake of this example, it allows all protocols & all IP addresses.

  
Resources:
  Vpc:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      Tags:
        - Key: 'purpose'
          Value: 'blogging'
  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: eu-west-2a
      CidrBlock: 10.0.1.0/24
      VpcId:
        Ref: Vpc
  LambdaSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow all traffic
      GroupName: ${self:service}-${self:provider.stage}-lambdaSecurityGroup
      VpcId:
        Ref: Vpc
      SecurityGroupIngress:
        - IpProtocol: '-1'
          CidrIp: 0.0.0.0/0
      SecurityGroupEgress:
        - IpProtocol: '-1'
          CidrIp: 0.0.0.0/0
      Tags:
        - Key: 'purpose'
          Value: 'blogging'
  PrivateRouteTable1:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: Vpc
  SubnetRouteTableAssociationPrivate1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId:
        Ref: PrivateSubnet1
      RouteTableId:
        Ref: PrivateRouteTable1
  

When I manually uploaded test.csv in ‘*usage-details-sg/PrivateCSVFiles/’ I could see the following logs in the AWS CloudWatch.

See the 'How to Remove the Service from the AWS Cloud' section to remove/un-deploy this service(CloudFormation stack) so that there are no unwanted bills from AWS.

Scenario 2:

Each time a .csv file gets added to the folder 'PrivateCSVFiles' of an AWS S3 bucket(*usage-details-sg), a CreateObject event will be generated. This event triggers a lambda (parseUsageDetailsPri). The expected responsibility of the lambda is to read the file, print the contents and then exit. In this scenario, we’ve created a VPCEndpoint of gateway type to the AWS S3 service. VPCEndpoint provides a private connection from the VPC to the AWS S3 service on the same AWS account without the need of reaching the public internet.

Note: In this solution, no code changes needed in the lambda (parseUsageDetailsPri) compared to Scenario 1. Changes are adding a VPCEnpoint component through Serverless Framework.

The lambda-vpc-vpcendpoint-sucess repo takes you through the source code in which lambda can successfully access the AWS S3.

See 'How to deploy and Test section' for deployment and test instructions.

Note: It’s suggested to remove the service created for Scenario 1 before proceeding with Scenario 2.

The following configuration creates a VPC(10.0.0.0/16), private subnet(10.0.5.0/24), VPCEndpoint of gateway type for S3 service, and associates that private subnet with the VPC. It also creates a security group.

For the sake of this example, it also allows all protocols & all IP addresses.

The main difference with Scenario 1 is creating a VPCEndpoint that lets the VPC reach the S3 service without the need of reaching the public internet. Link to the VPCEndpoint pricing page is here.

  
Resources:
  Vpc:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      Tags:
        - Key: 'purpose'
          Value: 'blogging'
  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: eu-west-2a
      CidrBlock: 10.0.5.0/24
      VpcId:
        Ref: Vpc
  LambdaSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow all traffic
      GroupName: ${self:service}-${self:provider.stage}-lambdaSecurityGroup
      VpcId:
        Ref: Vpc
      SecurityGroupIngress:
        - IpProtocol: '-1'
          CidrIp: 0.0.0.0/0
      SecurityGroupEgress:
        - IpProtocol: '-1'
          CidrIp: 0.0.0.0/0
      Tags:
        - Key: 'purpose'
          Value: 'blogging'
  PrivateRouteTable1:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: Vpc
  VpcEndpointS3:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      RouteTableIds:
        - Ref: PrivateRouteTable1
      ServiceName: com.amazonaws.${self:provider.region}.s3
      VpcId:
        Ref: Vpc
  SubnetRouteTableAssociationPrivate1:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId:
        Ref: PrivateSubnet1
      RouteTableId:
        Ref: PrivateRouteTable1
  

I could see the following logs in the cloud watch.

See 'How to Remove the Service from the AWS Cloud' section to remove/un-deploy this service (CloudFormation stack) so that there are no unwanted bills from AWS.

How to Deploy & Test

Clone the Repo

From your command prompt, navigate to a directory where you want to clone this repo.

Then run the following command

  
$ git clone git@github.com:succulentpup/lambda-vpc.git
  

Deploy Shared Resources Service

Change directory

  
$ cd s3bucket-usagedetails
  

Run yarn install. This will install all the dependent npm modules & serverless plugins.

  
$ yarn install
  

Deploy to AWS

  
$ yarn deploy:dev --aws-profile 
  

Deploy Scenario 1 or Scenario 2

Change directory

  
// Run the following command to try Scenario 1
$ cd lambda-vpc/lambda-vpc-s3acess-no-success

// Run the following command to try Scenario 2
$ cd lambda-vpc/lambda-vpc-vpcendpoint-s3access-success
  

Run yarn install

  
$ yarn install
  

Deploy to AWS

Running the following command deploys the service. It can be seen in the CloudFormation page of the AWS web console.

Note: Be mindful that usage of AWS services/resources can add cost.

On *nix and mac OS, AWS profile is configured in the '${home}/.aws/credentials' file:

  
$ yarn deploy:dev --aws-profile 
  

For Scenario 1, it took me about three & half minutes to complete this deployment, it may be different for you.

For Scenario 2, it took me about three & half minutes to complete this deployment, it may be different for you.

Testing

  1. Navigate to the AWS S3 service in the AWS web console
  2. Search for *usage-details-sg bucket
  3. Navigate to that bucket and manually create a ‘PrivateCSVFiles’ directory in that bucket
  4. Navigate to ‘PrivateCSVFiles’ directory and manually upload test.csv file.
  5. Navigate to AWS CloudWatch logs in the AWS web console
  6. In LogGroups, search for *parseUsageDetailsPri lambda logs
        a. Scenario 1 — Lambda times out after 30 seconds
        b. Scenario 2 — Lambda prints the contents of the file test.csv

Move back to Scenario 1 or Scenario 2

How to Remove a CloudFormation Stack Deployed using SLS Framework

1. Make sure you are in the same directory where you executed ‘yarn deploy:dev’ command for Scenario 1 or Scenario 2 and run the following command.

  
$ yarn remove:dev --aws-profile 
  

Note: It takes a good amount of time to undo or remove the deployment as it has networking resources.

2. Make sure to undo the shared resources service as well.

  
// navigate to the shared resources directory from the repo
$ cd  s3bucket-usagedetails
$ yarn remove:dev --aws-profile 
  

If you read this line, you probably will like to know why ‘a lambda put inside a public subnet while trying to access S3’ fails with a time out error?

Initial thought was it should be able to access the S3 bucket. Fact is even though lambda is in a public subnet, it doesn't own a public IP hence not able to access internet. Following is an excerpt from AWS documentation.

Ensure that instances in your subnet have a globally unique IP address (public IPv4 address, Elastic IP address, or IPv6 address)

Hence, it will still fail with a timeout error.

If you’re interested in knowing how to configure a Natgateway to a private subnet thru ‘Serverless Framework’, go through the vpc.yml file of the repo lambda-vpc-natgateway-s3access-success.

Note: This service will create EIP and that can add significant cost to the billing. Hence, it’s highly recommended to undo the deployment of this service as soon as you’re done with the testing.

Note: While having multiple services and those services own their VPCs, make sure not to overlap the subnet ranges of different services.

Conclusion:

In this article, we’ve tested two scenarios.

Before testing these scenarios, we’ve deployed the shared resources service using ‘Serverless Framework’. This service helped to create a bucket that will be consumed by Scenario 1 or Scenario 2.

In Scenario 1, a Lambda is inside a private subnet & trying to access AWS S3. Private subnet has no internet access. Serverless.yml file in the source code should help to understand how a private subnet is configured in a VPC. AWS CloudFormation is used for this configuration. We’ve deployed the service, invoked the lambda by adding the test.csv file to PrivateCSVFiles folder of the S3 bucket. When this lambda is invoked, it is timed out after 30 seconds wait time. We saw it in the AWS CloudWatch logs. We understand, as private subnet has no internet access and lambda is inside that private subnet, lambda can’t reach the AWS S3. After testing this, we removed this service.

In Scenario 2, a Lambda is inside a private subnet & trying to access AWS S3. VPC hosting that private subnet is configured with a VPCEndpoint. Serverless.yml file in the source code should help to understand how a VPC is configured with a VPCEndpoint of gateway type and for the S3 service. AWS CloudFormation is used for this configuration. We’ve deployed the service, invoked the lambda by adding test.csv file to PrivateCSVFiles folder of the S3 bucket. When this lambda is invoked, in the AWS CloudWatch logs we could see that lambda was able to read the contents of test.csv. We understand that VPCEndpoint helped the lambda to reach the AWS S3 through a private connection between VPC & S3. After testing this, we removed this service.

After testing both the scenarios, we removed the shared resources service as well.

Happy learning 🙂

Sources

[1] https://www.serverless.com/framework/docs/getting-started

[2] https://github.com/succulentpup/lambda-vpc/tree/main/s3bucket-usagedetails

[3] https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html