Cloud

Deploy DataSet on AWS Elastic Beanstalk

by Amit Sharma
Published April 13, 2022 | 5 min read

AWS Elastic Beanstalk acts as a managed Platform as a Service to deploy your applications in the AWS cloud. With AWS Elastic Beanstalk, you can quickly deploy and manage applications without worrying about the infrastructure that runs those applications. AWS Elastic Beanstalk reduces management complexity without restricting choice or control. You upload your application, and AWS Elastic Beanstalk automatically handles the details of capacity provisioning, load balancing, scaling, and application health monitoring.

Elastic Beanstalk supports applications developed in Go, Java, .NET, Node.js, PHP, Python, and Ruby. When you deploy your application, Elastic Beanstalk builds the selected supported platform version and provisions one or more AWS resources, such as Amazon EC2 instances, to run your application.

To use Elastic Beanstalk, you create an application, upload an application version in the form of an application source bundle (for example, a Java .war file) to Elastic Beanstalk, and then provide some information about the application. Elastic Beanstalk automatically launches an environment and creates and configures the AWS resources needed to run your code. After your environment is launched, you can manage your environment and deploy new application versions.

Why Traditional Log Management Approaches Don't Work?

If you're using Elastic Beanstalk and accessing logs via SSHing into EC2 instances or exporting them via the Elastic Beanstalk console, you're doing it wrong. Elastic Beanstalk deployments are distributed and run in dynamic container environments. Even if you are running Elastic Beanstalk environments on EC2, those compute instances are also ephemeral.

Web server, application server, and Elastic Beanstalk logs are stored locally on individual instances by default. However, you can collect logs via the CloudWatch log group. The eb logs command has two distinct purposes: to enable or disable log streaming to CloudWatch Logs, and to retrieve instance logs or CloudWatch Logs logs. The command enables or disables log streaming with the --cloudwatch-logs (-cw) option. Without this option, it retrieves logs.
When retrieving logs, specify the --all, --zip, or --stream option to retrieve all logs. If you don't specify any of these options, Elastic Beanstalk retrieves tail logs.
Relevant logs vary by container type. If the root directory contains a platform.yaml file specifying a custom platform, this command also processes logs for the builder environment.
If you are retrieving logs from EC2 instances, consider that Elastic Beanstalk logs are rotated every fifteen minutes, so you will probably want to set up the log rotation in S3. You then confront the longer retrieval times and the lack of analytics on log data. Enter DataSet.

Why DataSet for Elastic Beanstalk Logging?

DataSet is the centralized log data analytics platform that enables teams to quickly get answers from all of their data, across different use cases and from all time periods – streaming or historical. Teams choose DataSet to elastically scale to petabytes of data while delivering real-time performance at a fraction of the cost.

You can use the DataSet Agent to collect data across the Elastic Beanstalk environments - infrastructure, platform and applications. DataSet enables teams to achieve an unparalleled combination of:

Peak Performance: Data is available to query instantaneously after ingest within seconds and queries return in milliseconds, even at a petabyte-scale. That's fast, blazing fast.
Effortless Scalability: DataSet elastically scales to a practically unlimited scale. There is no need to rebalance nodes, manage storage, or allocate resources, even when compared to open source solutions such as ELK stack.
Lower TCO: Delivered as a cloud-native service, DataSet lowers the total cost of ownership by orders of magnitudes. DataSet provides unprecedented value at an unparalleled cost.

How to Deploy DataSet within Elastic Beanstalk?

You can add AWS Elastic Beanstalk configuration files (.ebextensions) to your web application's source code or to your container images for configuring your environment and customizing the resources. Configuration files are YAML- or JSON-formatted documents with a .config file extension that you place in a folder named .ebextensions and deploy in your application source or container images. In this example, I am using a sample Docker application.

01. Create .ebextensions directory:

drwxr-xr-x   8 amits  staff   256 Apr 13 10:56 .
drwxr-xr-x  19 amits  staff   608 Apr 10 21:56 ..
drwxr-xr-x   3 amits  staff    96 Apr 10 21:56 .ebextensions
-rwxr-xr-x@  1 amits  staff   149 Apr  7  2020 Dockerfile
-rwxr-xr-x@  1 amits  staff   106 Mar 18  2019 Dockerrun.aws.json
-rwxr-xr-x@  1 amits  staff  5217 Apr  4  2020 application.py
-rw-r--r--@  1 amits  staff    84 Mar 18  2019 cron.yaml

02. Within .ebextensions directory, create configuration file called dataset.config

files:
  "/tmp/scalyr.json":
    mode: "000755"
    owner: root
    group: root
    content: |
      {
        "api_key": "",
        "server_attributes": {
          "environment": "production",
          "tier": "web"
        },
        "logs": [
          {
            "path": "/var/lib/docker/containers/*",
            "attributes": {
              "parser": "json"
            }
          }
        ],
        "monitors": [
        ]
      }
commands:
  01-getScalyrAgent:
    command: "wget -q https://www.scalyr.com/scalyr-repo/stable/latest/install-scalyr-agent-2.sh"
  02-install-scalyr-agent:
    command: "sudo bash ./install-scalyr-agent-2.sh"
  03-agentFile:
    command: "cp /tmp/scalyr.json /etc/scalyr-agent-2/agent.json"
  04-start:
    command: "scalyr-agent-2 restart"

There are two main components to this configuration file. The first section (files) tells Elastic Beanstalk to create a temporary file at /tmp/scalyr.json with provided permissions and log data locations. You can modify this file to use custom parsers and add other locations to capture log data.
The second section (commands) tells Elastic Beanstalk to install the Scalyr Agent, copy the temporary Scalyr configuration file and start up the Scalyr agent.

You can create an archive for your project, including the hidden folder, by using the following command:
$ zip ../dockerapp.zip -r * .[^.]*

You are now ready to create your Elastic Beanstalk application environment via the console or using the CLI.

That's it! You will now see DataSet automatically collecting logs from all sources that you configured across the entire Elastic Beanstalk environments.

Get Centralized Logging for Elastic Beanstalk Applications

DataSet provides real-time visibility into your entire Elastic Beanstalk environment.

Sign up for a free and fully-functional trial of DataSet and start to get value from your data immediately.

Cloud

Deploy DataSet on AWS Elastic Beanstalk

Why Traditional Log Management Approaches Don't Work?

Why DataSet for Elastic Beanstalk Logging?

How to Deploy DataSet within Elastic Beanstalk?

Get Centralized Logging for Elastic Beanstalk Applications

More From DataSet

Observability Trends and Dogfooding Products

Unleash the Power of Modern Log Analytics

DataSet, SentinelOne at KubeCon Europe 2023

Unmatched Scale and Performance
at a Lower Cost. Unlock the
Ultimate Live Data Experience.

Cloud

Deploy DataSet on AWS Elastic Beanstalk

Why Traditional Log Management Approaches Don't Work?

Why DataSet for Elastic Beanstalk Logging?

How to Deploy DataSet within Elastic Beanstalk?

Get Centralized Logging for Elastic Beanstalk Applications

More From DataSet

Observability Trends and Dogfooding Products

Unleash the Power of Modern Log Analytics

DataSet, SentinelOne at KubeCon Europe 2023

Unmatched Scale and Performance at a Lower Cost. Unlock the Ultimate Live Data Experience.

Unmatched Scale and Performance
at a Lower Cost. Unlock the
Ultimate Live Data Experience.