Bolt on Instance Diagnostic intended for Amazon EC2 | Netflix TechBlog

http techblog.netflix.com 2017 04 introducing-bolt-on-instance-diagnostic.html
http techblog.netflix.com 2017 04 introducing-bolt-on-instance-diagnostic.html

Introducing Bolt: On-Instance Diagnostics intended for AWS Components

Each day, thousands associated with Netflix engineers create, test, and set up software and providers on AWS. For you to support their operate, we've developed some sort of range of tools and services in order to help them swiftly diagnose and resolve issues.

One of each of our most popular equipment is Chaos Monkey, which randomly ends instances in production to test each of our systems' resilience. However, Chaos Monkey can easily sometimes cause troubles that are challenging to diagnose, in particular when the illustration is running a number of services.

To address this particular problem, we've made Bolt, a brand-new tool that provides on-instance diagnostics for AWS components. Bolt can be utilized to collect a new variety of details from an instance, including:

  • System metrics (CPU, memory, disk I/O, etc. )
  • Course of action information (list associated with processes, CPU in addition to memory usage, and many others. )
  • Network information (list of open jacks, network traffic, and so forth. )
  • AWS component information (list of operating AWS services, their own configuration, etc. )

Bolt can be used to analyze a wide selection of problems, like:

  • High CPU or memory usage
  • Slow network performance
  • Failed AWS providers
  • App crashes

Bolt is easy to use. It can always be installed on just about any instance with the internet connection. When installed, Bolt may be run coming from the command series or via a web interface.

Bolt is definitely open source and available on GitHub.

Just how Bolt Works

Bolt is a Python application that uses the variety of approaches to collect data from an illustration. These techniques incorporate:

  • The Python psutil library for you to collect system metrics and process details.
  • Typically the Python netifaces library to acquire network information.
  • The AWS Python SDK to collect information concerning AWS components.

Bolt can be manage in two modes:

  • Diagnostic method: This kind of mode collects some sort of snapshot of information from the instance. This information can be used to diagnose problems the fact that are occurring with the time Bolt is run.
  • Monitoring mode: This mode gathers information from the instance over time period. This information could be used to be able to track the performance of the illustration and identify styles that may indicate potential problems.

Bolt can be configured to collect various types of information depending on the particular needs of typically the user. For instance, an user may well choose to collect only system metrics and process information, or they may possibly choose to accumulate all of the particular information that Bolt can provide.

Using Bolt

Bolt can be used to diagnose a wide range regarding problems. Here usually are a few illustrations:

High CPU or even memory usage: Bolt can easily be used for you to identify the course of action or processes the fact that are using the particular most CPU or maybe memory. This details can be used to troubleshoot performance problems.

Slow system performance: Bolt can be used to identify the source associated with slow network overall performance. This information might be used in order to troubleshoot network difficulties and improve efficiency.

Failed AWS companies: Bolt can be applied to identify the particular cause of been unsuccessful AWS services. This particular information can end up being used to troubleshoot AWS problems and restore service.

Application crashes: Bolt can always be used to recognize the cause of application crashes. This kind of information can be used to troubleshoot application problems in addition to improve stability.

Bolt is a powerful tool that can be used to analyze a wide selection of problems. It is easy to work with and can turn out to be installed on virtually any instance with an internet link.

Bottom line

Bolt is a valuable tool for diagnosing issues on AWS situations. It is simple to use plus can offer a new wealth of information that can become used to troubleshoot problems and boost performance.

We encourage anyone to try Bolt and see exactly how it can support you improve typically the reliability and overall performance of your AWS applications.

Additional Resources