In this post, I summarize the issues that arise in operating Prometheus at the enterprise level and introduce NexClipper's development roadmap to solve them.
Overview for Monitoring Kubernetes
Kubernetes is the first project to graduate from the Cloud Native Computing Foundation (CNCF), and it is currently the de facto standard for container orchestration, encompassing IaaS and PaaS, and has grown tremendously in just a few years, revitalizing the community, and many companies have adopted it in real production environments.
The basic Kubernetes metric description is divided into system metrics and service metrics, and monitoring methods are largely divided into core metric pipeline and monitoring pipeline.
Initially, Heapster was widely used together with Kubernetes as a monitoring solution, as indicated in that document. It started as a tool that transmits monitoring data to an external system, and has since grown into its own monitoring system. However, Heapster was deprecated in Kubernetes version 1.11, and most components of Kubernetes clusters after 1.13 are supported by Prometheus to measure in cloud native way.
Kubernetes monitoring is based on cluster monitoring by default. As you monitor your cluster, you can get information about the overall health and workloads of the system, such as checking the utilization of nodes and the number of pods running. Next, we describe the services and tools most used among the various ecosystems.
- Prometheus : Open source time series database
- kube-state-metric : Service that generates metrics information of various objects and workloads in Kubernetes cluster
- Node_exporter : Service that generates hardware and OS metric information exposed by the *NIX kernel
- Pushgateway : Receive and provide metrics for ad-hoc and small batch jobs
- Alertmanager : Service that handles (deduplication, grouping, and sending) notifications sent from Prometheus server
- Grafana : Dashboard visualization tool for monitoring and metric analysis
How to install Prometheus
Installing Prometheus is not difficult. However, using Prometheus with a complete understanding or just deploying it with Getting Started is very different for an actual operation in the future.
There are mainly two ways to configure it by yourself. You can run the already compiled binary or build the source yourself and install it using Docker. In addition, it can be installed using configuration management tools such as Ansible and Puppet.
The Prometheus community basically provides Helm Charts in beta.
You can simply deploy Alertmanager and Prometheus alone, or simply use it for deploying various exporters.
Prometheus Operators use Custom Resources by default to simplify deployment and configuration of Prometheus, Alert Manager, and related ecosystems.
Alternatively, it can be deployed through the kube-prometheus-stack helm chart that includes the operator above.
Advantages of using Prometheus
There are several advantages to using Prometheus as a monitoring tool for Kubernetes.
- Ease of management: The key to Prometheus is that it is easy to manage. Since it operates as a single binary file without separate installation, it basically only needs a local disk and has few dependencies with other solutions such as database or cache.
- Service Discovery: Basically, file or DNS-based service discovery can be configured, so the target to be scraped is basically registered through the DNS domain name that is periodically queried. In the Kubernetes REST API, you can search for what to scrap and stay in sync with your Kubernetes cluster at all times.
Powerful and easy data model: All collected monitoring data is stored in metric format in the built-in time series database (TSDB). And in addition to the default name, every sample contains a set of tags that describe the characteristics of the sample. Each time series data is uniquely identified by a metric name and a key-value pair called an optional label. Each time series data stores a series of sample values in chronological order. Each sample consists of a float64 value and a timestamp in milliseconds.
Query Language (PromQL): Easily query and aggregate monitoring data based on labels and time series. You can apply functions and operators to metric queries, filter and group by label, and even use regular expressions for matching and filtering. PromQL is also used for data visualization or alerts, notifications such as Grafana.
Monitoring internal status by pull collection method: Prometheus recommends that users monitor the internal status of the service themselves. You can check the metric status of the actual application or solution using various client libraries or exporters.
Cons of Prometheus
- Long-term storage: Basically, if 100,000 metrics of 3 byte size are stored per second, about 500 GB of storage is used in 30 days. If Prometheus is operated in the form of a local instance or Pod, long-term storage is not easy, and the more monitoring targets, the more data is stored.
- Data source redundancy: When Prometheus is installed and operated in multiple clusters, it is not easy to manage in terms of data sources, and when a dashboard such as Grafana is configured, it is difficult to distinguish when querying multiple clusters for the same metric. Because of this, when Prometheus is installed for each cluster, it is difficult to see the data integrated.
- It lacks group management and user authentication management (security).
- Raw log/event collection is not possible.
- Application-based request tracing is not supported by itself.
- Additional data analysis capability is required for Anomaly Detection.
- Complicated configuration for horizontal scaling and high availability.
- It takes a lot of labor and effort to distribute and operate. (Learn PromQL, configure Grafana dashboard, create Alert-Rule, etc.)
NexClipper is under development with the following roadmap to solve the problems the current Prometheus ecosystem has. We are working hard to add new features while maintaining our existing open source and solutions.
The following features are currently being developed.
Prometheus ecosystem can be quickly and easily deployed in any production environment. (https://github.com/NexClipper/provbee)
- NexClipper Cloud allows you to quickly and easily deploy Prometheus ecosystem through a simple installation command.
- NexClipper On-Premise provides a separate Prometheus server cluster and various ecosystem installation and operation environments in multiple multi-cloud environments.
- Long-term storage can be configured based on open source softwares. (Avoiding vendor dependency)
- Provisioned based on PromScale, which is basically composed of TimescaleDB, and storage period and capacity can be increased whenever necessary.
Similar to the Bastion Host role, even in a blocking environment such as a firewall, the open source-based Task Manager(https://github.com/NexClipper/klevr) executes asynchronous jobs, so it can be operated in a private cloud environment.
- Separate charts or resources can be directly distributed, and operation is possible in a private environment that is isolated for security reasons.
- Cluster management or operation is possible without direct access to KubeAPI.
- Remotely modify and manage Prometheus and AlertManager config.
NexClipper provides convenient functions related to queries and rules.
- Multiple Prometheus instances can be queried simultaneously to a single endpoint and multi-cluster Graphana dashboards can be operated while minimizing the movement of multiple data sources.
- Promlens(https://github.com/promlabs/promlens-public) Preview features are included so you can write and test simple queries.
- With the ability to manage exporters, you can easily install and configure AlertRule for operation. ExporterHub.io, https://github.com/NexClipper/exporterhub.io
We are working hard to provide many other features.
In this post, explained monitoring in the Kubernetes environment, the pros and cons of Prometheus, and the roadmap of NexClipper to overcome the drawbacks of Prometheus.
If you have any questions or would like to have additional information, please visit KubeCon North America 2020 NexClipper Booth.
We appreciate all your feedbacks on all our technologies and products, including blog content, and if you have any questions at any time, please contact us at firstname.lastname@example.org and we will reply as soon as possible.
Introduce NexCloud and Career
NexCloud is a container-based cloud technology company. We’re planning to kick-start our US operations in early 2021, and looking for talents to grow with the company.