Skip to content

Service Catalog

The Service Catalog offers a centralized, data-rich resource for managing and optimizing your system's services. It provides a holistic view of service health, enabling better decision-making and faster issue resolution, ultimately improving the performance and reliability of your entire system.

Overview

The Service Catalog provides a complete list of services you have in your system, displaying the health of each service. The catalog shows service type, number of requests received by the service, and error rate and latency for received requests.

  • Use the search bar, search by service, or any other parameter.

  • Select the timeframe for which you want to view your services.

  • Use dimensions to filter your services. Dimensions help you filter your services by adding new labels to a metric, allowing you to filter the services shown according to the tags you define.

Prerequisites

Access the Service Catalog

1

In your Coralogix toolbar, navigate to APM. Click on the Service Catalog tab.

2

Select the timeframe for which you want to view information.

3

Select a service to view the service drill-down.

Filter services using dimensions

Creating a dimension involves adding a new label to a metric, allowing you to filter the services shown using the tags you define.

1

Click { } Add Dimension on the upper right-hand corner of the Service Catalog tab.

2

Enter a filter name and select a span tag from the dropdown menu to pair the data source with the dimension.

3

To add additional filters, click { } ADD DIMENSION and repeat STEP 2.

4

Click ADD DIMENSIONS.

5

Once you have created one or more dimensions, the Dimensions toolbar will appear above the Service Catalog.

To filter the services using a specific dimension, choose from the dimensions bar at the top and select the results you wish to see.

Note

When you enter a specific service with dimensions selected, the service drill-down will remain filtered by the desired dimension.

Limitations

  • Dimensions create metrics from spans and are therefore considered part of your quota. Use a maximum of 5 dimensions, each of which can filter up to 10k labels (cardinality).

  • Only team admins have permission to create dimensions.

Service drill-down

The Service drill-down displays more detailed information about the specific service selected.

The drill-down includes details of which service you are viewing and all the details given on the main service catalog page. It includes visualizations and additional information that changes depending on your viewing tab.

The service drill-down includes the following tabs:

  • Overview

  • Flows

  • Operations

  • SLO

  • Resources

  • Logs

  • Map

Overview

The Overview tab gives a summary of the service.

Overview widgets

The widgets in this tab give you a broad overview of the service for the timeframe selected in the top bar.

The Overview widgets include:

  • SLO. An overview of the current SLOs, how many are okay, how many are breached, and how many are not available.

  • Average Latency. Shows the average latency for the current service.

  • Throughput. Shows the throughput for the current service.

  • Error Percentage. Displays the percentage of errors in relation to the total number of requests.

  • Requests and Errors. Shows a graph with the number of requests and errors for the service.

  • Error Percentage for Top 5 Requests. Displays the percentage of errors in the top five incoming service requests.

  • Apdex Score. Displays the Apdex (Application Performance Index) score over the selected timeframe. The Apdex score is a standardized metric used to measure and quantify user satisfaction with the response time of software applications. For more information about Apdex, including defining the threshold, view our Apdex Score tutorial.

  • Highest Consumption. Shows the five operations with the highest consumption.

  • Latency. Shows a graph with the service’s P99, P75, P50, and Average latency.

Note

Latency percentiles are calculated using the histogram_quantile() function, which is commonly utilized in systems like Prometheus to compute quantiles (e.g., the 95th percentile) from histogram data. In Coralogix APM, with Event2Metrics, a predefined set of buckets (all in microseconds) is used for this calculation. These buckets include: 1, 2.5, 5, 7.5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10,000, 25,000, 50,000, 75,000, 100,000, 250,000, 500,000, 750,000, 1,000,000, 2,500,000, 5,000,000, 7,500,000, and 10,000,000.

  • Map. Shows a mini version of the service map.

Manage widgets

To manage a widget, click on the ellipsis (...) for each. Click View Query to see the queries underlying it.

Group metrics per service version

You can monitor your service health by displaying metrics for each version of your service. Use this data to track changes resulting from version updates or multiple service versions running in parallel. Visualize the changes in the Coralogix UI, and continue with further investigations, such as displaying related traces, etc. For details, see the Group by Service Version documentation.

Flows

The Service Flows tab allows you to rapidly investigate the radius of the impact of different services in your system over time.

Use it to:

  • Investigate the performance of each service flow by breaking it down into its constituent operations.

  • Gain a granular understanding of how each sub-flow, a collection of related operations, affects the performance of the entire service flow over time.

  • Rapidly identify and troubleshoot the subflows causing performance issues over time.

Find out more here.

Operations

The Operations tab presents incoming, outgoing, and internal requests for your service through various spans. Select which request type you would like to view in the dropdown menu in the upper right-hand corner.

At the top of the page, three charts are shown displaying the service operations for each of the following:

  • Time Consumption

  • Throughput

  • Error Rate

Incoming requests

View the service's requests – in the form of server and consumer spans.

For each operation, view the operation type, method, time consumed, percentage of errors caused by the operation, and the percentage of the operation that comprised the total number of operations. These are all shown for the timeframe and dimensions selected.

View a deeper drill-down of each operation by clicking on an operation row or a series.

The deep drill-down shows the time when the operation occurred, the operation type, the service for which the operation was taken, the duration of the operation, and how many errors it generated. It also shows the Throughput, Error Rate, and Latency graphs for that specific operation.

Outgoing requests

View operations that the service requested from other services, in the form of client and producer spans.

For each operation, you can view the operation type, method, P95 latency, percentage of total requests, percentage of errors caused by the operation, and the time consumed. These are all shown for the timeframe selected in the top bar.

You can see a deeper drill-down of each operation by clicking on an operation row.

Internal requests

View operations internal to the service with internal spans.

For each internal operation, you can view the operation type, method, P95 latency, percentage of total requests, percentage of errors caused by the operation, and the time consumed by the operation. These are all shown for the timeframe selected in the top bar.

You can see a deeper drill-down of each operation by clicking on an operation row.

SLO

A Service Level Objective (SLO) is a measurable target that defines the acceptable performance or reliability of a service, ensuring it meets user expectations. By tracking key metrics such as error rates and latency against predefined thresholds, SLOs enable teams to maintain high service quality while optimizing resource usage.

The SLOs view in Coralogix UI offers a comprehensive overview of each SLO's status, target, and remaining error budget, displayed through visual indicators and detailed metrics. This enables teams to proactively address potential issues, prioritize engineering efforts, and ensure alignment between reliability and business objectives.

Whether for incident management, capacity planning, or enhancing user experience, SLOs are a crucial tool for optimizing service health and reliability in Coralogix APM. For details, see Service SLOs.

Resources

The Resources tab presents resources used by the service.

The resources in this tab present CPU utilization, memory used (bytes), and network usage (bytes) for the timeframe selected in the top bar.

Logs

The Logs tab presents all related logs for the selected service.

On the right-hand side of the logs tab, click OPEN LOG QUERY to open a new tab with the logs open in your Coralogix Explore Screen.

Set up Correlation Mapping to allow your system to identify the fields in a log that are related to the service. The feature does this by mapping a single key to one or more replacement keys in the service’s logs.

1

Click Setup Correlation on the right-hand side of the logs tab.

2

Select the replacement logs key from the dropdown menu.

You can see a deeper drill-down of each operation by clicking on an operation row.

Map

The Map tab displays the service map centered on a selected service.

Services that send requests to the selected service are shown on the left.

Services that receive requests from the select service are displayed on the right.

The latency for each is presented on the line between the services, the thickness of which changes according to the latency. The thicker the line, the greater the latency.

Where multiple services have an error rate greater than 0%, the service with the highest error rate is encircled in red.

Hovering over a service shows a tooltip with the service's throughput, error rate, and average duration in relation to the central service.

Clicking on a service brings up a context menu with the option to view the Service Overview, its errors, traces, or related logs.

Dynamic view

The Map view dynamically changes depending on the size of your screen. On larger screens, you can see the throughput, error rate, and SLO status in a box with the service name. All of the information presented relates to the central service, except for SLO status, which is provided for each service.

On smaller screens, the service name is shown to the side of a circular icon, and the rest of the information moves into a tooltip.

Service retention

Coralogix presents services in the Service Catalog based on their metrics. If we do not receive metrics for a service for 30 days, the service is removed from the Catalog. This practice ensures the Catalog remains current and uncluttered, simplifying user navigation and minimizing unnecessary resource usage. To adjust the 30-day retention period, utilize our Service Retention gRPC API. Should we receive metrics for a deprecated service after its removal, it will reappear in the Service Catalog and its historical metrics will become available.

Additional resources

Documentation Application Performance Monitoring (APM)
Apdex Score
Service Retention gRPC API

Support

Need help?

Our world-class customer success team is available 24/7 to walk you through your setup and answer any questions that may come up.

Feel free to reach out to us via our in-app chat or by sending us an email at [email protected].