Skip to content

API Error Tracking

Overview

Consistently monitoring the errors collected by Coralogix is essential for maintaining your system's health. When there are many individual error events, it becomes hard to prioritize errors for troubleshooting. API Error Tracking simplifies debugging of backend services by assembling thousands of similar API errors into a single group, enabling you to:

  • Follow, grade, and resolve fatal errors.

  • Categorize similar errors into error groups. For example, organize all errors with HTTP status 502 Bad Gateway into one group or collect all errors with gRPC status 5 - NOT_FOUND into another. These groupings help you identify and prioritize API errors that are most impactful, reduce noise, and minimize service downtimes.

  • Track issues over time to determine when they started, whether they are ongoing, and how frequently they occur

Availability

Service error data is extracted from spans within the time interval selected in the time picker, based on HTTP or gRPC status codes. To enable API error tracking using span metrics, follow the instructions here.

Data sources

Service error data is extracted from spans during the interval selected in the time picker, according to HTTP or gRPC status codes.

Track your service errors

  1. Navigate to APM > Service Catalog. Click on a service of interest.
  2. On the service page, go to the API Errors tab.
  3. View aggregated information related to service errors:
    • Number of error groups
    • Total number of API errors
    • Percentage of API errors in relation to the total number of service requests
  4. Click on the Errors chart to display a modal with a detailed view of the error occurrences. Use it for a better understanding of error dynamics in the service and pinpointing error spikes. The chart presents errors over time (count and percentage) for the top error groups that affect most of your service operations.
  5. Scroll down to the detailed Error Groups summary to study the following:

    • Error messages and related operations for each group
    • The first and last appearances of this error within the selected time range
    • Total number of occurrences and error percentage

    Use this information to cut down on noise and improve the visibility of the error data. Easily locate a specific error group using the Search field above the Error Groups tables.

  6. Click on an error group to display a modal with a detailed view of all error occurrences.

Typical use cases

The API Errors tab focuses on profiling errors to uncover their specific impact on service operations. Its time-based analysis offers valuable insights—allowing you to slice and dice data by various dimensions—into the frequency of different error groups and the exact parts of the service they affect. The use cases below focus on understanding error patterns, prioritizing their resolution, and minimizing their operational impact.

Isolating errors by component

Identify the specific endpoint or outgoing call causing particular error types, enabling you to take precise, targeted actions to resolve the issue efficiently.

Scenario

You notice 14 Unavailable errors associated with a specific service endpoint, hipstershop.CartService/GetCar.

Solution

Debugging reveals configuration issues within the region, which are quickly identified and resolved.

Profiling and grouping errors

Understand error categories, such as HTTP 500, gRPC UNIMPLEMENTED, to pinpoint specific issues and their operational impact.

Scenario

Frequent HTTP 500 Internal Server Errors are observed in checkout API.

Solution

Profiling the error reveals the issue stems from a misconfigured database operation. This insight enables swift optimization and resolution of the issue.

Time-based analysis

Use error trends to identify recurring patterns, such as spikes triggered by deployments or traffic surges.

Scenario

Spikes in HTTP 503 Service Unavailable errors are detected every morning between 9 and 10am.

Solution

Through time analysis, you can correlate the issue with an automated backup process that is overloading the API. This insight helps pinpoint the root cause, enabling you to address the overload effectively.

Prioritize error resolution based on their impact and urgency

Prioritize resolving errors that have the greatest operational impact to minimize service disruptions and maintain optimal performance.

Scenario

An authentication service is reporting frequent HTTP 401 Unauthorized errors, while fewer HTTP 500 Internal Server Errors are having a notable impact on user logins.

Solution

Prioritizing the resolution of HTTP 500 errors is crucial, as fixing them first restores critical functionality and ensures uninterrupted user logins.

Additional resources

DocumentationApplication Performance Monitoring: Components, Metrics, and Practices
TutorialIntroduction to APM

Support

Need help?

Our world-class customer success team is available 24/7 to walk you through your setup and answer any questions that may come up.

Feel free to reach out to us via our in-app chat or by sending us an email to [email protected].