Continuously Manage Your CircleCI Implementation with Coralogix

For many companies, success depends on efficient build, test and delivery processes resulting in higher quality CI/CD solutions. However, development and deployment environments can become complex very quickly, even for small and medium companies.

A contributing factor to this complexity is the high adoption rate of microservices. This is where modern CI/CD solutions like CircleCI come in to provide greater visibility. In this post, we’ll walk through the Coralogix-CircleCI integration and how it provides data and alerts to allow CircleCI users to get even more value from their investment.

See this post for how to set up the integration.

Once completed, the integration ships CircleCI logs to Coralogix. Coralogix leverages our product capabilities and ML algorithms providing users with deep insight and proactive management capabilities. It allows them to continuously adjust, optimize and improve their CI processes. To borrow from CI/CD vocabulary, it becomes a CM[CI/CD], or a Continuously Managed CI/CD, process.

In addition, using the CircleCI Orb will automatically tag each new version you deploy allowing you to enjoy our ML Auto Version Benchmarks. 

The rest of this document provides examples of visualizations and alerts that will help implement this continuous management of your CircleCI implementation.

Visualizations

Top committers

This table shows the top committers per day. The time frame can be adjusted based on specific customer profiles. It uses the user.name’ field to identify the user.

Jobs with most failures

This visualization uses the ‘status’ and ‘workflows.job_name’ fields. It shows the jobs that had the most failures per time frame specified by the customer profile.

Class distribution

Class usage relates to memory and resource allocation. This visualization uses the field  ‘picard.resource_class.class’, part of the log’s picard object. Its optional values are ‘small’, ‘medium’, and ‘large’. The second field used is ‘workflows.workflow_name’ that holds the workflow name. Although classes are set by configuration and do not change dynamically, they are tightly related to your credit charges. It will be good to have this monitored to identify if a developer unexpectedly runs a new build that will impact your quota.

For the same reasons mentioned above, it could be beneficial to see a view of the class distribution per executors like the following example:

Average job runtime

The following visualizations help identify trends in job execution time, per build environment, and per executor. They give different perspectives on the average job runtime.

The first one gives an executor perspective for each job environment. It uses the ‘picard.build_agent.executor’ and the ‘build_time_millis’ to calculate the average per environment and per day (in this example day is the aggregation unit).

Depending on your needs, you can change the time frame for calculating the averages. It is important to note that the visualization should calculate the average time of successful job runs based on the filter ‘status:success’. 

In a very similar way, this visualization shows the average job runtime per workflow, using the ‘workflows.workflow_name’ field:

This visualization shows the average job runtime for each job:

Depending on preference these visualizations can be configured to show a line graph. This is applicable to companies with higher frequency runs:

Job runs distribution per workflow

This visualization gives information about the number of runs for each workflow’s job. It can alert the analyst or engineer to situations where a job has more than the usual number of runs due to failures or frequent builds:

Number of workflows

These two are quite simple. Again, it is the user specific circumstances that will determine the time range for data aggregation.

Job status ratio

This visualization shows the distribution of job completion statuses. There are four optional values; ‘success’, ‘onhold’, ‘failed’, and ‘cancelled’. The ‘onhold’ and ‘cancelled’ values are very rare. It is important to get visibility into the ratio as an indicator of when things actual do go wrong.

Alerts

In this section we will show how you can transform managing your deployment environment to be more proactive by using some of Coralogix’s innovative capabilities. Such as, its ability to identify changes from normal application behavior and the easy-to-use alert engine. Here are a few examples of alerts covering different aspects of monitoring CircleCI insights.

This tutorial explains how to use and define Coralogix alerts.

Job failure

This is an operational alert. It sends a notification when a job named ‘smtt’ fails.

Alert query:

workflows.job_name:smtt AND status:failed

If we have more than one job with the same name, running in different workflows, we can add the workflow name to the query, ‘AND workflows.workflow_name:CS’. The alert query can also be changed to capture a set of alerts based on a query language. For example ‘workflows.job_names:sm*’ or ‘workflows.job_names:/sm[a-z]+/’.

The alert condition is an immediate alert

Job duration

This is another example of an operational alert that sends a notification if a job runtime is larger than a threshold. We use ‘build_time_millis.numeric’. This is a numeric version Coralogix creates for every field. Since every job is different, the query for this alert can come with a job or workflow name. It can also look for outlier values like in this example:

Alert query:

build_time_millis.numeric:>20000

Alert condition:

Ratio of ‘failure’ compared to to all job runs is above the threshold

In this operational alert, a user will get a notification when the ratio of failed job runs to overall number of runs, is over a certain threshold. In this purpose, we’ll use the Coralogix ‘Ratio alerts’ type. For this alert type, users define two queries and then alert on the ratio between the number of logs in both queries’ results. Our query example counts the overall number of jobs and the number of failures:

Alert query 1:

status:*

Alert query 2:

status:failed

This condition alerts the user if failures are more than 25% of job outcomes:

SSH Disabled

This is a security alert. The field ‘ssh_disabled’ is a boolean field. When false, it indicates that users are running jobs and workflows remotely using SSH. For some companies, SSH runs will be considered a red flag.

Alert query:

ssh_disabled:false

Alert condition:

If choosing specific fields for the notification using the Coralogix Alert Notification Content, make sure you include ‘ssh_users’. Its value is an array of strings that includes the user names of the SSH users. 

You can of course set security alerts based on other key-value pairs like ‘user.login’, ‘user.name’, or ‘picard.build_agent.instance_ip’.

As an example, this query will create an alert if picard.build_agent.instance_ip does not belong to a group of approved IP addresses that start with 170:

Alert query:

NOT picard.build_agent.instance_ip.keyword:/170.d{1,3}.d{1,3}.d{1,3}/

To learn more about keyword fields and how to use regular expressions in queries see our queries tutorial.

As you know each company has its own build schedule and configuration. One of the powers of Coralogix is its ease of use and flexibility, allowing you to take the concepts and examples found in this document and adapt them to your own environment and needs. We are always an email or intercom chat away.

Travis CI vs CircleCI

Travis CI vs CircleCI

The way we think about development and CI/CD solutions started with Waterfall – sequential, solid, conservative – moved to Agile, whose origins can be traced back to a somewhat romantic story at a ski resort in Utah, and is now heavily influenced by DevOps.

This progression, from a stage-by-stage process that often meant reaching the “end” of a project only to begin the herculean task of debugging, then to a world of Scrums and sprints, and finally to a full DevOps culture, has led to the adoption of Continuous Integration.

DevOps intersection

What is CI?

This new approach is consistent with the values of collaboration, quicker release cycles, and the integrated disciplines of development, QA, and Operations.

Continuous Integration solutions (related to Continuous Delivery and Continuous Deployment) is a development practice where developers integrate code into a shared repository regularly, often several times a day. These check-ins are verified by an automated build, which allows teams to detect problems early.

Through this regular approach, errors can be detected quickly, located more easily, last-minute pre-release chaos is averted, the most current build is always available, feedback on changes is immediate – with all of this saving time, money, and other resources, and contributing to a more efficient organizational culture.

While there are caveats; the CI approach may not be efficient for smaller projects, the automated test suite may require a significant cost/time investment, and larger teams may encounter queueing issues, overall this method is seen by many as the most valuable of all the software development practices today.

We’ll look at 2 popular CI platforms, to help decision makers when it comes time to choose the right one for different projects or organizations.

Travis CI

Travis CI is a hosted distributed continuous integration service used to build and test software, and is known for its easy integrations and quick setup.

TravisCI


Originally designed for open source projects (and still free for these), Travis CI has now expanded to closed source projects.

Travis bills itself as “The simplest way to test and deploy your projects”. This is not far from the truth, as using Travis CI can be as easy as logging in with GitHub, instructing Travis CI to test a project, and then pushing to GitHub.

Top-10-CI

When you run a build, Travis CI clones your GitHub repository into a new virtual environment. It then carries out a series of tasks to build, and then test, your code. If none of these tasks fail, the build is considered passed (well done!), and Travis CI is able to deploy your code. If one of these tasks fails, the build is considered broken.

You can have Travis watch tests as they run and easily integrate with tools like Slack to keep your team up-to-date with any build issues.

Travis-dashboard

More details about builds, jobs, stages, and phases can be found in the documentation depository. The Travis CI documentation is generally excellent, providing guidance and advice to experts and novices alike.

Travis CI supports most programming languages, including Android, C, C#, C++, Clojure, Crystal, Dart, Erlang, Elixir, Go, Groovy, Haskell, Haxe, Java, JavaScript (with Node.js), PHP, Python, Ruby, Scala and Visual Basic.

Complaints from users about Travis CI often refer to clunky UI, slowness at times, and the relatively steep pricing for the paid version.

In terms of Circleci pricing, the first 100 builds are free, no matter what plan you’re on. As mentioned, open source projects are always free, and closed source projects are billed based on the number of concurrent jobs, from the Bootstrap package at $69 per month (1 concurrent job), to the Premium package at $489 a month (10 concurrent jobs). All packages come with unlimited build minutes, unlimited repositories and unlimited collaborators.

There is also an enterprise option, an on-premises product for companies wanting the regular features of Travis CI, but with additional on-site security needs.

CircleCI

Like Travis, CircleCI is a continuous integration and delivery platform that automates the software development process and is used by the likes of Facebook, Kickstarter, and Spotify.

circle-CI

CircleCI integrates with GitHub, GitHub Enterprise, and Bitbucket, creating a build every time you commit code It then automatically tests your build in a clean container or virtual machine, notifying you of build fails that issues can be addressed as quickly as possible. Once a build has passed, it is deployed to various environments.

CI-FLOW ilustration

Other integrations include Fastlane, Azure, Jira, Slack, and many others depending on an organization’s specific needs, and is known for its Docker support.

CircleCI can be hosted on the cloud or run behind a firewall on an organization’s private infrastructure. Any language that builds on Linux or macOS is supported, including C++, Javascript, .NET, PHP, Python, and Ruby. CircleCI also emphasizes their security credentials, including LDAP for user management, full-level virtual machine isolation, and audit logging.

CircleCI-2.0-start-building

In terms of user feedback, many users prefer the config set up in Travis, a confusing dashboard, and there have been complaints of slowness but this is probably project-dependent.

Pricing is offered as a cloud plan or a server plan. On the cloud plan, Linux is billed per container, with the first container being free, and each additional container is costing $50 a month. For macOS, pricing is calculated based on concurrency and build minutes and starts at $39 a month. The server plan is priced at $35 per user per month, with a 10 user minimum.

Comparison

Choosing the right tool will depend on factors such as your chosen programming language and application architecture, your team’s skills, experience and preferences, and your roadmap in terms of growth and scaling.

CircleCI has a whole page dedicated to its advantages over Travis CI, and while many points are valid, there are some Travis CI advantages that are not presented.

Right now, Travis CI seems to have the edge when it comes to open-source or smaller projects, while CircleCI is generally seen to be the better option for larger teams and projects. Both, however, have been successfully used in large and small projects, and in this case, the desired option will probably come down to personal preference.