An Introduction to Windows Event Logs
The value of log files goes far beyond their traditional remit of diagnosing and troubleshooting issues reported in production. They provide a wealth of information about…
Proper logging is something you want to plan and standardize before you start writing your code, especially if it involves different teams or separate locations.
During the last couple of years, we witnessed the strong connection between quality and standardize logging and the ability to track and resolve production problems.
In this post, I will focus on a few lessons we’ve learned about Java logging and how to do it right.
Many times, too often actually, we see a complete log file written with the same log severity, this makes your logs harder to understand, and hides the important logs you want to notice.
To make it easier for you to decide what severity to set for each log, I made this simple log severity guidelines:
When writing your application logs, keep in mind that besides you, other people are going to read these logs. Whether its programmers, QA or support consuming the logs you wrote, they better be clear and informative. On the other hand, logs that are long and detailed can be hard to parse automatically (grep, awk, etc.) so either you find a way to write a clear log that can also be parsed easily, or you can simply print two logs, one for humans and one for computers:
e.g – print these two logs together, first log for humans and second for computers:
Integration issues can be the hardest to debug; my suggestion is that you log every event that comes in/out of your system to an external system, whether is HTTP headers, authentications, keep alive, etc.
In complex and high scale system this can be a performance overhead, but in case you experience performance issues you can always switch off the logging for that particular log level (usually Debug or Trace) and use it when something goes wrong with your production.
Many times programmers write great log text and severity but forget to add the log metadata such as Category, Class, method or threadID. Adding metadata to your logs can significantly enhance your capability of pinpointing production problems as you can search and identify problematic categories, classes or methods, or follow a thread to understand the root cause of an error you see. The more metadata you add, the better your log is.
Logging API’s make it much easier to add log destinations and integrate with logging tools seamlessly and without any code modifications. In addition, they make your logs more clear by standardizing them and enriching them with metadata fields such as thread ID.
The two most common logging API’s for Java are Log4J and Logback (slf4j).
Note that one of the greatest benefits of Log4J and Logback is that they allow you to send logs from any Java-based Apache project easily! (Kafka, Hazelcast, etc.)
A Logback log will be written in the following structure:
log.warning(“Retried {} times before succeeding to create user: ‘{}'”, retries, username);
The same log in Log4J would be:
log.warning(“retried” + retries + “times before succeeding to create user” + username);
When writing logs, especially when calling functions and variables within that log, make sure you understand what will be the outcome of that print. Bad logs can be:
It’s great to write detailed and descriptive logs, but many times we see single log entries with an enormous amount of characters (20K+) as the logs are used to store data that is completely unrelated to logs and should be managed separately. This can cause serious performance issues when writing logs to your disk and create bandwidth issues when using hosted logging solutions. Remember what the is the main purpose of your logs and stick to it, you want clear logs that tell the story of your software in order for you to understand its behavior and be able to debug it.
We all probably agree that reporting exceptions is a crucial part of the logging process. On that basis, some tend to both report the exception and then wrap it with their own custom exception and throw it again, this will probably cause the stack trace to be printed twice, a fact that will most likely cause confusion. We suggest never to report and re-throw, decide what works for you best and stick to it. We generally recommend to throw the exception with your own custom exception and catch them all in a centralized handler which will log them and handle any other activities that are needed.
Here are some examples for Java exception logging:
BAD:
try {
Integer x = null;
++x;
} catch (Exception e) {
log.error(“IO exception”, e);
throw new MyCustomException(e);
}
BETTER:
try {
Integer x = null;
++x;
} catch (Exception e) {
log.error(“IO exception”, e);
}
BEST:
try {
Integer x = null;
++x;
} catch (Exception e) {
throw new MyCustomException(e);
}
As a rule of thumb, let the logging framework that you are using help you log exceptions and don’t do it yourself. Remember, the first argument is always the text message, write something about the nature of the problem. Don’t include the exception message, as it will be printed automatically after the log statement, preceding stack trace. But to do so, you must pass the exception itself as the second argument; other logging standards will most likely cause the message to be printed wrongly.
BAD:
log.error(e);
log.error(e, e);
log.error(“” + e);
log.error(e.toString());
log.error(e.getMessage());
log.error(null, e);
log.error(“”, e);
log.error(“{}”, e);
log.error(“{}”, e.getMessage());
log.error(“Error reading configuration file: ” + e);
log.error(“Error reading configuration file: ” + e.getMessage());
GOOD:
log.error(“Error reading configuration file”, e);
This method will allow you to easily filter or search for a specific event that you want to track. The idea is that whoever is responsible for creating an event (e.g client, worker etc) generates a unique ID that is passed through all functions and service calls that are used to process that event. Then once an exception or error occur, it is simple to take the event ID from that error and query for its history throughout the different functions services and components.