...

kevindong

2748

Karma

2016-07-02

Created

Recent Activity

  • Sorry if I was ambiguous before. When I said "log format", I was referring to the message part of the log line. Standardized timestamp, line in the source code that emitted the log line, and level are the bare minimum for all logging.

    Keeping the message part of the log line's format in sync with some external store is deviously difficult particularly when the interesting parts of the log are the dynamic portions that can take on multiple shapes.

  • You can't realistically expect every log format to get a custom schema declared for it prior to deployment.

  • Sure the relative difference between the fastest and slowest approach is 5x. But the absolute difference is still just 125.23 ns. For some perspective, 1 millisecond / 125.23 ns = ~7,985.

    There certainly are cases where that slowdown matters. But for the vast, vast, VAST majority of applications, it is completely irrelevant.

  • As you correctly point out, everything you can do with metrics can be achieved via logs if you have enough compute and I/O. But that ignores the reality of what happens when you do indeed have too many logs to use something fancier like column store/inverted indices as you point out? I agree that in the vast majority of cases, it's likely fine to just take the approach of using logs and counting. But plenty of of developers (particularly here on HN) are in that overall small slice of the overall community that does have a genuine need for greater performance than is afforded by some form of optimized logging.

    Likewise, traces are indeed (as you point out) functionally just correlating logs from multiple services which is akin to syntactic sugar. But again, that's precisely its value _at scale_: easing the burden of use. I've personally seen traces that reach across tens of services in an interconnected and cyclical graph of dependencies that would be hellish to query the logs for by hand.

  • Graphing metrics and doing transformations/comparisons is (much) faster. Yes, metrics do have to be defined first but in my experience that's a non-issue since the things you want to monitor are usually immediately obvious during development (e.g. request-response times, errors returned, new customers acquired, pages loaded, etc.).

    With that being said, it's not a mutually exclusive situation. You can have both. However, some logs used for plotting metrics have near-zero debugging value (e.g. a log line that just includes the timestamp and the message "event x occurred"). Those kinds of logs should be fully converted over to metrics.

    Some other logs however are genuinely useful (e.g. an exception occurred, the error count should be incremented, and this is the stack trace).

HackerNews