Last night I attended a Cloud Austin user group which covered various logging solutions. Some were fully open source where as others were proprietary soltuions.
LogStash was the first up. It’s a ruby based solution that provides a means for templating and parsing logs into a common JSON format. From that point it’s up to you how you put them somewhere to do what you need with them. Options include pushing alerts out through Nagios or sending to the likes of Elastic Search. Visualization options include the likes of Graphite and Kibana.
It’ has quite a nice feature where you can daisy chain LogStash instances which will allow for a log processing pipeline.
It’s strength seems to be in the fact that it can pretty much take log data from anywhere and bring it into a consistent format.
Graylog2 looks like an interesting solution for actually making sense of the logs but it’s kind of restrictive in what it will accept in. There are options around this though. Natively it will only accept GELF or Syslog formatted messages. There are a series of appenders that can be added to the configuration of various logging frameworks which will output GELF format although this might not be an ideal solution as you need control of the application creating the logs. An interesting solution for this that seems to have a lot of traction is to combine both LogStash for normalization + Graylog2 for analytics, alerting, searching and stream analysis.
Graylog2 is java based and its dashboard is in the process of being ported from being a rails application to Java + Scala. Logs are stored in Elastic Search and all of the metadata required is stored in MongoDB.
Graylog2 has commercial support available via it’s creator, www.torch.sh.
SumoLogic is a SaaS based soltion that some users have been migrating to due to the cost of splunk. It does seem to be cheaper and it’s provided on a utility based model. It seems very Splunk like and is obviously targetted at this market. It contains some novel features, LogReduce being an interesting one that will group log messages together that look like they have been created by a similar error. This can be useful in terms of not having to see everything be repated.
Splunk is considered a market leader but it comes with a pricetag to match. They offer a free version but that does’t go very far. It looks like other solutions are fast catching up and offer a lot more bang for the buck. If you want a turnkey solution for log analysis and don’t care too much about cost, this might be the way to go but it probably wouldn’t sit well as a strategic investment.
An interesting one to watch is Project Meniscus that’s being sponosred by Rackspace. The goal of this is to provide a hyperscale, MultiTenant, MultiPlexable solution that sits well with OpenStack. To that end it probably never will be an OpenStack core project but it does leverage common components such as Keystone.
It is written in Python and leverages the Common Event Expression. It’s goals are to leverage syslog, rsyslog and liblognorm. It has a coordinator process which will observe the load on the system and bring workers online and offline as is required. Typically Logs are stored in HDFS and Elastic Search.
My conclusion from this session is there seems to be a lot of activity in this space and it should be very possible to build out a scalable log management solution without having to resort to paying license fees. Further investigation will be required around absolute suitability of the open solutions and also consideration around total cost of ownership.
After this I’m most interested in a LogStash + Graylog2 solution.