We have always used Ruby for the scripting (we're predominately a Ruby shop so this was key for future adoption.) The very first mvp for this tool was individual Ruby scripts running against Graphite and being scheduled via cron. The first real backend scheduler was built in Scala, but for various reasons we've converted to Rails/Puma/Celluloid running in a VM using Jruby. The monitors themselves run in an MRI sandbox for security purposes.
I'm not sure lugubrious means what you intended it to mean. :) At any rate, see my reply here https://news.ycombinator.com/item?id=6646402 for a sampling of things Rearview brings to the table. The tl;dr is that it's not a NOC tool, it's more for process monitoring whether that be application processes, engineering processes, or business processes. It also does provide a central location for anyone to see the state and history of an application or business unit.
>The tl;dr is that it's not a NOC tool, it's more for process monitoring whether that be application processes, engineering processes, or business processes. It also does provide a central location for anyone to see the state and history of an application or business unit.
We use NewRelic and Pingdom as well. Where Rearview really shines is creating monitors like this: 1) control charts to alert when a process deviates from a range of 3 stdev above or below the mean based on historical data (e.g. purchases/logins are lower than expected, process failures are higher than expected, etc.), 2) deployment triggered monitors that automatically analyze data before and after a deploy for shifts in mean or increases in variance (e.g. do we see more login failures after this deploy, do we see more 4xx/5xx responses, did page load time increase, etc.), 3) response time monitors... while this seems straightforward enough, Rearview can not only tell you when a service or page response time has exceeded some statistical limit, it can also present you with more information regarding causes (e.g. this process is slow because of an issue with the database, redis, a dependent process/service, etc.), 4) it allows you to use SPAN as a means of monitoring load time or response time (SPAN is the 95th percentile - the 5th percentile and it give a much more accurate representation of what users experience than mean or median, 5) process efficiencies can be checked by making sure they complete on time and execute the expected number of commands (e.g. sent email, updated databases, etc.), and many more. Basically you are only limited by your imagination and coding skills. Of course the other benefit is in performing similar monitoring on business metrics and not just application performance (e.g. is funnel performing as expected/needed, are our customer tools being used on a regular basis, are our marketing campaigns paying off, etc.)
Your welcome! We did a second feature release of our Ruby version today that has even more UI goodness. Basically we've added the ability to group categories of monitors under one dashboard. You can then switch between categories using carousel controls or direct from drop down. We're hoping to open source this version soon and crossing our fingers that the Ruby version will see more collaboration from outside developers.
I was installing on a laptop to demo during a conference talk. I also had aspirations of contributing to graphite (something I've so far failed to do.) Having said that, I do know that hosting and maintaining graphite internally is a barrier for some companies. Your product looks like it will fill that need very nicely. Kudos!
Thanks. I wrote a simple Ruby script that will "replay" data as if the event were happening live. It's great for a demo because you can leave it up while your talking and known data is streaming in. And for me it's even better since I was demo-ing a monitoring tool and could have known alert occurring live. :)