Exploring the Insanity of Log Pricing
I’ve been off consulting in the OpenStack Outlands for the past 6 months. In my absence, the Logging River Valley has apparently been busy: Splunk Storm went free, Logentries and Loggly both launched a new look and feel using their respective hauls, and Google's BigQuery changed their pricing structure and added a slew of new features.
When I started working on TinyProbe about a year ago, the idea was simple. Add a user interface to BigQuery and make it easy to write simple
Now that Google has changed their pricing on BigQuery and have added features I need to complete TinyProbe, I decided to jump back in and see where we were at with pricing.
WTF is up with all the knobs?
I worked at Splunk and founded Loggly (obviously I’m no longer there), so I know my way around the various terms used in the industry when discussing log management offerings. Fancy phrases like retention times and indexing volumes rolled off my tongue on a daily basis, mostly due the long hours I spent with the best of the best discussing pricing models.
Today, it seems only one company really gets it when it comes to pricing event indexing and storage as a service: Google.
Google is big on transparency and flexibility nowadays, so much so in fact that they have a section titled pricing philosophy on their pricing page for BigQuery.
Their pricing strategy is simple: charge once a month for storing the data and charge again when you search it.
In comparison, here’s a compiled list of all the the other company’s features/knobs/limits/crap you have to wade through to figure out how much sending them your logs is going to cost you and how long you’ll have access to your data:
GB storage pricing (less the full text searchable index)
GB sent pricing
variable retention time limits (7, 15, 30, infinity days)
max storage size retention limits
daily volume limits
monthly volume limits
indexing hard limits
overage fees (daily/monthly)
extended support options
It’s no wonder they need all those fancy plan names like Development, Free, Gold, Most Popular, Platnium, and ‘Pay for what you need’ - it’s fucking confusing figuring out how much it’s going to cost you!
Leveling the Field
Given I’m going to eventually charge something for TinyProbe, I decided to do a cost comparison across services, based on a logging volume of 20GB/day and a desired retention of 30 days. At that volume, you should have a maximum of600GB of data available and searchable in your account at the end of each monthly billing period.
Here are the results of my research:
I should explain a few terms I use here before diving into each service's pricing. First, the term retention hit refers to the amount of time before you hit storage OR time retention limits if you tried to shove in 20GB a day or 600GB a month to the service. For example, Logentries has a 7 day retention hit because accounts are limited to 150GB/month total.
The term max retention is used for indicating a service’s apparent hard limit on retaining searchable data.
The term daily limits refers to whether or not the the pricing model shows GB/day rates, which could possibly imply to the end user there are daily limits to sending in data. In reality, I don’t think any of the services above marked 'Yes’ in the daily limit column rate-limit their inbound data due to data loss concerns.
Starting out, we have Splunk Storm. I always figured Splunk would use Storm as lead generation, and here we are and I was right: it’s now completely free. You can send in up to 20GB/month of data and indexes and storage are trimmed monthly. Indexing stops after 20GB though, so it’s tough titties if you go over. That’s why there’s an N/A in the projected cost column. Hey, it’s free. Sell your first born, buy their software and run it yourself if you don’t like it.
Logentries has two types of pricing: plans and metered. I used the Plus plan rate above to calculate the cost per month because I couldn’t figure out what the hell they were talking about on the metered page and what the limits were. They charge a combination of GB sent per month and GB stored per month for metered. I suppose it might be $1.99 + $0.69 = $2.68/GB month sent and stored, but that seems more expensive than the $1.66 for the Plus plan. I gave up on that tactic and multiplied their top plan price by 4 yielding about $1K a month for 600GB of data. Imagine you have four accounts with them, paying about $250/month each to get around the hard limits.
Loggly's monthly cost projection is confusing in a similar way because the site shows they only retain data for 15 days. Like Papertrail’s primary pricing page, they are showing them charging you a monthly rate for a thing they do on a daily basis: indexing your data, and something else they do on a semi-monthly basis: storing your data. One way to think about it is that you pay them monthly for storing half the data you’ve sent them during the month.
*Note: Shortly after posting this, Loggly contacted me saying a) they didn’t have limits on retention, b) their pricing was $1,350 for 30 days retention of 20GB/day and 600GB/month storage, c) they didn’t have daily GB limits. I’ve since changed the table to reflect the email I received from them pricing, but have left references to the 15 day retention times as that’s what their site says it is. I added a paragraph above explaining my reference to daily limits.
And, it’s still confusing!
Papertrail wins the most expensive service award, which isn’t surprising given they have a jumbled set of pricing pageswith massive numbers of buttons on them. The first pricing page takes a similar tactic to Loggly’s where they only keep the index for half the month, but then 'store’ it for you for a year. It took me a while to find their pricing slider which seems to indicate they have monthly volume limits with daily or weekly based retention times. You can send in up to 500GB/month (a real month) and store and search logs for up to 4 weeks (a fake month). As an aside, I actually invented the idea for the pricing slider in a frustrated fit of pricing creativity one day at Loggly. Good to see it still in use somewhere.
Ah, domo SumoLogic. It took me a good 5 minutes wading through their pages to find their pricing page. Like Papertrail, Sumo's pricing is done on a sliding scale and I like how they seem ready and willing to provide larger retention times - up to and over a year. This makes sense as the founders are from ArcSight and compliance use-cases are worth a lot of money and require hella long retention times. Once you find the page, cost calculation is simple. There’s no talk of index or search limits, and they are nearly as cheap as Logentries while supporting higher volumes. Good stuff.
Google's simple philosophy shows well here price-wise, but the lack of a UI is certainly a big barrier to entry. It’s also a business opportunity for some given how ridiculously cheap it is compared to the other offerings. Pricing is simple and stupid cheap at just under $50 for 600GB/month stored and searchable. The result is a price that is an order of magnitude cheaper than other offerings.
Google recently added streaming and table decorators to BigQuery, which makes things a little more approachable, logging-wise. Of all the offerings, Google chooses to charge extra for searching the data, which raises the interesting question of how much it’s going to run me to use it. I honestly don’t know the answer to this question, but I can speculate a bit.
Speculating on Charging for Search Usage
There are literally hundreds of event/log management use cases. Analytics. Monitoring. Alerting. Troubleshooting. Compliance. Most of the more common use cases like monitoring require a regular timed job run on all the data that gets indexed in the system. For example, if you must alert on the term error then you must search all 20GB/day of data for that term if that’s how much data you are indexing.
Google does several interesting things with its indexes in BigQuery to help with this. First, when you do a query on BigQuery, only the fields specified are searched, which means fields not used in the query are not included in the quota. Second, you can search for things like “error AND failure” in a single query, which means you can lump together certain monitoring queries.
As for user interactions, consider when you manually search using these services you are searching a bounded time range. That means you are necessarily limiting your query to a smaller data set. This would translate to smaller costs for searches on BigQuery.
It’s hard to tell, but my gut says that for any given data set a user sends into a log management system, they end up searching through all that data roughly 10 times on average. Keep in mind you could search subsets of this data hundreds of times over as needed and still fall under the 10 time average estimate, especially with the tricks BigQuery plays with data.
If we use Google’s cost of $35 per TB of data processed as a guide, that means our 600GB/month of data would cost us about .6 x 10 x $35 = $210 to search for the month. Add that to the measly 48 bucks above, and you get a total cost of about $258 - nearly a tenth the cost of the competitors. Google give breaks for doing batch queries as well, so it’s probably cheaper still than I outline here.
After all this analysis I’ve decided to stick to Google’s new pricing model and charge customers based on the amount of data sent in and, separately, searched on TinyProbe. I’ll probably create a couple of dirt cheap accounts which have hard limits and a single reasonably priced metered plan for larger data sets.
And if it weren’t obvious by now, I’m using BigQuery as the storage engine for TinyProbe. Thanks to Google I get a twofer on my offering - fantastic search capabilities coupled with simple, non-confusing pricing.
Kord Campbell was formally at Splunk and Loggly. His link always seems to be broken, so I’ve reposted on my blog.