Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Glacier redux (storagemojo.com)
124 points by wmf on April 30, 2014 | hide | past | favorite | 54 comments


> Old hard drives that are no longer economical for more intensive service, supported by disk-handling robotics.

I am with the author in that I do not think they are using older disks. From personal experience (working with petabytes of disk based storage), I would put money on the fact that AWS is not using stock off the shelf older disks and then powering them on/off as a method of storage. The last thing you want to do with an older disk (5+ years) is ever power it off. The chances of it coming back to life decrease rapidly with each cycle. Picture seized bearings, etc. Your best hope it to keep them continually moving, but never powered down for extended periods of time.

Also, the GB/power/physical space required ratio does not make sense after X number of years. For example, you could replace 3 shelves of 1TB disks, with one shelf of 3TB disks. Yielding less power consumption in less physical space! When you are at scale, data centre space and power are big factors. For these reasons, I do not think they are using old stock disks.

NOTE: Maybe they are using some custom make disks, but from what I know about standard disks, even enterprise, they do not like being powered on/off when they get old.


Emphasizing the above, the study a few years ago of disk drives at big HPC centers found that they don't follow a bathtub failure model. Rather, there are very few infant failures, and wear is noticeable starting in the range of 1-2 years in service.


The google results I saw were more like an exponential curve, i.e. a memory-less distribution - which would mean old drives are just as reliable as new ones.


The HPC paper struck me as a lot more detailed, rigorous and useful. I read the Google paper later (assuming we're talking about the same one, they came out at roughly the same time), and the only useful takeaway I got from it was that disk companies seem to have solved moderately high temperature issues, Google actually saw a correlation with longer life and higher temperatures.


The author handwaves away the power savings of powering down disks by talking about the capital cost of providing power.

"Unless the prices of copper, PDUs and diesel-generators have started following Moore’s Law, this is probably more true today than in 2007."

But this is fallaciously assuming that Glacier servers would need to be like EC2 or S3 servers that are switched on for x random hours per day. This isn't the case at all.

Diesel generators, for instance - what's the use? If you're replicating the data around the world, it doesn't matter if your Glacier servers in one location are powered down for days.

Cooling, power - just switch off every single Glacier server at each location during the peak hours of the day. No additional capital costs, because you're never adding to peak power usage.

Distribution of reads - Profile customers based on how frequently they're reading from the data store. I bet 50% of customers never read a single byte back. Colocate these customers on the same servers, and you only have to power up the server once a month to check if the data is still there.

Bonus - Profile data access patterns with simple heuristics to determine what's likely to be read back in, and temporarily store that data in S3. Imagine a company that archives everything to Glacier daily, but restores day, week and month old backups regularly. Keep all data less than a month old in S3, and the rest on the indefinitely powered down servers talked about above.

I actually quite liked the author's case for BDXL, but it seems he's doing a straw man on other possible solutions. Well implemented BDXL more cost efficient than a naively implemented disk strategy? Not exactly news.


Makes me wonder something

Database machines are usually never shut down. And when they are, it's usually a manual process (unless we're talking about a power failure or something)

"Diesel generators, for instance - what's the use?"

Backup power (not only for Glacier).

Reducing power consumption in peak hours is a good strategy, however, I'm not sure how worried Amazon is about this (and the difference in price between regular hours and peak hours)


It's not about the price they're paying for power (peak hours from the power company's perspective), it's about the peak usage of their data center.

As the article (correctly) says, the capital costs are dependent on your peak usage. If you have 1000 servers using 400kW at peak, you need sufficient air conditioning to extract 400kW worth of heat, and backup generation capable of producing 400kW. It doesn't matter if you only use 100kW 16 hours a day - the capital costs are the same.

I'm suggesting that Glacier could live entirely in non peak periods, meaning that the capital costs are unchanged and the demand curve is flattened.


The real question then is: is there really a peak hour for datacenter usage/consumption? What's the consumption difference between peak and regular hours (and low-demand hours)

How much power does Amazon use during mornings compared to Netflix watching peak time?


Yes, I'm sure there is. Look at Google clicks by hour or similar metrics, there is a significant curve during the day, peak can be 4-5x the low.


I think the requirements for power are much different in case of glacier that with some more generic computing. With a service like glacier, running on hard disks, you could probably completely turn off large parts of your infrastructure. With 5 hour read latency there should be plenty of opportunities to optimize the reads so that only a small part of the infra needs to be powered on at any time.

Also you might be able to skip the backup power part completely. The data is anyways distributed to few locations. The probability of many of them losing to power at the same time for several hours is likely to be very small.

Maybe there's also large differences in how the customers use the system. I would assume some customers are more likely to retrieve data than others. Once you start learning these patterns you could further optimize the storage.

The pricing of Glacier is not necessarily driven by technical reasons. Amazon already has S3 and they need to differentiate the products.


If you have a lot of heavily-read data distributed across machines, you're probably constrained by available spindles rather than available storage space. So co-locating data that will almost never be read with heavily-read data is effectively free. I would guess that Amazon stores Glacier data alongside S3 data, and the lower price reflects the fact that the limiting factor in their storage system is IO rather than capacity.


That doesn't offer any clear explanation of why they would charge extra for early deletion of that data, though.


One possible explanation: the cost of initial IO to write your data is amortized over that data's lifetime. The early-delete penalty ensures that Amazon always makes enough on your data to justify the cost of writing it into their system in the first place.


  But NONE of the Hacker News commenters addressed Sony and 
  Panasonic’s continued investment in high-density optical 
  disc technology. [...] There has to be a business reason 
  for the continued investment, i.e. customers prepared to 
  buy a lot of product in the future and buying a lot right 
  now.
If Amazon was the only customer for high-density optical storage, they'd be crazy to invest in developing it, because Amazon will have all the negotiating power in the relationship.

There must be other customers somewhere if two companies are continuing to develop this stuff.


Well a couple of other organizations that have lots of data and don't want to share their practices come to mind: NSA, FBI, etc.


Is it really that unusual to have two vendors competing for a single very large customer?


If the customer is a major government, no. Otherwise? Yes.


Well, it's sold at Amazon, it's off the shelf, hardly something with an "exclusive customer"

Backing up to the cloud is great, but a lot of people can't do that


The deletion charge is interesting. Does deletion guarantee a scrubbing of the data as soon as possible? If so, I could see that as justification for the fee since some sort of significant work is involved in retrieving data, implied by the hours of wait time. If not though, the author has a good point in that Amazon needs to make at least .03/GB in order to be profitable.


I would be surprised. Anyway, why would they pay for scrubbing if the data is less than 3 months old and do it for free if not? If they do scrub, I don't think those are related


Okay, this is beyond silly. Still ignoring tape.

Explain how BDXL, a new format, which has never proved itself beyond a few years, which costs $45 for a few hundred GB, which AFAIK has very little data on re-writability (which is probably terrible and close to one-time use) could be any more profitable or reliable, or useful than tape.

I don't buy it. Tape would be the logical short-term choice to get started, because Amazon could just go buy an off-the-shelf tape library and add tape as they needed from Oracle, versus again, engineering their own BDXL library on the expectation that it would cost less than tape, taking into account factors such as:

1. Reliability and degradation 2. Supply 3. Cost 4. Reusability/Re-Writability



No, it doesn't definitively say the backing storage is not tape based. Only that:

    "Essentially you can see this as a replacement for tape," 
and:

    "inexpensive commodity hardware components"
Nowhere did they explicitly deny that it may be tape-backed.

In addition, From the article:

     Instead, Glacier runs on "inexpensive commodity hardware components", he said, noting that the service is designed to be hardware-agnostic.
Which may allude to the fact that the backing storage itself may be flexible (a combination of HDD, Tape, possible BDXL)

The author himself only acknowledges:

    This suggests the system will be based on very large storage arrays consisting of a multitude of high-capacity low-cost discs.

Which isn't definitive in the slightest. Also, the article is over 18 months old.

I've said it before, I wouldn't be terrible surprised if they got started with old commodity hardware to get started, but the economics and characteristics of tape still seem much more amenable to the use case.


"Asked what IT equipment Glacier uses, Amazon told ZDNet it does not run on tape."


"Replacement for tape" is tape? OK.


"Essentially you can see this (our tape) as a replacement for [your] tape".


What I'd like to see is a storage product that aggregates the unused storage space on EC2 instances.

* By default instance storage isn't attached, so there's probably a lot of completely available capacity.

* Even if attached, it's rare that the full capacity would be used, so thin provisioning would leave some space available.

* Some host machines won't be fully allocated.

I imagine taking this pool of capacity and using erasure coding and replication to build reliable storage. As your volumes come and go, you need to make sure it remains available, which is why I imagine erasure coding across a large number of customers. By integrating with the guest -> host assignment function you can ensure that you never lose data, if need be delaying scheduling until you've copied data elsewhere.

You'd have to throttle reads & writes to ensure that the guests weren't unduly impacted (easier with SSD's predictable IOPS), and the splitting and erasure coding would make for slow reads & writes as well. But this makes the economics a lot more attractive (free!).

One thing that suggests Glacier could be doing this: if I was AWS, and I was doing this, I would not be in any rush to tell EC2 customers that I was "stealing" their unused capacity!


I don't see why Amazon would want to buy enough disks to support 100% of what they promise and then sell the unused capacity at below cost when they could just buy fewer disks instead. Either way they'd be in trouble if their EC2 customers suddenly started wanting to use all the space they were promised.


It's a good point: a cloud provider could choose to try to under-provision disks instead. The problem though is that (some) disks are local to the machine; it's not easy to move physical hard disks around when the calculations are wrong (and if you have to get it right on a per-host basis, it's more likely to go wrong). It is however easy to move chunks of data around, particularly if you use something like erasure coding to give you a huge amount of flexibility.

In short, your way is a good alternative, but my guess is that buying the full capacity and selling the surplus is probably roughly cost-equivalent, and considerably less likely to end up with you not being able to sell the full capacity of any given host.


Maybe for unattached instance storage this is possible (thought not a good business decision as others have noted). But you are forgetting how Amazon sees those attached instance disks: As raw block devices from the hypervisor.

They see a computer with an attached disk of, say, 1 TB. They don't see the filesystem at all. They don't know if a sequence of zeros is unused space, or are literally zeros in data. There is no way that Amazon can use slack space because they don't know what is slack space.

This incomplete view of how disks (both instance and EBS) are used is also evident by what metrics CloudFront can track: CPU usage. Raw I/O stats for disk and network. To get richer metrics, you have to install scripts/agents running inside your EC2 instance which than beacon the info to CloudFront via the API.

In short, Amazon's current setup means they cannot use the slack space inside of attached block devices, even if they wanted to.


The way thin provisioning normally works (e.g. on ZFS or LVM) is that only non-zero blocks are stored; an unstored block is read as zero; a block written as zeroes could be "unstored". It doesn't matter whether a block of zeroes is stored or not: it still reads as zero. So a cloud provider only needs to store non-zero blocks, and thus doesn't need to provision the full capacity to back a volume.

As to whether EC2 does this I don't know - based on your observations they may not. I think it's an interesting way to build a storage product like Glacier, even if Amazon may have chosen to do something different.


The Amazon Glacier Secret is hidden in plan sight. I think is just a virtual product. Probably unused S3 capacity, at a much lower price, not a different technology.


Maybe Glacier stores in the unused parts of S3 disk sectors, the empty part of the last sector of files that don't completely fill 512k. Takes 4-5 hours to retrieve because it has to pull without impacting dozens or hundreds of S3 drives.


Full disclosure: I work at UpCloud, a cloud hosting provider and I think business in this industry day and night.

One thing that gets overlooked in almost all comparisons is the pricing models. I actually use Glacier through Arq (brilliant backup for Mac), but the catch is in the requests. I recently uploaded 200GB of photos to glacier and the upload process cost me about $10. The monthly storage is about $2.

The thing is that you shouldn't compare the sole storage price, but the total of cost of storing your data in glacier, which was also overlooked in the original article.

I'm sure AWS has understood this through their S3 storage lifecycle and thus developed an appropriate pricing model for glacier that arouses interest in the product like no other.


$10 for upload? Pricing is $0.050 per UPLOAD request plus $0.000 per byte. That means that your backup system made 200,000 UPLOAD requests. I don't think I have that many files which I want to back up, and I would hope that any system using Glacier as a back-end would bundle smaller files into larger archives.

(Well, I guess it means that your request size averaged 1 MB. If I'm already using Glacier, I would be perfectly happy with archive granularity coarser than 1 MB.)


    Glacier is significantly cheaper than S3
Yes, as long as you put something in and almost never take it out.

    They charge for deletions in the first 3 months
What if this is just disincentive to pull content out and treat Glacier like S3?

    Power is not the driving cost for Internet scale infrastructure
It is not the only cost, but it is still one of the largest factors, no?

    Sony and Panasonic continue to invest in a product that has no visible commercial uptake
That means nothing in and of itself. The only person who will win such a market needs to be one of the first there, innovator's dilemma, etc.

    Facebook believes optical is a reasonable solution to their archive needs
Do they? I saw one mention in the author's previous post of James Hamilton commenting on a Facebook cold storage system using Blu-ray but is unclear to me if it is in production.

Assuming it is true though, it is likely an apples-to-oranges comparison. Glacier provides archival restoration for presumably largely enterprise-level customers. Facebook backs up data from users, and I'd presume this is from deactivated accounts, etc., and unlikely to need urgent restoration.


Is there another example of amazon doing discentive pricing?

Every example of their pricing I've seen has been cost plus, cost plus, and/or cost plus.

Thus it seems at least an order of magnitude more likely that the $.03 reflects some cost.


Fulfilled by Amazon has disincentives for heavy/large objects, pulling inventory back to mail it back to you and for not prepping your item correctly when shipping it in.


Other than burying the cite (2 links to his own blog) to why "cost of provisioning a single watt of power is more expensive than 10 years of power consumption," the author's canonical source is eventually a 404:

http://static.googleusercontent.com/media/labs.google.com/en...

The data is also > 7 years old.

EDIT: working link:

http://static.googleusercontent.com/media/research.google.co...


If power provisioning and space are an issue it seems unlikely to me that they're sitting with many drives all hooked up for immediate power-up. It would make more sense to separate the storage of media and the power connection so you get the most from each power connection. Therefore: removable media, whether those are hard drives or optical.

There's a data retrieval period of multiple hours [1]. That doesn't sound like they're just powering up a drive. It sounds like they're moving something around or doing some kind of linear read (as opposed to random-access). I'd bet on "a retrieval job fetches this stack of read-only media and connects it to the powered device, which loads it onto hard drives for quick download".

[1] "Retrieval jobs typically complete within 3-5 hours" - http://aws.amazon.com/glacier/faqs/


I've always thought that the deletion fee for <3 month old data was a way to limit "misuse of the service" - call it a detergent if you will - more than actually a way to recuperating costs. Because I guess the costs will be there no matter when you delete data


I still think it is possible to make the whole thing work using a tape library.


If the goal is to manage capital costs, I don't see how funding the development of an entirely new data storage hardware ecosystem is a reasonable answer. You control capital costs with commodity hardware, not cutting edge.

Yet that is what the author thinks Amazon is doing with BDXL.

So who's funding high-capacity optical storage? Hmm, can we think of a customer who ingests huge amounts of data, wants to keep it for a long time, and has no fear if funding cutting-edge product development? Yes: defense departments.


My friend Robin Harris wrote the original article.

A little off topic, but it seems really strange to me that Amazon is not transparent on the technology. Because of the high charge for fast reads, I tend to believe that the underlying storage is some form of media that gets mounted, perhaps like Facebook's bluray archival system.


Is data really so heterogeneous? How many files in your home directory are generic, with bit-exact duplicates existing in the home directories of many other users? And the remaining files, which are uniquely yours -- what percent of each of them consists of generic data, like file headers?


I have 100gb of gaming screencaptures. That data exists nowhere else (some people may have copies of the final cuts on youtube, but that's tiny in comparison); if you could store it as input data + game code you could compress it by a lot, but I highly doubt Amazon's that far ahead of mainline video encoding technology.

Other than that, I think the big culprit will be photos; everyone's family photos are different (the JPEG header is a tiny proportion of a modern 5MB photo) and that's one of the most popular things for people to back up on these kind of services.

Plenty of data is generic in the way you say, but plenty of it isn't. So I don't think there's any free lunch here.


Don't forget that S3 costs additional $0.12/GB for transfer out.

So either cheap VPSs or dedicated servers are still much cheaper if you don't want to deal with Glacier.


another question: what is the difference in power consumption between a HDD that's powered up but idle, and one that's being used to read from or write to?

As an aside, erasure codes allow you to reduce power consumption as the redundant fragments are only necessary for safety, not for regular retrieval. You don't need glacier to benefit from that. (But glacier might be an optimization of that strategy)


Most of the power is used to keep the drive spinning.


Don't forget that S3 costs additional $0.12/GB for transfer out.

Dedicated servers are still much cheaper than even Glacier.


question: how does Amazon's glacier relate to this? http://dl.acm.org/citation.cfm?id=1251214


It doesn't. Just coincidence.


Erm, isn't it getting a bit meta to have a post to respond to the comments on HN for a previous post of yours? Surely, that is exactly what the HN comments are for?


Robin's blog post is not only to respond to HN comments (as you may noticed, he addresses an observation made by another commenter on his own blog).

And he not only responds to the comments, but adds plenty of other information and reasons why he suspects optical media is used by Glacier.

So no, I don't see anything "meta" about this post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: