Mark Spencer: March 2011

Thursday, March 24, 2011

Tape Is Dead ? No, Not Really...

In recent years, we have seen a shift away from tape media as a means of protecting corporate data. Tape has a long history of being the media of choice, due to it's ability to pack data in high density, low cost, portable packages. But it is also sensitive to magnetic interference, and temperature extremes. But the paradigm remains. How does an IT organization protect it's valuable data in an offsite, secure way, without incurring unreasonable costs ?

Many companies, like HP, believe that tape is dead. In fact, they have a product-line called SEPATON - "no tapes" spelled backwards. They believe that the cost of near line storage - inexpensive secondary data arrays - has come down far enough in price to challenge tape as a low-cost medium. In some respects, it is true.

Others would argue that the near line array needs to be offsite in order to satisfy even the most basic Business Continuity schemes. So the pat response is to house the secondary array in a second data centre ! Well, doesn't that fly in the face of keeping the data on the cheapest media possible ? Imagine the face of the CFO when you suggest you want to DOUBLE his costs for IT !

A recent "replacement" for traditional tape systems is the Virtual Tape Library (VTL). VTL takes common storage devices line SAN or NAS, and mimics a Tape Library. The data is streamed to "tapes", which in reality are small disk volumes on the storage device. I've never really seen the value of having a VTL, as you lose the portability.

In all reality, current ideas about Information Lifecycle Management (ILM) point towards a tiered storage approach. Data being used by business applications in day-to-day operations is stored on a Primary, high-speed array. When data is considered stale, either due to it's age (a few days for example) or a change in it's IO requirements, it can move to a secondary tier of storage, in less expensive and lower performance disks. Finally, data which has not been accessed in a longer period of time (say 6 months) gets relegated to long-term retention - AKA tape.

This ILM means that data can be moved around automatically, according to pre-defined policies, in line with the business's goals & requirements. Further, tape is still used for long-term data retention, but is not the primary means of recovery. The secondary tier provides the majority of that function, minimizing the tape requirements, but not eliminating it.

The opinions expressed in this post are purely those of the author. Opinions are like noses; everyone has one and they are entitled to it !

Monday, March 21, 2011

Data Lifecycle Management

I have been spending a significant amount of time lately researching how computer data storage has been evolving in the last 5 years. As many of you know, I earn my living as a consultant who develops complex IT solutions for my clients - matching appropriate technology solutions to real business problems. Data Lifecycle Management is a hot topic these days !

Business-grade data storage solutions generally take on two flavors: Network Attached Storage (NAS), and Storage Area Network (SAN). Typically, what differentiates these types of data storage is how the host computers connect to them. NAS devices are connected to the Ethernet network, and use TCP/IP protocols to connect, like CIFS (Windows) and NFS (UNIX/Linux). A newer standard called iSCSI is seeing a rapid rise in popularity.

NAS devices are attractive to smaller organizations because they are less expensive to deploy. The are simply attached to the existing Ethernet network and then they expose their data volumes to the host computers. SAN devices require Fibre-Channel cabling and switching gear to connect the hosts. This is significantly more expensive than NAS, so is considered a "higher-end" solution. It is also considered to be more secure, as the users of the applications and the storage are on physically separate networks, using different access protocols.

In almost all data storage systems, the device is a collection of computer hard disks in an enclosure. It will also house the controller heads (the brains to read & write the data) and the means of connection, be it Ethernet or fibre-channel. The brains provide different means of striping the data across the disks, mitigating the risk that a hardware failure of a single drive permanently destroys data.

Newer data storage systems have evolved from the traditional ones to include different types of computer hard disk in the same array. The heads (brains) understand that the different types of disks will have different price:performance ratios. So rather than treating all workloads the same, those with higher performance requirements - real-time applications or databases - will be intelligently migrated to the faster disks. Those with low performance requirements will move to the less-expensive, lower-performance disks.

Then there are other software advances in the heads (brains) which help by deduplicating the stored data. Imagine you wrote a paper on encyclopedia. Rather than writing out the word encyclopedia every time it appears in the paper, it gets written once, and then all subsequent copies are stubs which simply point back to the first - like an abbreviation. But the heads can do this with every word in the paper ! This can lead to efficiencies in the storage system of up to 90% ! Then there are other means of making the data storage more efficient, like thin-provisioning.

As you can see, organizations will go to great lengths to protect their data. The final tier in this Data Lifecycle Management paradigm is the role of tape. In traditional data storage systems, data would be backed up onto tapes, which are then sent offsite for safe keeping. Companies such as Iron Mountain have made a great business out of managing the logistics of off-site data storage for other organizations.

But tape is not without it's problems. While it IS remarkably inexpensive, in terms of cost-per-Gigabyte-stored, it is also somewhat fragile. Like most media, it is very sensitive to magnetic fields, and simply holding your tapes too close to a mobile phone places them at risk !

Since the price of disk-based storage is rapidly coming down, and the availability of high performance flash disks (no spinning platters, these are solid state !), is becoming common-place, the concept of "near line" storage is really taking hold. Organizations will acquire a second inexpensive storage array, and use it to store copies of the Production data.

This near line storage allows for all kinds of operational efficiencies. In the event that a user accidentally deletes an important file, it is quickly and easily restored from the near line storage system. In a traditional system, the operator would have to identify what tape the file was on, order the tape back from the off-site facility, and aft it arrives, restore the file. This could take a significant amount of time.

With data archival systems (like traditional tape backup systems), there are three key factors: Recovery Time Objective (RTO), Recovery Point Objective (RPO), and - of course - cost. RTO is the amount of time it takes from when the file was deleted to the time it has been restored. The RPO is how far back in time was the deleted file copied to the archival medium. That could be measured in minutes, hours, or days ! The entire outage is measured in terms of the sum of RTO + RPO. Finally, the costs are associated with the length of the outage targets. Conventional wisdom is that the shorter the outage window, the more expensive the protection scheme.

So the final tier of Data Lifecycle Management is still tape, but it is used more for long-term data retention. When the organization looks at the age of it's data, it may no longer be cost-effective to keep stale, unused files on the near line storage system. Tools are available to keep track of when data is accessed and migrate it down through the storage tiers, based on policies. For example, a policy can state that anything not accessed in 14 days should be moved from the primary storage system to the near line storage system. And if it remains untouched there for 30 more days, it gets marked for long-term data retention, offsite on tapes.

The opinions expressed in this post are purely those of the author. Opinions are like noses; everyone has one and they are entitled to it !

Renewable Energy and the Grid

It seems that every time we turn around, our electricity providers are extolling the virtues of going Green. By that they mean we should be using energy from renewable sources. Whether the sources are wind - like the experimental wind farm near Pincher Creek, in Southern Alberta, or Solar Panels that Enmax (the power retailer in Calgary) will install on your roof. It seems they all want to demonstrate how conscious they are of environmental issues.

Don't get me wrong, that's admirable and everything, but until the infrastructure changes, it is really little more than lip-service. "Why do you say that, Mark ?" you ask... Well, we have to remember that most Renewable Energy sources aren't steady-state. The wind goes calm, and the sun hides behind clouds. As such, we can't always predict how much power they will generate.

As I noted in my previous post, the Grid is like a pipe. When the demand for water at the output end changes, so must the call for water at the input end. In Alberta, the Grid is managed in a similar fashion - electrical demand is forecast and provided for in advance. When unplanned spikes or dips happen, the operators of the Grid call upon the producers to ramp up or slow down their production.

But the operator of the Grid can't ask to increase the wind or decrease the sun. It simply doesn't work that way. The generation "curve" from Renewable Energy sources would look very spiky, if you were to graph it on paper.

The problem is further exacerbated by simple physics - any electricity not consumed is simply wasted. Like the "water in the pipe" analogy, anything not used before it gets to the end flows out the far end, unusable and unrecoverable.

The State of California is desperate for electricity. Every summer, we read news articles about rolling blackouts, and brown-out conditions. So bad is the situation that Google built their own hydro-electric dam! But California struggles to use Renewable Energy from their neighbors in Oregon, because they can't manage it's production.

"So what's the solution, Mark ?" you ask... Well, I think if we could store any excess power when the Renewable Energy producers can over-generate, we could draw from that reserve to augment the production when the sun sneaks behind a cloud.

I'm quite sure we are all imagining a huge building full of truck batteries, in a quiet corner of the Province ! But we can store power in many different ways. Electricity is really a different form of energy potential. So that energy could be stored in other forms, such as heat or mechanical energy.

One power-storage means that I find intriguing stores potential energy in a huge fly-wheel. We use the excess electricity to drive a motor, which in turn spins up the fly wheel. It has lots & lots of mass, so once it is spinning, we need less energy to keep it spinning at the desired rate.

Then, when we need to draw power from the spinning mass, We simply attach or engage a generator - which turns the mechanical energy back into voltage. As long as the mass continues to spin, voltage is created. This is not a remarkably efficient means of storing energy, because as soon as the power input is removed, the flywheel's rate of spin begins to decrease, eventually stopping altogether.

Another, more efficient means of energy storage would pump water from a source in a lower basin UP a hill. The excess electricity powers the pump, and a reservoir at the top of the hill acts as the storage medium. When electricity is required, gates open up, channelling a flow of water through a hydro-electric generator. The trick is having large enough storage reservoirs to handle enough volume to provide power for a reasonable period of time.

So the solution is not in the creation of electricity from renewable sources,it's truly more of a question of how to manage it. Our current Grid is inelastic in that it is unable to cope with fluctuations in the production of electricity as well as it does with the consumption. Implementing some form (or many !) of energy storage would flatten the spiky generation curve, making it easier for the Grid operators to properly manage the load.

The opinions expressed in this post are purely those of the author. Opinions are like noses; everyone has one and they are entitled to it !

Sunday, March 13, 2011

The other side of the equation: how DOES the grid work ?

Astute readers will notice that I haven't written in my journal in quite some time. Life gets busy: a new job, family pressures, and a host of other things have occupied my time. But on a positive note, I have taken a long term contract as an IT Consultant at the Alberta Electric System Operator (AESO). This is an eye-opening experience, and I'd like to share a little about what I have learned so far.

The AESO is mandated by the Provincial Government of Alberta to operate "the Grid". The Grid is a mesh of components, including electricity producers, the transmission facilities and the retailers. Finally the retailers handle the "last mile" of connection to the homes & businesses in Alberta.

The AESO manages the supply and demand cycle of electricity on the grid. That means that it forecasts usage in advance, but tweaks the generation on a minute by minute basis. Electricity has to always be on, so imagine it's like a pipe full of water: water goes into the pipe, and flows to the far end of the pipe, which is open. If you opened a faucet in the middle of the pipe, the amount of water coming out the far end is diminished. In order to keep the flow and pressure up, the producer must monitor the flow and add more water when the faucet is open, and equally, reduce they need to reduce production when that faucet is closed again.

Behind the scenes is a lot of metering, and logic behind the estimation of the demand. Complex models are created which identify how to re-route the electricity should a segment of the grid become available, such as in severe whether when transmission lines get damaged.

The AESO monitors the demand, and as demand rises in the morning, or slows in the evening, must call for more power from the producers, or ask the producers to throttle back their production. Internally, it operates similar to a commodities market, buying electricity according to demand, and in turn selling it to the retailers.

Of course this is where things get interesting. With traditional electricity-generating methods, the aim is to create so-called "steady state" production, ramping up as demand rises, and ramping down as demand subsides. Producers which are able to adjust morequickly to demand changes have a higher "value" in the market, than those which react slower. Remember, we are measuring demand on a minute-by-minute basis !

But how do you do that with some of the renewable energy sources ? Wind and solar generation schemes are fickle. It's either windy or it's not. It's either sunny or it's not. From the AESO's point of view, renewable energy is usable, but has ZERO value on the Grid, due to the fact that it cannot ramp up or down according to demand.

When it is available to the Grid, it is added at a net-zero cost, which effectively lowers the market price over a long period of time. In effect, sunny and windy days CAN help lower your power bills ! BUT we also can't count on them, as their production tapers off as the sun ducks behind a cloud, or the wind dies down.

So while we value the electricity generated by non-traditional, renewable energy sources, there are some drawbacks. The Grid wasn't really built to support non-"steady state" producers, so the "value" is zero. But we can all rest assured that at least some of our power is generated from non-polluting sources, and that is worth being proud of.

The opinions expressed in this post are purely those of the author. Opinions are like noses; everyone has one and they are entitled to it !