principal investigators and chief architect for the NSF TeraGrid
Dan Reed recently gave a great presentation on the Future of Cyber-Infrastructure at a SURA meeting. You can see a copy of his presentation at http://www.sura.org/news/2009/it_matsf.html
His basic thesis is that the bulk of academic computing will probably move to commercial clouds. Although there will still remain some very high end close coupled applications that need dedicated supercomputers the majority of academic computing can be done with clouds. Despite the presence of grids and HPC on our campuses most academic applications still run on small clusters in closets or stand alone servers. Moreover the challenge with academic grids is building robust, high quality middleware for distributed systems and solving the myriad political problems of sharing computation resources in different management domains. As well, the ever increasing costs of energy, space and cooling will soon force researchers to start looking for computing alternatives. Clouds are solution to many of these
problem and in many ways represent the commercialization of the original vision for grids.
Dan also ruminates about the possibility of building “follow the
sun/follow the wind” cloud architecture on his blog, which of course
is music to my ears:
**Geo-dispersion: The Other Alternative **
If it were possible to replicate data and computation across multiple, geographically distributed data centers, one could reduce or eliminate UPS costs, and the failure of a single data center would not disrupt the cloud service or unduly affect its customers. Rather, requests to the service would simply be handled by one of the service replicas at another data center, perhaps with slightly greater latency due to time of flight delays. This is, of course, more easily imagined than implemented, but its viability is assessable on both economic and technical grounds.
In this spirit, let me begin by suggesting that we may need to
rethink our definition of broadband WANs. Today, we happily talk of
deploying 10 Gb/s lambdas, and some of our fastest transcontinental
and international networks provision a small number of lambdas (i.e.,
10, 40 or 100 Gb/s). However, a single mode optical fiber
has much higher total capacity with current dense wave division
(DWDM) technology, and typical multistrand cables contain many
fibers. Thus, the cable has an aggregate bandwidth of many terabits,
even with current DWDM.
Despite the aggregate potential bandwidth of the cables, we are
really provisioning many narrowband WANs across a single fiber.
Rarely, if ever, do we consider bonding all of those lambdas to
provision a single logical network. What might one do with terabits of
bandwidth between data centers? If one has the indefeasible right to
(IRU) or owns the dark fiber
, one need only provision the equipment to exploit multiple fibers
for a single purpose.
Of course, exploiting this WAN bandwidth would necessitate dramatic
change in the bipartite separation of local area networks (LANs) and
WANs in cloud data centers. Melding these would also expose the full
bisection bandwidth of the cloud data center to the WAN and its
interfaces, simplifying data and workload replication and moving us
closer to true geo-dispersion and geo-resilience. There are deep
technical issues related to on-chip photonics
, among others, to make this a reality.
In the end, these technical questions devolve to risk assessment and
economics. First, the cost of replicated, smaller data centers without
UPS must be less than that of a larger, non-replicated data center
with UPS. Second, the wide area network (WAN) bandwidth, its fusion
with data center LANs and their cost must be included in the economic
These are interesting technical and economic questions, and I invite
economic analyses and risk assessments. I suspect, though, that it is
time we embraced the true meeting of high-speed networking and put our
eggs in multiple baskets.