NVIDIA GPGPU Conference

Posted: October 12th, 2009 | Author: | Filed under: GPGPU | No Comments »

I went to the NVIDIA GPU Technology Conference in San Jose at the beginning of October and there were a few interesting things worth pointing out.  The show highlighted how scientists are using the technology, and how businesses are trying to capitalize on it. Some of the scientific projects taking advantage of the low cost, highly parallel computational capacity of GPUs were really amazing. On the other hand, with a few exceptions, the businesses highlighted were surprisingly uninteresting. I guess this is a good thing for me. It shows that other than GPU-izing existing applications, new businesses haven’t really figured out what new things are possible with the technology.

Here is a smattering of cool things to check out:

  • Pat Hanrahan – A professor of CS at Stanford, he showed two very cool things. One is a visual query language, now embodied in Tableau Software, a great Seattle start-up. (Tableau does to visualizing data what Lotus/Excel did for modeling it.)  Later he showed a very ambitious project at Stanford on Domain Specific Languages (DSLs). Long story short: the idea is to enable domain specialists (for instance, Computational Fluid Dynamics experts) to be able to focus on modeling their science, and not on implementing their modeling. For example, for CFD, a DSL could allow researchers to work on modeling the science and not modeling the implementation of meshes or data structures for computing. Then the compiler could take care of the mesh implementations and data structures. And even better, the compiler can optimize the computation for different computing platforms, such as GPUs. Very cool stuff.
  • Computer Vision with Horst Bishof – Horst showed a lot of very cool things, including this one: Building Rome in a Day.  Here, they imported about 200,000 amateur photos of Rome from Flickr, used a bunch of computation, and built a highly accurate 3-D model of Rome. Awesome.
  • Hanspeter Pfister of Harvard showed some amazing research projects going on at Harvard.  The common element was the extraordinary amount of data processing involved. One was reconstructing the neural circuitry in brains. Another was massive arrays of radio telescopes. Check out the video.
  • One company really doing some cool stuff was MotionDSP. These GPUs are great at visual computation, unsurprisingly, and MotionDSP really takes advantage of them. They process video to clean up the images, either for consumers or for police / defense forensics. Watch the videos.
  • Enodo is both really boring and really interesting. They are boring because they don’t seem to create any real technology but simply implement it for consulting projects. They use the Cryengine (the technology behind the video game Crysis) to build 3-D models for training and other uses. For instance, they modeled an oil rig and built simulations to train people what to do in emergencies.  Instead of reading training documents, people work through realistic 3-D simulations – essentially playing training video games. Here’s what’s cool: First, they are licensing a technology that was designed for video gaming and using it for something more valuable. Wait – sounds exactly like GPGPU, doesn’t it? Second, what a great idea – real-world, 3-D simulations as part of a normal business. Given how realistic 3-D simulations and rendering are these days, and the trajectory of technology, it seems clear that this will become more commonplace.

These are pretty sketchy writeup. Follow the links to learn more. They are cream of the crop – you won’t be disappointed.


GPUs for Databases

Posted: September 22nd, 2009 | Author: | Filed under: GPGPU | 1 Comment »

Here’s a really good article on how GPUs may revolutionize databases: Why graphics processors will transform database processing

This is worth a read because it gives a good overview of the potential of GPUs overall, and not just with respect to databases.  The authors also restated my thesis pretty well:

GPUs per se do not enable anything radically different from what can already be done with today’s CPUs. However, they may very well be the key to an epochal change. GPUs are democratizing supercomputing the way the PC democratized computing, making an enormous amount of computational power—previously the exclusive domain of government agencies, research institutes, and large companies—available to the masses.

As for GPUs with respect to databases, here is an interesting point:

Enterprise data have been growing at a slower rate than the number of transistors on microchips (see Moore’s Law), so computer memory is growing faster than the amount of enterprise data. … The implications of that fact are tremendous: It is now possible to handle huge databases in main memory, which means that the data can be pulled in microseconds, as opposed to the several milliseconds needed for disk access.

This mirrors something I heard from the dean of Stanford’s CS dept (who is a database researcher). She said that databases are no longer I/O bound since you can increasingly store all the data in memory instead of disk.


Agent Based Modeling

Posted: August 21st, 2009 | Author: | Filed under: GPGPU | No Comments »

One application for GPGPU is Agent Based Modeling (ABM).  ABM is a technique where you model the behavior of a system, not by modeling aggregate or average aspects of the system, but instead by modeling the activities of individual agents. For instance, think about modeling traffic. One way to do it is to say that the average speed of a motorist is X, there is a Y chance of an accident, if there is an accident the flow is impacted in a certain way, etc. With agent based modeling, instead of dealing with aggregates, you deal with the individual actors. So each motorist has an individual speed profile, likelihood of getting into an accident, location they are going to, etc. Then instead of trying to solve the global solution, you arrive at it by programming a million individual motorists and then watching what happens. (This is obviously a highly parallel problem and hence its suitability to GPGPU.) [Wikpedia on ABM]

Agent Based Modeling can be applied to a wide range of phenomena, including modeling:

  • Traffic patterns
  • The flow of people leaving a stadium
  • The path of flocking birds
  • The emergence of genocide
  • Macroeconomic activity (emerging from the activity of individual agents)
  • Swine flu propagation

One of the key elements of ABM is the concept of emergence.  “Emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions,” often in unpredictable ways. For instance, imagine a bunch of people trying to leave a packed stadium quickly. Each has a few simple rules about how he or she behaves. It turns out that if you put a column in front of the exit door, slightly to one side, the throughput of exiting increases. This is a non-obvious result that emerges, somewhat surprisingly, from the interaction of many agents following simple individual rules of behavior.

A Simple ABM Example: Boids

Simulator (Video): http://www.dcs.shef.ac.uk/~paul/publications/boids/index.html

Imagine trying to program the behavior of a flock of birds. Sounds bafflingly difficult, don’t you think? Well, it turns out that you can do a pretty good job modeling the behavior of an entire flock by creating a bunch of fake birds (or “boids” in this case), giving them a few simple rules of interaction, and then turning them loose. Recognizable patterns of flocking result.

The model for the above simulation is slightly more complex, but the original boids model had three rules for each boid:

  1. Separation: Steer to avoid crowding local flockmates
  2. Alignment: Steer towards the average heading of local flockmates
  3. Cohesion: Steer to move toward the average position of local flockmates

That’s it. Give a boid these simple rules, put a bunch together in a fake 3D space, and watch them fly like a real flock of birds. This really shows the power of ABM. Simple and understandable rules can lead to interesting and complex group behavior.

ABM Example: Emergence of Riots

With boids, you can see how simple rules can lead to recognizable behavior. In the social sciences, ABM has been done to give insight into the causes of group behavior.

Here is a fantastic paper on an ABM that shows the emergence of civil violence and genocide: Link.  It’s not difficult, but does take more effort to read than watching boids fly. But it’s worth it.

Imagine a grid of squares. Each square is either empty or occupied by a person or a cop. Each person feels some amount of personal hardship (eg: poverty) and some amount of faith in the legitimacy of the government. Put the two together, and each person feels some grievance towards the government. (EG: If you are hungry and you believe the government is corrupt, you’ll hate the govt. If you’re hungry but you believe the govt. is doing it’s best, you won’t blame them.) Based on this grievance, you may decide to rebel against the government. (Say, by rioting.)

What’s holding you back are cops. Or, more specifically, your fear of getting arrested by the cops, which is dependent on the number of cops near you, and the likelihood that they will arrest you.

That’s the essence of the model:

  • Agent rule: If your grievance > your fear of getting arrested, go active. Otherwise, be quiet.
  • Cop rule: Arrest one random riotor near you.

Then you can run the simulation (thousands of times) and see what happens. And what happens looks a lot like real mob behavior.

But then it gets interesting.  You can actually learn something from the model.

  • When a cop is nearby, agents go inactive (ie: non rebellious). When the cop leaves, they rebel again.  Which makes sense when you think about it, but it wasn’t anticipated by the model authors.
  • The illegitimacy of a regime isn’t what is important – it is fast changes in perceived illigitimacy that precipitates rebellion. Here, the modelers slowly reduced the legitimacy of the regime from very high to very low, but no rebellion broke out. Why? Because as legitimacy decreased, some small numbers of agents would be pushed over the tipping point from quiet to rebellious. But then they were quickly arrested by the cops, never catalyzing a full-scale rebellion. But when illigitimacy spiked, lots of people simultaneously went from quiet to rebellious. The cops couldn’t keep up. So more people saw that others were rebelling and felt emboldened themselves and went active. The result was a mob / rebellion / riot. Once again, in retrospect this makes sense. But it would have been difficult to predict.

I’m not doing this paper justice. It’s very worth reading.

Conclusion

Agent Based Modeling is a very cool and very powerful modeling technique that is still in the early days of adoption.  It also scales very well to GPU acceleration.  It is an area I am going to continue to look into.

Further Reading


Throughput vs. Latency Processing

Posted: August 12th, 2009 | Author: | Filed under: GPGPU | No Comments »

Billy Dally, the Chief Scientist at Nvidia, has a very interesting classification of throughput vs. latency processing.

However you package it, the PC of the future is going to be a heterogeneous machine. It could have a small number of cores (processing units) optimized for delivering performance on a single thread (or one operating program). You can think of these as latency processors. They are optimized for latency (the time it takes to go back and forth in an interaction). Then there will be a lot of cores optimized to deliver throughput (how many tasks can be done in a given time). Today, these throughput processors are the GPU. Over time, the GPU is evolving to be a more general-purpose throughput computing engine that is used in places beyond where it is used today. (Source)

It makes a lot of sense to think about chips this way.  General purpose CPUs (like Intel x86 chips) have been optimized to reduce latency.  They have all sorts of fancy predictive logic to guess what instructions could be executed next, big caches to keep data local, and are designed to keep the CPU pipe full.  (They have all sorts of tricks for achieving this: superscaling, out of order execution, instruction pipelining, etc.) But at some point, this breaks down. Your cache may have a high hit rate, but it isn’t 100%. You may guess pretty well at future operations, but eventually the logic required to support this guessing hits diminishing returns.

Throughput processors (eg: GPUs) are different. They don’t have huge caches or complex management logic to fill the instruction pipe. But if you can feed them, they can crank. That’s why a current Intel x86 CPU has 4 cores, and a current Nvidia as 240.  They spend their transistors on different logic. For GPUs, it is having a lot of very simple processors that can’t do much, but can do what they do very quickly.  The challenge is keeping these throughput processors full and busy, so you can take advantage of their potential speed.


GPGPU Application Categorization

Posted: August 12th, 2009 | Author: | Filed under: GPGPU | 1 Comment »

In trying to figure out what GPUs can be used for in the future, it seems instructive to start by looking at what they currently arebeing used for.  What follows is a pretty boring and dry, but I believe important, overview of the types of things GPUs are being used for (beyond graphics). My source data is the NVIDIA CUDA site.

Disclaimer: NVIDIA isn’t the only GPU that supports general processing, but it seems to have the greatest current traction. Also, the user-submitted content on the NVIDIA site certainly has its biases, but it has info on over 500 applications, so it’s an interesting source for real-world examples.

High-Level Categorization

Below is a rough categorization of different types of applications. It isn’t comprehensive, or even accurately categorized, but should give you a high-level feel for what is going on. s

  • Numerical / Scientific computation
    • Computational Fluid Dynamics
    • Signal Processing
    • Computational Chemistry
    • Neural Networks
    • Cryptography
    • Genetic Programming
    • Algorithms
      • Linear algebra
      • Linear optimization
      • Sparse matrix vector product
      • Gaussian mixture models
      • Stochastic differential equations
      • Fourier transforms
      • k Nearest Neighbor
      • 3D Particle Boltzmann solver
      • Parallel sorting
      • List ranking
      • Traveling salesman problem
  • Imaging
    • Medical Imaging
      • Image reconstruction
      • Image compression
    • Other
      • Ray tracing
      • Holography
  • Oil & Gas exploration
  • Finance
  • Hybrid physics / visualization
  • Gaming

Examples

The above list may give you a general notion of the types of issues, but let me dig in to a few to give you some deeper insight.

Scientific Computation: Computational Fluid Dynamics

A great way to get a sense for what scientific computing is about is to look at the Wikipedia entry for CFD.  Below is an extended excerpt. Skimming it should give you a good flavor.

Computational fluid dynamics (CFD) is one of the branches of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the millions of calculations required to simulate the interaction of liquids and gases with surfaces defined by boundary conditions. Even with high-speed supercomputers only approximate solutions can be achieved in many cases.

The most fundamental consideration in CFD is how one treats a continuous fluid in a discretized fashion on a computer. One method is to discretize the spatial domain into small cells to form a volume mesh or grid, and then apply a suitable algorithm to solve the equations of motion (Euler equationsfor inviscid, and Navier-Stokes equations for viscous flow). In addition, such a mesh can be either irregular (for instance consisting of triangles in 2D, or pyramidal solids in 3D) or regular; the distinguishing characteristic of the former is that each cell must be stored separately in memory. Where shocks or discontinuities are present, high resolution schemes such as Total Variation Diminishing (TVD), Flux Corrected Transport (FCT), Essentially NonOscillatory (ENO), or MUSCL schemes are needed to avoid spurious oscillations (Gibbs phenomenon) in the solution.

If one chooses not to proceed with a mesh-based method, a number of alternatives exist, notably :

It is possible to directly solve the Navier-Stokes equations for laminar flows and for turbulent flows when all of the relevant length scales can be resolved by the grid (a Direct numerical simulation). In general however, the range of length scales appropriate to the problem is larger than even today’s massively parallel computers can model. In these cases, turbulent flow simulations require the introduction of a turbulence model. Large eddy simulations(LES) and the Reynolds-averaged Navier-Stokes equations (RANS) formulation, with the k-ε model or the Reynolds stress model, are two techniques for dealing with these scales.

In many instances, other equations are solved simultaneously with the Navier-Stokes equations. These other equations can include those describing species concentration (mass transfer), chemical reactions, heat transfer, etc. More advanced codes allow the simulation of more complex cases involving multi-phase flows (e.g. liquid/gas, solid/gas, liquid/solid), non-Newtonian fluids (such as blood), or chemically reacting flows (such as combustion).

Imaging: Tomographic Reconstruction

Tomography is imaging by sections or sectioning.” For instance, when you take a CT-scan, you are taking lots of individual slices of a picture, and then you need to put the data together. “Reconstruction” is the process of putting these different slices together.

Here’s what’s cool. There is a trade-off between the number of slices and detail of the slices you take and the computation required to reconstruct an image. You can take less data (and hence have the patient spend less time strapped into a CT scanner, or process more patients), but then it might take days to process the data. But with GPU computing you get the best of both worlds: fast scanning and fast reconstruction. Thus, GPGPU is significantly changing what is possible.

There is a great video that talks about one example of this: http://fastra.ua.ac.be/en/index.html

Video Enhancement / Cleanup

This is such a no-brainer. Someday this will be standard.  Check it out: http://www.vreveal.com/video_demos

Performance Improvement

I’ll be shocked if anyone’s made it this far. I know I’d have quit. But at the risk of burying the lead, here’s the cool part.

The performance improvement demonstrated with some of these CUDA applications (ie: GPGPU apps) is pretty remarkable. Some show modest improvements of 3x to 10x. Not bad, but not revolutionary. Many, however, show speedups in the 30x to 40x range. And these are compared to apps often already optimized for CPUs. And some algorithms or apps show speedups in the 100x to 300x range. That’s obviously amazing.  (Though, per Amdahl’s Law, if the algorithm is a small portion of the total computation time, that isn’t that helpful.)

Conclusion

Not surprisingly, the majority of the effort in GPGPU to date has been in hard-core scientific and mathetmatical computations. These are the areas that lend themselves to parallel computing of floating point operations, the problems have been studied and worked on for years, and the jump to GPUs is obvious (though difficult). Yet the performance improvements can be remarkable.

I still believe my original thesis: that this sort of massive computing power will have impact for general business applications, and not just be relegated to traditional HPC-type problems.


Geek Stuff: GPU Computing

Posted: August 3rd, 2009 | Author: | Filed under: GPGPU | No Comments »

I started this blog after learning that you can now buy a 4 TFLOP supercomputer for under $10,000.  This post will examine how this is possible and the implications and challenges for the future.

History: CPUs and GPUs

CPUs are the Central Processing Unit that run a computer. Over time they have grown faster and smarter and capable of doing more complex things. GPUs (Graphics Processing Units) are specialized chips that are focused on a much narrower set of tasks: doing all the things necessary to draw fancy graphics on your screen. Over time, GPUs also got faster, but did so in large part by becoming highly parallel.  CPUs have also gone multi-core, but a multicore CPU may have 4 cores, while  a current GPU may have 240 smaller, more specialized processing units.

Recently, GPUs have made their specialized functions more programmable. And while they still are not capable of doing everything needed to run your computer (ie: they are not CPU replacements), they can now run tasks beyond merely drawing things on your screen.  It turns out that the hardware required for drawing on your screen is also good for generalized mathematical processing. By making the GPUs programmable, manufacturers are now opening up these chips to developers.

Limitations and Challenges of GPU Computing

GPU Computing (or General Purpose computing on GPUs) holds great potential, but it also has significant limitations.  For instance:

  • GPUs are massively parallel. A 4 TFLOP NVIDIA computer has almost 1000 cores. Programming massively parallel computers is very difficult, even for smart people. (This was drilled home to me when I spoke to two super-smart CS professors, both of whom said parallel programming is hard, even for them.)
  • GPU programming is highly constrained. First, you need a task that can be paralellized. Not all can.  But even then you need tasks that are “embarrassingly parallel” – tasks that can be split up and processed without a lot of interaction between the atomic tasks. GPUs can be really fast with these sorts of problems, but if threads need to share data amongst each other, a low-end GPU computer will bog down. (The memory architecture isn’t up to the task.)
  • The tools and skillsets needed to support this sort of programming are nascent at best.  We’re early in the world of broadly adopted parallel programming.

Opportunity

Despite these challenges, it seems to me that inevitable that industry will find a way to overcome the challenges and take advantage of this new processing power.  First, the switch to parallel programming has to happen. Main CPUs have already gone multi-core, and soon will be 8, then 16, 32, 64, then 128 cores. To take advantage of this, parallel programming will be a must.  GPUs (with 240 cores) are ahead of the curve, but the industry will catch up.  And second, industry always finds ways to take advantage of new technologies.  Its not obvious why a construction company needs 4 TFLOPs of power – but eventually they will have it.  (Ok – maybe it’s not that hard to imagine a construction company having very sophisticated computer models of the building they will construct, rather than mere blueprints.)

Right now, much of the potential of GPU computing is being applied to traditional supercomputer or computationally intensive applications: scientific modeling, Computer Aided Design, Computer Aided Diagnosis (eg: having a computer automatically read a CAT scan), oil & gas exploration, etc. These are arenes where people already had difficult, computationally oriented challenges, and applying the new technology is more straightforward. (ie: do the same thing, just better / cheaper / in more situations).  NVIDIA has seen speedups increase 3 to 40x for some of these applications.

More interesting to me are what general business applications could benefit from this processing power. And here, for applications that are not already optimized for parallel processing, NVIDIA as seen speedups of as much as 100x.  That’s amazing – two orders of magnatude is transformational. So, applicatons will arise, and it is up to the entrepreneur to find them.


The computing revolution no one knows about

Posted: July 24th, 2009 | Author: | Filed under: GPGPU | 1 Comment »

I’m a pretty technical and informed person when it comes to information technology. I’m not an engineer (any more), but compared to most business people working in the industry, I know a fair bit. But I had no idea that there is a massive revolution going on in the computing industry. The fundamental paradigm that has powered the industry for the last 20 years has changed. It amazes me I didn’t know this. And I’m guessing that many techno-savvy people don’t either.

The Past

Since the mid 80′s, the computer industry has been built on the fact that the speed of computers doubles every 18 months. This is widely described as “Moore’s Law,” but Moore’s Law is slightly different.  Moore’s Law states that the density of transistors in a chip doubles every 18 months.  For nearly 20 years microprocessor companies could squeeze more transistors into their chips and run them twice as fast – doubling the frequency the chips run at. This doubling of frequency directly led to a doubling of performance. (Frequency ~  the number of instructions that can be run in a second.) Thus, year after year, the speed of a single processor grew at about 52%.  This predictable increase in processing power has driven the growth of the computer industry.  (Think about how awesome your iPhone is.  That wasn’t possible 4 years ago.)

The Change

People have been predicting the demise of Moore’s Law for years, but even with existing technology projections, Moore’s Law seems to have at least a few cycles left. That said, the predictable implication of Moore’s Law – that single processor performance doubles every 18 months – has already broken.

This is a huge huge point and suggests a fundamental shift in the computer industry. But before I discuss the implications, let me explain what is going on.

For 20 years, Intel kept doubling the frequency of their chips, thereby doubling performance. But in 2004, Intel hit “the power wall.” (Source) The power consumption of a chip is directly related to the frequency it runs at. So every time you double the frequency (other things being equal), you double the power required to run the chip. This gets costly in terms of straight electricity cost, but you also have to spend a lot on expensive air conditioning to suck the excess heat away. And the further you push the chips, the more these power costs dominate the benefit of increased performance. And on top of that, it gets difficult to cool these chips even if you wanted to. So single-processor performance stalled.

But Moore’s Law keeps trucking, so what can you do with the extra transistors? Well, instead of building a processor that is twice as fast, just build two of them.  When transistor density doubles again, build 4. Then 8. Etc.  You can see this taking place already in dual core and now quad core processors.

The Hidden Revolution

Anyone who’s been paying attention to computers knows that we’ve moved to a multi-core world. And maybe in the back of our minds we’ve wondered about the implications. Is 2 really twice as good as 1? But I could imagine one processor is doing my antivirus check while one lets me surf the net and do email. And when we get to 4? And 8? 16? … 128 cores? What in the world would my laptop do with 128 cores? And would it really be 128 times as fast as 1 core?

There’s the hidden revolution: the answer is, basically, “no.”  Or, at least, a 128 core computer is not going to be 128 times as fast as a 1 core computer without some serious changes in the industry.  That’s the revolution.

In the past, if you wrote a program and didn’t touch it, in 7 generations (about 18mths * 7 = 10.5 years), your program would run 128 times faster. (Ignoring I/O issues, which is obviously crucial). With many-core, to take advantage of the 128 cores, you would have to entirely rewrite your code to do parallel processing.  And this is assuming it is even possible to parallelize your code. If your program is fundamentally serial, then 7 microprocessor generations down the road it might not run any faster.

What This Means

There is a fundamental shift that needs to take place in the computing industry, from serial programming, to parallel programming. This is a very non-trivial change. Some companies, like Intel, are betting that it is too big a change, and their goal is to hide the complexity and paradigm shift from users beneath smarter processors, compilers, and OS’s. My gut feel, admittedly knowing little, is this is a short term solution at best. If the future of computing is parallel, then the future of programming will be parallel too, and all new students coming out of college or grad school will be well versed in parallel programming models.

Another implication may be that for many applications, speed may simply not improve very much. It is not obvious that all programs (or algorithms) are parallelizable. (In fact, many probably aren’t.) So if you have a set of tasks that require serial processing, the doubling of peformance every 18 months will no longer apply.

Conclusion

This is just a taste of the issues involved. (And I apologize for any mistakes or simplifications I’ve made – my knowledge is days old.) But the implications are huge.

Further Reading


Supercool: The personal supercomputer

Posted: July 17th, 2009 | Author: | Filed under: GPGPU | 2 Comments »

The discovery that got me re-energized about starting something, and that is providing my jump-off point for investigations, is the following:

You can now buy a 4 Teraflop supercomputer for under $10,000.

Why this is amazing

For someone like me (ie: know enough to be amazed, but not enough to be blase), this is extraordinary. In 1999, the world’s fastest supercomputer clocked in at 2 TFLOPS (Source). It was a typical supercomputer – taking up over 2,000 sq. ft. of space and probably costing close to $100 million (I’m just guessing on the cost). And now you can buy something TWICE as fast for 1/10,000th the cost, plug it in, and put it under your desk.

The thing that’s incredible to me is that this isn’t a comparison with some top-of-the-line computer from 1950, but from 1999. Sure – my cell phone has more processing power than the fastest supercomputer in the world from the distant past, but that comparison has lost meaning to me. Back then people were still driving around around in horse-drawn carriages and playing pong. But 1999 is CURRENT. That’s AFTER the internet exploded. That’s the modern era. And now that supercomputer - that top-of-the-line, hardcore, crazy technology - is now available for $10k. Wow.  [Actually, this comparison is a bit of apples vs. oranges, but it's close enough to true to be startling.]

How they do it

I think there is increasing activity in the field of personal supercomputing, but the one that got me started is the NVIDIA Tesla (Nvidia, Wikipedia). NVIDIA makes graphics accelerator chips - the chips in your computer used to render graphics (GPU – Graphic Processing Unit). These chips are designed to do all the things needed to display stuff on your computer screen. If you’re just reading text, that’s not that big a deal. But if you are playing a high-end video game on a wide-screen monitor, that IS a big deal.

It turns out that GPU’s, with some work, can be used to perform general purpose and floating point (ie: math) processing (General Purpose Computing on GPU).  NVIDIA put 240 of these GPU cores on a card, strung 4 of them together, and voila – you’ve got a 4 teraflop supercomputer for under $10k.

What are the implications?

This is where things get a bit murkier. The way these HPC (High Performance Computing) computers work is that they string togther hundreds or thousands of individual processors (or cores) together. You can only get a single processor to be so fast, but if you string 1000 together, then it’s a thousand times faster.

Actually, it’s not that easy.  It’s only a thousand times faster if you can split up your problem into 1000 smaller problems that can be worked on in parallel. This is fine for some hard math problems like weather simulation, nuclear modeling, and other traditional supercomputing tasks. But it doesn’t mean you can just throw a Tesla computer under your desk and have Windows boot in 1/10th of a second. (Even disregarding I/O issues.) So one big issue is solving problems that are parallel-izable.

Also, what really needs that sort of computing power? I’m particularly intersted in general-purpose business applications. Sure – experimental physicists and the defense department are always going to need hardcore computing power. But does your average company? A standard desktop computer that you can buy for under $1,000 probably has more computing power than most people currently ever use (gaming aside). And while businesses often need big computing power, it is currently more about I/O, databases, transactions, and web serving. This is cool stuff, no doubt, but different from raw processing power.

All that said, there will be uses for teraflop computing for the masses. I don’t know what they will be, but it sounds revolutionary to me and my initial research suggests that others also see this as an upcoming technology transition.

Next Steps

This is cool stuff. My goal now is to start pulling on the HPC thread and see what I can discover. The question I’m trying to answer is:

What becomes possible when you can buy a 4 TFLOP computer for under $10k?