Sunday, August 30, 2009

Why You Want a Dual or Better

The Macbook headphone problem seems to be solved most of the time by blowing in it. Why this is is beyond me. I'm still quite upset about it, but at least I can get my music playing now while I write.

That is, of course, not the point of this post, just an aside. I thought I'd write up the principle arguments for a dual socket system and then turn the next post over to a friend of mine who holds the opposing view.

First of all, if you do any database work using a process-based database such as Oracle or Postgres, the more hardware threads you have, the better, generally. Since, in most architectures, each database process maps to at least one thread or process in the middleware or front end, you will find things go faster if you have even more hardware threads. When, due to the constraints of the problem, you find that your system has hundreds of threads with tens of database connections, in a time-critical application, you will discover that the dual socket system with its lots higher hardware thread count will complete in a lot less time.

One argument that has been put forward is that a cluster is more efficient than a single large machine. It may be that it is more cost-effective, but it cannot be faster because the latency involved in communicating over even well-designed gigabit is much, much higher than communicating locally. You can architect around this, using lots more processes and so on, but at the end of the day, a single bigger machine will beat several smaller machines for database manipulation.

There are lots of work profiles that this is not true of, such as video processing, where, quixotically, a single socket machine easily eats up the bandwidth of any sufficiently large drive, meaning that more than one socket really isn't helpful. The problem with video actually becomes one of getting the video to the processor, not one of getting enough processor threads.

However, I and many like me find it lots more convenient to have one large box rather than many smaller boxes primarily because the maintenance cycle is so much lower. I don't have multiple machines to constantly shepherd.

It is true, barely, that multiple hot single socket machines are cheaper for the amount of processing power they have than one large machine, but there are mitigating factors even here. For starters, any enterprise geek will tell you that the disk performance matters, and once you're done outfitting those single machines with adequate disk, the price tilts in favor of a single large machine. This is due to the price of SSDs, which may decline enough in the future to reverse the position, but, for now, a single, fast SSD is enough for most profiles in the single big machine, but you will need one for each small machine to achieve the same level of performance.

Further, ram aggregation is a good thing. While there are consumer products that can take more than 8gb ram, they are frightfully expensive, both for the board and for the ram, which invariably uses 4gb modules, which are very expensive. A decent dual socket board will have a minimum of 8 dimm slots, for up to 32gb ram. 16gb ram at the moment is very cheap. Now, in a single big server, you will find that the majority of that ram goes to buffers, which are shared among your processes. In other words, lots of your data is common amongst the processes you're running, but with multiple machines, it all gets duplicated around.

Now, if you're doing 3D rendering or video processing and have already solved the throughput problem mentioned above (I recommend a massive NFS server with at least 2gb, preferably 8gb cache) then you don't really care because neither one of those really burns ram. However, if you are working with a database such as Postgres, you will find an 8gb system to be inadequate once you get the base system and the shared memory (I was running 3gb shared memory for Postgres alone!!!). You end up with as little as 3gb buffers and cache, which cause lots of disk access.

Switch to 16gb and you can run everything at once. You can run two 2gb Java instances with a 2gb Eclipse instance, 3gb shm for Postgres, about 1gb Postgres footprint, and a 1.2gb os footprint, for 9.2gb, leaving nearly 7gb buffers and cache, which is normally enough to avoid significant disk access. And, yes, you'd have to run at least two machines to achieve that same ram footprint using consumer hardware, meaning you either have to fiddle with 10g (don't tempt me; I already have one card) or accept lots higher database latency. Alternatively, you can get the whole thing into one machine by dropping some of the java footprints, but that significantly increases the cost of garbage collection, which slows down the java processes.

So, the reality is that, despite that the main reason I have such a big machine remains that dual socket machines are just more interesting to me, I really could not live on less broad shoulders. And yes, I am trying to save for a quad socket system in the future, which is why I have the 8000 series Opteron processors rather than the 2000 series, as it would only cost about three hundred to get two more 8358se procs, nine hundred for a quad socket board, and around $240 for 16gb more ram. My case is large enough, my psu is both large enough and equipped with the appropriate connectors. I do need to replace the stock fans (100mm case fans, four to a side) with Scythe Kaze2s, for around two to three times the airflow, as those 8358se procs run very hot.

I am also going to eventually add a second Radeon 3870HD because I am running into a problem with dual screens where if both screens are receiving significant amounts of data, they tend to stutter. It is possible the problem is actually in XFCE, which I must investigate first, but I do have to say the machine is pretty much bored out of its life, playing hulu in one screen and freeciv in the other. When the freeciv app starts moving stuff around in a hurry, the flash hulu player stutters, a problem that is worse full-screen. Since I only ever tax at most three cores, as evidenced by the fact that only three cores go to full speed with the on-demand governor, and since the hulu playing happens without ever causing a cpu to go above the floor freq (1.2ghz), I think I can reasonably assume it is not a cpu load problem. I think I can also reasonably assume it is not a bandwidth problem, as the video card is in an x16 PCI-e slot and the system has four DDR-2 667MHz memory busses, and is doing interleaving on top of that.

One more thing I'm fiddling with is the need to have open gl on this radeon card in linux. I just want the pretty open gl screensavers. Seriously.

No comments:

Post a Comment