How about also comparing the power usage? At 25 watts, you can get three of thes...

hhw · on Jan 29, 2014

The Avoton Atom C2750 which is already out now is also 8 cores, but at 2.4GHz and with the entire SoC at 20W. It's supposed to have comparable performance to a Xeon E5520 quad-core/8-thread 2.26GHz CPU from 3 generations back, or about half the performance of a current quad-core/8-thread Xeon E3 1230v3. And it supports virtualization extensions.

I agree that I/O and not the CPU is usually the bottleneck though.

SwellJoe · on Jan 29, 2014

That's pretty impressive, actually. I just built a new server, and was comparing 80W and 95W packages. I didn't realize Intel had sorted out power so effectively that they could compete with ARM architecture. (Though, to be fair, this new ARM from AMD is a beast...ARM's grown up and Intel has shrunk down.)

Three generations back is still plenty fast for me. I'm running a Core 2 Duo on my desktop, as mentioned, and I don't see any reason to upgrade. I can't imagine needing vastly more in a web server.

I think it'd be interesting to see how these two product lines stack up on all the variables: cost, performance, power under load, etc.

justincormack · on Jan 29, 2014

The AMD might have more IO bandwidth - ships with dual 10GbE, while the C2750 ships with 4x2.5GbE (usually 1Gb unless on backplane), although it has PCIe too, who knows about total bandwidth.

gonzo · on Jan 29, 2014

I've got a C2758 on my desk with dual 10G over PCIe.

works just fine.

wmf · on Jan 29, 2014

Except that cost ~$700 for the processor + mobo + NIC, right? Seattle is supposed to be cheaper.

gonzo · on Jan 29, 2014

The Rangely C2758 adds QuickAssist for even more crypto/compression throughput.

http://ark.intel.com/products/77988/Intel-Atom-Processor-C27...

I have one of these sitting on my desk with an m-sata SSD. It compiles emacs just as fast as a R510 with 2 X5570s @ 2.93GHz.

When Intel goes 'tock' on these this Summer and manufactures them in 14nm rather than the current 22nm, the power consumption will drop even further.

and then there is Denverton. :-)

hhw · on Jan 30, 2014

I think you mean 'tick' for a die shrink, which is Denverton, and is supposed to have "more cores and more of everything". 'tock' refers to a new microarchitecture, and the details on the generation that will follow Denverton haven't been announced yet.

There's also going to be the Broadwell SoC, which will fit somewhere between Denverton and E3 v4's.

sitkack · on Jan 29, 2014

I had no idea this existed!

berkut · on Jan 29, 2014

Memory / Cache subsystems are generally where ARM chips fall down (in terms of throughput), and Intel has several patents on cache hierarchies that AMD is licensed to use.

So it's technically possible that AMD could build ARM chips that are a lot more competitive (FLOPS-wise) with Intel than other manufacturers can.

stephencanon · on Jan 29, 2014

The efficiency of a memory hierarchy doesn’t factor into raw FLOPS throughput at all. Rather, it effects your ability to bring real data into registers and get useful work done.

What sort of issues with ARM memory/cache do you have in mind? These systems have been sufficiently powerful to keep the ALUs saturated on compute-heavy tasks on all recent ARM micro architectures with which I am familiar.

berkut · on Jan 29, 2014

What?!

Of course it does in real life - unless you're working on very small amounts of data, cache level latencies (where Intel chips - non-atom at any rate - generally have much lower latencies) and cache pre-fetchers and branch prediction units (where Intel is generally 5/6 years ahead) can make or break the difference between the FP units being constantly busy or regularly stalled waiting for data.

stephencanon · on Jan 29, 2014

In mainstream raw-flops workloads (things like lapack), a correct implementation re-uses the data from each load many times such that the FPUs are not “stalled waiting for data”. Unless the software implementation is terrible, the memory hierarchy does not pose a significant bottleneck for these tasks, and even older ARM designs like Cortex-A9 can achieve > 75% FPU utilization, comparable to x86 cores.

There are more specialized HPC workloads (sparse matrix computation, for example) where gather and scatter operations are critical, and the efficiency of the memory hierarchy comes much more into play (but in these cases even current x86 designs are stalled waiting for data). There are also streaming workloads (which you seem to reference) where you have O(1) FPU op data element, which stress raw memory throughput and prefetch. However, one doesn't typically use these to make a general claim about which core is "more competitive (FLOPS-wise)”, precisely because they are so dependent on other parts of the system.

berkut · on Jan 29, 2014

fair enough - I guess I might be biased in my interpretation of what "normal" use-cases are...

berkut · on Jan 29, 2014

I'm talking about real-world usage - not benchmarks that only stretch the NEON FP units in unrealistic situations.

stephencanon · on Jan 29, 2014

What "real-world usage" are you talking about, specifically?

EDIT: looking at your comment history, you seem to be focused on VFX tasks, which tend to be entirely bound by memory hierarchy; even on x86 the FPU spends most of its time waiting for data. For a workload like that, you absolutely want to buy the beefiest cache/memory system you can, but that shouldn’t be confused with a processor being more competitive “flops-wise".

sitkack · on Jan 29, 2014

I don't know why u guys are arguing, he wouldn't have much use this ARM part. Where this will excel is in CRUD web apps. The ARM part it there just to shuttle data between the 10Ge and the 128GB of ram.

brigade · on Jan 29, 2014

Well, Intel vaguely sells a 4-core Xeon at 1.8/2.8 GHz with 25 W TDP [1], then between the hyperthreading and higher IPC you've mostly made up for the fewer cores relative to this ARM part.

[1] http://ark.intel.com/products/75053/Intel-Xeon-Processor-E3-...

jccooper · on Jan 29, 2014

An i7 will auto-boost its clock speed (if it's cool enough); it could actually be running at 3.9GHz during your mining. Won't make up twice the work the way, so there's still some architectural improvements, but that's not all the story.