Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How about also comparing the power usage? At 25 watts, you can get three of these for one six core Intel CPU; so you're at 24 cores at 2 GHz vs 6 cores at 3 GHz (and probably still at a lower price). The GHz can't really be directly compared, though. Even comparing GHz across Intel product generations isn't useful. I have a 3.16 GHz Core 2 Duo in my desktop that I think (I haven't really benchmarked, but I've run a Litecoin miner on both for testing) does about half the work of the 3.2 GHz i7 in my laptop.

All that said, I have a 16 core AMD server in colo that is running at about 3% usage across all CPUs, and yet it is slow as hell because the disk subsystem can't keep up (replacing the spinning disks with SSDs as we speak). The reality is that CPU is not the bottleneck in the vast majority of web service applications. Memory and disk subsystems are the bottlenecks in every system I manage.

So, I love the idea of a low power, low cost, CPU that still packs enough of a punch to work in virtualized environments. Dedicate one of each of these cores to your VMs; would be pretty nice, I think.



The Avoton Atom C2750 which is already out now is also 8 cores, but at 2.4GHz and with the entire SoC at 20W. It's supposed to have comparable performance to a Xeon E5520 quad-core/8-thread 2.26GHz CPU from 3 generations back, or about half the performance of a current quad-core/8-thread Xeon E3 1230v3. And it supports virtualization extensions.

I agree that I/O and not the CPU is usually the bottleneck though.


That's pretty impressive, actually. I just built a new server, and was comparing 80W and 95W packages. I didn't realize Intel had sorted out power so effectively that they could compete with ARM architecture. (Though, to be fair, this new ARM from AMD is a beast...ARM's grown up and Intel has shrunk down.)

Three generations back is still plenty fast for me. I'm running a Core 2 Duo on my desktop, as mentioned, and I don't see any reason to upgrade. I can't imagine needing vastly more in a web server.

I think it'd be interesting to see how these two product lines stack up on all the variables: cost, performance, power under load, etc.


The AMD might have more IO bandwidth - ships with dual 10GbE, while the C2750 ships with 4x2.5GbE (usually 1Gb unless on backplane), although it has PCIe too, who knows about total bandwidth.


I've got a C2758 on my desk with dual 10G over PCIe.

works just fine.


Except that cost ~$700 for the processor + mobo + NIC, right? Seattle is supposed to be cheaper.


The Rangely C2758 adds QuickAssist for even more crypto/compression throughput.

http://ark.intel.com/products/77988/Intel-Atom-Processor-C27...

I have one of these sitting on my desk with an m-sata SSD. It compiles emacs just as fast as a R510 with 2 X5570s @ 2.93GHz.

When Intel goes 'tock' on these this Summer and manufactures them in 14nm rather than the current 22nm, the power consumption will drop even further.

and then there is Denverton. :-)


I think you mean 'tick' for a die shrink, which is Denverton, and is supposed to have "more cores and more of everything". 'tock' refers to a new microarchitecture, and the details on the generation that will follow Denverton haven't been announced yet.

There's also going to be the Broadwell SoC, which will fit somewhere between Denverton and E3 v4's.


I had no idea this existed!


Memory / Cache subsystems are generally where ARM chips fall down (in terms of throughput), and Intel has several patents on cache hierarchies that AMD is licensed to use.

So it's technically possible that AMD could build ARM chips that are a lot more competitive (FLOPS-wise) with Intel than other manufacturers can.


The efficiency of a memory hierarchy doesn’t factor into raw FLOPS throughput at all. Rather, it effects your ability to bring real data into registers and get useful work done.

What sort of issues with ARM memory/cache do you have in mind? These systems have been sufficiently powerful to keep the ALUs saturated on compute-heavy tasks on all recent ARM micro architectures with which I am familiar.


What?!

Of course it does in real life - unless you're working on very small amounts of data, cache level latencies (where Intel chips - non-atom at any rate - generally have much lower latencies) and cache pre-fetchers and branch prediction units (where Intel is generally 5/6 years ahead) can make or break the difference between the FP units being constantly busy or regularly stalled waiting for data.


In mainstream raw-flops workloads (things like lapack), a correct implementation re-uses the data from each load many times such that the FPUs are not “stalled waiting for data”. Unless the software implementation is terrible, the memory hierarchy does not pose a significant bottleneck for these tasks, and even older ARM designs like Cortex-A9 can achieve > 75% FPU utilization, comparable to x86 cores.

There are more specialized HPC workloads (sparse matrix computation, for example) where gather and scatter operations are critical, and the efficiency of the memory hierarchy comes much more into play (but in these cases even current x86 designs are stalled waiting for data). There are also streaming workloads (which you seem to reference) where you have O(1) FPU op data element, which stress raw memory throughput and prefetch. However, one doesn't typically use these to make a general claim about which core is "more competitive (FLOPS-wise)”, precisely because they are so dependent on other parts of the system.


fair enough - I guess I might be biased in my interpretation of what "normal" use-cases are...


I'm talking about real-world usage - not benchmarks that only stretch the NEON FP units in unrealistic situations.


What "real-world usage" are you talking about, specifically?

EDIT: looking at your comment history, you seem to be focused on VFX tasks, which tend to be entirely bound by memory hierarchy; even on x86 the FPU spends most of its time waiting for data. For a workload like that, you absolutely want to buy the beefiest cache/memory system you can, but that shouldn’t be confused with a processor being more competitive “flops-wise".


I don't know why u guys are arguing, he wouldn't have much use this ARM part. Where this will excel is in CRUD web apps. The ARM part it there just to shuttle data between the 10Ge and the 128GB of ram.


Well, Intel vaguely sells a 4-core Xeon at 1.8/2.8 GHz with 25 W TDP [1], then between the hyperthreading and higher IPC you've mostly made up for the fewer cores relative to this ARM part.

[1] http://ark.intel.com/products/75053/Intel-Xeon-Processor-E3-...


An i7 will auto-boost its clock speed (if it's cool enough); it could actually be running at 3.9GHz during your mining. Won't make up twice the work the way, so there's still some architectural improvements, but that's not all the story.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: