What was interesting about Optane was that it was kinda an attempt to get rid of...

rcxdude · on Dec 5, 2023

The problem is optane wasn't as fast as DRAM, nor as cheap as disk. So you still needed the conceptual split anyway, and there wasn't a compelling reason to really use it (OK, I can get a lot of RAM but it's 1/10th the speed of my normal RAM, which will kill both memory latency and memory bandwidth limited applications, which is basically any workload which wants lots of RAM. Or I can get super fast persistent storage with high durability but much higher cost, which basically makes it good for a persistent cache but not that much else, considering SSDs are now similarly fast even though their durability is not as good). It wasn't sunk by a stuck-in-the-mud way of thinking about memory versus disk, it was sunk because the tech never got good enough to actually achieve their ambitions.

crabbone · on Dec 5, 2023

Where Optane could've been used: distributed storage. The whole game there is about how fast can you ack the writes while writing as far away as you can. The company I worked for in this specific area had used pmem when that was available for that specific reason.

I'm no expert on costs of these things and especially wouldn't dare to predict how these costs could've behaved in the future, but my guess would be not that the technology had no use, rather that it was too expensive for what it was good at, and the second best was cheap enough to warrant choosing it over Optane.

smueller1234 · on Dec 5, 2023

You're right and it did get used for that. Unfortunately that's nowhere large enough a use case to support the entire product/business/r&d.

hardware2win · on Dec 5, 2023

Hmm, ive seen report by fujitsu that it was fine if used with dram

>Intel Optane persistent memory is blurring the line between DRAM and persistent storage for in-memory computing. Unlike DRAM, Intel Optane persistent memory retains its data if power to the server is lost or the server reboots, but it still provides *near-DRAM performance*. In SAP HANA operation this results in tangible benefits

>Speeds up shutdowns, starts and restarts many times over – significantly reduce system downtime and lower operational costs

 Process more data in real-time with increased memory capacity

 Lower total cost of ownership by transforming the data storage hierarchy

 Improve business continuity with persistent memory and fast data loads at startup

https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-perfor...

mike_hock · on Dec 5, 2023

Like Itanium, it was one of those Intel projects that was simultaneously too ambitious and not ambitious enough.

If Intel really wanted to redesign the Von Neumann architecture, they would have had to be prepared to absorb losses for much longer, way north of a decade.

The alternative might have been to focus exclusively on providing SSDs using the new technology and maybe try to segue into this new memory architecture 10 years later. Like Itanium should have initially focused on beating competing x86_32 chips of the era in benchmarks and ship the new ISA as an afterthought.

Thank you, Intel, for trying to push the envelope, though.

acdha · on Dec 5, 2023

The other similarity was that they needed to treat developers like VIPs: Itanium failed in part because almost nobody was interested in paying a premium for a slow chip, licensed compiler, and then spend their time optimizing code to match competing chip’s out of the box performance. In both cases, they really needed to flood developers with free hardware & help - especially open source developers working on things like databases where you could see the most wins.

People might have bought Optane if the pitch was “Postgres/MySQL runs twice as fast” rather than hoping someone else would make your purchase cost-effective later.

avidphantasm · on Dec 5, 2023

I think it will be basically impossible to move away from von Neumann unless you control the entire stack, including OS and software. I don’t even think Apple could do it with Mac, because they are general purpose machines (for now at least). Maybe Apple could do it with iOS devices. Nintendo might be able to pull it off, though people trying port titles from other platforms may no longer try to do so. Because so many AAA titles go between XBox and PS, I don’t think Sony would try.

theandrewbailey · on Dec 5, 2023

I'm not sure if Sony would want to make a system with a very unique architecture again. Devs complained about how hard it was to program PS2 games, and again with PS3. PS4 and 5 are practically PCs by comparison.

acdha · on Dec 5, 2023

I don’t want to say that we’ll never see big architecture changes again but I think a company like Sony would want more confidence that they’d get real advantages. Cell wasn’t just unpopular with developers but also never delivered compelling performance; I suspect if they’d had a PC CPU and a Blu-ray player at the same price it would’ve sold identically.

Anyone trying this needs to figure out a decade-long schedule with points where something would be worth using for some reason so they don’t have to run the whole thing in a vacuum hoping it’ll be worth it at the end.

Dylan16807 · on Dec 5, 2023

At the very least wait for CXL to become commonplace before you get ready to cancel, since that's the kind of interface that is a perfect fit for optane.

zinodaur · on Dec 5, 2023

Yeah - the article was talking about mmap... but what i wanted was to not have to define persistence boundary. I wanted the entire in-memory state of my program to be persisted - perhaps even duplicated and moved elsewhere

tanelpoder · on Dec 5, 2023

mmap on Optane direct-access-aware (DAX) filesystems like EXT4/XFS now, is not like mmap on block devices where the OS gets in your way and pages stuff in from disk and (maybe) later syncs it back to persistent storage. Optane is the persistent storage, it's just usable/addressable as regular RAM as it's plugged in to DIMM slots.

And in the later Xeon architecture (Xeon Scalable 3rd gen, I think), intel expanded the persistence domain to CPU caches too. So, you didn't even have to bother with CLFLUSH and CLWB instructions to manually ensure that some cache lines (not 512B blocks, but 64B cache lines) get persisted. You could operate in the CPU cache and in the event of power loss, the CPU/mem controllers/and the capacitors on Optane DCPMMs ensured that the dirty cache lines got persisted to Optane before the CPU lights went off. But all this coolness a bit too late...

Another note: Intel's marketing had terrible naming for Optane stuff. Optane DCPMMs are the ones that go into DIMM slots and have all the cool features. Optane Memory SSDs (like Optane H10) are just NAND SSDs with some Optane cache in front of them. These are flash disks, installed in PCIe slots but Intel decided to call these disks "Optane Memory" ...

zozbot234 · on Dec 5, 2023

> mmap on Optane direct-access-aware (DAX) filesystems like EXT4/XFS now, is not like mmap on block devices where the OS gets in your way and pages stuff in from disk

Yes, the real case for Optane memory is that, supposedly, you don't have to fsync(). And insisting on proper fsync() tends to tank the performance of even the fastest NVMe SSD's. So the argument for a real, transformative performance improvement is there.

adastra22 · on Dec 5, 2023

Why would you not have to fsync? The fsync is a memory barrier that is just as useful with octane to ensure integrity.

Do you mean the latency of ensuring fsync safety is lower?

RyanHamilton · on Dec 5, 2023

No you don't have to fsync. Think of it like RAM. You don't fsync RAM.

adastra22 · on Dec 5, 2023

You do, in fact. It’s called a memory write barrier. Ensures consistency of data structures as needed. And it call stall the cpu pipeline, so there’s a nontrivial cost involved.

pbalcer · on Dec 5, 2023

The point is that on PMem that is simply "sfence", and not a potentially super-expensive "fsync" syscall... Fsync is an fsync, not a memory barrier...

adastra22 · on Dec 5, 2023

They both involve flushing cache to backing stores, and waiting for confirmation of the write. It’s literally the same thing. It’s just writing a cache line to RAM is orders of magnitude faster than writing a disk sector to storage, even with NVME SSDs. Octane is/was somewhere in the middle.

Dylan16807 · on Dec 5, 2023

> They both involve flushing cache to backing stores, and waiting for confirmation of the write.

No they don't. A fence only imposes ordering. It's instant. It can increase the chance of a stall when it forbids certain optimizations, but it won't cause a stall by itself.

CLWB is a small flush, but as tanelpoder explained the more recent CPUs did not need CLWB.

buildbot · on Dec 5, 2023

There are also fully optane based SSDs such as the 905p or the 5800x

tanelpoder · on Dec 5, 2023

Yes, which made it even more confusing, why call the Optane-cached consumer NAND disks "memory" ... but perhaps they thought that it's easier to fool the consumer segment (?)

barrkel · on Dec 5, 2023

That's an easy way to accumulate data corruption.

It's better to design for unexpected restarts than design for a golden in-memory image which needs to be carefully ported around, have all its connections wired back up, and so on.

You're going to get unexpected restarts anyway. The faster and more reliable you can make recovery from that, it benefits you in the moving use case. The kinds of things you might want to do to enable reliable restart - like retry mechanisms for incoming requests - make migration work too.

tanelpoder · on Dec 5, 2023

You can design for that. For example, when building a persistent-memory native database engine, you probably need some sort of data versioning anyway - either Postgres style multi-version rows (or some other memory structures) that later need to be vacuumed or Oracle/InnoDB style rollback segments that hold previous values of some modified objects. Then you probably want WAL for efficient replication to other machines and point in time recovery (in case things go wrong or just DB snapshots for dev/test).

Transient & disposable memory structures like keeping track who's logged in or compiled SQL execution plans that facilitate access to the persistent "business data", much of that stuff will need to be in RAM/HBM/CPU cache anyway, for performance reasons and as these things do not necessarily need to persist across a crash/reboot. The data (and likely indexes, etc) need to. But you won't need a buffer cache manager that copies entire blocks around from storage to different places in memory and vice versa. Your giant index or graph could rely just on direct memory pointers instead of physical disk block addresses that need to get read to somewhere in memory and then are accessed via various hashtable lookups & indirect pointers. And you don't have to ship entire 512B-8kB blocks around just to access the next index/graph pointer, just access only the relevant cache line, etc.

With proper design, you'd still have layers of code that take care of coherency, consistency and recovery...

ddalex · on Dec 5, 2023

But then you can't just reset the state when crashes happen and data corrupts

gumby · on Dec 5, 2023

This shouldn’t be voted down. This problem, in the more general case, was inherent at the system level with the persistent and PARC’s immersive Smalltalk and Interlisp environments on the D-Machines. It was much better to have full source you could reload into a fresh environment.

gwd · on Dec 5, 2023

I'll admit this sounded cool when I first heard about it; but it's actually a lot harder to program if you want to be able to recover from sudden power outages (which would be the main reason for having persistence in the first place).

addaon · on Dec 5, 2023

Look at developing for MSP430s with FRAM -- these microcontrollers have a decent amount of FRAM with full persistence, full XIP etc, up to 256 kB; but only 8 kB or less of traditional SRAM. Even in this world, where you /could/ have everything persisted, you still end up aware of the persistence boundary and using SRAM both for the absolutely highest-performance code (e.g. interrupt handlers; FRAM has more wait states than SRAM in this implementation), but more interestingly for things that specifically /should not/ be persisted (e.g. the bytes storing whether your POST has completed, hardware initialized, etc). You can come close to persistence-oblivious, especially at a conceptual "application layer", but the overall implementation still ends up persistence-aware.

jared_hulbert · on Dec 5, 2023

memverge.com does some cool work around making that happen.

hlandau · on Dec 5, 2023

Still waiting for the Memristor...

pclmulqdq · on Dec 5, 2023

Optane was probably a memristor technology under the hood.

jamiek88 · on Dec 5, 2023

Oh jeez I was so excited for that!

dghughes · on Dec 5, 2023

At first I was resistant but now I am charged for it.

Aardwolf · on Dec 5, 2023

I had a calculator from the early 90s like that, the HP 48GX, 128K RAM, but it kept it storage when shutting it off and on again. Certainly very convenient!

xtracto · on Dec 5, 2023

I was just thinking about optane now that I'm dealing with AWS lambda cold starts.

It wold be a great use for that technology to decrease function startup time .

ori_b · on Dec 5, 2023

The problem is that if you want to keep the server state around, there's already a cheap and easy solution: write a persistent server.

Dalewyn · on Dec 5, 2023

So the winning move now is for Apple to announce iOptane and sweep the data storage and cache industries by storm?

ritwikgupta · on Dec 5, 2023

Apple may already be headed in that direction. They already have unified CPU and GPU RAM. It doesn’t seem far-fetched to imagine that they could unify persistent storage and memory.

znpy · on Dec 5, 2023

Knowing apple’s marketing moves, they could definitely do that: just use a single number to describe memory. And then pretend it’s a big number.

FirmwareBurner · on Dec 5, 2023

I can already see it: base 128GB of total system unified memory for your files and data.

Affric · on Dec 5, 2023

Keep all other files in iCloud for big money. Genius

satiric · on Dec 5, 2023

Well technically, Intel and AMD both use the regular system RAM for their integrated graphics VRAM, but I see what you mean.

masklinn · on Dec 5, 2023

Intel and AMD also do support unified memory for their integrated graphics. It’s been a while since you needed a statically cordoned-off area of main memory (“shared memory”) for the iGPU to work.

Consoles have been using unified memory since the 8th gen (PS4/XB1/Switch, kinda sorta even WiiU).

And NVidia has CUDA unified-memory slide decks going back at least 5 years.

alpaca128 · on Dec 5, 2023

The Xbox360 already had unified memory too. That gave it a slight edge compared to the PS3 in the long term because it was more flexible compared to a fixed 50:50 split.

NikkiA · on Dec 5, 2023

The original xbox before it too.

pezezin · on Dec 5, 2023

And the N64 before them. I think it was the first console with unified memory.

DeathArrow · on Dec 5, 2023

I think they'd rather call it iMemory and jack up the price of the iDevices using it by 25‰.

rewmie · on Dec 5, 2023

> What was interesting about Optane was that it was kinda an attempt to get rid of disk and ram distinction.

That might be the promise, but that premise is fundamentally flawed. I mean, what drives the need to classify memory devices is performance characteristics, and even if your starting point is this idealized world where Optane was ubiquitous then all it took was a ephemeral memory technology to significantly outperform Optane to create need support the performance-oriented non-persistent memory type.