> mmap on Optane direct-access-aware (DAX) filesystems like EXT4/XFS now, is not like mmap on block devices where the OS gets in your way and pages stuff in from disk
Yes, the real case for Optane memory is that, supposedly, you don't have to fsync(). And insisting on proper fsync() tends to tank the performance of even the fastest NVMe SSD's. So the argument for a real, transformative performance improvement is there.
You do, in fact. It’s called a memory write barrier. Ensures consistency of data structures as needed. And it call stall the cpu pipeline, so there’s a nontrivial cost involved.
They both involve flushing cache to backing stores, and waiting for confirmation of the write. It’s literally the same thing. It’s just writing a cache line to RAM is orders of magnitude faster than writing a disk sector to storage, even with NVME SSDs. Octane is/was somewhere in the middle.
> They both involve flushing cache to backing stores, and waiting for confirmation of the write.
No they don't. A fence only imposes ordering. It's instant. It can increase the chance of a stall when it forbids certain optimizations, but it won't cause a stall by itself.
CLWB is a small flush, but as tanelpoder explained the more recent CPUs did not need CLWB.
Yes, the real case for Optane memory is that, supposedly, you don't have to fsync(). And insisting on proper fsync() tends to tank the performance of even the fastest NVMe SSD's. So the argument for a real, transformative performance improvement is there.