My reading of the article is not that it promotes "anti-intellectualism" or that it argues to remove all implicit precedence or associativity rules.
I think it actually gives a nuanced answer to your question "where do you draw the line", namely: With a partial precedence scheme, one does not need explicit parentheses for those operator combinations for which the precedence is "clear enough". E.g., in "3 + 2 * 4" precedence is clear to virtually all programmers because it follows standard rules from math ("PEMDAS"), so that is why the precedence table in the post specifies an ordering. However, especially for operators which are less frequently used, or where precedence does differ between languages, one should give parentheses for clarity. I think this is a very sensible argument.
I have certainly made errors because of unclear precedence (e.g., with boolean operators, exponentiation, casting etc.) in the past. And given that there even is a CWE number (https://cwe.mitre.org/data/definitions/783.html) for this kind of error, it seems frequent enough to warrant discussion.
I don't know Swift, but does this mean there are expressions in Swift where the compiler returns an error, because the expression is ambiguous without additional parentheses?
(My understanding of the article is that it mainly argues for _partial_ precedence, not so much that precedence/associativity can be defined in the program - the latter is also the case in other languages, e.g., Haskell.)
I don't think that's a fair assessment, and it is also not fair to the designers of WebAssembly. They certainly have spent a lot of thoughts on protecting the host (e.g., browser, underlying OS) from _malicious_ WebAssembly code. This is what we call "host security" in the introduction of the paper. Modulo implementation bugs (which is orthogonal to the design of the language), WebAssembly applications cannot break out of the sandbox more than arbitrary JavaScript already can.
However, what we look at in the paper is "binary security", i.e., whether _vulnerable_ WebAssembly binaries can be exploited by malicious _inputs_ themselves. Our paper says: Yes, and in some cases those write primitives are more easily obtainable and more powerful than in native programs. (Example: stack-based buffer overflows can overwrite into the heap; string literals are not truly constant, but can be overwritten.)
> They certainly have spent a lot of thoughts on protecting the host (e.g., browser, underlying OS) from _malicious_ WebAssembly code.
You say that like the Java Virtual Machine designers didn't do the same thing.
> Modulo implementation bugs (which is orthogonal to the design of the language), WebAssembly applications cannot break out of the sandbox more than arbitrary JavaScript already can.
That has a very familiar sound to it. ;-)
> However, what we look at in the paper is "binary security", i.e., whether _vulnerable_ WebAssembly binaries can be exploited by malicious _inputs_ themselves. Our paper says: Yes, and in some cases those write primitives are more easily obtainable and more powerful than in native programs. (Example: stack-based buffer overflows can overwrite into the heap; string literals are not truly constant, but can be overwritten.)
I think you've highlighted an important subtlety here for sure. Would you say that Java applets were similarly vulnerable?
Good summary and clarification. Yes, we do not aim to break out of the (browser) sandbox, and yes the example exploits only use functions that are already imported into the vulnerable WebAssembly module.
However, I would draw a bit more attention to the consequences when memory vulnerabilities in a WebAssembly binary are exploited:
(1) Not every WebAssembly binary is running in a browser or in a sandboxed environment. The language is small, cool, and so more people are trying to use it outside of those "safe" environments. E.g., "serverless" cloud computing, smart contract systems (Ethereum 2.0, EOS.IO), Node.js, standalone VMs, and even Wasm Linux kernel modules. With different hosts and sandboxing models, memory vulnerabilities inside a WebAssembly binary can become dangerous.
(2) Even if an attacker can "only" call every function imported into the binary, it depends on those imported functions how powerful the attack can be. Currently, yes, most WebAssembly modules are relatively small and import little "dangerous" functionality from the host. I believe this will change, once people start using it, e.g, for frontend development -- then you have DOM access in WebAssembly, potentially causing XSS. Or access to network functions. Or when the binary is a multi-megabyte program, containing sensitive data inside its memory.
Sure, the warning is early, but I'd rather fix these issues before they become a common problem in the wild.
You are right, some of the issues highlighted in the paper could be solved by compilers targeting WebAssembly. One such mitigation that is (currently) missing are stack canaries. In contrast, stack canaries are typically employed by compilers when targeting native architectures. They also cost performance there (typically single digit percentages), but evidently compiler authors have decided that this cost is worth the added security benefit, since fixing "old C issues" in all legacy code in existence is not realistic.
Note however, that other security issues highlighted in the paper _are_ characteristics of the language, notably linear memory without page protection flags. One consequence of this design is that there is no way of having "truly constant" memory -- everything is always writable. I do think that this is surprising and certainly weaker than virtual memory in native programs, where you _cannot_ overwrite the value of string literals at runtime.
The decision to not (yet) have finer-grained protection for WebAssembly memory pages was in part motivated by the JS API surface in the web embedding. WebAssembly memories are exposed to JS as ArrayBuffers, and optimizing JITs can and do optimize ArrayBuffer access down to naked stores and loads. In addition to the optimized JIT code, ArrayBuffers are used throughout the webstack for moving data into and out of programs (read: Chrome's 6 million lines of code). The entire webstack is not really prepared for read-only ArrayBuffers causing signals underneath.
That said, we always knew a day would come where WebAssembly would get the ability to set more page protections. For non-web embeddings, this is easier, but as of yet, none of the engines I know of have either an API or are really prepared for the implications of this. I am still bullish on eventually getting that capability, though.
Thanks a lot for the additional background, and also for all the work on WebAssembly. It is a very cool language and having it available with linear memory now is much better than if it were still in the works due to figuring out page protections.
As you said, I just hope page protections can still be added later (somebody needs to specify it, embedders need to be able to implement them, toolchains need to pick it up, etc.).
Maybe memory vulnerabilities inside WebAssembly programs can also be mitigated in other ways that do not require such pervasive changes, e.g., by keeping modules small and compiling each "protection domain" (e.g., library) into its own module, or to have a separate memory. I am not sure about the performance cost of such an approach, though.
Daniel [1], one of the authors here. Feel free to ask also technical questions, I'll try my best to answer them.
If you don't feel like reading the whole paper, there is also a short (~10 min) video on the conference website [2], where I explain the high-level bits.
Great overview. Would it be correct to characterize the fwrite capability as one of the more concerning potential exploits (ie. esp. when combined with other browser vulnerabilities)?
You are referring to the example exploit in section 5.3, right? Please note that this example is for a standalone VM, not inside the browser (where JavaScript programs -- and by extension, WebAssembly modules -- do not have direct access to the filesystem).
Whether that exploit is more or less concerning than the browser and Node.js examples, I think is hard to answer in general without additional qualifications. If the standalone VM uses fine-grained capabilities (e.g., libpreopen) or is sandboxed, then changing the file that is being written to might be possible inside WebAssembly memory but access could be blocked by the VM.
> This doesn't really seem to offer anything useful, or even contain new information.
You are right in that we do not try to attack a concrete host implementation or aim to break out of the sandbox.
I disagree, however, that security inside a WebAssembly binary is irrelevant. If an attacker can arbitrarily read/write in linear memory, it depends on the imported host functions what can be done. The larger WebAssembly applications are (and we believe they will become larger in the future, and incorporate DOM functions, as in our PoC application), the higher the risk that host APIs can be abused. This is troubling especially where there is no sandbox (on some standalone VMs), but even in browsers (where it opens the door for a new type of cross-site scripting).
For example, even if you only call JS eval() with a constant string in your C code, because WebAssembly has no truly constant memory, this can give XSS to an attacker with a memory write primitive inside linear memory.
> The html page there seems to be missing the "main.js" to see what's happening. :(
Oops, good point. They are now pushed, sorry about that.
> main.js, which looks (without checking) like it came from Emscripten
On one hand, the problem doesn't seem to be any different than happens with standard JS. The exploit was possible only because the wasm (literally) did no input validation. And "validate all input" is the first thing web programmers learn. (or very close to first ;>)
On the other hand, it is a vector. And we've seen plenty of cases where vectors are chained together in novel ways to enable unexpected attacks. So there's that. ;)
> Oops, good point. They are now pushed, sorry about that.
> C++ compiled to WebAssembly generally manages its own parallel "shadow stack" in linear memory
In the paper, we call the compiler-organized stack in linear memory the "unmanaged stack", to differentiate it from the "evaluation stack" (WebAssembly is a stack-based VM, so this contains arguments and results of instructions) and the "managed call stack" (contains call frames, return addresses, local variables. Managed by the VM, cannot be inspected explicitly by WebAssembly instructions).
> attacker can only jump to a function with a compatible type signature
This is true, but note that WebAssembly types are fairly low-level. That is, there are only four primitive types (i32/i64, f32/f64) and, e.g., a C function that takes a string (char *) and a size_t would be type-compatible with a function that takes a signed int and a struct pointer (all those four types map to the same Wasm type: i32).
(In asm.js, memory was provided by an ArrayBuffer of fixed size, so there memory could truly not grow at runtime.)