Any sufficiently advanced serialization standard will let you serialize/deserialize a wide variety of objects. When J2EE SOAP libraries are passing large amounts of XML back and forth over the wire, it is often going to get instantiated via calling a zero-argument constructor and then firing a bunch of setter methods. As J2EE has learned to its sorrow, there are some good choices for objects to allow people to deserialize and some not so good ones.
If J2EE is a boring platform to you, pick your favorite and Google for a few variants. You'll find a serialization vulnerability. It's hard stuff, by nature.
* The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.*
Do you have a citation for this? What particular bug in the parser are you referring to? The behavior which is being exploited is a fairly complicated interaction between the parser and client Rails code -- I banged my head against the wall trying to get code execution with Ruby 1.8.7's parser for over 12 hours, for example, without any luck unless I coded a too-stupid-to-be-real victim class. (It's my understanding that at least one security researcher has a way to make that happen, but that knowledge was hard won.)
> Any sufficiently advanced serialization standard ... it is often going to get instantiated via calling a zero-argument constructor and then firing a bunch of setter methods.
Yes, this is always a bad idea. It's actually in a similar problem space as the constant stream of vulnerabilities in the Java security sandbox (eg, applets); all it takes is one mistake and you lose.
And thus, people have been saying to turn off Java in the browser for 4+ years, and this is also why Spring shouldn't have implemented such code.
> It's hard stuff, by nature.
Which is why deserializing into executable code is a bad idea, by nature. I'd thought this was well established by now, but apparently it is not.
> Do you have a citation for this? What particular bug in the parser are you referring to?
> This is systemic engineering incompetence that apparently pervades an entire language community
The original target of that claim was the Ruby community. With this comment allowing the same issue existing in the Java community, are you leveling the same claim against it? Does every severe security issue that remains unnoticed by a community for some time and is eventually noticed suggest pervasive engineering incompetence throughout that entire community? Maybe you would be entirely right to make that claim because any security issue is indicative of incompetence at some level, but I think the closer your definition of incompetence comes to including everybody, the less useful that definition is.
> Which is why deserializing into executable code is a bad idea, by nature. I'd thought this was well established by now, but apparently it is not
I'm not sure that means anything. In an OO language, you are always de-serializing into objects, and objects are always 'executable code'. Hashes and Arrays are executable code too, right?
The problem is actually when you allow de-serializing into _arbitrary_ objects of arbitrary classes, and some of those objects have dangerous side effects _just by being instantiated_, and/or have functionality that can turn into arbitrary code execution vector. (Hopefully Hash's and Array's don't).
It is a problem, and it's probably fair to say that you should never have a de-serialization format that takes untrusted input and de-serializes to anyting but a small whitelisted set of classes/types. And that many have violated this, and not just in ruby.
But if you can't even describe the problem/guidance clearly yourself, I think that rather belies your insistence that it's an obvious thing known by the standard competent programmer.
(I am not ashamed to admit it was not obvious to me before these exploits. I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.).
> I'm not sure that means anything. In an OO language, you are always de-serializing into objects, and objects are always 'executable code'. Hashes and Arrays are executable code too, right?
No. You're conflating code and state (which was the problem to begin with!)
Let's disassemble parsing a list of strings:
When you instantiate the individual string objects, you do not 'eval' the data to allow it to direct which string class should be instantiated. You also do not 'eval' the data to determine which fields to set on the string class.
You instantiate a known String type, and you feed it the string representation as an array of non-executable bytes using a method you specified when writing your code -- NOT a method the data specifies.
The data is not executable. It's an array of untrusted bytes. The string code is executable, and it operates on state: the data.
You repeat this process, feeding the string objects into the list object. At no point do you ask the data what class or code you should run to represent it. Your parsing code dictates what classes to instantiate, and the data is interpreted according to those fixed rules, and your data is never executed.
It should never be possible for data to direct the instantiation of types. The relationship must always occur in the opposite direction, whereby known types dictate how to interpret data.
> I think it was not obvious to a bunch of people who are in retrospect _claiming_ it was obvious to.
Given the preponderance of prior art, this seems unlikely.
The YAML vulnerability was not from any 'eval' in the YAML library itself, you realize, right?
It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
To use your language, I don't think it's 'intellectual honest' to call allowing de-serialization to data-specified classes "a YAML parser that executed code"--that's being misleading -- or to say that a 'trained monkey should have known it was a bad idea' (allowing de-serialization to arbitrary data-specified classes).
There have been multiple vulnerabilities _just like this_ in other environments, including several in Java (and in major popular Java packages). You could say with all that prior art it ought to have been obvious, but of course you could say that for each of the multiple prior vulnerabilities too. Of course, each time there's even more prior art, and for whatever reason this one finally got enough publicity that maybe this kind of vulnerablity will be common knowledge now.
> The YAML vulnerability was not from any 'eval' in the YAML library itself, you realize, right?
> It was from allowing de-serialization to arbitrary classes, when it turned out that some classes had dangerous side-effects merely from instantiation -- including in some cases, 'eval' behavior, yes, but the eval behavior wasn't in YAML, it was in other classes, where it could be triggered by instantiation.
What you are looking for is not "OO language", but "dynamic interpreted language".
In a traditionally compiled OO language like C++, classes cease to exist after compilation; there is no fully generic way to instantiate an object of a class by data determined at runtime. So this whole concept of deserializing to whatever the protocol specifies goes completely out of the door.
So your conclusion is that dynamically interpreted languages are all insecure?
(You can instantiate objects with classes specified by data in Java too, although Java isn't usually considered exactly dynamicaly interpreted. In fact, there was a very analagous bug in Spring, as mentioned in many places in this comment thread. But anyway, okay, sufficiently dynamically interpreted to allow instantiation of objects with classes chosen at runtime... is the root of the problem, you're suggesting, if everyone just used C++ it would be fine?)
One could argue that since every call goes through a runtime messaging framework, Objective C is really just an interpreted language with pre-JITed function bodies.
Here's the thing. You can not load YAML with attackable data. Period. If you do, you have to assume bad things are going to happen. The fact that psych calls []=(key, val) on instantiated objects in combination with ActionController::Routing::RouteSet::NamedRouteCollection calling eval on the key made for a particularly easy drive-by attack on a huge range of deployments, but even without the []=, there are still plenty of ways to exploit loading arbitrary YAML, though they may require more custom targeting.
In terms of that issue request, I doubt that adding a safe_load option would have stopped the Rails vulnerability. After all, the Rails guys _already knew_ that they should not be loading YAML from the request body; that's why it was not allowed directly. The issue was loading XML, which then allowed YAML to be loaded. Allowing YAML to be loaded there was a mistake; it seems unlikely that someone would make that mistake, while at the same time mitigating it by adding safe_load.
You're describing the previous Ruby on Rails vulnerability. The latest one involved them deliberately using the YAML parser to parse untrusted JSON data. Also, the RubyGems compromise was a result of them parsing gem metadata represented using YAML - since the metadata is YAML you pretty much have to use a YAML parser to parse it.
Using YAML to parse JSON was obviously non-optimal, which is (presumably) why Rails stopped doing it in 3.1 (thus the vulnerability your refer to is only present in 3.0 and 2.x).
W.r.t RubyGems, I hear what you're saying, but that doesn't mean there's a bug in psych. Even the feature request of adding a safe_load option strikes me as problematic...either you're limiting the markup to json with comments, or you'd have to name the option something like sort_of_safe_load.
Spring isn't the only widely-used Java framework that's had these problems. The Struts developers put a general-purpose interpreter (OGNL) in their parameter parsing pipeline, and thought they'd kept things safe by blacklisting dangerous syntax.
It would obviously be unfair to claim on this basis, or the recent problems with the Java browser plugin, that the "entire Java language community" has a bad attitude on security matters. Communities are big, each of them has a range of attitudes within it, and most importantly --- regardless of attitude --- sooner or later, everyone screws up.
Parsing is not deserialization. I keep having to say this on all these threads. There should be a giant glowing force field between these two practices.
The security fuckup is a lot more simple than that - they fucked up as soon as they opened the door to this kind of complicated interaction by letting untrusted code instantiate arbitrary classes and pass strings of their choice to them. Doesn't matter that they weren't aware of any way this could be exploited, as soon as they let an attacker pass data to random classes that were never designed to accept untrusted input a security disaster was basically inevitable.
http://wouter.coekaerts.be/2011/spring-vulnerabilities
If J2EE is a boring platform to you, pick your favorite and Google for a few variants. You'll find a serialization vulnerability. It's hard stuff, by nature.
* The bug in the YAML parser was reported and the author of the YAML library genuinely couldn't figure out why this mattered or how it could be bad.*
Do you have a citation for this? What particular bug in the parser are you referring to? The behavior which is being exploited is a fairly complicated interaction between the parser and client Rails code -- I banged my head against the wall trying to get code execution with Ruby 1.8.7's parser for over 12 hours, for example, without any luck unless I coded a too-stupid-to-be-real victim class. (It's my understanding that at least one security researcher has a way to make that happen, but that knowledge was hard won.)