I'm really surprised this can work at all in any automated way. You can't just m...

morgante · on Aug 3, 2024

Line by line is infeasible, which is precisely why you need to use AI to make larger semantic inferences.

You also don't have to one-shot translate everything. One of the valuable things about the Rust compiler is it gives lots of specific information that you can feed back into an LLM to iterate.

I've been working on similar problems for my startup (grit.io) and think C -> Rust is definitely tractable in the near term. Definitely not easy but certainly solvable.

stogot · on Aug 3, 2024

What about convert to AST then ask the AI to convert to Rust. Would that work?

Someone · on Aug 3, 2024

That’s probably the rout they would take, but the C AST won’t have ownership attributes. You‘d have to discover those yourself.

ASTs also don’t have much info on threading (that’s more or less limited to “the program starts a thread with entry point foo at some time”, “Foo waits for another thread to finish”)

morgante · on Aug 3, 2024

Foundation models aren't primarily trained on ASTs, so you're typically going to have worse results than just using text unless you do extensive fine-tuning yourself.

ASTs also generally don't actually have magical information in them. They won't solve the lifetime issues for you.

Someone · on Aug 3, 2024

> Pointers and aliasing are ubiquitous in c programs

If we ignore multi-threaded programs is long term aliasing actually ubiquitous in C programs? For many programs, I would expect most of it to happen within the scope of a single function (and within it, across function calls, but there, borrowing will solve this, won’t it?)

If so I would trying to tackle that as one sub-problem (you have to start somewhere), and detecting how data gets shared between threads as another. For the latter, I expect that many programs will have some implicit ownership rule such as “thread T1 puts stuff in queue Q where thread T2 will pick it up” that can be translated as “putting it in queue transfers ownership”.

Detecting such rules may not be easy, but doesn’t look completely out of reach for me, either, and that would be good enough for a research project.

ip26 · on Aug 3, 2024

For a naive newcomer - could you go line by line, wrap the whole thing in “unsafe”, compile to an identical binary, and then slowly peel away the “unsafe” while continuing to validate equivalence?

That would at least get you to as much rust as possible, and then let engineers tackle rethinking just those concepts.

jcranmer · on Aug 3, 2024

Converting C to legal (unsafe) Rust is quite possible; there is indeed already a tool that does this (https://github.com/immunant/c2rust).

The problem you run into is that the conversion is so pedantically correct that the resulting code is useless. The result retains all of the problems that the C code has, and is so far from idiomatic Rust that it's easier to toss the code and start from scratch. Progressive lifting on unsafe Rust to safe Rust is a very difficult order, and the tool I mentioned had a tool to do that... which is now abandoned and unmaintained.

At the end of the day, the chief issue with converting to safe Rust is not just that you have to copy semantics over, but you also have to recover a lot of high-level preconditions. Turning pointers into slices is perhaps the easiest task of the lot; given the very strict mutability rules in Rust, you also have to work out when and where to insert things like Cell or Rc or Mutex or what have you, as well as building out lifetime analysis. And chances are the original code doesn't get all these rules right, which is why there are bugs in the first place.

Solving that problem is the goal of this DARPA proposal, or perhaps more accurately, determining how feasible it is to solve that problem automatically. Personally, I think the better answer is to have a semi-automated approach, where users provide as input the final Rust struct layouts (and possibly parts of the API, to fix lifetime issues), and the tool automates the drudgery of getting the same logic ported to that mapping.

Animats · on Aug 3, 2024

Right. Used c2rust once. Been there, done that. The Rust code that comes out is awful. Does the same thing as the C code, bugs and all. You don't get Rust subscript check errors, you get segfaults from unsafe Rust code. What comes out is hopeless for manual "refactoring".

The hardest part may be Rust's affine type rules. Reference use in Rust is totally different than pointers in C/C++. Object parenting relationships are hard to express in Rust.

j-krieger · on Aug 3, 2024

There are "warts" with unsafe Rust that would make this feat very difficult. Aliasing rules still apply.

rolph · on Aug 3, 2024

you need to create a transpiler philosophy.

transform CtoASM, then ASMtoRust.

what you need to avoid is incompatibilites between different high level languages with a low level intermediary so you arent stuck attempting to convert high level hardware abstraction directly to another high level hardware abstraction.

alkonaut · on Aug 3, 2024

A line-by line doesn't require much "AI" either. You could probably make a rough translation in some (mostly unsafe) Rust.

Assume the AI actually needs to figure out lifetimes and so on to be actually useful and make valid programs. Which would be impressive if it does.

alex_suzuki · on Aug 3, 2024

I wonder about this as well, especially im code bases that make heavy use of macros.