Why are we templating YAML?

joeduffy · on Feb 7, 2019

My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML aren't sufficiently expressive -- lacking for loops, conditionals, variable references, and any sort of abstraction -- so, of course, we add templates to it. But as the author (IMHO rightfully) points out, we just end up with a funky, poor approximation of a language.

I think this approach is a byproduct of thinking about infrastructure and configuration -- and the cloud generally -- as an "afterthought," not a core part of an application's infrastructure. Containers, Kubernetes, serverless, and more hosted services all change this, and Chef, Puppet, and others laid the groundwork to think differently about what the future looks like. More developers today than ever before need to think about how to build and configure cloud software.

We started the Pulumi project to solve this very problem, so I'm admittedly biased, and I hope you forgive the plug -- I only mention it here because I think it contributes to the discussion. Our approach is to simply use general purpose languages like TypeScript, Python, and Go, while still having infrastructure as code. An important thing to realize is that infrastructure as code is based on the idea of a goal state. Using a full blown language to generate that goal state generally doesn't threaten the repeatability, determinism, or robustness of the solution, provided you've got an engine handling state management, diffing, resource CRUD, and so on. We've been able to apply this universally across AWS, Azure, GCP, and Kubernetes, often mixing their configuration in the same program.

Again, I'm biased and want to admit that, however if you're sick of YAML, it's definitely worth checking out. We'd love your feedback:

- Project website: https://pulumi.io/

- All open source on GitHub: https://github.com/pulumi/pulumi

- Example of abstractions: https://blog.pulumi.com/the-fastest-path-to-deploying-kubern...

- Example of serverless as event handlers: https://blog.pulumi.com/lambdas-as-lambdas-the-magic-of-simp...

Pulumi may not be the solution for everyone, but I'm fairly optimistic that this is where we're all heading.

Joe

ff_ · on Feb 8, 2019

This is a great analysis, but it's missing a fundamental point: why do we have a problem with these approximations of a programming language or just using a programming language to template stuff?

Because your build then becomes an actual program (i.e. Turing complete) and you have to refactor and maintain it! This is the common problem of using a "programming language as configuration" (e.g. gulp?)

Dhall solves exactly this problem: https://dhall-lang.org

It has the same premises of Pulumi, but without the Turing completeness (I don't know if/how Pulumi avoids that, but if it does it should be part of the pitch), so you cannot shoot yourself in the foot by building an abstraction castle in your build system/infrastructure config.

We use it at work to generate all the Infra-as-Code configurations from a single Dhall config: Terraform, Kubernetes, SQL, etc.

And there is already an integration with Kubernetes: https://github.com/dhall-lang/dhall-kubernetes

solatic · on Feb 8, 2019

> We use it at work to generate all the Infra-as-Code configurations from a single Dhall config

This is the key bit and not something which is pitched well enough from the Dhall landing pages: using straight YAML forces you to repeat yourself in multiple areas for each Individual tool being used, and these repetitions have to stay consistent across multiple tools. What Dhall does is allow you to write a single config and use it to derive the correct configurations for each tool that you use. So you can write a single configuration file from which, eventually, every single part of your system is derived - Terraform infrastructure, Kubernetes objects, application config, everything. When you pull it off, it's simply magical.

You can think of it like this: JavaScript is a horrible, no-good, very bad language, and yet all browser programming is done in JavaScript because every browser supports it - so too, are JSON and YAML horrible configuration languages. But JavaScript gave rise to abstractions like TypeScript which are much better languages which compile down to JavaScript for compatibility. TypeScript is to JavaScript what Dhall is to JSON and YAML - the fact is, pretty much everything is configured with JSON and YAML, and Dhall makes it much, much easier to live in that world, with no need for the systems being configured to support it.

Considering the relative obscurity of Dhall, it's basically the best-kept secret in the DevOps world right now, and it's a shame more people don't know about it.

metafunctor · on Feb 8, 2019

Dhall appears to be expressive enough that I can't see why you wouldn't have to refactor and maintain the Dhall code?

Writing Dhall code look exactly like programming to me, and the programmer must possess the necessary programming skills to produce good Dhall code. A random guy with a text editor will make an equal mess in Dhall as they would with a “real” programming language.

I don't see how the restrictions in Dhall really help much in this regard. Turing completeness feels like a red herring to me.

js8 · on Feb 8, 2019

Not a user of Dhall, just a fan, but refactoring of Dhall configuration should be extremely easy. You make a change, and your configuration stays the same, which is easy to verify. (Thanks to https://en.wikipedia.org/wiki/Normalization_property_(abstra... )

For TC languages, comparing if two programs (original and refactored) do the same thing is not solvable in general. If the language is not TC then it is more feasible.

metafunctor · on Feb 8, 2019

You can compare the outputs of two programs.

Sure, a TC program may not finish to produce output you can compare, but in my experience that's only a theoretical problem.

Gabriel439 · on Feb 8, 2019

You can do more than just compare the output of two programs in Dhall. You can verify using a semantic integrity check that two programs are the same for all possible inputs. For example:

  $ dhall hash <<< 'λ(x : Natural) → x + 0'
  sha256:986613701cf8cc883c2490af81d5fdcfb0f33f840870acaac21689f57c1baab6

  $ dhall hash <<< 'λ(x : Natural) → x'
  sha256:986613701cf8cc883c2490af81d5fdcfb0f33f840870acaac21689f57c1baab6

  $ dhall hash <<< 'λ(y : Natural) → y'
  sha256:986613701cf8cc883c2490af81d5fdcfb0f33f840870acaac21689f57c1baab6

The cryptographic hash is smart enough that many behavior-preserving changes don't perturb the hash.

js8 · on Feb 8, 2019

Actually with Dhall, you should be able to compare the programs themselves, even without full "input" (there is even example on the Dhall page, see "You can reduce functions to normal form, even when they haven't been applied to all of their arguments").

So you can for example leave some parameters out of your config and still validate the correctness of refactoring.

If you use general purpose programming language, then even comparing just output might be difficult - most languages allow to do I/O, so it's possible that the configuration is dependent on some side channel.

I would say if you are only using general language "sensibly" for configuration then you are effectively restricting yourself in the same way that Dhall does.

gdfasfklshg4 · on Feb 8, 2019

This sounds like a render test?

leg100 · on Feb 8, 2019

I don't get the problem with using a turing complete language to generate configuration. There's nothing wrong with maintaining and refactoring a program, that's a natural process for any program. If you don't want an infinite loop, don't write one, as you wouldn't in any other program. You can choose as much or as little abstraction as you so wish.

Give me a real language any day over dhall or jsonnet.

Gabriel439 · on Feb 8, 2019

This explains the disadvantages of using a general-purpose programming language as a configuration language:

https://github.com/dhall-lang/dhall-lang/wiki/Safety-guarant...

ithkuil · on Feb 8, 2019

FWIW jsonnet is a "real" language. It's a dynamically typed, lazily evaluated purely functional programming language).

leg100 · on Feb 8, 2019

Fair enough. I should have said "general purpose language" rather than "real", which makes for flame-bait.

ithkuil · on Feb 8, 2019

I once built a mandelbrot fractal renderer which emitted a data-URL encoded PNG string to stdout in BCL (a spiritual predecessor of Jsonnet @ Google).

Yeah, I know what you mean. It lacks generic input/output, you cannot read write arbitrary files and perform arbitrary network requests etc.

I do like that restriction in the context of managing configuration systems, because it allows you to build hermetic evaluations.

With kubecfg we added the ability to import from URLs, which I wish was available out of the box in jsonnet.

justincormack · on Feb 8, 2019

This is how Lua started, as a config language, but it gradually added more features that people found useful in config, and became Turing complete.

icebraining · on Feb 8, 2019

Lua was TC from the start, it came with the procedural concepts from Modula - if/while/repeat - and functions.

justincormack · on Feb 10, 2019

I meant SOL, the predecessor, before Lua proper.

grumpyprole · on Feb 9, 2019

What's so bad about Turing Completeness? I haven't a decent look at Dhall, but I'm betting I could probably write an exponential Dhall program that won't terminate in the lifetime of the universe.

The real reason for giving up Turing equivalence was probably to get dependent types. This gives very powerful static guarantees, including the presence/absence of fields under non-trivial record operations such as merge. In using dependent types, they have also had to give up significantly on type inference, which is really going to annoy the average JavaScript/Ruby programmer.

TheCoelacanth · on Feb 8, 2019

> you have to refactor and maintain it

You already have to do that, so why not do it in a reasonably powerful language?

ff_ · on Feb 8, 2019

Here's a nice explanation on why using "reasonably powerful languages" has many disadvantages: https://github.com/dhall-lang/dhall-lang/wiki/Safety-guarant...

Also you might be familiar with the Rule of Least Power: https://en.wikipedia.org/wiki/Rule_of_least_power

rauhl · on Feb 8, 2019

> My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years.

I think that you’re right, and I think it’s great, because we have a programming model in which code is data and data is code: Lisp & S-expressions.

It’d be downright awesome to have a Lisp-based system which used dynamic scoping to meld geographical & environmental (e.g. production/development) configuration items. But then, it’d be downright awesome if the world had seriously picked up Lisp in the 80s & 90s, and had spent the last twenty years innovating, rather than reïnventing the wheel, only this time square-shaped. But then, the same thing could be said about Plan 9 …

I’ve not yet had the time to take a look at Pulumi, but I hope to have time soon.

ajross · on Feb 8, 2019

> I think that you’re right, and I think it’s great, because we have a programming model in which code is data and data is code: Lisp & S-expressions.

"Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of CommonLisp."

http://wiki.c2.com/?GreenspunsTenthRuleOfProgramming

Seriously, this has happened again and again and again. You have software, so you configure it via a clean and simple text syntax, then the configuration needs to be generated and the syntax becomes more complicated, then the next system you do has an "API" instead so you can configure it via programming, which is too complicated so the next time you Do it Right and go with a simple text file, which is then outgrown when the configuration it stores becomes too complicated...

It's like a circle of life thing.

aaaaaaaaaaab · on Feb 8, 2019

And people are vehemently agreeing/disagreeing depending on their phase shift in the Turing complete vs declarative carousel.

Compare with: strongly vs weakly typed languages

marcosdumay · on Feb 8, 2019

That saying was very true of Fortran, reasonably true of C, and mostly don't happen on newer languages.

mschaef · on Feb 8, 2019

I think the parts of Lisp that tended to be rebuilt have mostly been incorporated into the newer languages. (At least, it's been a very long time since I've had to rewrite a fundamental data structure, etc.)

erikpukinskis · on Feb 8, 2019

You don’t need code-is-data for what your parent is describing. All you need is code that outputs data. Or even better, code that initiates contact with other code.

The only requirement is a commitment to doing things imperatively in a real programming language. It’s hard to resist the temptation to do things declaratively (because it’s easier to imagine a declarative interface that describes your problem than an abstraction of the procedure which will solve it) but you are never forced to.

pjc50 · on Feb 8, 2019

As the kids say: stop trying to make Lisp happen, it's not going to happen.

It has become yet another community that's fighting a struggle that everyone else ended years ago, like the few Japanese in jungles who refused to surrender. I'm not entirely sure why it's not been adopted, but I suspect it's because most people strongly prefer (a) visually semantically different scope delimiters and (b) function-outside-brackets syntax ie f(a, b) rather than (f a b).

Or you could go the other way and say that JSON is s-exps with curly brackets so it should be made executable as such, and build that language.

eadmund · on Feb 8, 2019

> As the kids say: stop trying to make Lisp happen, it's not going to happen.

That's probably true, but I think it's useful to fight the good fight regardless. Even if Lisp & s-expressions don't, in fact, take over the world (and I think they will), arguing in their favour might help increase the chance that whatever inferior technology does end up getting adopted is better than it could have been.

> Or you could go the other way and say that JSON is s-exps with curly brackets so it should be made executable as such, and build that language.

The problem is that without symbols, that ends up being hideously ugly. This:

    ["if",
     ["<", 1, 2],
     "less than",
     "greater than or equal to"
    ]

is appreciably worse than:

    (if (< 1 2)
        "less than"
        "greater than or equal too")

And alternatives like:

    {"if": [[1, "<", 2], "less than", "greater than or equal to"]}

are so much worse that I don't think anyone could seriously expect to use them.

mschaef · on Feb 8, 2019

> It has become yet another community that's fighting a struggle that everyone else ended years ago, ... like the few Japanese in jungles who refused to surrender.

Nice imagery, but the wrong point.

Except for the syntax, everybody else joined Lisp.

"We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." --Guy Steele

Flash back to the mid-1980's (when the mainstream was C, Pascal, BASIC, FORTRAN, COBOL, etc.) and it's Lisp/Scheme (and Smalltalk) that have features like Garbage Collection, interactive development, lexical closures, decent built-in data structures, dynamic typing.

The fact that all of this is commonplace today, both justifies a lot what Lisp did in the first half of its existence and undermines its (technical) competitive advantages now.

> but I suspect it's because most people strongly prefer (a) visually semantically different scope delimiters and (b) function-outside-brackets syntax ie f(a, b) rather than (f a b).

It's not technical. I don't think it ever was. So much of it is around social concerns: a performance stigma dating back to the 1970's, fear of being able to hire people to do the work, fear of what VC's will think, worries that the language will still be available... And then at the end of the day, the problems whatever language will solve are a tiny fraction of the overall problem of doing something relevant and lasting and useful to others.

> As the kids say: stop trying to make Lisp happen, it's not going to happen.

Life is too short and the world is too big to try to confine other people's ideas of how they should think or work.

The point of the market economy and of the scientific process is that people get to try what they think is going to be useful and then let the world decide. The fact that Lisp is still in the conversation at all, when its contemporaries (Autocoder, Fortran) either aren't or are highly specialized, says a lot that we can learn from.

kkarakk · on Feb 8, 2019

>As the kids say: stop trying to make Lisp happen, it's not going to happen.

Mean girls came out in 2004, no kid knows that movie

lenkite · on Feb 8, 2019

Oh my! So web-assembly is not 'happening' then ? May it REST in peace.

okhudeira · on Feb 8, 2019

I think what you're doing with pulumi is the right answer and it's only a matter of time before this becomes the norm. The author's examples could easily be done with plain ol' JS/ES/TS with more far more extensibility and customization when the need arises.

I also feel this is where JSX got it right. Instead of creating yet-another-templating-language (looking at you Angular!), they used JavaScript and did a great job of outlining how interpolation works. Any new templating language is always going to be missing some key feature you expect out of a general programming language and your customers will continue to ask for more features.

Take for example Terraform and HCL, they're continually adding more and more [templating features](https://github.com/hashicorp/terraform/blob/master/website/d...) and [functions](https://www.terraform.io/docs/configuration/interpolation.ht...) because there's so many different ways to skin configuration/infrastructure as code. What if TF just expect a "computed" JSON object and it's left up to the developer to figure out how to put it together?

I'm gonna keep an eye on Pulumi and hope to be able to use in a real project soon.

jstrat · on Feb 8, 2019

Crazy idea, but couldn't we use JSX for configuration?

Paired with Typescript, we would have the clearness of a declarative language, with the power and flexibility of a real language that is also easy to extend and navigate.

As a bonus, most tooling already exists.

Q6T46nT668w6i3m · on Feb 8, 2019

You just invented XML!

akdor1154 · on Feb 8, 2019

This is sitting right on the genius/insanity border.

ranyefet · on Feb 8, 2019

I had the exact same idea! Does something like that exists?

int_19h · on Feb 8, 2019

In .NET land, there's Razor, which was designed from get go to mesh well with C# syntax such that you need a minimal amount of control characters:

https://docs.microsoft.com/en-us/aspnet/core/mvc/views/razor...

Waterluvian · on Feb 8, 2019

In ROS we have these XML launch files that are just awful. They have enough features to be a really bad programming language for configuring and launching (often conditionally) numerous robot software nodes.

In ROS2 the launchfile can now just be a Python script. Very much learned all this the hard way and the solution was to just support Python. I think it's brilliant.

pjmlp · on Feb 8, 2019

AOLServer and our own Tcl based application server also used this idea.

Configuration files for each component were a DSL made of Tcl functions. Each module just sourced the respective file on load.

sametmax · on Feb 8, 2019

There are several possible situations:

- the django like situation: the configuration is pure code, and it's a mistake. It was not necessary, it brought plenty of problems. I wish they went with a templated toml file.

- the ansible like situation: the configuration is templated static text. But with something as complex as deployment, they ended up adding more and more constructs, until they created a monstrous DSL on top of a their implementation language, with zero benefits compared to it and plenty of pitfalls. In that case, they should have made a library, with an API and documentation making an emphasis on best practices.

- and of course a big spectrum between those

The thing is, we see configuration as one big problem, but it's not. Not every configuration scenario has the same constraints and goals. Maybe you need to accept several sources of data. Maybe you need validation. Maybe you need generation. Maybe you to be able to change settings live. Maybe you need to enforce immutable settings. Maybe you need to pub sub your settings. Maybe you need to share them in a central place. Maybe they are just for you. Maybe you want them to be distributed. Maybe you need logic. Maybe you want to be protected from logic. Maybe the user can input settings. Maybe you just read conf. Maybe you generate it.

So many possibilities. And that's why there is not a single configuration tool.

What you would need, is a configuration framework, dealing with things like marging conf, parsing file, getting conf from the network, expressing constraints, etc.

But if you recreate a DSL for your config, it's probably wrong.

Scarblac · on Feb 8, 2019

In defence of Django, the way settings.py works has been very stable for the entire lifetime of Django.

It may have its problems (I don't have many issues with it) but it doesn't seem to have this problem of attracting ever more layers of abstraction on top of it. It works.

sametmax · on Feb 8, 2019

Actually, I think settings.py is not a bad idea, but it's half backed.

There should have a schema checking the setting file. There should have a better way to extend settings, and make different settings according to context, such as prod, staging or dev.

There should be a linter avoiding stupid mistakes like missing a coma in a tuple, resulting in string concatenation.

There should be variables giving you basic stuff like current dir, log dir, var dir, etc. We all make them anyway.

And there should be a better to debug the import settings problem.

But all in all, it's quick and easy to edit, and very powerful.

devxpy · on Feb 8, 2019

There is already a mechanism to validate the settings.py file inside django.

The different context stuff can be handled by using env vars, and a nice python wrapper, like python-decouple.

sametmax · on Feb 8, 2019

> There is already a mechanism to validate the settings.py file inside django.

It's not exposed, but it's very limited.

> The different context stuff can be handled by using env vars, and a nice python wrapper, like python-decouple.

It's just one of the way to do it. Go to a new project, they use a different way. The main benefit of Django is the fact that a Django project is well integrated, and you find similar conventions and structure from project to project, allowing to reuse the skill you learned and build an ecosystem of pluggable app.

devxpy · on Feb 8, 2019

Just so we're on the same page, this is the validation I was referring to -

https://docs.djangoproject.com/en/2.1/topics/checks/

Standardization is always an issue, I guess. Env vars seem to be the norm in the community in my experience, whatever that's worth..

sametmax · on Feb 9, 2019

Ah the stuff used for the password ?

I would be more of a fan of something like marshmallow, checking the whole thing.

marcosdumay · on Feb 8, 2019

> it brought plenty of problems

Does anybody here personally suffered those problems that the Turing complete Django configuration creates? (I mean, not the ones caused by lack of a completness checks, or good library support, but the ones caused by too much power.)

If so, how do those problems look like?

sametmax · on Feb 8, 2019

Now that you say it, it's true I didn't have problems with too much power.

I never had an untrusted party editing my config, nor did I use data from any.

Also, you can make the same mistakes in the setting file that in any code file, but it's not more or less important.

In fact, all the problem I had could have been solved by better integration: solving the import problem, making composition easy, adding checks, allow loading data from several sources and merge them, presenting them in a unify interface.

If I'm being honest, problem with settings.py may have not been that it's Python, but that it's a flat file with no strong conventions, tooling or best practices.

I could raise the issue that you can't read the config from another language, but I never had to, and good tooling would allow a synced export or an API to consume the settings.

Same for writing, or live settings.

svsucculents · on Feb 8, 2019

After years of working with cfengine then ansible I finally went to a bespoke BSD ports work alike with optional client/server and json configuration components. Never looked back.

sametmax · on Feb 8, 2019

What does it look like ?

svsucculents · on Feb 9, 2019

RCS stored directory based modules with tasks in subdirectories. Make or shell script style module execution as part of each task dir + variable files containing settings for the install task. Json configuration files that define all necessary module params (ex:log, task selection, stop on error, initialization, build command per task, etc...) remote scheduling of module/task execution via per agent sysv ipc command queue serviced by a JSON-RPC microsvc which allows both serialized and non blocking task scheduling by queue priority.

star-trek-fleet · on Feb 8, 2019

I owned the majority of the configuration system and ecosystem for Borg, Google's internal cluster management and application platform.

Unfortunately, what described here is good in many level, but not excellent in any.

If you are OK to describe the complexity of your infrastructure in a programming language similar to the general purpose language, then a well abstracted API built on original APIs from cloud providers are more familiar to devs. And it will be more reliable performance and flexible.

If you want a config experience, something like kustomize is leaner and more compatible with the text config model.

I also cannot see how this interoperate with other tools, which will seriously limit it's appeals to people using other tools.

devxpy · on Feb 8, 2019

The problem with code as configuration is that the config file is indeterministic and it takes longer to extract information from the file.

This has long been a problem in the python/pip community, as its basically impossible for the build tools to determine the dependencies of a package without fully downloading and running the setup.py file.

Static config files are static for a reason!

makapuf · on Feb 8, 2019

Unless you import rand() your code should be deterministic. You're right about needing to run the thing to get the data (that's the point) but there is a middle ground between pure literals and fully side effects code. By example you could impose pure functions (no side effects).

lmm · on Feb 8, 2019

That's exactly what Dhall is doing.

marcosdumay · on Feb 8, 2019

That's what Haskell already does. Dhall is optimizing on different dimensions (making sure the script execution ends, making the scripts verifiable at static time, making it convenient to marge files, making it convenient to centralize your configuration).

houqp · on Feb 8, 2019

As a happy pulumi user, I have to say I am very impressed with the experience. An order of magnitude improvement on maintainability over our old terraform code base. Highly recommended.

tokyodude · on Feb 8, 2019

This is my experience and it's clearly biased from maybe one bad example but ... Scons is an example of code over configuration and from what I could tell I never met someone that truly understood it. Because it was code over configuration every programmer added their own interpretation of what was supposed to happen, no programmer truly understood what was really going one and it turned into one giant mess of trying to understand different programmers hacks and code to get the build to work. I'm sure some Scons expert will tell me how I'm full of crap but I'm just saying, that's my experience.

So, what's my point? My point is configuration languages help in that they push "the one true way" and help to enforce it. Sure there are times you end up having to work around the one true way but given very powerful tools of a full language for configuration leads to chaos or at least that's my experience. Instead of being able to glance at the configuration and understand what's happening because it follows the one true way you instead end up with configuration language per programmer since every programmer will code up stuff a different way.

eropple · on Feb 8, 2019

For what it's worth--I've been using Pulumi on a couple of different projects and, today, I couldn't imagine starting a cloud-based project on anything else. The Pulumi team has spent more time than almost anybody I know on understanding how to attack these problems; I guess I have a bit of an understanding of just how much work that is, as I've tried to do the same thing and their solution is better.

I appreciate that their revenue model doesn't require making the open-source version frustrating or stupid and I appreciate that they're incredibly responsive. And some of the stuff you'll see around cloud functions/Lambdas and the deployment thereof will fucking blow your mind.

It's good. You should strongly consider it.

kross · on Feb 8, 2019

I have been using ksonnet but that is now officially dead. Working with jsonnet seemed unnecessarily painful when coming from coding typescript. This information is quite timely and welcome, I'll look further at the ts example.

joeduffy · on Feb 8, 2019

We have ksonnet expats on the team (we're all in cloud city -- Seattle), and I've been keeping an eye on that project myself, since I think it got a lot of things right and frankly many of the ideas for Pulumi were inspired by early chats with the Heptio team. But, as you say, why create a new language when an existing one will do -- that was our original stance and it's working great in practice.

Joe Beda will be doing a deep dive on Pulumi on the TGIK videocast tomorrow, so it's a timely opportunity to check it out: https://twitter.com/jbeda/status/1092963296565587969

jaxxstorm · on Feb 8, 2019

OP here. I actually wrote a post about Pulumi in this very space a while back

https://leebriggs.co.uk/blog/2018/09/20/using-pulumi-for-k8s...

I do think this is more like what we should be doing, but as dismayed to see Pulumi’s free tier get sunsetted

joeduffy · on Feb 8, 2019

Our free tier is still there and here to stay. What did we do to make you think it's been sunsetted? :-(

jaxxstorm · on Feb 8, 2019

Oh! I don’t know where I got that impression from then! perhaps I just thought that we couldn’t use the free tier because of the number of licenses we’d need, but you’re right, it’s still there!

freddie_mercury · on Feb 8, 2019

Build files (e.g. makefiles are their various descendants like SCons, rake, etc) seem to be in the same general boat except very early on mixing "real languages" (or at least shell scripting) was obviously allowed so they've always leaned far more towards the "yes, it is a general purpose language" end of the spectrum.

arvinsim · on Feb 8, 2019

> My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML aren't sufficiently expressive -- lacking for loops, conditionals, variable references, and any sort of abstraction -- so, of course, we add templates to it. But as the author (IMHO rightfully) points out, we just end up with a funky, poor approximation of a language.

This is the why I prefer to use a JS file for configuration instead of native JSON or YAML file if those options are available.

jonreem · on Feb 8, 2019

Also see `webpack` as a successful example of code-as-configuration in the wild.

h1d · on Feb 8, 2019

Not sure if it was successful when people call it a hell to maintain and newer simpler alternatives like Parcel is gaining popularity.

zamalek · on Feb 8, 2019

I still don't know how to get it to do exactly what I want. There is far too much magic involved, and experience has long demonstrated that magic is bad (Webpack confirms that belief).

That being said, the concept of defining a function in, essentially, a config file seems like a step in the right direction. I don't think I'd trust that functionality outside of builds or infra-as-code, though.

kakarot · on Feb 8, 2019

What's magic about webpack? The online documentation provides quite a lot of insight into how it all fits together.

It probably only seems like magic because you didn't build a fundamental understanding of how it works before using it. I use some massive webpack configurations and I understand them all quite thoroughly thanks to well-written, modularized configuration files.

ex3ndr · on Feb 8, 2019

For 10 years of Java/Android/Scala coding there was no need to understand how compilers combine everything into one JAR.

kakarot · on Feb 8, 2019

Javascript is a scripting language without native module support. That isn't Webpack's fault.

Webpack also handles much, much more than just Javascript. It handles CSS, HTML, images, files, pretty much any kind of asset. Java/Scala doesn't have anything like that. Asset management is completely different due to the nature of how assets are transferred to the client.

And Android? Give me a break. The moment you stray from the strict layout of an Android app you run into a wall and have to learn how Gradle operates. This strict layout is good for some but others hate when an environment forces particular constraints upon them.

Webpack is completely configurable at every stage, works with plugins (which compilers don't do) and again, isn't magic. Not knowing how something works doesn't make it magic. That's not what magic means with respect to code.

Besides... Maybe if you just like getting by, you can program in C/Java/etc without learning about compilers. Web dev is fucked and transpiler knowledge is basically required, but sure you can get by in other domains without it. But if you want to be a good programmer, an expert at what you do, someone who lives and breathes and understands computer science, someone who will excel in his career and not remain a code monkey forever... You have to learn about how your compilers work just like you should know how the silicon in your computer is doing its own "magic".

STRML · on Feb 8, 2019

It was very successful. Complicated projects require complicated build config. Parcel does fine for simple projects, but lacks the raw power & configurability of webpack.

Webpack now does simple config as well with the 'mode: "production"' and 'mode: "development"' presets.

W1ndRunner · on Feb 8, 2019

Hi, is Pulumi an generalized AWS CDK(https://github.com/awslabs/aws-cdk/blob/master/examples/cdk-...)? Looks pretty similar :D

jillesvangurp · on Feb 8, 2019

Having dealt with puppet, cloudformation, ansible and other solutions that have gone in and out of fashion and also dealing regularly with Kotlin, Java, Javascript, and recently typescript, my view is that configuration files are essentially DSLs.

DSLs ought to be type safe and type checked since getting things wrong means all kinds of trouble. E.g. with cloudformation I've wasted countless hours googling for all sort of arcane weirdness that amazon people managed to come up with in terms of property names and their values. Getting that wrong means having to dig through tons of obscure errors and output. Debugging broken cloudformation templates is a great argument against how that particular system was designed. It basically requires you know everything listed ever in the vastness of its documentation hell and somehow be able to produce thousands of lines of json/yaml without making a single mistake, which is about as likely as it sounds. Don't get me started on puppet. Very pleased to not have that in my life anymore.

On a positive note, kotlin recently became a supported language for defining gradle build files in. Awesome stuff. Those used to be written in Groovy. The main difference: kotlin is statically compiled and tools like intellij can now tell you when your build file is obviously wrong and autocomplete both the standard stuff as well as any custom things you hooked up. Makes the whole thing much easier to customise and it just removes a whole lot of uncertainty around the "why doesn't this work" kind of stuff that I regularly experience with groovy based gradle files.

Not that I'm arguing using Kotlin in place of Json/yaml. But typescript seems like a sane choice. Json is actually valid javascript, which in turn is valid typescript. Add some interfaces and boom you suddenly have type safety. Now using a number instead of a boolean or string is obviously wrong. Also typescript can do multi line strings, comments, etc. and it supports embedding expressions in strings. No need to reinvent all of that and template JSON when you could just be writing type script.

I recently moved a yaml based localization file to typescript. Only took a few minutes. This resulted in zero extra verbosity (all the types are inferred) but I gained type safety. Any missing language strings are now errors that vs code will tell me about and I can now autocomplete language strings all over the code base which saves me from having to look them up and copy paste them around. So no pain, plenty of gain.

And yes, people are ahead of me and there are actually several projects out there offering typescript support for cloudformation as well.

crooked-v · on Feb 8, 2019

To go with your general line of thought, see how many JS-based projects are increasingly moving towards a JS file with a default export as a config file.

Aeolun · on Feb 8, 2019

This looks absolutely great! I’ll give this a thorough look over for our coming API development/deployment.

Thanks for plugging!

saberience · on Feb 8, 2019

Are you going to add C# support?

joeduffy · on Feb 8, 2019

Definitely. I was a part of C# in the early days, so little else would make me happier than awesome class .NET support. This'll be great for Azure folks -- who knows, PowerShell too?

We are actively working on https://github.com/pulumi/pulumi/issues/2430, which will make it easier for our small team to manage multiple languages. Once that lands, I would expect this to be high priority.

Some of our amazing community members have been prototyping this, and it's looking pretty promising: https://twitter.com/MikhailShilkov/status/109278757393889689....

Joe

chokolad · on Feb 8, 2019

> Definitely. I was a part of C# in the early days, so little else would make me happier than awesome class .NET support. This'll be great for Azure folks -- who knows, PowerShell too?

Powershell would be great, it has nice support for building DSLs.

BossingAround · on Feb 7, 2019

I know I'm in a minority, but I really dislike YAML... I recently did a lot of Ansible and boy, at the beginning, I was just struggling a lot. Syntactic whitespace kills me.

I don't like it in Python either, but for some reason, when I write Python, it's a lot easier. Maybe YAML is just a bit more complex (and Python has better IDE support..?)

ravenstine · on Feb 8, 2019

> Syntactic whitespace kills me.

Okay, I'm gonna be the asshole in the room, but how hard is it to just use consistent indentation? I can't count how many times I've heard people complain about significant whitespace in languages.

Not only is it not difficult to begin with, but every code editor and IDE will show you where there's a syntax error in your YAML. People are free to dislike YAML, even for its significant whitespace, but how does it "kill you"?

Look at this example from the article:

```

something: nothing

  hello: goodbye

```

This is pure sloppiness, and anyone who has trouble carelessly adding pointless bytes to code, no matter the language, is sloppy. I don't understand why people criticize YAML and Python because "whitespace is hard".

P.S.: There's a similar configuration language called ArchieML, which is similar to YAML but doesn't have significant whitespace.

http://archieml.org

pjc50 · on Feb 8, 2019

Three big things that annoy me even though I'm happily writing Python:

- "cut and paste and edit" is broken. You can't autoformat the pasted code into the right place, you have to go back and fix the whitespace. Since whitespace is semantically significant, this can introduce bugs.

- visually identical whitespace may not be textually identical whitespace. Unless you go around breaking the tab key off your colleague's keyboards you'll trip over this. Especially (again) if you paste. Occasionally seen in merges too.

- editors can no longer give you 100% correct indentation.

giancarlostoro · on Feb 8, 2019

> - "cut and paste and edit" is broken. You can't autoformat the pasted code into the right place, you have to go back and fix the whitespace. Since whitespace is semantically significant, this can introduce bugs.

Depends on how your editor is configured / it's feature set. Which makes me wonder how editorconfig would handle this when enabled. It seems like a insignificant issue to me, you can auto-PEP8 the code before pasting it. You should probably be following PEP8 anyway (as far as spacing is concerned at least).

> - visually identical whitespace may not be textually identical whitespace. Unless you go around breaking the tab key off your colleague's keyboards you'll trip over this. Especially (again) if you paste. Occasionally seen in merges too.

I turn on show all whitespace on my editors regardless of programming language. I've been burned by Sublime Text not just figuring out the already defined whitespace ruleset for a file by what it's using and just shoving in it's own defaults. I wish all editors would base whitespace on what the file's structure looks like, if there's mixed spaces, give me a warning.

> - editors can no longer give you 100% correct indentation.

I don't understand this, it sounds like you've got your editor configured poorly or something? But it goes back to how unintuitive the nice editors can be. You can use editorconfig to define the indentation project wide, then any editor should pick it up, of course if you define PEP8 at a minimum it guarantees spacing settings.

I'm not sure if PyCharm covers a few of those cases, since I use it so seamlessly I don't usually have complaints.

scarface74 · on Feb 8, 2019

I’m on the opposite end. I just had to export a JSON based AWS CodePipeline configuration and had a hell of a time trying to edit it and paste things in the right place.

I ended up converting it to yaml, making the edits and converting it back to JSON.

Before anyone asks the obvious, how do I handle deeply nested code in brackets. Simple, I don’t. When things start getting nested deeply, I use my IDE to Extract Method.

fulafel · on Feb 8, 2019

In the YAML case: It's hard if you don't have editor support and good diagnostics. Not because you're unusually sloppy, but because you make human mistakes and because you don't know the syntax. (YAML syntax is surprisingly complex and poorly documented in the pedagogic sense). Also, the edit-debug cycle is slow with Ansible or YAML-using CI systems, so this is doubly painful.

In the Python case it's much better, because people less often casually edit .py files without editor support, and because Python has good diagnostics and it's much much harder to produce syntactically correct but semantically wrong Python by whitespace mixups.

kakarot · on Feb 8, 2019

Everything is hard when you don't have editor support and good diagnostics. Don't blame YAML because you prefer to use Notepad.exe

However, missing/extra whitespace is not "hard". You would be docked points in an English paper and you should be docked points as a programmer.

So, whitespace aside... Tell me what is easier to edit without built-in syntax support: JSON, or YAML?

If we define "easy" as "how long it takes to complete a task" or "how quickly you can grok the structure of a given block of code", then YAML beats out JSON every time.

fulafel · on Feb 8, 2019

I see you restate your argument for clarity, let me try the same :)

1) YAML is a configuration file format, and it's targeting user groups and environments where people use ad hoc terminal based or os-bundled editors, such tools being nano or Notepad, and such users being sysadmins for example. 2) YAML implementations (=parsers) have poor diagnostics compared to Python, separate from the editor issue, and 3) YAML syntax is more prone than Python to parsing correctly but producing unwanted semantics when you make a mistake.

I think there is value in your English paper analogy: many/most people editing YAML files don't know YAML syntax very well compared to this scenario. If their knowledge of English was at the same level, misplaced whitespace would not be chief of their problems in a graded English paper.

It is of course a structurally valid (philosophically consistent) argument that people should not make mistakes and they should suffer when they do, but this goes generally against the consensus of configuration language usability thinking.

kakarot · on Feb 8, 2019

In my opinion no one should be using Notepad for programming work or configurations that are more than 1-2 dozen lines. Nano is about the same: It's a text editor with no inherent tooling for configuration files and syntax support.

A construction worker can't complain that nails are hard to use because they showed up to work with a baseball bat. Or that they're designed badly because they brought a soft aluminum hammer with a tiny head instead of one made with a stronger metal and large impact surface. Tooling is important. Vim and several graphical editors have syntax support. Notepad++ if you're on Windows.

> YAML syntax is more prone than Python to parsing correctly but producing unwanted semantics when you make a mistake.

If you made a mistake, you made a mistake. Why do you expect a program with a mistake to work correctly? Use tooling which prevents you from making mistakes. And the particulars of YAML semantics are orthogonal to how your editor handles it. Yes = True, No = False, etc., for better or worse, but that's got nothing to do with your editor.

> many/most people editing YAML files don't know YAML syntax very well compared to this scenario. If their knowledge of English was at the same level, misplaced whitespace would not be chief of their problems in a graded English paper.

I wholeheartedly agree. So if a programmer complained to me that they were having issues related to inconsistency with whitespace, I would be suspect of their general programming abilities and would start reading their code to determine if the problem lies deeper than just getting an extra space here or there: Incorrect tooling, linting, sloppiness, inattention to detail... All of these things get in the way of well-written software.

As for whitespace in general, and the fact that it's harder for linters to determine and highlight if a block is correctly scoped without enclosures... Python and others have this same issue.

> It is of course a structurally valid (philosophically consistent) argument that people should not make mistakes and they should suffer when they do, but this goes generally against the consensus of configuration language usability thinking.

True, and I agree. Everyone makes mistakes even with things as simple as rote data entry. This is why tooling is incredibly important.

Tightrope-walking at great heights is incredibly dangerous. Practitioners accept this danger. They typically wear harnesses to mitigate the danger of falling. Of course, some people like to live on the edge and set records involving no harnesses. If someone like Dean Potter fell while walking a tight-rope freeform with no harness and plunged to their death, their last thought wouldn't be, "Shit, I knew that tightrope was poorly designed and dangerous," it will be "Shit, I wish I'd been wearing a harness."

We can't remove our harnesses and then complain that mistakes are too frequent and costly.

marcosdumay · on Feb 8, 2019

Editing JSON is ok without specific format support, it just looks like any other C-like language. Editing YAML is basically impossible without specific support, your editor will almost certainly break any file you open and destroy relevant information on the process.

kakarot · on Feb 8, 2019

Care to elaborate? Your statement is hollow on its own.

How does your editor destroy information? What kind of information is destroyed? Why is your editor rearranging bits in validly encoded text files?

marcosdumay · on Feb 8, 2019

Most programming editors rearrange the white space of the files they open. Some do it more, some do it less.

Rearranging white space in a YAML file often destroys information.

kakarot · on Feb 8, 2019

Mine do no such thing. The only whitespace that gets stripped in /any/ editors I have are trailing whitespace and extra whitespace before the EOF, and that's only in certain IDEs where I have consciously enabled these options. They are disabled by default.

Removing trailing whitespace should never change the logic of a file in general, but as for YAML it certainly doesn't. And editors should never remove leading whitespace... who does that?

Can you name an offending editor as an example?

marcosdumay · on Feb 10, 2019

Press tab on a line in emacs, and the whitespace will get rearranged. It's more explicit in vi, but don't bother (un)indenting blocks there either.

Just writing characters anywhere a file on the MS IDEs I've tried is enough to rearrange the line's whitespace, while the Jetbrain's I've tried are more conservative and won't break lines you haven't changed somehow.

kakarot · on Feb 10, 2019

Ok, now show me a single editor that doesn't make whitespace changes when you press tab.

I've only ever had issues with vim messing up whitespace on the line I'm typing specifically with regard to YAML, and yes that's an issue but it has nothing to do with YAML. For example, Adding another colon to a string, wrapped in whitespace or not, will often reduce indentation. That's just plain bad behavior, but it's not intended behavior.

arh68 · on Feb 8, 2019

I largely agree with you, but:

How hard is it to use HN formatting? I can’t count how many times people screw it up.

It’s not difficult to begin with, the documentation is free, yet here I am reading your comment with broken formatting.

    something: nothing
      hello: goodbye

Anyone who has trouble with this is just being sloppy. No useless backticks! You might think you’re doing it right, but unless you check, maybe you’re not.

ravenstine · on Feb 8, 2019

lol Well played.

fnordsensei · on Feb 8, 2019

Have a look at the CodeDeploy appspec.yml specification for whitespace [1].

  …
  [4]mode:[1]mode-specification
  [4]acls: 
  [6]-[1]acls-specification 
  [4]context:
  [6]user:[1]user-specification
  …

"AWS CodeDeploy will raise an error that might be difficult to debug if the locations and number of spaces in an AppSpec file are not correct."

Great. There couldn't possibly be an easier format to use, could there?

[1]: https://docs.aws.amazon.com/codedeploy/latest/userguide/refe...

laumars · on Feb 8, 2019

In fairness, that documentation makes the process out to be far more complicated than it actually is in reality. Plus their point about errors being difficult to debug can be equally true with other data formats too (eg some JSON parses can throw really unhelpful errors if you accidentally include a comma at the end of list)

AnthonBerg · on Feb 8, 2019

Please consider simply believing and trusting those who tell you they hate significant whitespace and that it is a real impediment to work.

Another take, perhaps: Assigning deep semantic significance to invisible symbols is simply stupid. It is stupid to a much greater degree than wanting to be free from having to care about the amount of invisible symbols is “sloppy”.

SignsOfABeast · on Feb 8, 2019

YAML is a generic format which leaves formatting as a result of effect to you. Ansible puts rules on top of it, which makes intendation not always trivial, it is easy to have an dangling key value pair which doesn't cause an error, but has only an effect with the right intendation.

saagarjha · on Feb 8, 2019

> how hard is it to just use consistent indentation

It's pretty annoying when you don't have access to an IDE or decent editor.

atombender · on Feb 8, 2019

YAML is a bit bonkers in that it's a superset of JSON (all valid JSON is valid YAML), so if you don't like the whitespace sensitivity, you can write your YAML like this:

  {
    a: 42,
    # But you can have comments!
    b: "hello world",
    c: "and
      multi-line
      strings!",  # and trailing commas!
  }

crdoconnor · on Feb 8, 2019

I wrote a pared down version of YAML because while I like the basic structure I hated the complicated bullshit like the "we also parse JSON" layered on top:

https://github.com/crdoconnor/strictyaml

Worse than JSON though, is the Norway problem.

If you remove this stuff and start validating it properly it becomes much easier to maintain.

brownbat · on Feb 8, 2019

> the Norway problem

Huh:

https://hitchdev.com/strictyaml/why/implicit-typing-removed/

I've always liked YAML, it's always seemed pretty intuitive to me coming from Python, and I like human-readable resource files, but those are some pretty damning counterexamples.

int_19h · on Feb 8, 2019

JSON.NET has an insane default "smart deserialization" mode which checks if string values are valid ISO dates, and if so, deserializes them to DateTime. The result is that your typical unsuspecting app works fine for a long time, until the user just happens to throw data at it that has a date-like string in it somewhere - and so the app code gets a DateTime instance where it expected a string.

And depending on how exactly it was accessed, this can go two ways. The best case is that the app just gets the value via the untyped API, casts it to string, and blows up with an invalid cast - best because you actually know what went wrong.

The worst case is when the app specifically tells JSON.NET that it wants a string value (via generic type parameters), at which point it will helpfully implicitly convert the actual date value back to a string... except it can reformat it, and even helpfully adjust it from one timezone to another. Semantically it's the same date, of course, but it's not at all the same string, and sometimes that matters a lot. So this is the worst case because it's just silent data corruption.

For some mysterious reason, the author believes that this is acceptable default behavior - i.e. "it's a feature, not a bug". It's especially ironic to look at all the mentions in GitHub ticket, as various projects that rely on the library run into this issue (one of them is mine):

https://github.com/JamesNK/Newtonsoft.Json/issues/862

laumars · on Feb 8, 2019

I'll be honest, I didn't believe you at first so I went and tested some JSON against a few YAML parsers and it's completely true.

This is insane. I already knew writing a YAML unmashaller was a needlessly complicated affair - and that was before I realised this could happen.

tomrod · on Feb 8, 2019

Oh hey, a Python dictionary that throws an error. Just kidding -- faced a JSON parsing issue today and this made me smile.

It's a step up from XML though.

pjmlp · on Feb 8, 2019

For me lack of validation and comments are many steps down, not up.

zamadatix · on Feb 9, 2019

Lack of validation?

pjmlp · on Feb 9, 2019

JSON files don't have validation support like XML schema does.

mdaniel · on Feb 9, 2019

Maybe not built in like DTD, closer to XML Schema in its laid-on-top manner, but I would argue that JSON Schema is fairly good, and on its way to becoming an IETF standard: http://json-schema.org/ and https://tools.ietf.org/html/draft-handrews-json-schema-01 (et al)

The lack of obvious namespace management is suboptimal, but so far I thankfully haven't encountered a situation where it was a show-stopper.

IntelliJ and related tools also allow associating a json schema with a YAML file, which I have found infinitely handy

atombender · on Feb 11, 2019

JSON Schema is actually pretty great.

ilovetux · on Feb 8, 2019

I find YAML to be almost unusable. IMO it's just not intuitive. If I get to choose a format for my config files I would only use TOML, it's just better (again IMO).

skywhopper · on Feb 8, 2019

All these comments amuse me, because I feel the opposite. YAML has always made immediate intuitive sense to me. Meanwhile TOML feels like a terrible hack.

Also, I'm guessing I'm in a tiny minority who loves YAML but hates Python's semantic indentation...

imtringued · on Feb 8, 2019

I like YAML but it has some minor quirks and it feels overused in domains in which it simply doesn't make sense to use YAML. I can think of ansible or complex dynamic configurations that depend on external values as mentioned in the article. If simple merging of a base file + dev, staging, prod files isn't enough for the task at hand then YAML is a bad fit.

harshreality · on Feb 8, 2019

I'm the same way. I think it's a difference in my expectations of a programming language versus a hierarchical data storage format. I'm fine with (and even prefer) enforced whitespace in data formats. That makes it easier to view and edit.

In programming languages, it makes me twitch. I don't have any problem with "accepted" formatting styles (i.e. linux kernel c style), but for the language itself to enforce that for some reason feels like it's adding perpetual cognitive overhead (like whenever I use python). I don't know why; it shouldn't be any different than using a particular formatting style voluntarily in a more flexible language, but somehow, it feels different.

hliyan · on Feb 8, 2019

This is strange. The thing I most like about YAML is how intuitive and human readable it is.

afiori · on Feb 8, 2019

One thing that is rarely pointed is that one of the advantages of TOML is that it allows to write dictionaries/tables not as trees.

h1d · on Feb 8, 2019

Agreed. From readability perspective, I started out with INI, which I ditched partially due to having no standard on the format and skipped JSON when it can't add comments, skipped YAML for not looking too intuitive, considered JSON5 but skipped for being not popular enough and landed on TOML.

Pxtl · on Feb 8, 2019

And I skipped toml for an object-array syntax that makes baby Jesus cry.

None of these problems are hard, why are all the solutions so awful?

h1d · on Feb 8, 2019

What are you using now?

Pxtl · on Feb 8, 2019

XML, generally. I hate it but I've given up on its "successors".

pkulak · on Feb 8, 2019

Hocon for me, if I get to choose.

jrockway · on Feb 8, 2019

I don't mind YAML. I dislike that something things become strings and sometimes they become other types:

   foo: bar   # {"foo":"bar"}
   foo: "bar" # {"foo":"bar"}
   foo: 42    # {"foo":42}
   foo: "42"  # {"foo":"42"}

Other than that, no major complaints. My editor understands YAML and shows the indentation level in the background (highlight-indentation-mode) and auto-formats files so they all have consistent indentation (prettier-mode). As a result, it is not much of a nightmare to edit, despite the fact that semantic whitespace COULD cause you a lot of problems.

aldanor · on Feb 11, 2019

How about...

    foo: yes  # "foo": true
    bar: YEs  # "bar": "YEs"
    baz: YES  # "baz": true

jrockway · on Feb 11, 2019

Yeah, that's a little crazy. It's the classic case of in-band signalling. It never works. I wish quotes around strings were mandatory, then having 83 ways to say "true" would be OK. But when strings randomly get upgraded to other primitive types... it's a little weird.

Slippery_John · on Feb 8, 2019

I like looking at YAML when it doesn't use any of the insane YAML features. Even so I'm not convinced it should be used (at least not as widely as it is) for one big reason: it can be truncated almost anywhere and still be valid. This causes way more issues than you might think. JSON has no such issue - the only case I can think of where you can truncate JSON into valid JSON is if your JSON is just a number.

nerdponx · on Feb 7, 2019

The simple parts of YAML are great for simple applications. The complicated parts of YAML are not great for complicated applications.

bluedino · on Feb 7, 2019

>> I really dislike YAML

I wasn’t aware anyone liked it

h1d · on Feb 8, 2019

I have a feeling that it is/was preferred in rails/ruby community.

SamWhited · on Feb 7, 2019

Tools keep using it even where it's the wrong tool for the job, so someone must like it, surely?

I'm sure it will be just like XML where it's the trendy thing for a while in the early days, then everyone stops and hates it for a while. Except XML at least has a handful of applications where it's the right tool for the job (it has a nice streaming mode), YAML doesn't even have that.

hazz99 · on Feb 8, 2019

YAML is great for simple human-editable configuration files. Its very easy to write, and can be picked up quite quickly.

Opting for YAML over XML/JSON/whatever doesn't make me a tool. It made life much easier for myself & my colleagues.

kjeetgill · on Feb 8, 2019

I think they meant tools like docker or npm and stuff. Not calling people tools.

SamWhited · on Feb 8, 2019

heh, I didn't even consider this interpretation of the above; sorry about that. I meant "tools like Docker keep using it", not people. But I still don't know what you're talking about; YAML has an 83 page spec that includes pointers, and uses tons of random confusing symbols. I say "maybe it's just me" in some of these posts, but I know it's not: I've watched many of my coworkers get it wrong the first time for years and then have to be corrected. A quick common example I see in CI config all the time: If I write version: 1.10, that's a number, then I decide to move to latest so I write version: 1.10.x, that's a string. Oops, we were never using version "1.10" we were using "1.1". Everything about it is implicit and bad. Now, it's easy to say "always use quoted strings", and I agree, but then why the hell does it have bare strings in the first place? That seems like an easy enough oversight or typo to make, and it will be made.

illumin8 · on Feb 7, 2019

I generally dislike languages like YAML or Python where whitespace matters, and can break your code, however, YAML is way more easily human readable than JSON, so I started to appreciate it for readability purposes.

I guess YMMV, but after you've used both YAML and JSON for a while, you might appreciate YAML a little bit more.

bamboozled · on Feb 8, 2019

The killer feature for me over JSON is comment support, it's definitely useful to add todo comments etc.

abraae · on Feb 8, 2019

Yep, so true. Any decent config file that will be seen/edited by a human needs comment support. And any file that will not be seen by a human could be json (or whatever).

krferriter · on Feb 8, 2019

Yeah this is a huge selling point of YAML. JSON should have comments added to the spec. The other benefit to YAML is human readability, which is usually better in YAML compared to JSON. A specific glaring example of this is when there are long string-literal snippets inside the document, in YAML this is massively more readable than in JSON.

hannasanarion · on Feb 8, 2019

JSON shouldn't have comments added to the spec, because people shouldn't be trying to read or write JSON. It's an application interchange language, meant to be written and read by machines. YAML is a markup language, meant to be written and read by people. Ever notice how most YAML libraries don't even have a "dump" function?

krapp · on Feb 8, 2019

>It's an application interchange language, meant to be written and read by machines.

Not entirely true, JSON is based on Javascript objects, it was meant to be written and read by humans just like Javascript, INI or any other basic serialized data format, or text-based programming language. If JSON were truly never meant to never be viewed or edited by human beings, it would have been published as bytecode.

heavenlyblue · on Feb 8, 2019

>> Ever notice how most YAML libraries don't even have a "dump" function?

No. I guess unless you’re using something half-baked, I suppose. Machines need to be able to edit configs either way.

marcosdumay · on Feb 8, 2019

Hum... We have a bad historic of success in deciding that this or that stuff is for machine consumption only.

lstamour · on Feb 8, 2019

> JSON should have comments added to the spec.

See also ~5 years ago: https://news.ycombinator.com/item?id=7325735

basil-rash · on Feb 8, 2019

You can just use JSON with comments though. If you have sufficient control over the technology in question to be able to completely change it to a YAML parser, surely you can change it to be a JSON+Comment parser too. See: VS Code's config files.

jhanschoo · on Feb 8, 2019

But you can restrict yourself to JSON+Comments and call it YAML; it isn't idiomatic YAML, but it's still valid YAML.

saagarjha · on Feb 8, 2019

> You can just use JSON with comments though.

Then it's not valid JSON, and if you try to treat it as such bad things will happen.

basil-rash · on Feb 8, 2019

My point is that if you're in sufficient control of the stack to be able to convert the whole thing over to YAML, you could just as easily convert the whole thing over to JSON+Comments. And of course bad things would happen if you treat JSON+Comments as JSON, but similar bad things would happen if you treat YAML as JSON, so I don't see your point. It's not like people are trying to send their tsconfig's on the wire as "application/json" and expecting arbitrary parsers to support it.

h1d · on Feb 8, 2019

When you think about it, JSON lacks so much (check JSON5 for what it's lacking), it's hard to believe even a comment is not allowed when pretty much anything else allows, which is a showstopper by itself.

chriswarbo · on Feb 7, 2019

> YAML is way more easily human readable than JSON, so I started to appreciate it for readability purposes.

> I guess YMMV, but after you've used both YAML and JSON for a while, you might appreciate YAML a little bit more.

I've used JSON a lot, and XML and s-expressions and MessagePack and ini and YAML and a whole bunch of other formats.

I usually have to fire up Google to read YAML. YAML is the only one where I routinely have to Google for a syntax cheatsheet and wade through tables of redundancy and edge-cases.

YAML made sense before JSON became a thing. Why people persist with it in new projects is baffling to me.

ekianjo · on Feb 8, 2019

a raw YAML file is readable with your eyes. A Json file needs to be prettified before you go thru it. Json is good for APIs but is not made for readibility.

chriswarbo · on Feb 8, 2019

We must disagree with what "readable" means then. I find JSON readable (as long as it's nicely layed out, e.g. by piping through 'jq "."'), in the sense that I can skim over the structure looking for [/]/{/}/". If I want to read some of the content, like a string, I just need to read '\"' as '"' and '\\' as '\', which is a small constant cost per (usually rare) occurrence.

With YAML it's difficult to even know the structure of what I'm looking at, due to anchors and extensions. It's also hard to discern structure from skimming, since strings can appear unquoted, and may contain unescaped lexical tokens (depending on which particular symbols it started with); hence we must carefully consider each and every character, rather than just skimming for the next token.

If I know I'm looking at a perfect YAML file, than I should be able to guess the gist of what it says, since I can make assumptions about what the syntax means. If I want to be sure, I'd be Googling for cheatsheets. Yet as a programmer, I mostly look at files when they're buggy, meaning I can't just assume that, say, an unescaped quotation mark won't terminate the string; or that a certain piece of text is allowed to run across multiple lines; or that the indentation corresponds to the nesting; etc.

thanatropism · on Feb 8, 2019

That's what syntax highlighting is for.

I use notepad++ for YAML. Besides coloring what it thinks I'm thinking, it displays vertical lines corresponding to the indentation levels.

(I prefer INI/TOML whenever I can help it; hierarchies in TOML are so counterintuitive that it incentivizes a simple flat structure. But then, some things are irremediably hierarchical)

basil-rash · on Feb 8, 2019

Is it really though? The implicit typecasting gives rise to very unexpected results, that you'd never get with JSON. See: https://hitchdev.com/strictyaml/why/implicit-typing-removed/

h1d · on Feb 8, 2019

Have you considered alternatives like TOML or JSON5?

bpicolo · on Feb 8, 2019

Writing YAML is easily the part I hate most about writing/deploying software. It's unstructured, feedback cycles tend to be slow (e.g. when deploying k8s configs into prod), and you can't possibly write something useful without documentation pulled up. It's definitely easier to implement than purpose built DSLs, but it's not a good experience.

marcosdumay · on Feb 8, 2019

Python has better IDE support and, maybe more importantly, python does not completely disallow any indentation style so it handles unsupported IDEs better. But it's essentially an IDE support problem.

My life improved a lot since I got an YAML mode for emacs. Now things would be just perfect if haskell's cabal migrated to dhall...

andrewflnr · on Feb 8, 2019

I know you only said Python is better, not great, but you might want to check out OpsMop: https://medium.com/@michaeldehaan/opsmop-building-the-next-g... . By the creator of Ansible, in pure Python, including the config.

dmix · on Feb 8, 2019

Is there a reason http://opsmop.io/ and http://vespene.io/ are both down and their Github's both say "DISCONTINUED"?

https://github.com/opsmop/opsmop

https://github.com/vespene-io/vespene

jaxxstorm · on Feb 8, 2019

Michael DeHaan threw his toys out of the pram because he wasn’t getting the user numbers he wanted and discontinued them

geerlingguy · on Feb 8, 2019

Looks like it happened a few days ago: https://threadreaderapp.com/thread/1091710068234641408.html

dmix · on Feb 8, 2019

> In this case, lots of discussions show everyone is busy, has no time, and also ... increasingly they have interest in low-code/no-code type solutions. This is not open source as whole, just the IT ops vertical

Looks like he doesn’t believe the code approach is viable as much as other people are claiming in this thread.

andrewflnr · on Feb 8, 2019

I... No. The OpsMop Twitter has a tweet from January 31, so it seems like if the project had died it would have to be really recent. That would be sad.

gcb0 · on Feb 8, 2019

YAML is useless because it replaces JSON (tree structure that is minimally verbose to not be confusing) with something worse (a tree structure that is just less verbose than JSON to be slightly confusing)

aetherlord · on Feb 7, 2019

I didn't see this mentioned anywhere else, so another alternative (that I've seen and really like conceptually, but haven't used so far) to all this wildness with YAML and JSON -> https://github.com/dhall-lang/dhall-lang, and for kubernetes specifically -> https://github.com/dhall-lang/dhall-kubernetes

tfinch · on Feb 7, 2019

Came here to say similar. In particular dhall does allow scripting (functions etc.) but is non-Turing-complete as a feature. This seems like a particular sweet spot to me as it allows for more dynamism than data formats like json/yaml while constraining the scope sensibly.

It also has very nice bindings with haskell and nix

tfinch · on Feb 7, 2019

also this is probably a nicer intro https://dhall-lang.org/

Cyph0n · on Feb 8, 2019

What is up with the strange comma positioning? I assume that’s just a stylistic choice?

jtdev · on Feb 8, 2019

It allows each line to be completely independent of it’s neighbors; you can comment and/or add lines without needing to touch neighboring lines. Also, it makes it visually easy to spot missing commas. Give it a try sometime, it’s actually quite nice.

yzmtf2008 · on Feb 8, 2019

Wouldn't this also be solved by allowing trailing commas?

heavenlyblue · on Feb 8, 2019

But they are not independent: first line doesn’t have one.

hazz99 · on Feb 8, 2019

Also makes nice file diffs!

_j7tr · on Feb 8, 2019

Wouldn't the same thing occur if it wasn't there at all?

merijnv · on Feb 8, 2019

Having prefixed commas is a rather common style in the Haskell community, because it ends up nicely matching open/closing brackets/braces and lining things up.

Since the author of Dhall comes from the Haskell community, he's kept this style.

Gabriel439 · on Feb 8, 2019

Author here: This is correct. I'm just borrowing a Haskell convention. Also, I like this convention because it leads to vertical alignment of commas.

svnpenn · on Feb 7, 2019

god those examples are ugly - commas at the beginning of a line? mismatched brace styles?

chriswarbo · on Feb 8, 2019

Not sure what you mean by "mismatched brace styles". The convention of putting separators like commas at the start of the following line rather than the end of the preceding line is common in Haskell, which Dhall is built with.

The advantages are:

- All of the separators are in the same column, along with the opening and closing characters. This makes it trivial to check if we've missed a separator.

- Appending new lines to the end will not affect previous lines (i.e. we don't need to go and add a comma). This avoids making mistakes and polluting diffs.

Unfortunately the error-prone diff pollution we avoid at the last line instead occurs at the first line. It's still less error-prone than trailing commas, since we can look in the separator column and either spot that it's empty, or that it contains two opening braces (depending on whether we inserted or copy/pasted).

necubi · on Feb 8, 2019

It's a haskell thing. The main advantage is that each line is independent. You can comment out a line or add a line at the end without modifying anything else.

basil-rash · on Feb 8, 2019

Each line is not independent: you cannot comment out the first line. A better approach is to allow trailing commas. (I suppose you could allow leading commas, does Haskell support this?)

nh2 · on Feb 8, 2019

Unfortunately no.

Allowing trailing commas, like Python does, would be really great. Unfortunately trailing commas already mean something: (a,b,) is a function that still takes 1 argument to make a triple. It's called "TupleSections".

matt_kantor · on Feb 10, 2019

For the sake of this comment, let's define "templating" to be attempts to solve the problem "I need $FORMAT due to an existing constraint, but $FORMAT does not entirely meet my needs on its own" (in this article, $FORMAT is YAML). Additionally let's say that in order to be a "template" something must be a text file (e.g. exporting a database table as $FORMAT does not count as "templating" for the purposes of this comment).

I think there are three very different kinds of tools that people use for this:

1. Interpolation/preprocessor languages: This is what the author is talking about. There are delimiters/tags/sigils to distinguish "the templated parts" from "the rest" and the primary operation done by the template engine is substitution. "The rest" is literal content that's already in $FORMAT and it remains mostly/entirely unchanged during template rendering. Languages of this type are basically glorified sed. This can be nice because they're agnostic as to their embedding (any string will do) so they're very portable/flexible (you don't have to create "handlebars for YAML", "handlebars for HTML", "handlebars for CSV", etc; one implementation does it all). Languages of this kind can work in the small but don't scale well for all the reasons mentioned in the article/comments. The language doesn't know anything about the semantics of $FORMAT and that can cause all kinds of pain. Examples include golang templates, PHP, ERB, handlebars, the C preprocessor, Jinja, etc.

2. Compilers/code generators: These are "complete" languages that compile to $FORMAT. The difference between these and and interpolation/preprocessor languages is that the entire input is the language, not just specific chunks/tags. This kind of language can be nice because you have complete control and can therefore guarantee valid output and do tricks like supporting multiple different output formats for the same input, but the downside is that you're working with an entirely new language so there's a learning curve, you need specialized syntax highlighters and other tools to work with templates, etc. Examples include HAML, Jsonnet, Dhall, etc.

3. Embedded DSLs: Templates of this kind are valid $FORMAT from the beginning, but have embedded ways to specify transformations to be applied to the parsed AST. These languages are homoiconic with respect to $FORMAT. First $FORMAT is parsed, then the template engine iterates through the AST to perform evaluations, then the result can either be used as-is in memory or serialized back to (a possibly different) $FORMAT. This is sort of like an interpolation/preprocessor language with the evaluation order swapped: preprocessing is "run the template engine, then parse $FORMAT" while this is "parse $FORMAT, then run the template engine". A downside of this approach is that it is less general, e.g. it only really makes sense when $FORMAT has a well-defined structure (you probably can't template plain english sentences with this approach), but these days most "data languages" have converged towards being semantically equivalent to JSON (lists, dictionaries, and primitives) and this approach works well for any of them. An upside is that like compilers/code generators you can guarantee that the output will be valid $FORMAT no matter what the template looks like. Examples include JSON-e, Lisp macros, CloudFormation templates, etc.

It's unfortunate that all of these get called "templating languages" because they're very different beasts from one another, and usually when I see conversations about this stuff these distinctions get blurred and you end up with apples-vs-oranges comparisons. If I had my druthers we'd reserve the word "templating" for the first one and use different terminology for the others, but that ship has sailed.

miohtama · on Feb 7, 2019

If you have been around long enough you still remember the world that was excited about XML and templating it using XSLT. As a hindsight it was a horrible world.

Even though YAML is not optimal, it is a human friendly compromise between too verbose XML and machine only JSON. It lacks native templating, leading to funny constructs e.g. with Ansible files. However human kind has made progress and will make progress further, so it is just a matter of time until someone comes up with sane "native templated YAML" and all projects will adopt it.

chriswarbo · on Feb 8, 2019

> If you have been around long enough you still remember the world that was excited about XML and templating it using XSLT. As a hindsight it was a horrible world.

I actually really like the idea behind XSLT: machine-friendly, human-tolerable, structured data + declarative rules for turning that data into a display, or a report, or whatever else.

The execution was horrible though: incredibly verbose, lots of overcomplication due to XML weirdness/asymmetries (e.g. attributes vs elements vs text, namespaces, ...); mixtures of different languages hidden inside each other (e.g. XPath hidden in attributes); etc.

I would really like to see what this could look like if done in a more minimalist, lispy fashion (normal code-is-data stuff in Lisp is similar, but I think term-rewriting is a more appropriate evaluation mechanism for such rules)

hyperpallium · on Feb 8, 2019

The syntactic mistake of XSLT was writing it in XML, XPath was a redeeming feature. Imagine if XPath was also written in XML...

jq occupies the same role as XSLT, but for JSON. It can be used for templating but it's not quite as declarative as XSLT (you must pipe things through).

chriswarbo · on Feb 8, 2019

> The syntactic mistake of XSLT was writing it in XML, XPath was a redeeming feature. Imagine if XPath was also written in XML...

Yes, I didn't mean to imply that XPath itself is bad (although it also has to handle XML quirks like element/attribute/text, etc.).

Rather that the reason to write XSLT as XML in the first place is that it's machine readable, we can mix and match elements from different vocabularies, etc. yet most of the heavy lifting ends up as opaque string attributes :(

PS: I've done a few projects which make heavy use of jq; it's really nice, but as you say it's more of a pipeline.

int_19h · on Feb 8, 2019

XQuery was halfway between XSLT and XPath in expressiveness - functions, loops, queries with joins etc, but no pattern matching. If it only had the latter, it'd be perfect.

mst · on Feb 8, 2019

I quite liked SXML + SXSLT back in the day (scheme syntax)

hyperpallium · on Feb 9, 2019

It is more concise. Similarly, I was a fan of using attributes instead of text elements (with their unnecessary close tags), but eventually was won over by neatness. e.g. translating an eg from https://www.gnu.org/software/guile/manual/html_node/SXML.htm...

  (parrot (@ (type "African Grey")) (name "Alfie")) 

  <parrot>
    <type>African Grey</type>
    <name>Alfie</name>
  </parrot>

It's more the one-variable-per-line pretty-printing than the syntax as such, but still.

pytester · on Feb 8, 2019

XSLT was one in a litany of domain specific languages (ant, apache rewrite rules, latex macros, etc.) that evolved towards turing completeness because that's what the problem space demanded.

In most if not all of these cases an existing and well designed turing complete programming language would likely have better served them.

astine · on Feb 8, 2019

I don't see a reason that a DSL can't be Turing complete and still a better option than an existing language. If you look at old-school make files, they are little more than shell script with top level rules that you can invoke from the command line. You could theoretically just use shell script but the make scripts still simplify the task quite a bit.

hyperpallium · on Feb 8, 2019

Less power means less to go wrong. Automatic checks can also be deeper in a simpler system.

Your point is valid though, the power seems to end up being needed, sometimes, in some parts, in some cases. Escaping to a full language when needed seems to retain the benefits of both worlds.

thanatropism · on Feb 8, 2019

There was a post on HN a few weeks back to the effect that it's rather easy for Turing completeness to emerge accidentally. I wish I remembered more specifics so I could find it again.

chriswarbo · on Feb 8, 2019

Was it http://beza1e1.tuxen.de/articles/accidentally_turing_complet... by any chance?

miohtama · on Feb 8, 2019

I wrote a lot of XMLT in early 2000.

The idea behind pure function is nice, but in practice you ended up with hundreds of mini plugins in Java or Python.

james_s_tayler · on Feb 8, 2019

I find XSLT intolerable in practice. Thankfully I've only had to touch it a handful of times. I agree the idea behind it is neat but boy is it a headache.

int_19h · on Feb 8, 2019

I kinda miss it, actually. XML had many warts, but at least everybody spoke it, and it was the same everywhere. Occasionally you still had some overlapping but different things, like XSD and RELAX NG schemas (though even there, there was a big difference - one is a language for describing data types, and the other is a language for describing grammars). But it's better than several dialects of JSON, YAML, TOML etc.

I also rather liked thorough extensibility. Namespaces were the right idea, despite clunky syntax. Today you can see Clojure doing something similar in Spec.

And while we're on the subject of XML, XSLT and Clojure; I feel like this is the best solution for readable serialization of tree-like data, and an associated ecosystem of tools (to validate, transform etc). Note some nice features for humans, like the ability to comment out a specific node, in addition to the usual line-oriented comments.

https://github.com/edn-format/edn

miohtama · on Feb 8, 2019

As, I think Douglas Crockford said, the best thing that XML delivered was UTF-8.

Then we have built a whole new domain on things on the top of UTF-8.

NicoJuicy · on Feb 8, 2019

I actually like XML templating, it's the only one that supports it in the file.

I export custom queries for data to xml, Json, CSV and HTML.

Where HTML is XML + XSLT. It works great. And clients can even theme it

pjmlp · on Feb 8, 2019

I also like it, for a couple of years my web site was XML + XSLT.

dagss · on Feb 8, 2019

What you say you want is XSLT though, just wity a better syntax and for YAML. XSLT was just fine, had a lot of very good ideas, just horrible syntax.

Horrible syntax is kind of a forgivable offense, lots of things have horrible syntax and work fine.

Generating YAML with go templates though.. it is just horrible on so many levels.

I definitely feel the 90s can take the higher ground over 2010s on this one.