My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML aren't sufficiently expressive -- lacking for loops, conditionals, variable references, and any sort of abstraction -- so, of course, we add templates to it. But as the author (IMHO rightfully) points out, we just end up with a funky, poor approximation of a language.
I think this approach is a byproduct of thinking about infrastructure and configuration -- and the cloud generally -- as an "afterthought," not a core part of an application's infrastructure. Containers, Kubernetes, serverless, and more hosted services all change this, and Chef, Puppet, and others laid the groundwork to think differently about what the future looks like. More developers today than ever before need to think about how to build and configure cloud software.
We started the Pulumi project to solve this very problem, so I'm admittedly biased, and I hope you forgive the plug -- I only mention it here because I think it contributes to the discussion. Our approach is to simply use general purpose languages like TypeScript, Python, and Go, while still having infrastructure as code. An important thing to realize is that infrastructure as code is based on the idea of a goal state. Using a full blown language to generate that goal state generally doesn't threaten the repeatability, determinism, or robustness of the solution, provided you've got an engine handling state management, diffing, resource CRUD, and so on. We've been able to apply this universally across AWS, Azure, GCP, and Kubernetes, often mixing their configuration in the same program.
Again, I'm biased and want to admit that, however if you're sick of YAML, it's definitely worth checking out. We'd love your feedback:
This is a great analysis, but it's missing a fundamental point: why do we have a problem with these approximations of a programming language or just using a programming language to template stuff?
Because your build then becomes an actual program (i.e. Turing complete) and you have to refactor and maintain it! This is the common problem of using a "programming language as configuration" (e.g. gulp?)
It has the same premises of Pulumi, but without the Turing completeness (I don't know if/how Pulumi avoids that, but if it does it should be part of the pitch), so you cannot shoot yourself in the foot by building an abstraction castle in your build system/infrastructure config.
We use it at work to generate all the Infra-as-Code configurations from a single Dhall config: Terraform, Kubernetes, SQL, etc.
> We use it at work to generate all the Infra-as-Code configurations from a single Dhall config
This is the key bit and not something which is pitched well enough from the Dhall landing pages: using straight YAML forces you to repeat yourself in multiple areas for each Individual tool being used, and these repetitions have to stay consistent across multiple tools. What Dhall does is allow you to write a single config and use it to derive the correct configurations for each tool that you use. So you can write a single configuration file from which, eventually, every single part of your system is derived - Terraform infrastructure, Kubernetes objects, application config, everything. When you pull it off, it's simply magical.
You can think of it like this: JavaScript is a horrible, no-good, very bad language, and yet all browser programming is done in JavaScript because every browser supports it - so too, are JSON and YAML horrible configuration languages. But JavaScript gave rise to abstractions like TypeScript which are much better languages which compile down to JavaScript for compatibility. TypeScript is to JavaScript what Dhall is to JSON and YAML - the fact is, pretty much everything is configured with JSON and YAML, and Dhall makes it much, much easier to live in that world, with no need for the systems being configured to support it.
Considering the relative obscurity of Dhall, it's basically the best-kept secret in the DevOps world right now, and it's a shame more people don't know about it.
Dhall appears to be expressive enough that I can't see why you wouldn't have to refactor and maintain the Dhall code?
Writing Dhall code look exactly like programming to me, and the programmer must possess the necessary programming skills to produce good Dhall code. A random guy with a text editor will make an equal mess in Dhall as they would with a “real” programming language.
I don't see how the restrictions in Dhall really help much in this regard. Turing completeness feels like a red herring to me.
Not a user of Dhall, just a fan, but refactoring of Dhall configuration should be extremely easy. You make a change, and your configuration stays the same, which is easy to verify. (Thanks to https://en.wikipedia.org/wiki/Normalization_property_(abstra... )
For TC languages, comparing if two programs (original and refactored) do the same thing is not solvable in general. If the language is not TC then it is more feasible.
You can do more than just compare the output of two programs in Dhall. You can verify using a semantic integrity check that two programs are the same for all possible inputs. For example:
Actually with Dhall, you should be able to compare the programs themselves, even without full "input" (there is even example on the Dhall page, see "You can reduce functions to normal form, even when they haven't been applied to all of their arguments").
So you can for example leave some parameters out of your config and still validate the correctness of refactoring.
If you use general purpose programming language, then even comparing just output might be difficult - most languages allow to do I/O, so it's possible that the configuration is dependent on some side channel.
I would say if you are only using general language "sensibly" for configuration then you are effectively restricting yourself in the same way that Dhall does.
I don't get the problem with using a turing complete language to generate configuration. There's nothing wrong with maintaining and refactoring a program, that's a natural process for any program. If you don't want an infinite loop, don't write one, as you wouldn't in any other program. You can choose as much or as little abstraction as you so wish.
Give me a real language any day over dhall or jsonnet.
What's so bad about Turing Completeness? I haven't a decent look at Dhall, but I'm betting I could probably write an exponential Dhall program that won't terminate in the lifetime of the universe.
The real reason for giving up Turing equivalence was probably to get dependent types. This gives very powerful static guarantees, including the presence/absence of fields under non-trivial record operations such as merge. In using dependent types, they have also had to give up significantly on type inference, which is really going to annoy the average JavaScript/Ruby programmer.
> My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years.
I think that you’re right, and I think it’s great, because we have a programming model in which code is data and data is code: Lisp & S-expressions.
It’d be downright awesome to have a Lisp-based system which used dynamic scoping to meld geographical & environmental (e.g. production/development) configuration items. But then, it’d be downright awesome if the world had seriously picked up Lisp in the 80s & 90s, and had spent the last twenty years innovating, rather than reïnventing the wheel, only this time square-shaped. But then, the same thing could be said about Plan 9 …
I’ve not yet had the time to take a look at Pulumi, but I hope to have time soon.
Seriously, this has happened again and again and again. You have software, so you configure it via a clean and simple text syntax, then the configuration needs to be generated and the syntax becomes more complicated, then the next system you do has an "API" instead so you can configure it via programming, which is too complicated so the next time you Do it Right and go with a simple text file, which is then outgrown when the configuration it stores becomes too complicated...
I think the parts of Lisp that tended to be rebuilt have mostly been incorporated into the newer languages. (At least, it's been a very long time since I've had to rewrite a fundamental data structure, etc.)
You don’t need code-is-data for what your parent is describing. All you need is code that outputs data. Or even better, code that initiates contact with other code.
The only requirement is a commitment to doing things imperatively in a real programming language. It’s hard to resist the temptation to do things declaratively (because it’s easier to imagine a declarative interface that describes your problem than an abstraction of the procedure which will solve it) but you are never forced to.
As the kids say: stop trying to make Lisp happen, it's not going to happen.
It has become yet another community that's fighting a struggle that everyone else ended years ago, like the few Japanese in jungles who refused to surrender. I'm not entirely sure why it's not been adopted, but I suspect it's because most people strongly prefer (a) visually semantically different scope delimiters and (b) function-outside-brackets syntax ie f(a, b) rather than (f a b).
Or you could go the other way and say that JSON is s-exps with curly brackets so it should be made executable as such, and build that language.
> As the kids say: stop trying to make Lisp happen, it's not going to happen.
That's probably true, but I think it's useful to fight the good fight regardless. Even if Lisp & s-expressions don't, in fact, take over the world (and I think they will), arguing in their favour might help increase the chance that whatever inferior technology does end up getting adopted is better than it could have been.
> Or you could go the other way and say that JSON is s-exps with curly brackets so it should be made executable as such, and build that language.
The problem is that without symbols, that ends up being hideously ugly. This:
["if",
["<", 1, 2],
"less than",
"greater than or equal to"
]
is appreciably worse than:
(if (< 1 2)
"less than"
"greater than or equal too")
And alternatives like:
{"if": [[1, "<", 2], "less than", "greater than or equal to"]}
are so much worse that I don't think anyone could seriously expect to use them.
> It has become yet another community that's fighting a struggle that everyone else ended years ago, ... like the few Japanese in jungles who refused to surrender.
Nice imagery, but the wrong point.
Except for the syntax, everybody else joined Lisp.
"We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." --Guy Steele
Flash back to the mid-1980's (when the mainstream was C, Pascal, BASIC, FORTRAN, COBOL, etc.) and it's Lisp/Scheme (and Smalltalk) that have features like Garbage Collection, interactive development, lexical closures, decent built-in data structures, dynamic typing.
The fact that all of this is commonplace today, both justifies a lot what Lisp did in the first half of its existence and undermines its (technical) competitive advantages now.
> but I suspect it's because most people strongly prefer (a) visually semantically different scope delimiters and (b) function-outside-brackets syntax ie f(a, b) rather than (f a b).
It's not technical. I don't think it ever was. So much of it is around social concerns: a performance stigma dating back to the 1970's, fear of being able to hire people to do the work, fear of what VC's will think, worries that the language will still be available... And then at the end of the day, the problems whatever language will solve are a tiny fraction of the overall problem of doing something relevant and lasting and useful to others.
> As the kids say: stop trying to make Lisp happen, it's not going to happen.
Life is too short and the world is too big to try to confine other people's ideas of how they should think or work.
The point of the market economy and of the scientific process is that people get to try what they think is going to be useful and then let the world decide. The fact that Lisp is still in the conversation at all, when its contemporaries (Autocoder, Fortran) either aren't or are highly specialized, says a lot that we can learn from.
I think what you're doing with pulumi is the right answer and it's only a matter of time before this becomes the norm. The author's examples could easily be done with plain ol' JS/ES/TS with more far more extensibility and customization when the need arises.
I also feel this is where JSX got it right. Instead of creating yet-another-templating-language (looking at you Angular!), they used JavaScript and did a great job of outlining how interpolation works. Any new templating language is always going to be missing some key feature you expect out of a general programming language and your customers will continue to ask for more features.
Paired with Typescript, we would have the clearness of a declarative language, with the power and flexibility of a real language that is also easy to extend and navigate.
In ROS we have these XML launch files that are just awful. They have enough features to be a really bad programming language for configuring and launching (often conditionally) numerous robot software nodes.
In ROS2 the launchfile can now just be a Python script. Very much learned all this the hard way and the solution was to just support Python. I think it's brilliant.
- the django like situation: the configuration is pure code, and it's a mistake. It was not necessary, it brought plenty of problems. I wish they went with a templated toml file.
- the ansible like situation: the configuration is templated static text. But with something as complex as deployment, they ended up adding more and more constructs, until they created a monstrous DSL on top of a their implementation language, with zero benefits compared to it and plenty of pitfalls. In that case, they should have made a library, with an API and documentation making an emphasis on best practices.
- and of course a big spectrum between those
The thing is, we see configuration as one big problem, but it's not. Not every configuration scenario has the same constraints and goals. Maybe you need to accept several sources of data. Maybe you need validation. Maybe you need generation. Maybe you to be able to change settings live. Maybe you need to enforce immutable settings. Maybe you need to pub sub your settings. Maybe you need to share them in a central place. Maybe they are just for you. Maybe you want them to be distributed. Maybe you need logic. Maybe you want to be protected from logic. Maybe the user can input settings. Maybe you just read conf. Maybe you generate it.
So many possibilities. And that's why there is not a single configuration tool.
What you would need, is a configuration framework, dealing with things like marging conf, parsing file, getting conf from the network, expressing constraints, etc.
But if you recreate a DSL for your config, it's probably wrong.
In defence of Django, the way settings.py works has been very stable for the entire lifetime of Django.
It may have its problems (I don't have many issues with it) but it doesn't seem to have this problem of attracting ever more layers of abstraction on top of it. It works.
Actually, I think settings.py is not a bad idea, but it's half backed.
There should have a schema checking the setting file. There should have a better way to extend settings, and make different settings according to context, such as prod, staging or dev.
There should be a linter avoiding stupid mistakes like missing a coma in a tuple, resulting in string concatenation.
There should be variables giving you basic stuff like current dir, log dir, var dir, etc. We all make them anyway.
And there should be a better to debug the import settings problem.
But all in all, it's quick and easy to edit, and very powerful.
> There is already a mechanism to validate the settings.py file inside django.
It's not exposed, but it's very limited.
> The different context stuff can be handled by using env vars, and a nice python wrapper, like python-decouple.
It's just one of the way to do it. Go to a new project, they use a different way. The main benefit of Django is the fact that a Django project is well integrated, and you find similar conventions and structure from project to project, allowing to reuse the skill you learned and build an ecosystem of pluggable app.
Does anybody here personally suffered those problems that the Turing complete Django configuration creates? (I mean, not the ones caused by lack of a completness checks, or good library support, but the ones caused by too much power.)
Now that you say it, it's true I didn't have problems with too much power.
I never had an untrusted party editing my config, nor did I use data from any.
Also, you can make the same mistakes in the setting file that in any code file, but it's not more or less important.
In fact, all the problem I had could have been solved by better integration: solving the import problem, making composition easy, adding checks, allow loading data from several sources and merge them, presenting them in a unify interface.
If I'm being honest, problem with settings.py may have not been that it's Python, but that it's a flat file with no strong conventions, tooling or best practices.
I could raise the issue that you can't read the config from another language, but I never had to, and good tooling would allow a synced export or an API to consume the settings.
After years of working with cfengine then ansible I finally went to a bespoke BSD ports work alike with optional client/server and json configuration components. Never looked back.
RCS stored directory based modules with tasks in subdirectories. Make or shell script style module execution as part of each task dir + variable files containing settings for the install task. Json configuration files that define all necessary module params (ex:log, task selection, stop on error, initialization, build command per task, etc...) remote scheduling of module/task execution via per agent sysv
ipc command queue serviced by a JSON-RPC microsvc which allows both serialized and non blocking task scheduling by queue priority.
I owned the majority of the configuration system and ecosystem for Borg, Google's internal cluster management and application platform.
Unfortunately, what described here is good in many level, but not excellent in any.
If you are OK to describe the complexity of your infrastructure in a programming language similar to the general purpose language, then a well abstracted API built on original APIs from cloud providers are more familiar to devs. And it will be more reliable performance and flexible.
If you want a config experience, something like kustomize is leaner and more compatible with the text config model.
I also cannot see how this interoperate with other tools, which will seriously limit it's appeals to people using other tools.
The problem with code as configuration is that the config file is indeterministic and it takes longer to extract information from the file.
This has long been a problem in the python/pip community, as its basically impossible for the build tools to determine the dependencies of a package without fully downloading and running the setup.py file.
Unless you import rand() your code should be deterministic. You're right about needing to run the thing to get the data (that's the point) but there is a middle ground between pure literals and fully side effects code. By example you could impose pure functions (no side effects).
That's what Haskell already does. Dhall is optimizing on different dimensions (making sure the script execution ends, making the scripts verifiable at static time, making it convenient to marge files, making it convenient to centralize your configuration).
As a happy pulumi user, I have to say I am very impressed with the experience. An order of magnitude improvement on maintainability over our old terraform code base. Highly recommended.
This is my experience and it's clearly biased from maybe one bad example but ... Scons is an example of code over configuration and from what I could tell I never met someone that truly understood it. Because it was code over configuration every programmer added their own interpretation of what was supposed to happen, no programmer truly understood what was really going one and it turned into one giant mess of trying to understand different programmers hacks and code to get the build to work. I'm sure some Scons expert will tell me how I'm full of crap but I'm just saying, that's my experience.
So, what's my point? My point is configuration languages help in that they push "the one true way" and help to enforce it. Sure there are times you end up having to work around the one true way but given very powerful tools of a full language for configuration leads to chaos or at least that's my experience. Instead of being able to glance at the configuration and understand what's happening because it follows the one true way you instead end up with configuration language per programmer since every programmer will code up stuff a different way.
For what it's worth--I've been using Pulumi on a couple of different projects and, today, I couldn't imagine starting a cloud-based project on anything else. The Pulumi team has spent more time than almost anybody I know on understanding how to attack these problems; I guess I have a bit of an understanding of just how much work that is, as I've tried to do the same thing and their solution is better.
I appreciate that their revenue model doesn't require making the open-source version frustrating or stupid and I appreciate that they're incredibly responsive. And some of the stuff you'll see around cloud functions/Lambdas and the deployment thereof will fucking blow your mind.
I have been using ksonnet but that is now officially dead. Working with jsonnet seemed unnecessarily painful when coming from coding typescript. This information is quite timely and welcome, I'll look further at the ts example.
We have ksonnet expats on the team (we're all in cloud city -- Seattle), and I've been keeping an eye on that project myself, since I think it got a lot of things right and frankly many of the ideas for Pulumi were inspired by early chats with the Heptio team. But, as you say, why create a new language when an existing one will do -- that was our original stance and it's working great in practice.
Oh! I don’t know where I got that impression from then! perhaps I just thought that we couldn’t use the free tier because of the number of licenses we’d need, but you’re right, it’s still there!
Build files (e.g. makefiles are their various descendants like SCons, rake, etc) seem to be in the same general boat except very early on mixing "real languages" (or at least shell scripting) was obviously allowed so they've always leaned far more towards the "yes, it is a general purpose language" end of the spectrum.
> My belief is that we've been slowly building up to using general purpose languages, one small step at a time, throughout the infrastructure as code, DevOps, and SRE journeys this past 10 years. INI files, XML, JSON, and YAML aren't sufficiently expressive -- lacking for loops, conditionals, variable references, and any sort of abstraction -- so, of course, we add templates to it. But as the author (IMHO rightfully) points out, we just end up with a funky, poor approximation of a language.
This is the why I prefer to use a JS file for configuration instead of native JSON or YAML file if those options are available.
I still don't know how to get it to do exactly what I want. There is far too much magic involved, and experience has long demonstrated that magic is bad (Webpack confirms that belief).
That being said, the concept of defining a function in, essentially, a config file seems like a step in the right direction. I don't think I'd trust that functionality outside of builds or infra-as-code, though.
What's magic about webpack? The online documentation provides quite a lot of insight into how it all fits together.
It probably only seems like magic because you didn't build a fundamental understanding of how it works before using it. I use some massive webpack configurations and I understand them all quite thoroughly thanks to well-written, modularized configuration files.
Javascript is a scripting language without native module support. That isn't Webpack's fault.
Webpack also handles much, much more than just Javascript. It handles CSS, HTML, images, files, pretty much any kind of asset. Java/Scala doesn't have anything like that. Asset management is completely different due to the nature of how assets are transferred to the client.
And Android? Give me a break. The moment you stray from the strict layout of an Android app you run into a wall and have to learn how Gradle operates. This strict layout is good for some but others hate when an environment forces particular constraints upon them.
Webpack is completely configurable at every stage, works with plugins (which compilers don't do) and again, isn't magic. Not knowing how something works doesn't make it magic. That's not what magic means with respect to code.
Besides... Maybe if you just like getting by, you can program in C/Java/etc without learning about compilers. Web dev is fucked and transpiler knowledge is basically required, but sure you can get by in other domains without it. But if you want to be a good programmer, an expert at what you do, someone who lives and breathes and understands computer science, someone who will excel in his career and not remain a code monkey forever... You have to learn about how your compilers work just like you should know how the silicon in your computer is doing its own "magic".
It was very successful. Complicated projects require complicated build config. Parcel does fine for simple projects, but lacks the raw power & configurability of webpack.
Webpack now does simple config as well with the 'mode: "production"' and 'mode: "development"' presets.
Having dealt with puppet, cloudformation, ansible and other solutions that have gone in and out of fashion and also dealing regularly with Kotlin, Java, Javascript, and recently typescript, my view is that configuration files are essentially DSLs.
DSLs ought to be type safe and type checked since getting things wrong means all kinds of trouble. E.g. with cloudformation I've wasted countless hours googling for all sort of arcane weirdness that amazon people managed to come up with in terms of property names and their values. Getting that wrong means having to dig through tons of obscure errors and output. Debugging broken cloudformation templates is a great argument against how that particular system was designed. It basically requires you know everything listed ever in the vastness of its documentation hell and somehow be able to produce thousands of lines of json/yaml without making a single mistake, which is about as likely as it sounds. Don't get me started on puppet. Very pleased to not have that in my life anymore.
On a positive note, kotlin recently became a supported language for defining gradle build files in. Awesome stuff. Those used to be written in Groovy. The main difference: kotlin is statically compiled and tools like intellij can now tell you when your build file is obviously wrong and autocomplete both the standard stuff as well as any custom things you hooked up. Makes the whole thing much easier to customise and it just removes a whole lot of uncertainty around the "why doesn't this work" kind of stuff that I regularly experience with groovy based gradle files.
Not that I'm arguing using Kotlin in place of Json/yaml. But typescript seems like a sane choice. Json is actually valid javascript, which in turn is valid typescript. Add some interfaces and boom you suddenly have type safety. Now using a number instead of a boolean or string is obviously wrong. Also typescript can do multi line strings, comments, etc. and it supports embedding expressions in strings. No need to reinvent all of that and template JSON when you could just be writing type script.
I recently moved a yaml based localization file to typescript. Only took a few minutes. This resulted in zero extra verbosity (all the types are inferred) but I gained type safety. Any missing language strings are now errors that vs code will tell me about and I can now autocomplete language strings all over the code base which saves me from having to look them up and copy paste them around. So no pain, plenty of gain.
And yes, people are ahead of me and there are actually several projects out there offering typescript support for cloudformation as well.
To go with your general line of thought, see how many JS-based projects are increasingly moving towards a JS file with a default export as a config file.
Definitely. I was a part of C# in the early days, so little else would make me happier than awesome class .NET support. This'll be great for Azure folks -- who knows, PowerShell too?
We are actively working on https://github.com/pulumi/pulumi/issues/2430, which will make it easier for our small team to manage multiple languages. Once that lands, I would expect this to be high priority.
> Definitely. I was a part of C# in the early days, so little else would make me happier than awesome class .NET support. This'll be great for Azure folks -- who knows, PowerShell too?
Powershell would be great, it has nice support for building DSLs.
I know I'm in a minority, but I really dislike YAML... I recently did a lot of Ansible and boy, at the beginning, I was just struggling a lot. Syntactic whitespace kills me.
I don't like it in Python either, but for some reason, when I write Python, it's a lot easier. Maybe YAML is just a bit more complex (and Python has better IDE support..?)
Okay, I'm gonna be the asshole in the room, but how hard is it to just use consistent indentation? I can't count how many times I've heard people complain about significant whitespace in languages.
Not only is it not difficult to begin with, but every code editor and IDE will show you where there's a syntax error in your YAML. People are free to dislike YAML, even for its significant whitespace, but how does it "kill you"?
Look at this example from the article:
```
something: nothing
hello: goodbye
```
This is pure sloppiness, and anyone who has trouble carelessly adding pointless bytes to code, no matter the language, is sloppy. I don't understand why people criticize YAML and Python because "whitespace is hard".
P.S.: There's a similar configuration language called ArchieML, which is similar to YAML but doesn't have significant whitespace.
Three big things that annoy me even though I'm happily writing Python:
- "cut and paste and edit" is broken. You can't autoformat the pasted code into the right place, you have to go back and fix the whitespace. Since whitespace is semantically significant, this can introduce bugs.
- visually identical whitespace may not be textually identical whitespace. Unless you go around breaking the tab key off your colleague's keyboards you'll trip over this. Especially (again) if you paste. Occasionally seen in merges too.
- editors can no longer give you 100% correct indentation.
> - "cut and paste and edit" is broken. You can't autoformat the pasted code into the right place, you have to go back and fix the whitespace. Since whitespace is semantically significant, this can introduce bugs.
Depends on how your editor is configured / it's feature set. Which makes me wonder how editorconfig would handle this when enabled. It seems like a insignificant issue to me, you can auto-PEP8 the code before pasting it. You should probably be following PEP8 anyway (as far as spacing is concerned at least).
> - visually identical whitespace may not be textually identical whitespace. Unless you go around breaking the tab key off your colleague's keyboards you'll trip over this. Especially (again) if you paste. Occasionally seen in merges too.
I turn on show all whitespace on my editors regardless of programming language. I've been burned by Sublime Text not just figuring out the already defined whitespace ruleset for a file by what it's using and just shoving in it's own defaults. I wish all editors would base whitespace on what the file's structure looks like, if there's mixed spaces, give me a warning.
> - editors can no longer give you 100% correct indentation.
I don't understand this, it sounds like you've got your editor configured poorly or something? But it goes back to how unintuitive the nice editors can be. You can use editorconfig to define the indentation project wide, then any editor should pick it up, of course if you define PEP8 at a minimum it guarantees spacing settings.
I'm not sure if PyCharm covers a few of those cases, since I use it so seamlessly I don't usually have complaints.
I’m on the opposite end. I just had to export a JSON based AWS CodePipeline configuration and had a hell of a time trying to edit it and paste things in the right place.
I ended up converting it to yaml, making the edits and converting it back to JSON.
Before anyone asks the obvious, how do I handle deeply nested code in brackets. Simple, I don’t. When things start getting nested deeply, I use my IDE to Extract Method.
In the YAML case: It's hard if you don't have editor support and good diagnostics. Not because you're unusually sloppy, but because you make human mistakes and because you don't know the syntax. (YAML syntax is surprisingly complex and poorly documented in the pedagogic sense). Also, the edit-debug cycle is slow with Ansible or YAML-using CI systems, so this is doubly painful.
In the Python case it's much better, because people less often casually edit .py files without editor support, and because Python has good diagnostics and it's much much harder to produce syntactically correct but semantically wrong Python by whitespace mixups.
Everything is hard when you don't have editor support and good diagnostics. Don't blame YAML because you prefer to use Notepad.exe
However, missing/extra whitespace is not "hard". You would be docked points in an English paper and you should be docked points as a programmer.
So, whitespace aside... Tell me what is easier to edit without built-in syntax support: JSON, or YAML?
If we define "easy" as "how long it takes to complete a task" or "how quickly you can grok the structure of a given block of code", then YAML beats out JSON every time.
I see you restate your argument for clarity, let me try the same :)
1) YAML is a configuration file format, and it's targeting user groups and environments where people use ad hoc terminal based or os-bundled editors, such tools being nano or Notepad, and such users being sysadmins for example. 2) YAML implementations (=parsers) have poor diagnostics compared to Python, separate from the editor issue, and 3) YAML syntax is more prone than Python to parsing correctly but producing unwanted semantics when you make a mistake.
I think there is value in your English paper analogy: many/most people editing YAML files don't know YAML syntax very well compared to this scenario. If their knowledge of English was at the same level, misplaced whitespace would not be chief of their problems in a graded English paper.
It is of course a structurally valid (philosophically consistent) argument that people should not make mistakes and they should suffer when they do, but this goes generally against the consensus of configuration language usability thinking.
In my opinion no one should be using Notepad for programming work or configurations that are more than 1-2 dozen lines. Nano is about the same: It's a text editor with no inherent tooling for configuration files and syntax support.
A construction worker can't complain that nails are hard to use because they showed up to work with a baseball bat. Or that they're designed badly because they brought a soft aluminum hammer with a tiny head instead of one made with a stronger metal and large impact surface. Tooling is important. Vim and several graphical editors have syntax support. Notepad++ if you're on Windows.
> YAML syntax is more prone than Python to parsing correctly but producing unwanted semantics when you make a mistake.
If you made a mistake, you made a mistake. Why do you expect a program with a mistake to work correctly? Use tooling which prevents you from making mistakes. And the particulars of YAML semantics are orthogonal to how your editor handles it. Yes = True, No = False, etc., for better or worse, but that's got nothing to do with your editor.
> many/most people editing YAML files don't know YAML syntax very well compared to this scenario. If their knowledge of English was at the same level, misplaced whitespace would not be chief of their problems in a graded English paper.
I wholeheartedly agree. So if a programmer complained to me that they were having issues related to inconsistency with whitespace, I would be suspect of their general programming abilities and would start reading their code to determine if the problem lies deeper than just getting an extra space here or there: Incorrect tooling, linting, sloppiness, inattention to detail... All of these things get in the way of well-written software.
As for whitespace in general, and the fact that it's harder for linters to determine and highlight if a block is correctly scoped without enclosures... Python and others have this same issue.
> It is of course a structurally valid (philosophically consistent) argument that people should not make mistakes and they should suffer when they do, but this goes generally against the consensus of configuration language usability thinking.
True, and I agree. Everyone makes mistakes even with things as simple as rote data entry. This is why tooling is incredibly important.
Tightrope-walking at great heights is incredibly dangerous. Practitioners accept this danger. They typically wear harnesses to mitigate the danger of falling. Of course, some people like to live on the edge and set records involving no harnesses. If someone like Dean Potter fell while walking a tight-rope freeform with no harness and plunged to their death, their last thought wouldn't be, "Shit, I knew that tightrope was poorly designed and dangerous," it will be "Shit, I wish I'd been wearing a harness."
We can't remove our harnesses and then complain that mistakes are too frequent and costly.
Editing JSON is ok without specific format support, it just looks like any other C-like language. Editing YAML is basically impossible without specific support, your editor will almost certainly break any file you open and destroy relevant information on the process.
Mine do no such thing. The only whitespace that gets stripped in /any/ editors I have are trailing whitespace and extra whitespace before the EOF, and that's only in certain IDEs where I have consciously enabled these options. They are disabled by default.
Removing trailing whitespace should never change the logic of a file in general, but as for YAML it certainly doesn't. And editors should never remove leading whitespace... who does that?
Press tab on a line in emacs, and the whitespace will get rearranged. It's more explicit in vi, but don't bother (un)indenting blocks there either.
Just writing characters anywhere a file on the MS IDEs I've tried is enough to rearrange the line's whitespace, while the Jetbrain's I've tried are more conservative and won't break lines you haven't changed somehow.
Ok, now show me a single editor that doesn't make whitespace changes when you press tab.
I've only ever had issues with vim messing up whitespace on the line I'm typing specifically with regard to YAML, and yes that's an issue but it has nothing to do with YAML. For example, Adding another colon to a string, wrapped in whitespace or not, will often reduce indentation. That's just plain bad behavior, but it's not intended behavior.
How hard is it to use HN formatting? I can’t count how many times people screw it up.
It’s not difficult to begin with, the documentation is free, yet here I am reading your comment with broken formatting.
something: nothing
hello: goodbye
Anyone who has trouble with this is just being sloppy. No useless backticks! You might think you’re doing it right, but unless you check, maybe you’re not.
In fairness, that documentation makes the process out to be far more complicated than it actually is in reality. Plus their point about errors being difficult to debug can be equally true with other data formats too (eg some JSON parses can throw really unhelpful errors if you accidentally include a comma at the end of list)
Please consider simply believing and trusting those who tell you they hate significant whitespace and that it is a real impediment to work.
Another take, perhaps: Assigning deep semantic significance to invisible symbols is simply stupid. It is stupid to a much greater degree than wanting to be free from having to care about the amount of invisible symbols is “sloppy”.
YAML is a generic format which leaves formatting as a result of effect to you. Ansible puts rules on top of it, which makes intendation not always trivial, it is easy to have an dangling key value pair which doesn't cause an error, but has only an effect with the right intendation.
YAML is a bit bonkers in that it's a superset of JSON (all valid JSON is valid YAML), so if you don't like the whitespace sensitivity, you can write your YAML like this:
{
a: 42,
# But you can have comments!
b: "hello world",
c: "and
multi-line
strings!", # and trailing commas!
}
I wrote a pared down version of YAML because while I like the basic structure I hated the complicated bullshit like the "we also parse JSON" layered on top:
I've always liked YAML, it's always seemed pretty intuitive to me coming from Python, and I like human-readable resource files, but those are some pretty damning counterexamples.
JSON.NET has an insane default "smart deserialization" mode which checks if string values are valid ISO dates, and if so, deserializes them to DateTime. The result is that your typical unsuspecting app works fine for a long time, until the user just happens to throw data at it that has a date-like string in it somewhere - and so the app code gets a DateTime instance where it expected a string.
And depending on how exactly it was accessed, this can go two ways. The best case is that the app just gets the value via the untyped API, casts it to string, and blows up with an invalid cast - best because you actually know what went wrong.
The worst case is when the app specifically tells JSON.NET that it wants a string value (via generic type parameters), at which point it will helpfully implicitly convert the actual date value back to a string... except it can reformat it, and even helpfully adjust it from one timezone to another. Semantically it's the same date, of course, but it's not at all the same string, and sometimes that matters a lot. So this is the worst case because it's just silent data corruption.
For some mysterious reason, the author believes that this is acceptable default behavior - i.e. "it's a feature, not a bug". It's especially ironic to look at all the mentions in GitHub ticket, as various projects that rely on the library run into this issue (one of them is mine):
I find YAML to be almost unusable. IMO it's just not intuitive. If I get to choose a format for my config files I would only use TOML, it's just better (again IMO).
All these comments amuse me, because I feel the opposite. YAML has always made immediate intuitive sense to me. Meanwhile TOML feels like a terrible hack.
Also, I'm guessing I'm in a tiny minority who loves YAML but hates Python's semantic indentation...
I like YAML but it has some minor quirks and it feels overused in domains in which it simply doesn't make sense to use YAML. I can think of ansible or complex dynamic configurations that depend on external values as mentioned in the article. If simple merging of a base file + dev, staging, prod files isn't enough for the task at hand then YAML is a bad fit.
I'm the same way. I think it's a difference in my expectations of a programming language versus a hierarchical data storage format. I'm fine with (and even prefer) enforced whitespace in data formats. That makes it easier to view and edit.
In programming languages, it makes me twitch. I don't have any problem with "accepted" formatting styles (i.e. linux kernel c style), but for the language itself to enforce that for some reason feels like it's adding perpetual cognitive overhead (like whenever I use python). I don't know why; it shouldn't be any different than using a particular formatting style voluntarily in a more flexible language, but somehow, it feels different.
Agreed. From readability perspective, I started out with INI, which I ditched partially due to having no standard on the format and skipped JSON when it can't add comments, skipped YAML for not looking too intuitive, considered JSON5 but skipped for being not popular enough and landed on TOML.
Other than that, no major complaints. My editor understands YAML and shows the indentation level in the background (highlight-indentation-mode) and auto-formats files so they all have consistent indentation (prettier-mode). As a result, it is not much of a nightmare to edit, despite the fact that semantic whitespace COULD cause you a lot of problems.
Yeah, that's a little crazy. It's the classic case of in-band signalling. It never works. I wish quotes around strings were mandatory, then having 83 ways to say "true" would be OK. But when strings randomly get upgraded to other primitive types... it's a little weird.
I like looking at YAML when it doesn't use any of the insane YAML features. Even so I'm not convinced it should be used (at least not as widely as it is) for one big reason: it can be truncated almost anywhere and still be valid. This causes way more issues than you might think. JSON has no such issue - the only case I can think of where you can truncate JSON into valid JSON is if your JSON is just a number.
Tools keep using it even where it's the wrong tool for the job, so someone must like it, surely?
I'm sure it will be just like XML where it's the trendy thing for a while in the early days, then everyone stops and hates it for a while. Except XML at least has a handful of applications where it's the right tool for the job (it has a nice streaming mode), YAML doesn't even have that.
heh, I didn't even consider this interpretation of the above; sorry about that. I meant "tools like Docker keep using it", not people. But I still don't know what you're talking about; YAML has an 83 page spec that includes pointers, and uses tons of random confusing symbols. I say "maybe it's just me" in some of these posts, but I know it's not: I've watched many of my coworkers get it wrong the first time for years and then have to be corrected. A quick common example I see in CI config all the time: If I write version: 1.10, that's a number, then I decide to move to latest so I write version: 1.10.x, that's a string. Oops, we were never using version "1.10" we were using "1.1". Everything about it is implicit and bad. Now, it's easy to say "always use quoted strings", and I agree, but then why the hell does it have bare strings in the first place? That seems like an easy enough oversight or typo to make, and it will be made.
I generally dislike languages like YAML or Python where whitespace matters, and can break your code, however, YAML is way more easily human readable than JSON, so I started to appreciate it for readability purposes.
I guess YMMV, but after you've used both YAML and JSON for a while, you might appreciate YAML a little bit more.
Yep, so true. Any decent config file that will be seen/edited by a human needs comment support. And any file that will not be seen by a human could be json (or whatever).
Yeah this is a huge selling point of YAML. JSON should have comments added to the spec. The other benefit to YAML is human readability, which is usually better in YAML compared to JSON. A specific glaring example of this is when there are long string-literal snippets inside the document, in YAML this is massively more readable than in JSON.
JSON shouldn't have comments added to the spec, because people shouldn't be trying to read or write JSON. It's an application interchange language, meant to be written and read by machines. YAML is a markup language, meant to be written and read by people. Ever notice how most YAML libraries don't even have a "dump" function?
>It's an application interchange language, meant to be written and read by machines.
Not entirely true, JSON is based on Javascript objects, it was meant to be written and read by humans just like Javascript, INI or any other basic serialized data format, or text-based programming language. If JSON were truly never meant to never be viewed or edited by human beings, it would have been published as bytecode.
You can just use JSON with comments though. If you have sufficient control over the technology in question to be able to completely change it to a YAML parser, surely you can change it to be a JSON+Comment parser too. See: VS Code's config files.
My point is that if you're in sufficient control of the stack to be able to convert the whole thing over to YAML, you could just as easily convert the whole thing over to JSON+Comments. And of course bad things would happen if you treat JSON+Comments as JSON, but similar bad things would happen if you treat YAML as JSON, so I don't see your point. It's not like people are trying to send their tsconfig's on the wire as "application/json" and expecting arbitrary parsers to support it.
When you think about it, JSON lacks so much (check JSON5 for what it's lacking), it's hard to believe even a comment is not allowed when pretty much anything else allows, which is a showstopper by itself.
> YAML is way more easily human readable than JSON, so I started to appreciate it for readability purposes.
> I guess YMMV, but after you've used both YAML and JSON for a while, you might appreciate YAML a little bit more.
I've used JSON a lot, and XML and s-expressions and MessagePack and ini and YAML and a whole bunch of other formats.
I usually have to fire up Google to read YAML. YAML is the only one where I routinely have to Google for a syntax cheatsheet and wade through tables of redundancy and edge-cases.
YAML made sense before JSON became a thing. Why people persist with it in new projects is baffling to me.
a raw YAML file is readable with your eyes. A Json file needs to be prettified before you go thru it. Json is good for APIs but is not made for readibility.
We must disagree with what "readable" means then. I find JSON readable (as long as it's nicely layed out, e.g. by piping through 'jq "."'), in the sense that I can skim over the structure looking for [/]/{/}/". If I want to read some of the content, like a string, I just need to read '\"' as '"' and '\\' as '\', which is a small constant cost per (usually rare) occurrence.
With YAML it's difficult to even know the structure of what I'm looking at, due to anchors and extensions. It's also hard to discern structure from skimming, since strings can appear unquoted, and may contain unescaped lexical tokens (depending on which particular symbols it started with); hence we must carefully consider each and every character, rather than just skimming for the next token.
If I know I'm looking at a perfect YAML file, than I should be able to guess the gist of what it says, since I can make assumptions about what the syntax means. If I want to be sure, I'd be Googling for cheatsheets. Yet as a programmer, I mostly look at files when they're buggy, meaning I can't just assume that, say, an unescaped quotation mark won't terminate the string; or that a certain piece of text is allowed to run across multiple lines; or that the indentation corresponds to the nesting; etc.
I use notepad++ for YAML. Besides coloring what it thinks I'm thinking, it displays vertical lines corresponding to the indentation levels.
(I prefer INI/TOML whenever I can help it; hierarchies in TOML are so counterintuitive that it incentivizes a simple flat structure. But then, some things are irremediably hierarchical)
Writing YAML is easily the part I hate most about writing/deploying software. It's unstructured, feedback cycles tend to be slow (e.g. when deploying k8s configs into prod), and you can't possibly write something useful without documentation pulled up. It's definitely easier to implement than purpose built DSLs, but it's not a good experience.
Python has better IDE support and, maybe more importantly, python does not completely disallow any indentation style so it handles unsupported IDEs better. But it's essentially an IDE support problem.
My life improved a lot since I got an YAML mode for emacs. Now things would be just perfect if haskell's cabal migrated to dhall...
> In this case, lots of discussions show everyone is busy, has no time, and also ... increasingly they have interest in low-code/no-code type solutions. This is not open source as whole, just the IT ops vertical
Looks like he doesn’t believe the code approach is viable as much as other people are claiming in this thread.
I... No. The OpsMop Twitter has a tweet from January 31, so it seems like if the project had died it would have to be really recent. That would be sad.
YAML is useless because it replaces JSON (tree structure that is minimally verbose to not be confusing) with something worse (a tree structure that is just less verbose than JSON to be slightly confusing)
Came here to say similar. In particular dhall does allow scripting (functions etc.) but is non-Turing-complete as a feature. This seems like a particular sweet spot to me as it allows for more dynamism than data formats like json/yaml while constraining the scope sensibly.
It also has very nice bindings with haskell and nix
It allows each line to be completely independent of it’s neighbors; you can comment and/or add lines without needing to touch neighboring lines. Also, it makes it visually easy to spot missing commas. Give it a try sometime, it’s actually quite nice.
Having prefixed commas is a rather common style in the Haskell community, because it ends up nicely matching open/closing brackets/braces and lining things up.
Since the author of Dhall comes from the Haskell community, he's kept this style.
Not sure what you mean by "mismatched brace styles". The convention of putting separators like commas at the start of the following line rather than the end of the preceding line is common in Haskell, which Dhall is built with.
The advantages are:
- All of the separators are in the same column, along with the opening and closing characters. This makes it trivial to check if we've missed a separator.
- Appending new lines to the end will not affect previous lines (i.e. we don't need to go and add a comma). This avoids making mistakes and polluting diffs.
Unfortunately the error-prone diff pollution we avoid at the last line instead occurs at the first line. It's still less error-prone than trailing commas, since we can look in the separator column and either spot that it's empty, or that it contains two opening braces (depending on whether we inserted or copy/pasted).
It's a haskell thing. The main advantage is that each line is independent. You can comment out a line or add a line at the end without modifying anything else.
Each line is not independent: you cannot comment out the first line. A better approach is to allow trailing commas. (I suppose you could allow leading commas, does Haskell support this?)
Allowing trailing commas, like Python does, would be really great. Unfortunately trailing commas already mean something: (a,b,) is a function that still takes 1 argument to make a triple. It's called "TupleSections".
For the sake of this comment, let's define "templating" to be attempts to solve the problem "I need $FORMAT due to an existing constraint, but $FORMAT does not entirely meet my needs on its own" (in this article, $FORMAT is YAML). Additionally let's say that in order to be a "template" something must be a text file (e.g. exporting a database table as $FORMAT does not count as "templating" for the purposes of this comment).
I think there are three very different kinds of tools that people use for this:
1. Interpolation/preprocessor languages: This is what the author is talking about. There are delimiters/tags/sigils to distinguish "the templated parts" from "the rest" and the primary operation done by the template engine is substitution. "The rest" is literal content that's already in $FORMAT and it remains mostly/entirely unchanged during template rendering. Languages of this type are basically glorified sed. This can be nice because they're agnostic as to their embedding (any string will do) so they're very portable/flexible (you don't have to create "handlebars for YAML", "handlebars for HTML", "handlebars for CSV", etc; one implementation does it all). Languages of this kind can work in the small but don't scale well for all the reasons mentioned in the article/comments. The language doesn't know anything about the semantics of $FORMAT and that can cause all kinds of pain. Examples include golang templates, PHP, ERB, handlebars, the C preprocessor, Jinja, etc.
2. Compilers/code generators: These are "complete" languages that compile to $FORMAT. The difference between these and and interpolation/preprocessor languages is that the entire input is the language, not just specific chunks/tags. This kind of language can be nice because you have complete control and can therefore guarantee valid output and do tricks like supporting multiple different output formats for the same input, but the downside is that you're working with an entirely new language so there's a learning curve, you need specialized syntax highlighters and other tools to work with templates, etc. Examples include HAML, Jsonnet, Dhall, etc.
3. Embedded DSLs: Templates of this kind are valid $FORMAT from the beginning, but have embedded ways to specify transformations to be applied to the parsed AST. These languages are homoiconic with respect to $FORMAT. First $FORMAT is parsed, then the template engine iterates through the AST to perform evaluations, then the result can either be used as-is in memory or serialized back to (a possibly different) $FORMAT. This is sort of like an interpolation/preprocessor language with the evaluation order swapped: preprocessing is "run the template engine, then parse $FORMAT" while this is "parse $FORMAT, then run the template engine". A downside of this approach is that it is less general, e.g. it only really makes sense when $FORMAT has a well-defined structure (you probably can't template plain english sentences with this approach), but these days most "data languages" have converged towards being semantically equivalent to JSON (lists, dictionaries, and primitives) and this approach works well for any of them. An upside is that like compilers/code generators you can guarantee that the output will be valid $FORMAT no matter what the template looks like. Examples include JSON-e, Lisp macros, CloudFormation templates, etc.
It's unfortunate that all of these get called "templating languages" because they're very different beasts from one another, and usually when I see conversations about this stuff these distinctions get blurred and you end up with apples-vs-oranges comparisons. If I had my druthers we'd reserve the word "templating" for the first one and use different terminology for the others, but that ship has sailed.
If you have been around long enough you still remember the world that was excited about XML and templating it using XSLT. As a hindsight it was a horrible world.
Even though YAML is not optimal, it is a human friendly compromise between too verbose XML and machine only JSON. It lacks native templating, leading to funny constructs e.g. with Ansible files. However human kind has made progress and will make progress further, so it is just a matter of time until someone comes up with sane "native templated YAML" and all projects will adopt it.
> If you have been around long enough you still remember the world that was excited about XML and templating it using XSLT. As a hindsight it was a horrible world.
I actually really like the idea behind XSLT: machine-friendly, human-tolerable, structured data + declarative rules for turning that data into a display, or a report, or whatever else.
The execution was horrible though: incredibly verbose, lots of overcomplication due to XML weirdness/asymmetries (e.g. attributes vs elements vs text, namespaces, ...); mixtures of different languages hidden inside each other (e.g. XPath hidden in attributes); etc.
I would really like to see what this could look like if done in a more minimalist, lispy fashion (normal code-is-data stuff in Lisp is similar, but I think term-rewriting is a more appropriate evaluation mechanism for such rules)
The syntactic mistake of XSLT was writing it in XML, XPath was a redeeming feature. Imagine if XPath was also written in XML...
jq occupies the same role as XSLT, but for JSON. It can be used for templating but it's not quite as declarative as XSLT (you must pipe things through).
> The syntactic mistake of XSLT was writing it in XML, XPath was a redeeming feature. Imagine if XPath was also written in XML...
Yes, I didn't mean to imply that XPath itself is bad (although it also has to handle XML quirks like element/attribute/text, etc.).
Rather that the reason to write XSLT as XML in the first place is that it's machine readable, we can mix and match elements from different vocabularies, etc. yet most of the heavy lifting ends up as opaque string attributes :(
PS: I've done a few projects which make heavy use of jq; it's really nice, but as you say it's more of a pipeline.
XQuery was halfway between XSLT and XPath in expressiveness - functions, loops, queries with joins etc, but no pattern matching. If it only had the latter, it'd be perfect.
It is more concise. Similarly, I was a fan of using attributes instead of text elements (with their unnecessary close tags), but eventually was won over by neatness. e.g. translating an eg from
https://www.gnu.org/software/guile/manual/html_node/SXML.htm...
XSLT was one in a litany of domain specific languages (ant, apache rewrite rules, latex macros, etc.) that evolved towards turing completeness because that's what the problem space demanded.
In most if not all of these cases an existing and well designed turing complete programming language would likely have better served them.
I don't see a reason that a DSL can't be Turing complete and still a better option than an existing language. If you look at old-school make files, they are little more than shell script with top level rules that you can invoke from the command line. You could theoretically just use shell script but the make scripts still simplify the task quite a bit.
Less power means less to go wrong. Automatic checks can also be deeper in a simpler system.
Your point is valid though, the power seems to end up being needed, sometimes, in some parts, in some cases. Escaping to a full language when needed seems to retain the benefits of both worlds.
There was a post on HN a few weeks back to the effect that it's rather easy for Turing completeness to emerge accidentally. I wish I remembered more specifics so I could find it again.
I find XSLT intolerable in practice. Thankfully I've only had to touch it a handful of times. I agree the idea behind it is neat but boy is it a headache.
I kinda miss it, actually. XML had many warts, but at least everybody spoke it, and it was the same everywhere. Occasionally you still had some overlapping but different things, like XSD and RELAX NG schemas (though even there, there was a big difference - one is a language for describing data types, and the other is a language for describing grammars). But it's better than several dialects of JSON, YAML, TOML etc.
I also rather liked thorough extensibility. Namespaces were the right idea, despite clunky syntax. Today you can see Clojure doing something similar in Spec.
And while we're on the subject of XML, XSLT and Clojure; I feel like this is the best solution for readable serialization of tree-like data, and an associated ecosystem of tools (to validate, transform etc). Note some nice features for humans, like the ability to comment out a specific node, in addition to the usual line-oriented comments.
I saw this title and immediately knew the article would be about Helm. I don't think anyone wants to use Helm. People use it for a set-and-forget thing that they don't care about (who cares that it's called impressive-leopard-kubernetes-dashboard, after all.)
It is actually a little bit too magical for my taste, but I continue to use it because it hasn't done anything stupid. I have one file that maps logical names to images in a container repository. If I create a service called "foo" pointing to selector.app.label="foo" in the base, then in production it's called foo-prd and the label magically updates to foo-prd for the selector. It actually understands what it's generating, and while they might have taken it a little bit too far, it's far better than just dumb text replacement.
I’m in agreement; it seems lots of projects use helm charts for hello world / standard deployment demos, but considerably fewer run Helm charts exclusively in production clusters.
Helm 2.0 introduced package as a first-class concept for Kubernetes and created the standard to distribute applications, thanks to Helm thousands of people could discover and collaborate on cloud-native deployments of the open source software https://github.com/helm/charts/tree/master/stable published and managed by organizations and contributors all over the world.
Helm 3.0 keeps innovating, it adopts the most forward-thinking approach to package management and Kubernetes config management by using higher level domain specific language based on Lua to create expressive package management system:
I don't really trust Helm to do anything that's actually useful in the long term. It will get something running very quickly, but whether or not it's maintainable, I am yet to be sure of. For example, very early on, I installed the helm chart for prometheus. Now I want it to live in the kube-system namespace because I am tired of seeing its resources in the default namespace. For some reason, I highly doubt that changing values.yaml to change the namespace is going to do anything other than give me a fresh instance of prometheus running in another namespace. It's not going to use the already allocated storage volume to satisfy the persistent volume claim in the new namespace. It's not going to update the other stuff in my cluster to refer to prometheus-pushgateway.kube-system.svc.cluster.local. It's not going to update my Grafana dashboards to refer to the new namespace, even though I installed Grafana with Helm! So what did I really gain? Helm isn't giving me the ability to manage the long-term lifecycle of third-party software. It just explodes some API objects all over my cluster and lets me delete most of them automatically. That's all it does.
I get why Helm is popular. You can get some piece of software running in Kubernetes with minimal effort. I would have never successfully made some random complex piece of software work correctly in Kubernetes on day 1, especially using something that assumes you deeply understand the core API objects like kustomize does. What that boils down to is that Helm doesn't go far enough, and in its current state, just encourages people to make mistakes early.
As others in this thread have said: I ask this question all the time, except s/templating/using/.
YAML is insanely over complicated; it's as bad or worse than XML for config files, and it doesn't even have the nice streaming mode.
Not to mention that it's a bit of a security nightmare (seriously, who put pointers into the YAML spec?).
And, on a more subjective note, YAML is just confusing: between all the significant whitespace and the random single character symbols that no one ever remembers what they do, I never get a YAML document right on the first try.
Templating it really does add a whole new level of headache too.
XML works very well for config files. It's schema-optional (but is there), well-specified, human-readable, has plethora of supporting technologies (making things like templating easy), and is well supported by every language.
XML is not tedious at all with the right tooling. For example a tool like Visual Studio IntelliSense proposes only elements and attributes valid in the context, automatically close tag, format the file and complete opening tags too so it makes editing XML file a breath.
notepad is a terrible editor for anything other than simple property files. For example, editing JSON that has any sort of complexity would be just as painful.
Not really. For quick authoring, configs should have pre-authored snippets for common things, that are commented out, and have adjacent descriptive comments ("Uncomment the following to ...") - this is regardless of their syntax.
And for complicated stuff, you're going to spend a lot more time reading the manual than you will actually typing those closing tags. In fact, in most cases you'd be copy/pasting bits from the manual as well.
Codelite's xml based project files can be easily read and modified by hand. Diffing them yields useful information about files added and moved, config values changed, etc.
Eclipses project files also written in xml are an Eldritch Horror.
I think the failing of xml is also it's strength. It doesn't do typing and schemas, doesn't even try. Which means that can be sane. Or not.
You changed the model when you adapted the xml to lisp. You decided that some tags are unnecessary, dropped some attributes and assumed others are merely different types of child nodes - and now your sample doesn't actually have the same semantic meaning as the XML example. You also removed some comments. Was all this done to emphasis how much cleaner a lisp alternative would be? If we're playing this game, you can actually simplify the XML configuration file as well. If you attempted to capture everything that the XML does, it would make your lisp sample much more ugly.
Anyway, to each his own, but I think XML holds up very well and I do find it more readable and easier to work with that your lisp example.
I also never said XML is the best configuration format. For simple configurations a simple property file is by far the best option. For anything complicated (as in your example) XML does a great job. To contrast, JSON would fall flat on its face with this. Not to mention the fact XML parsing is typically part of the standard library of most programming language and most people are familiar with it.
> You changed the model when you adapted the xml to lisp.
That was a conscious decision, because the verbosity of XML prevents clear understanding of a data model, while the cleanness of S-expressions enables a clarity of vision which enables prudent judgement when laying out a data structure.
> You also removed some comments.
Yes, because they were akin to:
// Add 1 & 2, assign to X
x = 1 + 2
If you really want an S-expression version of the XML in that example, here is SXML[0]:
(*top*
(fof
(*comment* " Common settings ")
(common
(*comment* " Container configuration ")
(container
(option (@ (name "componentNamespace")) "MyCompany\\MyApplication"))
(*comment* " Dispatcher configuration ")
(dispatcher
(option (@ (name "defaultView")) "items"))
(*comment* " Transparent authentication configuration ")
(authentication
(option (@ (name "totpKey")) "ABCD123456")
(option (@ (name "authenticationMethods"))
"HTTPBasicAuth_TOTP,QueryString_TOTP"))
(*comment* " Model configuration. One tag for each Model. ")
(model (@ (name "orders"))
(*comment* " Model configuration ")
(config
(option (@ (name "tbl")) "#__fakeapp_orders"))
(*comment* " Field aliasing. One tag per aliased field ")
(field (@ (name "enabled")) "published")
(*comment* " Relation setup. One tag per relation ")
(relation (@ (type "hasMany") (name "items")))
(relation
(@ (type "belongsToMany") (name "transactions")
(localKey "foobar_order_id") (foreignKey "foobar_transaction_id")
(pivotLocalKey "foobar_order_id")
(pivotForeignKey "foobar_transaction_id")
(pivotTable "#__foobar_orders_transactions")))
(relation
(@ (type "belongsTo") (name "client")
(foreignModelClass "Users@com_fakeapp")))
(*comment*
" Behaviour setup. Use merge=\"1\" to merge with already defined behaviours. ")
(behaviors (@ (merge "1")) "foo,bar,baz"))
(*comment* " Controller, View and Toolbar setup. One tag per view. ")
(view (@ (name "item"))
(*comment* " Controller task aliasing ")
(taskmap
(task (@ (name "list")) "browse"))
(*comment* " Controller ACL mapping ")
(acl
(task (@ (name "dosomething")))
(task (@ (name "somethingelse")) "core.manage"))
(*comment* " Controller and View options ")
(config
(option (@ (name "autoRouting")) "3"))
(*comment* " Toolbar configuration ")
(toolbar (@ (title "COM_FOOBAR_TOOLBAR_ITEM") (task "edit"))
(button (@ (type "save")))
(button (@ (type "saveclose")))
(button (@ (type "savenew")))
(button (@ (type "cancel"))))))
(*comment* " Component backend options ")
(backend
(*comment* " The same options as Common Settings apply here, too "))
(*comment* " Component frontend options ")
(frontend
(*comment* " The same options as Common Settings apply here, too "))))
Which I think is still indubitably and inarguably clearer & cleaner than the XML version.
Technically, the XML spec requires whitespace preservation, so really it’s this:
But I think that rather proves my point: XML obscures that which should be obvious.
(and apologies for these terribly vertical posts — I think that they go a long way towards demonstrating the need for a compact information representation).
I end up hitting bare string implicit casting problems constantly. I also end up catching them in code review when codeworkers do it constantly and yet I still end up doing it too. This might be the best example of why YAML is overengineered garbage (that and the fact that the spec is 83 pages long and has pointers… WTF?).
Unfortunately, this isn't practical. For any of my own tools I will never use YAML, but I don't just use my own tools, and reinventing the wheel just to not use YAML has its own problems which are (in some cases) worse.
That's a nice thought, but it comes with its own problems. For example, XMPP uses a sane subset of XML, which is nice, except that people throw full XML parsers at it (because why wouldn't you? Your XML library parses XML and limiting that is more work for the developer to do) and then end up with vulnerabilities they don't know about like entity expansion DOS's or system directive stuff (and YAML has lots of tricky behavior that can be abused too like pointers).
Using a subset creates more work for the developer, so many just won't bother (if they even know that it's using a subset and they have to do more work at all), which leads to issues.
We can agree, but that doesn't mean that others won't do it anyways; unless you operate in a silo, it's not likely that you're the only one writing software to use your system.
Standards get written and implemented by lots of people, and even tooling like Docker gets alternative implementations.
That might solve one problem but fighting your tools and occasionally hating the complex giant messes we've engineered is a fact of life for any programmer. I absolutely love programming, just not always the process.
We get better and better tools each year but it still seems unavoidable. We're ultimately building incredibly complex systems with each layer using multiple development approaches, style choices, language choices, degrees of quality/time investment by the creator, etc.
TLDR: you can't help bang your head against the wall in any real-world day-to-day programming
Let's agree to disagree here. No human should ever write XML. No human should ever be forced to read it.
YAML is very readable and writable if you stay away from the corners. Templating allows you to stay clear of the corners (the 1 char operators that concatenate stuff, b64 stuff and so on).
File-based configs are a troublesome abstraction: they package unrelated concerns into a rigid document whose form must take a particular, application-dependent shape, and the assembly and disassembly of that document essentially becomes an API where key-value pairs are mixed with complex glue code. The application has to do this internally, but anyone who's generating their configs are also doing parts of this externally.
Templates try to bandage over that by drilling down the abstraction to key-value pairs themselves. And imperative constructs that sneak into templating languages are an artifact of wanting to gain expressiveness without losing the benefits of declarative form -- but really, the two are at odds.
YAML is a red herring -- we had the same headaches with XML a decade prior. The problem is always that there's relationships among the data (or even multiple instances of the config) that we care about, but that the structure of a single config file at rest cannot model.
Databases -- let's say, an SQL one -- are actually among the better solutions, because they allow the universe of config items to live in structured places without overspecifying the exact form the data must take when serialized into a file. Then, data can be normalized where it makes sense to avoid repetition and introduce propagation. An SQL database gives all the tools needed to accomplish this, using mostly declarative code.
Databases in a KV sense are often used for configuration, and SQLite's rise has increased richly structured configs that are specified at a higher level than what's typically done with other serialization formats, but the full approach has not caught on outside big enterprise systems and complex applications. Which is a shame, because it's hardly more complex than the current awkward pairing of a full serializer and a templating engine.
I feel this article is missing the bigger problem - one that for some reason just cannot die.
The problem is that of gluing strings together. YAML is not an unstructured text file, it's a tree notation. Whatever "templating" or "generation" mechanism you want to use, it needs to respect the tree nature of the language it operates on. It needs to respect semantics.
Gluing strings together is literally what causes SQL Injection to exist. It caused countless of defacements on the web, and countless of broken websites. I would think we've learned our lessons, but for some reason, I see these template languages still alive and kicking.
The article goes on to talk about Jsonnet, which takes the exact approach you describe - it generates JSON by aiming to be a "templated JSON" where the templating involves generating semantic objects, not strings.
Here's an example (adapted from some real-world code) where I specify the k8s cpu limit in one place, and then look up that info in several other places to avoid needing to change multiple values later:
Note how I can patch the container.requests object with an alternate memory limit, and how I can calculate an expression for the NUM_THREADS value in order to automatically set it to ceil() of the requested cpu.
It’s DRY run amok. People don’t want to see the same bit of anything in two places, forgetting that code will be read, so they remove a “redundancy” but create duplicated effort for everyone, every time they have to decipher the thing later.
I also don’t think we have a workable definition of “configuration”. 70% of the config at my work is hard coded service discovery. If we moved the service discovery anywhere else (say, consul, kubernetes, hell - Docker swarm), we’d need far fewer sets of config than we have deployment environments. When there are only two or three you don’t need templating.
How often do you really have the same service deployed twice in prod and legitimately want it to work differently? I can count the scenarios I know on one hand and none of them have occurred for me in almost ten years, except read replicas and that shouldn’t be more than a few lines of config.
There have been times I've wanted to templatize my configuration but I don't want to do it with text-based templates but templates within the configuration files syntax (be it yaml, toml, or something else). Not sure what this is called, I've been calling it "structural templating".
This looks really nice. I'll have to give this a try at some point to see how I feel about it vs Azure Pipelines. From my quick look, this looks more general purpose at the cost of more verbosity.
Dhall-lang ( https://dhall-lang.org ) is another, somewhat interesting, attempt to solve this program: it comes with a non-Turing complete programming language, so you can bring some abstraction to your configuration files without having to worry about things like infinite loops.
JSON requires constant quoting, can't support multiline strings, has no comments, has no/little typing (e.g., no datetime type). It's not good if a human needs to encode data.
For configs, I think TOML beats YAML hands down; I think YAML's spot is at encoding data structures that humans need to read/write.
I do agree that YAML, the spec, is fairly complicated. But YAML, as used in most projects, by most people, is not, and can be picked up fairly quickly. It is easier to visually read as it removes much of the clutter that would exist in the comparable JSON. It isn't typically necessary to know the entirety of the YAML spec to be useful with YAML, and most of the parts you won't know will get introduced by an obvious-looking sigil, which can be used to figure out what you're dealing with.
When I've actually sat with folks struggling with YAML, it's almost always in configuration tools, and it's also always around the templating bits. Ansible, in particular, has a bizarre templating: it happens after YAML parsing, which is not the mental model most people use when approaching it. I've also found that most of the people I've spoken to intertwine YAML and Ansible's templating functions, thinking they're one in the same.
I do not think Ansible makes good use of YAML: I would rather write task files in an actual programming language, since they are — at their core — a program. (The tasks do have some metadata attached to them, but the core task itself is a program. A function in some real language can get metadata attached to it in a number of ways, and that would be a better solution.)
If you compare a medium sized, 3 level deep TOML document with an equivalent YAML document you'll find that the TOML document is up to 50% longer.
None of that additional verbosity adds meaning - or readability.
Part of this is the extra syntactic noise but most of it is because all of the key names have to be defined explicitly above the key values for each value whereas in YAML you just need to put the values below an indent.
TOML is tolerable for small, very lightly nested config files but even medium sized TOML configurations get ugly very fast.
It's worth giving Ansible some credit here. Alternatives like Chef, can and do contain arbitrary Ruby logic which can make Chef "cookbooks" horrible to figure out and debug.
So while Ansible may not be great, other solutions also have drawbacks, and it isn't quite as easy as do <x>.
I've worked with both. The simple items are nicer in ansible but when there are a lot of custom roles and functions ansible isn't much better than chef from what I've seen in practice.
I really like TOML as a compromise between "looks pretty" and "parses well".
That being said, I find it funny that the author is crying afoul, saying that YAML isn't as well suited to be templated as JSON is, when XML had schemas sorted out ages ago.
People ran away from XML, saying it was this verbose, ugly, bandwidth hungry (back when we were sending XML over AJAX) behemoth (which it totally is)... but I think when your usecase is complicated enough to worry about templating, you should take a hard look at XML and ask whether it might suit your better.
Baffles me. I don't like any language or file format where whitespace matters. Even Haskell bothers me in this regard. To me white space shouldn't add cognitive load. I want to look at the symbols not the formatting of the antisymbols to understand what is going on.
Write JSON and use your editor tools to format it with nice indentation, and you are sweet!
That said Yaml makes an excellent format for reading, but not for writing.
I personally love Python where whitespace matters a lot too. But I regularly find myself fighting with the whitespace in yaml for some reason.
To me yaml (especially with templating) seems like a very contrived way of avoiding having to program, while still effectively programming. I much prefer json, or if more intelligence is required, actual javascript objects.
The only big downside of json where yaml does shine is the support for comments within your files.
I have the same experience. I just think think it is possible to design easy-to-write indentation-sensitive formats but YAML is not. For example is always baffles me that
a:
- b
- c
has a list inside an object but the list is not further indented. There's in fact a hierarchy relationship but absolutely no indentation.
I think its because python has very strict and very simple indentation rules (a nested block must be indented, and the file requires consistent indentation. Is there anything else?)
YAML gives you options, or varies necessity somewhat arbitrarily on the structures, which is (marginally) good for reading, but a lot of headache in writing
I never quite understood this attitude. Not treating whitespace as meaningful means it's necessary to have a longer, syntactically noisier file.
That's fine if you're primarily concerned with computers exchanging data (JSON) but where readability and writability matters, extra syntactic weight is a headache.
I can't imagine writing JSON by hand but I write plenty of YAML by hand every day without issues.
Because deleting a tab could entirely change the meaning of the file, breaking everything, with no compile time errors. IDE tooling or linting might go somewhat to make me more comfortable with YAML.
Ok: I don't like any language of file format where the regex s/\s+/\s/g would change the meaning of the program except in parts of the program within string delimiters, except for the parts of string delimiters which contain literal code e.g. ins ES6 `Hello ${world}`.
OK, but I'd restrict the set even more to exclude voluntary indentation. I mean, you'd want that caught by style checkers anyway, even if the language didn't require it.
So we're left with the cases where you'd like compact code / one-liners but are forced to use multiple lines for language syntax reasons.
I'd say that that's a pretty small set of cases and long way from the phrase "I don't like whitespace sensitive languages" which you (and many others!) use.
I conceded it can be annoying in Python and I do wish there was an escape route sometimes.
Experienced programmers use decent editors to indent their code anyway but at least you don't see codes with indentation going left and right at random locations with Python.
The best thing that ever happened to Cloud Formation was the new YAML syntax.
Writing YAML is easy with a good editor like VSCode. Install the YAML code outline extension for sugar; works great for OpenAPI specs. YAML flow style offers some good options for keeping the file compact.
I'm all for good editor support, but you shouldn't need it to write a simple document. Complicated editors should be supporting tools at best, because you won't always be in a position to use one. The things we write should be simple enough to be written and understood by hand with minimal mistakes (then we can make it even easier in some cases using nice editors and tooling).
I don't think I've ever used anything under than vanilla vim for editing YAML files... with next to zero issues.
The problem for me is that the multiple ways to do the same thing result in something that's pretty opaque to clear specifications that aren't just very simple examples / structures.
Writing YAML is not easy for end-users. The indentation--especially in large files--is really difficult to keep track of, and some really basic stuff like when to quote strings is not consistent at all.
Nope, indentation is not a problem with indentation guides. A misplaced comma in a JSON file however kills it, gets me every time I have to write it by hand.
Also, it is true YAML has too many features, though I find they are typically ignored or disabled.
Syntax highlighting doesn't count commas. The problem is JSON doesn't allow trailing commas, so adding or reordering frequently results in a difficult to diagnose error.
The languages we're comparing here are JSON and YAML. The latter does not require quoted strings, except in ambiguous cases. (And even then, actually, it technically isn't required, though it is usually the easiest thing to do.) The absence of these quotes makes the syntax get out of the way of the reader, and makes YAML comparatively easier to read.
> comma requirements inane is an opinion, not an objective fact.
I boiled it down to a simple statement, but disallowing a comma at the end of repeated grammars means that adding something to that line causes unnecessary noise in the diff. For example, say I add a single item to the end of a JSON list. The diff I would love to see is:
[
a,
b,
+ c,
]
which makes it rather straight-forward to the code reviewer / reader of the diff that we're simply adding a single item c. But JSON's grammar forces this diff:
[
a,
- b
+ b,
+ c,
]
which makes it harder to see what the semantic change is, because simple syntactic changes are now clouding the picture. Also, I find that people's mental model of the task ("add item to end of list") causes them to forget that they need to add a comma to the item above it, resulting in syntax errors down the road.
(Better diff tooling can help here, but often I find we have to work with the most primitive of tooling; if the grammar can lend itself towards such simple tooling, s.t. that tooling is more effective despite its simplicity, why not? And here, we can: grammatically, whether the list ends or does not end in a comma is rather meaningless, and JSON (and for a long time, JS) were pretty alone in this opinion. Most other languages allow that trailing comma. The only other one I can think of is SQL.)
> You're not saying anything about the readability of one versus another, you're just listing gripes you have about JSON.
The gripes I have about JSON don't apply to YAML. Hence, that's why I prefer using YAML, which was the original question.
> The latter does not require quoted strings, except in ambiguous cases.
I believe it is the right choice to always quote strings, there should be various quoting formats as no single one is perfect (normal strings with reasonable escapes,raw string, multiline-with-indentation, multiline raw strings...) and you should be able to use unquoted keys (TOML has a good balance on this problem), but unquoted strings as a default are unnecessary and problematic.
No comments is... fair. That does put a limit on the legibility of JSON files, unless you want to count dumb hacks like comment strings.
Having to quote strings in JSON, however is still simpler than the multitude of ways strings can be declared in YAML. You know a string is a string in JSON because of the quotes... knowing whether something is a string or not in context is more difficult in YAML because of the more complex syntax.
You can learn the entire syntax of JSON in minutes. Objects, arrays, string keys, and a few primitive types... that's it. How long would it take to learn all of YAML? How explicit is its syntax versus JSON? Of course JSON is far simpler, and being simpler, it's easier to read.
I dismissed "inane comma requirements" as an opinion, not everything the commenter said. The only reason it's "inane" is because the commenter doesn't like it personally.
Not having multiline strings, to me, doesn't affect readability much at all, although it is unfortunate. Turn on text wrapping in your editor, it's the same thing.
Typing support doesn't affect readability either. I'd like to have a date type in JSON too but
Not when you take into account all the clutter in JSON: braces, brackets, quotes. Those are extra things to type and extra things to have to filter out when you read.
I think it is partly momentum - at this point most CI systems use YAML in spite of its insanity - even Azure Pipelines!
And partly it's because the syntax for multiline string literals is very minimal, which is kind of a nice feature for CI since you tend to have a lot of them.
It's still insane though. TOML is much more reasonable.
I've taken the strategy of emitting JSON, but accepting YAML input (maybe a restricted subset in cases where it's untrusted data). YAML can function as a super-set of JSON, so you can have comments in hand-written/modified data this way while emitting a simpler to parse data format.
Because json is poorly suited for config files, and it is very convenient to have config files that closely mirror apis, like kubernetes does. I've seen too many teams who try to use json for config end up developing yet another custom superset of json to allow comments or multiline strings or such. If you are going to use a superset of json might as well use an existing one like yaml or json5.
That said I agree with all the criticisms of templating yaml. We have to do the same (with helm and other tools), and I have pushed hard to adopt conventions that we only use flow-style and not block-style to avoid all the white space problems when splicing together chucks of yaml. And on the plus side we get trailing commas and other such niceties which don't exist in json and make it harder template.
LOL we are doing the same with k8s. We deploy to any environment, where each environment is a different k8s namespace. We even have a namespace for each developer. The variables are things like the image name (the tag is effectively a git commit id, or sometimes different for not-committed yet stuff). Beyond that just about nothing really needs to be a variable, but we do have different RAM amounts.
We have a couple ways that we template this out, but mostly we literally just do this in bash:
(Where CI_COMMIT_SHA comes from gitlab)
ENV comes from our gitlab CI file.
That all being said, the extent of our k8s integration is lots of stuff like that. We could write a JS file that creates a JSON k8s template, but honestly, that would be more work and more learning than we had to for what we're doing. Why would we do more just because we want to avoid templating in a YAML file?
I think they are missing the real selling point of JSON. It's basically interoperable with JavaScript objects.
That means you write it, send it, store it, operate on it, etc. with little or no modification.
The author says "converting between the two is trivial" which may be true, but the developer overhead is less trivial. And it will always be JSON in the client - JS doesn't support YAML objects.
Is that really the selling point? I interact with numerous services that speak JSON...and most of them aren't written in JS. YAML and JSON both have to be parsed to be used, even by JS. Otherwise it's just a string.
To me the real selling point in JSON is the dead-simplicity for humans to read and CHANGE. YAML is human-READABLE, but frankly I often screw up changing it because the formatting is a little too magical and I'm a little too unfamiliar. JSON is downright picky and obnoxious...but that makes it really easy to make a change. Screw up the quotes? It'll complain about the quote. Dangling comma? It will complain. Did I cut-and-paste from an HTML display that screwed up my whitespace by compressing everything down to a single space? Nothing cares.
The difference, at least for Vue single file template components is that it's 3 separate areas in one file:
- <template /> (Templated HTML)
- <script /> (JS OR Typescript)
- <style /> (CSS OR SASS/etc)
Whereas in old PHP files you could mix it in anywhere and the files were a big mess. Including inline SQL into your view templates which is hardly a good separation of concerns. While a Vue component can be separated into separate files, at least as one it all represents one isolated piece of the interface.
Off-topic: The whole discussion is about deploy-time configuration management, but our problem is more about run-time configuration management.
We have done the classical memcached+database custom solution, but I was wondering if there is any accepted library/tool to change application run-time behavior. We have tried consul KV store [1], but does not quite fit in our environment.
My ideal solution would be a webapp with some text editor (think codemirror). Changes in this text file would push the configuration data to a running application.
It makes me throw up a little in my mouth every time I see hundreds of lines of YAML to configure something like Traefik with Kubernetes. The worst is when people say they prefer that because "I don't have to write a config file for my backend". That's true but instead now you have extremely verbose configuration mixed in with other verbose configuration.
But in YAML's defense I think it's more of a problem with the tools that use it more so than YAML itself. Ansible is a great example of how amazing YAML can be to manage complex configuration in a concise way.
I agree that simple YAML can be nice as a quick and clean tool, but Ansible is an example of everything wrong with how YAML is used. Layering program flow constructs like loops, variables, templates, references etc. are exactly the sort of abuses that make YAML feel awkward.
YAML is a data stream, not a program. Please do not shove programs into data.
Your data does not need to be "expressive", it just needs to provide input to a program. If your data files need to be complex, you need a program to generate them for you.
I've danced the dance of ini -> json -> yaml -> weird hybrid -> embedded logic, and it ends with "program that asks for what the thing you want looks like and generates data files". Industrial software design figured this out ages ago.
And you end up with program compiled to a config anyway, so with a proper toolchain it means your real config is the program.
People keep increasing complexity unintentionally precisely because they don't realize that code = data. There's no real distinction. Code is data is code.
You will end up having Turing completeness somewhere, it's just a matter of choosing (or blindly selecting, like most people do) where. For a popular product, it eventually gets embedded in the configuration language, turning it into half-assed programming language (see most web-related templating). For less popular products / more enterprise'y settings, you can probably get away with embedding the Turing-complete part in your bureaucracy. That is, I can't code my config to make it do what I want, but I can pay you to get developers to write some code and export it to the config language as a keyword. There's a spectrum to this, and tradeoffs galore.
But ultimately, YAML is nothing but a tree notation. Tree notation is enough to represent high-level programming languages. Lisp without parenthesis, if you might, or Python, if you squint your eyes.
"Data" to me is the least complex input from a human, whereas "code" is more complex and necessitates a lot more work to make sure it's correct and bug-free.
If you embed "code" in "data", you made your thing way more complex and subject to software design patterns. But in software operation, we already have to contend with highly complex systems, so we want to remove as much time and effort and complexity as possible from the instrumentation.
To put it another way: if you had to run a nuclear reactor, do you want to instrument it by constantly writing new code, or turning a dial? I'd rather turn a dial. That means I have to develop the code for that dial ahead of time, but in the end, actually using it will be safer.
* all configuration may be generated from any other format imaginable, but it's sure as fuck going into the Big Main Godlike Application as XML.
Separation, interfaces, etc. Disclaimer: I work in .NET almost exclusively. The .NET configuration APIs generally work, as long as you only ever use them for reading; treating config as something the application itself can fiddle with is a fast route to madness.
I feel like Ansible generating YAML with Jinja templates is really a sweet-spot, with idempotency and reusability.
I find it pushes me to write plain YAML files for variables and defaults (Ansible), while allowing strong templating of generated files (Jinja) and letting the result be readable (YAML). By readable I mean minimum programming bloat (code spread on many lines just to write a for loop) and minimum extra syntax that clutters the screen (brackets and quotes). It also lets me write very little custom code (aside from variables, obviously).
If I had to use a more "powerful" YAML-like replacement, it would mix all these into files, written differently by people with different styles, and it would have bloat all over the place.
The main issue I have with helm is that values.yml is not templatable by default so you have to generate it if you want reusability.
YAML does one thing and does it well, it's readable and bloat-free. Maybe we need more tools like "kubectl explain" to know the syntax though.
Reading the commentary on whitespace, my thoughts immediately jumped to the C preprocessor. Even though it's built into the language, it has the same sorts of problems as a template engine has with generating YAML: the preprocessor just wasn't enough aware of the syntactic structure of the language to make it easy to generate anything of significant complexity.
I'm not proud of this (and like to think I could come up with something better these days), but this code was a bit of a nightmare for that reason:
Lisp macros do better, but they have the problem that the macros (and their potentially unusual evaluation rules) can easily just blend in with ordinary function calls.
Helm charts are great, and accomplishes the task, but things get really weird and complex if you're not careful. Oh man, the indentation and _helper.tpl nightmares. It's not obvious at all. And there's still a lot of repetition for each chart; writing a deployment or whatever is so verbose in K8s.
On the other hand, Ansible uses yaml and there it works great. I feel like Ansible uses yaml in a way that's easier to understand and the way it was meant to be written. With Ansible, you're writing configuration, not templates of configuration. I don't think a layer on top of Ansible like Helm is for K8s wouldn't make sense.
In contrast to other template systems, Emrichen templates are not just "based on YAML", they _are_ YAML. YAML tags like "!Var varname" are used to perform things like variable substitution, loops etc. Variables can be of any JSON type, not just strings, and the template is evaluated top–down.
Over the last week I created a tool processing YAML and JSON files using Jsonnet. It's called [ycat](https://github.com/alxarch/ycat) and is inspired by `jq` but uses Jsonnet for processing. It can also be used just as `cat` to concatenate JSON/YAML files. It's still young but very usefull, especially for handling complex kubernetes configurations.
Wholly agree with this which is why I've been experimenting with a project similar in goals to jsonnet as a side project of my own. I didn't know about jsonnet when I started or I'd probably have just used/contributed to that instead.
Why template yaml when you could just generate it? Or generate json, or toml, or xml, or environment variables...
Jsonnet is nice, but not actively developed, unfortunately. I highly recommend you to support UCG [0], which is written in Rust (Jsonnet has two implementations, which is one source of the slowness).
His main argument is that you need to template differently for different environments (dev/stage/prod) and cloud regions (us-west/us-east/emea/apac).
This is actually a solved problem, and you shouldn't be doing it in your YAML/JSON templates. You should be using an external parameter store to do this, and using a single template for everything.
This is a simple use case: I want to deploy the latest AMI (Amazon Machine Image) in any region, so I always get the latest patched Linux base image to run my application on. I don't (and shouldn't) want to update my YAML/JSON every time a new image is published.
So, why are people having to go to these crazy templating macro lengths? Just store the changing bits in an external config/parameter store like etcd and let your infrastructure as code templates remain unchanged.
I think his main argument is why are we using a text based templating when we can go up one step further and have language based templating.
like why have:
{"foo": "<%= bar %>"}
when you could just have
{"foo": bar}
the problem with text based templating is the templating language has to make a decision about escaping and it is sometimes the wrong one. for example rebar used an erlang haml [at some point... maybe they fixed it :)] which meant it escaped html special characters by default. but this makes almost zero sense when generating erlang configuration files.
i guess the reason that it is like this is because it is just easy and for most YAML/JSON/etc configuration files it is not a problem because you are basically doing static substitution or 'dynamic' substitution but with a safe range of characters. so the reason things are 'bad' is because the current solution works for 99% of the use cases and no-one wants to spend time fixing it when they could spend that time fixing a real problem. heh :/
Based on the appearance of Helm, this is probably referring to Kubernetes, which really does have per-environment things that can't be represented in any form other than namespaces. For example, you have an app running (that's a deployment in k8speak). You want network traffic to get to this app, so you set up a load balancer. The load balancer config needs to know the name of your app (or more precisely, a selector for "pods" that are created by the deployment), which will change between environments, so now there is that common variable that has to be updated in both places. That's a simple example but is why people are templating their k8s configs. Yes, you could work around it by giving every environment its own namespace and using the same set of objects in every environment (differing only by namespace, which should probably be in the file... but since you can't edit the files, you can pass it in to kubectl probably), but there are other cases where even that doesn't work.
You do need some way of saying "this is the base configuration and this is what we change for staging and production". Helm is a way to do that, and a popular one, but it's pretty ugly. Hence this article.
That gives you the latest, fully patched base OS image no matter which of 19 regions you launch it in.
Even hardcoding that in a K/V store is going to get outdated unless you manually update it. Parameters like this are great because you can simply write your code once and never have to update unless you're adding new functionality. All base parameters and external systems (APIs, etc) are parameterized and never need to get updated, except by your SaaS partners that update them for you.
For anyone using EC2 Parameter Store, a free service in AWS:
> Parameters:
> AMI:
> Type: AWS::SSM::Parameter::Value<String>
> Default: /aws/service/ecs/optimized-ami/amazon-linux/recommended/image_id
Gives you the latest, fully patched Linux OS in any of 19 regions that you launch it in. Free K/V store that works really well for apps and custom parameters.
YAML is the bastard offspring of XML. A bunch of ways to write semantically identical stuff is bullshit. Let data files be data files, and build them using languages actually suited for the job. JSON is plenty complex for 105% of the use cases of YAML, with much fewer downsides.
When I joined Google, I realized that it was kinda surprising I hadn't seen something like GCL in the outside world (not to say it doesn't exist, just it's ubiquitous enough for me to hear about it). This seems to be an example of that hole being filled.
Same here... just using the same language as the rest application, usually a file with constants or a class/struct/module/whatever.
the only reason config files in another language might be required is when you require configuring the application after it is compiled to native binary code.
Obviously this doesn't apply to all these applications written in scripting languages.
I do wonder how you create type-safe config files though. I currently have a `development.ts` and a `production.ts`, where the `production.ts` is only loaded when `node_env === production`. `production.ts` contains blank/default values and the file is overwritten on the server with production secrets.
I use j2cli to template my YAMLs, pretty powerful to template stuff without using ansible. You just need to right environment variables. And you can put if logic if you want.
I think this approach is a byproduct of thinking about infrastructure and configuration -- and the cloud generally -- as an "afterthought," not a core part of an application's infrastructure. Containers, Kubernetes, serverless, and more hosted services all change this, and Chef, Puppet, and others laid the groundwork to think differently about what the future looks like. More developers today than ever before need to think about how to build and configure cloud software.
We started the Pulumi project to solve this very problem, so I'm admittedly biased, and I hope you forgive the plug -- I only mention it here because I think it contributes to the discussion. Our approach is to simply use general purpose languages like TypeScript, Python, and Go, while still having infrastructure as code. An important thing to realize is that infrastructure as code is based on the idea of a goal state. Using a full blown language to generate that goal state generally doesn't threaten the repeatability, determinism, or robustness of the solution, provided you've got an engine handling state management, diffing, resource CRUD, and so on. We've been able to apply this universally across AWS, Azure, GCP, and Kubernetes, often mixing their configuration in the same program.
Again, I'm biased and want to admit that, however if you're sick of YAML, it's definitely worth checking out. We'd love your feedback:
- Project website: https://pulumi.io/
- All open source on GitHub: https://github.com/pulumi/pulumi
- Example of abstractions: https://blog.pulumi.com/the-fastest-path-to-deploying-kubern...
- Example of serverless as event handlers: https://blog.pulumi.com/lambdas-as-lambdas-the-magic-of-simp...
Pulumi may not be the solution for everyone, but I'm fairly optimistic that this is where we're all heading.
Joe