The padding character =, used to pad the base64 string to be a multiple of 4, is technically optional as well since they can be inferred from the length, though different decoders can be more strict than others. So virtually all ascii words are valid base64
(I got burned by this a few months ago where Chrome can produce HAR files that have base64 encoded bodies without padding characters, and .NETs built in decoder expects them)
I'm not quite sure what OP thinks base64 means, but:
> the probability that changeme is actually valid base64 encoding must be very low
This makes no sense. The characters are part of the base64 alphabet and you have an acceptable number of them.
You wouldn't claim that the probability that deadbeef is a valid base 16 number must be very low. Probability doesn't come into it, the characters are part of the alphabet, that's all there is to it.
Base64 is just like base10 but they needed 55 more characters beyond the 0-9. So someone decided to use letters (both lower case and upper case) and a few symbols.
> I always thought that it would fail to decode the string since the probability that changeme is actually valid base64 encoding must be very low
I'm a bit confused, I thought any string with only lowercase letters was "valid base64" (more precisely, I thought "valid base64" is equivalent to "string consists only of the 64 special characters we're using to represent digits 0-63").
Not any string, but if it is a multiple of 4 characters then yes it will always be valid. In particular 'changeme' has 8 characters and therefore represents exactly 6 bytes.
It gets more complicated if you're using base64 to represent a number of bytes that isn't a multiple of 3, those would be unlikely to happen randomly. Those will usually end in a number of '=' signs to pad their length to a multiple of 4 and indicate how many bytes are missing. Although apparently there are also versions o base64 that don't include the padding.
No, because of padding rules. e.g. "ab" isn't valid base64, because it only encodes 12 bits, not a multiple of bytes. base64 is thus always with padding to a multiple of 3 input byte (equaling 3*8/6 = 4 output characters)
I feel like this is an example of a failure of coordination where one confuses teleological design choices made by someone with specific intents with mere technical facts/incidental conditions of the context you find yourself in.
The author is surprised that the dummy string is valid base64. Other comments on this thread discuss whether all lowercase strings are valid and whether implementations differ.
But ... Isn't the point of using base64 (rather than hex or something) to make a large share of easily typeable strings valid (which implies they can be shorter/faster/cheaper)? I.e. someone upstream of you _wanted_ "changeme" to be valid, and the misunderstanding is less about the technical details of base64, and more about a hidden conflict in desires between people making design choices separated by time, space, and perhaps organization.
I recently learned that in journalism, the word TK is used as a placeholder to be filled in later - it's a fairly rare bigram and not a real word, so it's easier to notice it (or ctrl-f for it).
I use a VS Code extension for TODOs (Todo Tree) , which highlights TODO comments and aggregates them in a centralized list, which is useful. I think it can configured to include custom words, might be useful for your personal "changeme"s