'changeme' is valid base64

susam · on Oct 24, 2020

Any alphanumeric string with a length that is a multiple of 4 is valid Base64 string.

  $ printf AAAA | base64 --decode | od -tx1
  0000000    00  00  00
  0000003

  $ printf AAAAAAAA | base64 --decode | od -tx1
  0000000    00  00  00  00  00  00
  0000006

  $ printf AQEB | base64 --decode | od -tx1
  0000000    01  01  01
  0000003

  $ printf AQID | base64 --decode | od -tx1
  0000000    01  02  03
  0000003

  $ printf main | base64 --decode | od -tx1
  0000000    99  a8  a7
  0000003

  $ printf scrabble | base64 --decode | od -tx1
  0000000    b1  ca  da  6d  b9  5e
  0000006

  $ printf 12345678 | base64 --decode | od -tx1
  0000000    d7  6d  f8  e7  ae  fc
  0000006

Since '+' and '/' are also used as symbols in Base64 encoding (for binary 111110 and 111111, respectively), we also have:

  $ printf 1+2+3+4+5/11 | base64 --decode | od -tx1
  0000000    d7  ed  be  df  ee  3e  e7  fd  75
  0000011

  $ printf "\xd7\xed\xbe\xdf\xee\x3e\xe7\xfd\x75" | base64
  1+2+3+4+5/11

billyhoffman · on Oct 24, 2020

The padding character =, used to pad the base64 string to be a multiple of 4, is technically optional as well since they can be inferred from the length, though different decoders can be more strict than others. So virtually all ascii words are valid base64

https://en.m.wikipedia.org/wiki/Base64

(I got burned by this a few months ago where Chrome can produce HAR files that have base64 encoded bodies without padding characters, and .NETs built in decoder expects them)

m463 · on Oct 24, 2020

  $ echo "changeme" | base64 --decode | hexdump -C
  00000000  72 16 a7 81 e9 9e                                 |r.....|
  00000006
  $   echo "*lock*" | base64 --decode | hexdump -C
  base64: invalid input

rootusrootus · on Oct 24, 2020

That's for the asterisks, not because length.

  $ echo "llockk" | base64 --decode | hexdump -C
  00000000  96 5a 1c                 |.Z.|
  00000003

m463 · on Oct 25, 2020

no! ha, I meant the old /etc/shadow trick of adding * lock * or * LCK * helps.

jstanley · on Oct 24, 2020

I'm not quite sure what OP thinks base64 means, but:

> the probability that changeme is actually valid base64 encoding must be very low

This makes no sense. The characters are part of the base64 alphabet and you have an acceptable number of them.

You wouldn't claim that the probability that deadbeef is a valid base 16 number must be very low. Probability doesn't come into it, the characters are part of the alphabet, that's all there is to it.

ThreeFx · on Oct 24, 2020

I must say I never dove into the internals of base64 encoding, but this does indeed make a lot of sense :)

Thanks!

matt-attack · on Oct 25, 2020

Base64 is just like base10 but they needed 55 more characters beyond the 0-9. So someone decided to use letters (both lower case and upper case) and a few symbols.

jstanley · on Oct 26, 2020

> Base64 is just like base10 but they needed 55 more characters beyond the 0-9

How did you work that out?

anandoza · on Oct 24, 2020

> I always thought that it would fail to decode the string since the probability that changeme is actually valid base64 encoding must be very low

I'm a bit confused, I thought any string with only lowercase letters was "valid base64" (more precisely, I thought "valid base64" is equivalent to "string consists only of the 64 special characters we're using to represent digits 0-63").

contravariant · on Oct 24, 2020

Not any string, but if it is a multiple of 4 characters then yes it will always be valid. In particular 'changeme' has 8 characters and therefore represents exactly 6 bytes.

It gets more complicated if you're using base64 to represent a number of bytes that isn't a multiple of 3, those would be unlikely to happen randomly. Those will usually end in a number of '=' signs to pad their length to a multiple of 4 and indicate how many bytes are missing. Although apparently there are also versions o base64 that don't include the padding.

al2o3cr · on Oct 24, 2020

Depends on the decoder's default settings - some applications of base64 ignore padding but not all of them.

As an example, Elixir's stdlib includes a base64 decoder; passing "padding: false" will give the "decode anything" behavior you're describing.

https://hexdocs.pm/elixir/Base.html#decode64/2

detaro · on Oct 24, 2020

No, because of padding rules. e.g. "ab" isn't valid base64, because it only encodes 12 bits, not a multiple of bytes. base64 is thus always with padding to a multiple of 3 input byte (equaling 3*8/6 = 4 output characters)

IshKebab · on Oct 24, 2020

Right but they're still pretty confused, because the probability is 1/4, not "very low".

notRobot · on Oct 24, 2020

That does appear to be correct. I just did a bunch of base64decode("random-string-here") and always got an output, never an error.

tux3 · on Oct 24, 2020

Each base64 character represents 6 bits of data (which is why base64 is padded with ='s at the end).

Try running `base64 -d <<< a` and it will fail, but `base64 -d <<< aaaa` works (4*6 == 24 bits, which is divisible by 8 and gives 3 bytes of output)

pantalaimon · on Oct 24, 2020

I mean the whole point of base64 is to encode binary using only printable characters.

ramshorns · on Oct 24, 2020

So "valid+base64" is valid base64.

ralusek · on Oct 24, 2020

I am equally confused for the exact same reason.

abeppu · on Oct 24, 2020

I feel like this is an example of a failure of coordination where one confuses teleological design choices made by someone with specific intents with mere technical facts/incidental conditions of the context you find yourself in.

The author is surprised that the dummy string is valid base64. Other comments on this thread discuss whether all lowercase strings are valid and whether implementations differ.

But ... Isn't the point of using base64 (rather than hex or something) to make a large share of easily typeable strings valid (which implies they can be shorter/faster/cheaper)? I.e. someone upstream of you _wanted_ "changeme" to be valid, and the misunderstanding is less about the technical details of base64, and more about a hidden conflict in desires between people making design choices separated by time, space, and perhaps organization.

kevin_thibedeau · on Oct 24, 2020

The point of base64 is to have a safe way to get 8-bit data through channels that only support 7-bit ASCII and strip control chars.

bobbiechen · on Oct 24, 2020

I recently learned that in journalism, the word TK is used as a placeholder to be filled in later - it's a fairly rare bigram and not a real word, so it's easier to notice it (or ctrl-f for it).

I use a VS Code extension for TODOs (Todo Tree) , which highlights TODO comments and aggregates them in a centralized list, which is useful. I think it can configured to include custom words, might be useful for your personal "changeme"s

detaro · on Oct 24, 2020

Some software late in a publishing/printing toolchain will error out/warn about "lorem ipsum" placeholder text.

I wonder what the TK equivalent in German publishing is, here I'd expect a TK bigram to happen too often.

jp57 · on Oct 24, 2020

We used to use 0xDEADBEEF for this purpose in hex.

throw0101a · on Oct 24, 2020

0xDECAFBAD

M2Ys4U · on Oct 24, 2020

And if you need to occupy more space, 0xDEADBEEFCAFE works quite well

klyrs · on Oct 24, 2020

Barely related: "Fucko" is a valid graph6 string