Some results normalized by overall language popularity. Specifically, the entry in row R and column C is 1000 * (hits for C in language R) / (hits for "the" in language R) or "---" if either the numerator or denominator was small enough not to make the top-10 list on github.
All the scraping was done by hand and the numbers rounded to a limited number of places in the process, so there may very well be mistakes.
[EDIT: oops, initially I failed to paste in the actual data.]
Tentative conclusions: Python is ugly-hack-iest and (almost exactly tied with C) ugliest; HTML is most beautiful with Python a close second, XML is lolliest, C++ is WTFiest, and C is buggiest.
Tentative meta-conclusion: these numbers have no value beyond idle amusement. But they idly amused me, so that's OK.
(The weirdest result of the lot, to me, is XML coming top for "lol". If you do the search and click on "XML" on the left you'll see why it is. Lots of instances of what I think are the same file, full of "&lol;" entities. LOL, that's pretty ugly. WTF? An ugly hack, I guess.)
I read your table and immediately jumped to a different conclusion: That python programmers are more sensitive to ugly hacks and more likely to call them out. Not saying I'm right, but I don't know that the data can distinguish the hypotheses.
My intuition tells me that if your alternative hypothesis were true, then php programmers have higher standards because they ajudge more code 'ugly and C++ programmers are universally perfectionists...or at least when they are not confused which it appears they usually are.
Just for fun with hypothesizing, does Python's near 50/50 split between 'ugly and 'beautiful suggest a large degree of random use? Or more interestingly, does it suggest there is a tendency to classify middle cases as extreme cases, and is this a result of the community having 'a Pythonic Way?'
I immediately felt the same and left a comment. But what would distinguish it is if "Go" had a high proportion of such comments, where "Perl" would have a low one.
Could .py have the greatest number of "ugly hacks" because the community's standards for explicit, "beautiful" designs is higher? This would be shown if languages like Go have a much higher prevalence for "ugly hack" than a language like Perl. (Where even core language features, ahem you can complete the thought.)
All the scraping was done by hand and the numbers rounded to a limited number of places in the process, so there may very well be mistakes.
[EDIT: oops, initially I failed to paste in the actual data.]
Tentative conclusions: Python is ugly-hack-iest and (almost exactly tied with C) ugliest; HTML is most beautiful with Python a close second, XML is lolliest, C++ is WTFiest, and C is buggiest.Tentative meta-conclusion: these numbers have no value beyond idle amusement. But they idly amused me, so that's OK.
(The weirdest result of the lot, to me, is XML coming top for "lol". If you do the search and click on "XML" on the left you'll see why it is. Lots of instances of what I think are the same file, full of "&lol;" entities. LOL, that's pretty ugly. WTF? An ugly hack, I guess.)