Hi there, yes I used AI to help build this website. I personally don’t have the time nor the talent to build something like this from scratch! I do have knowledge in how historical and crime data are suppose to parsed,viewed, analyzed, and presented to the world :) If someone would like to take this work and improve it, please do!
In 2026, tools like WAVE, Lighthouse, and a real screen reader should be part of any website design process. They catch issues early. A stitch in time saves nine.
I know you may not be a designer. That’s fine. Starting with a solid, off-the-shelf CSS framework can get you much closer to Web Content Accessibility Guidelines (WCAG) compliance from day one. It sets a baseline so you’re not reinventing solved problems.
Building from scratch is absolutely valid. It’s cool, even. But right now it reads less like an intentional design choice and more like missing fundamentals.
I’m not trying to be a dick, the project has potential! A few design improvements would make it usable for a lot more people.
Thanks! I am definitely not a front-end web designer lol, and I for sure don't want to limit people's access. I will look into the standards and see how best to implement them into the website :)
Hey there, yeah, definitely. I maintain .txt change logs for all data modifications. To be clear, no information is added or altered — the Factbook content is exactly what the CIA published. The parsing process structures the raw text into fields (removing formatting artifacts, sectioning headers, and deduplicating noise lines), but the actual data values are untouched. What I've added on top are lookup tables that map the CIA's FIPS 10-4 codes to ISO Alpha-2/3 and a unified MasterCountryID, so the different code systems can be joined and queried together.
Hi there, thanks for linking this! My GitHub and website both link to and use this source! I just thought putting it in a SQL database and making the entire 1990-2025 queryable was needed since I couldn't find one anywhere :)
it is a lot of fun and rewarding to do this! I've done it several times for medium-sized datasets, like wikipedia dumps, the entire geospatial dataset to mapreduce it (pgsql). The wikipedia one was great, i had it set up to query things like "show me all ammunition manufactured after 1950 that is between .30 and .40" and it could just return it nearly instantly. The wikimedia dumps keep the infoboxes and relations intact, so you can do queries like this easily.
Do you have a write-up of this somewhere? When I last looked at the Wikipedia dumps, they looked like a mess to parse. How were you getting structured information?
Ohh that is a great idea! And since we already have the political field in SQL!. I will start working on some of this and update the website this week. Thank you for the awesome suggestions!
Found the problem, the total regex doesn't handle magnitude suffixes:
2018: total: 17,856,024 → parses as 17856024 (correct raw count)
2020: total: 18.17 million → parses as 18.17 (WRONG - drops "million")
2025: total: 39.3 million → parses as 39.3 (WRONG)
So the chart jumps from ~18 million down to ~18, making it wrong. The fix is to handle "million/billion/trillion" after total.
reply