More

MilkMp · 2026-02-23T14:54:48 1771858488

Hi there, yes I used AI to help build this website. I personally don’t have the time nor the talent to build something like this from scratch! I do have knowledge in how historical and crime data are suppose to parsed,viewed, analyzed, and presented to the world :) If someone would like to take this work and improve it, please do!

MilkMp · 2026-02-23T06:59:48 1771829988

Was originally just supposed to be a data archive/download place for the parsed data.Thought a website could help! Will look into the standards

dbg31415 · 2026-02-23T07:18:59 1771831139

Accessibility matters.

In 2026, tools like WAVE, Lighthouse, and a real screen reader should be part of any website design process. They catch issues early. A stitch in time saves nine.

I know you may not be a designer. That’s fine. Starting with a solid, off-the-shelf CSS framework can get you much closer to Web Content Accessibility Guidelines (WCAG) compliance from day one. It sets a baseline so you’re not reinventing solved problems.

Building from scratch is absolutely valid. It’s cool, even. But right now it reads less like an intentional design choice and more like missing fundamentals.

I’m not trying to be a dick, the project has potential! A few design improvements would make it usable for a lot more people.

Cheers!

MilkMp · 2026-02-23T07:31:51 1771831911

Thanks! I am definitely not a front-end web designer lol, and I for sure don't want to limit people's access. I will look into the standards and see how best to implement them into the website :)

MilkMp · 2026-02-23T06:57:02 1771829822

Thanks! Will look into it

wossab · 2026-02-23T06:59:39 1771829979

Yeah. Please don't. This is such a breath of fresh air. Dense data should be presented like a book, not a pamphlet-like hyperlinked website.

freakynit · 2026-02-23T07:01:45 1771830105

I agree. I love the current design. Personally, it seems to be just perfect.

MilkMp · 2026-02-23T05:29:02 1771824542

Hey there, yeah, definitely. I maintain .txt change logs for all data modifications. To be clear, no information is added or altered — the Factbook content is exactly what the CIA published. The parsing process structures the raw text into fields (removing formatting artifacts, sectioning headers, and deduplicating noise lines), but the actual data values are untouched. What I've added on top are lookup tables that map the CIA's FIPS 10-4 codes to ISO Alpha-2/3 and a unified MasterCountryID, so the different code systems can be joined and queried together.

I will add them to the github :)

freakynit · 2026-02-23T06:49:13 1771829353

Awesome. Thanks so much..

MilkMp · 2026-02-23T05:16:51 1771823811

Will check out! Thank you!

MilkMp · 2026-02-23T05:14:23 1771823663

Hi there, thanks for linking this! My GitHub and website both link to and use this source! I just thought putting it in a SQL database and making the entire 1990-2025 queryable was needed since I couldn't find one anywhere :)

genewitch · 2026-02-23T06:18:51 1771827531

it is a lot of fun and rewarding to do this! I've done it several times for medium-sized datasets, like wikipedia dumps, the entire geospatial dataset to mapreduce it (pgsql). The wikipedia one was great, i had it set up to query things like "show me all ammunition manufactured after 1950 that is between .30 and .40" and it could just return it nearly instantly. The wikimedia dumps keep the infoboxes and relations intact, so you can do queries like this easily.

3eb7988a1663 · 2026-02-23T06:54:12 1771829652

Do you have a write-up of this somewhere? When I last looked at the Wikipedia dumps, they looked like a mess to parse. How were you getting structured information?

iamacyborg · 2026-02-24T08:51:27 1771923087

You'd presumably have to run some part of the transclusion pipeline to properly handle template/module/page transclusion.

genewitch · 2026-02-24T05:15:57 1771910157

unfortunately, i consider it proprietary.

MilkMp · 2026-02-23T05:13:16 1771823596

Hi there! If you have anything you want added to the site, just let me know :) I can definitely try.

MilkMp · 2026-02-23T05:10:32 1771823432

Ohh that is a great idea! And since we already have the political field in SQL!. I will start working on some of this and update the website this week. Thank you for the awesome suggestions!

MilkMp · 2026-02-23T05:03:06 1771822986

Thanks so much!

MilkMp · 2026-02-23T05:00:29 1771822829

Found the problem, the total regex doesn't handle magnitude suffixes:

2018: total: 17,856,024 → parses as 17856024 (correct raw count) 2020: total: 18.17 million → parses as 18.17 (WRONG - drops "million") 2025: total: 39.3 million → parses as 39.3 (WRONG) So the chart jumps from ~18 million down to ~18, making it wrong. The fix is to handle "million/billion/trillion" after total.

Just deployed a new bug fix.

Thanks for bringing this to my attention!