And, after looking again at HN's API and determining that front pages aren't represented in it ... I'm crawling the 5,939 front pages from 20 Feb 2006 to 25 May 2023 and scraping those to find mentions of states on the actual front pages.
Aggressive crawls result in 403 after a few fetches, so my current attempt has an apparently-adequate delay parameter set. It'll take a day or two for the crawl to complete. Presently up through 22 Sept 2007.
State-name matches (with some false positives) in that set:
Delaware corporate franchise tax is due today (March 1st) (March 1, 2007)
An Early Stage Entrepreneur's Guide to New York City (March 5, 2007)
Web 2.0 Expo 2007, April 15-18, 2007, San Francisco, California (March 10, 2007)
Zillow Becomes Illegal in Arizona (April 16, 2007)
New York Times on the social news uproar (May 3, 2007)
What Silicon Valley Could Learn from Columbus, Ohio (May 22, 2007)
New York Times Will Lower Editorial Standards Online And Reduce Size Of Print Newspaper (May 26, 2007)
eHarmony sued in California for excluding gays (June 1, 2007)
New York Times versus Digg (June 6, 2007)
A Patent Lie - New York Times (June 9, 2007)
Interactive Colorado Nightlife Guide - "Social Nightworking" (June 16, 2007)
In the state of Hawaii, tax breaks make tech investments nearly risk-free? (June 27, 2007)
Top Web Apps in Arizona (June 28, 2007)
Millionaires Cash Out Of California (July 16, 2007)
C.E.O. Libraries Reveal Keys to Success - New York Times (July 21, 2007)
Pics of Web Company HQs in California - Digg, Facebook, OpenDNS.. (July 25, 2007)
Hackers find serious problems in California voting machines (July 30, 2007)
A Mystery Solved: "Fake Steve" Blogger Comes Clean - New York Times (Aug 5, 2007)
Colorado Startup Scene (Aug 5, 2007)
New York Times Sees Sense: Paywall Comes Crashing Down (Aug 7, 2007)
Parsing Miss South Carolina's Statement (Sept 1, 2007)
Texas Startup Says It Has Batteries Beat (Sept 5, 2007)
Spider-like vessel hits New York waters - can cross the Atlantic on one load of diesel fuel (Sept 8, 2007)
New York Times Launches Facebook App (Sept 12, 2007)
Theory Girl by the University of Washington CSE Band [mp3] (Sept 15, 2007)
Note that seven of the "New York" mentions are for "The New York Times".
In the event anyone wants to suggest analyses to perform, I'm game.
For now, I'm parsing out title, points, comments, submitter, and the submission site.
I'm classifying by state and city (list of 330 largest cities in the US by population), as well as a few "false positive" categories (e.g., "Washington Post", "New York Times").
One interesting angle I'm looking at is the average votes and comments by site within the data --- I'd been curious for a while as to which domains seem to be most highly considered (or at least successful in garnering votes and comments). I'm still waiting for the full archive to get pulled in, but early results are ... interesting. I'll leave it there. (Paulgraham.com rather predictably does well, though it's not the highest-rated.)
I'm also thinking of ways to do trending over time. In early data (I'm up to late 2011 / early 2012 as I'm writing this), sites including Quora and even jwz.com appear and ... do well. plus.google.com hasn't yet appeared, and a few others that come to mind don't seem represented yet. Top sites by year or over a five-year interval seems potentially illuminating.
Oh, and as far as states go, when restricted to front-page submissions only, California does far better than the Algolia search results (all non-dead/killed submissions) indicate.
Aggressive crawls result in 403 after a few fetches, so my current attempt has an apparently-adequate delay parameter set. It'll take a day or two for the crawl to complete. Presently up through 22 Sept 2007.
State-name matches (with some false positives) in that set:
Note that seven of the "New York" mentions are for "The New York Times".