I am so curious about your implementation, for instance, what sort of preprocessing did you have to carry out? I had written a script sometime back to analyze Paul Graham's essays (link: https://github.com/futureUnsure/pg-essay-lda), and had to remove date and times because they appeared a lot and distorted the top topics. I'm wondering if you had to do something similar for text that described code?
Also, did you write an LDA library yourself or did you leverage an existing library?
I apologize in advance if my questions sound naive/stupid, am just a noob...
Thanks, I am using gensim package for LDA. In a nutshell:
1. Get descriptions of repos user is interested in
2. Cleanup/Filtering/Tokenization
3. Use LDA to generate Topics
3. Use the topics to search for repositories github can provide.
So, I guess the post has been sort of banned from HN as I cannot see it either on Hacker News trending/new/show pages. It was trending on the front page for quite some time, just sometime ago (so obviously it wasn't downvoted into oblivion). I guess I will have to live with that :(
Hey! I am sorry if I rubbed you the wrong way. It was definitely not my intention to eulogize Paul Graham unnecessarily. I wrote that out of sincerity, I genuinely feel that his essays have been highly educational for me, and for a lot of my friends as well. Maybe his essays have not had the same effect on you, and a lot of other people. I guess if enough people complain about it, I'd edit that line out of the README, no questions asked.
I'd also like to add that I mentioned, "one of the.." and "pundit[s]" implying that there are many others who'd be in the club, and Paul Graham isn't the only or most influential. Hope that helps!
Don't get me wrong, what you've done is cool, and everybody's free to just love this or that "pundit" (PG is a lot of things, but I wouldn't call him a pundit). It's just that here on HN sometimes PG's writing's treated like the fifth Gospel.
Sorry if it came out harsher than I wanted, I had a hard day with a hard client. :)
That's a really good point! No, I did not, although in hindsight I should have had thought about that. As I had mentioned in the README, the modeling perhaps is not very high quality, but I surely do hope to make it better once I get some free time. Thanks for your suggestion! Means a lot :-)
I am so curious about your implementation, for instance, what sort of preprocessing did you have to carry out? I had written a script sometime back to analyze Paul Graham's essays (link: https://github.com/futureUnsure/pg-essay-lda), and had to remove date and times because they appeared a lot and distorted the top topics. I'm wondering if you had to do something similar for text that described code?
Also, did you write an LDA library yourself or did you leverage an existing library?
I apologize in advance if my questions sound naive/stupid, am just a noob...