I have personally gotten a lot of mileage from just writing the compute heavy parts of my code in C++ and exposing it to Python with a tool like PyBind11 [1] or NumpyEigen [2]. I find tools like numba and cython to be more trouble than they're worth.
I prototype in python or whatever, then, if the project survives into market and has legs I either buy more hardware or rewrite the expensive parts in C++.
Reduces calendar time, risk, cost. And I'm likely to make better decisions once the code and market is better understood after the prototype is tested under real world conditions and the requirements have changed (like they always seem to do).
+1 for pybind11. I wrote python bindings using pybind11 for two C++ based simulators: MOOSE and Smoldyn. It was surprisingly easy to use given how badly Python C-API and c++ tooling suck. Though you have to create binary wheels for every version of python and platform separately.
I grew up in Stanstead. I have fond memories of story time as a child in the library, borrowing movies and comic books, and playing age of empires 2 with my best friend on the two shared computers in the front room.
There’s also a street in the town (aptly named Canusa st.) which is half in the US and half in Canada. Interestingly, the houses on one side have flags reminding you where you are, while there are no flags on the other. Figuring out which side is left as an exersize to the reader ;)
Yes! My mother-in-law grew up in Stanstead, and we’ve spent a lot of time there with her side of the family (and lived and worked in Sherbrooke for a couple years, though we are Australian and are in Australia now).
(Also note that in Quebec provence flying a Canadian flag can be a bit of a political statement… it seemed easier to me to be carefully neutral on such things).
I would say likely Ontario, and specifically the Ottawa capital region. Though, since the Convoy, waving a Canadian Flag has taken on a Conservative/right wing/populist connotation that I think is still around today, so it may be shifting in the direction of Alberta and Saskatchewan.
> But even putting aside the fact that claiming someone else's writing as one's own is wrong, the value in survey papers is in how they re-frame the field. A survey paper that just copies directly from the prior paper hasn't contributed anything new to the field that couldn't be obtained from a list of references.
Good survey papers can be important contributions in their own right (e.g. [1]). A good survey should contextualize works within a subject area with respect to each other and identify high level trends/ideas in that subject. These connections are not only useful for learning a topic, but also for positioning novel work or identifying under-researched areas to focus on.
If the authors felt that one of the papers they plagiarized concisely expressed what they wanted to say, they could simply quote and cite that work. Otherwise, it could be construed that the authors are claiming to be the ones drawing the conclusions they wrote. Moreover, from the article, the survey in question seems to be pretty egregiously plagiarizing, which deserves to be called out/shamed.
> But even putting aside the fact that claiming someone else's writing as one's own is wrong, the value in survey papers is in how they re-frame the field. A survey paper that just copies directly from the prior paper hasn't contributed anything new to the field that couldn't be obtained from a list of references.
Whether or not a survey paper is "good" is irrelevant here. Yes, a survey paper that just lists others papers may be a bad survey paper, but it does nothing wrong as long as it cites the original papers, which this does. A bad survey paper may not be published in a journal, that's what peer review is for, but there is nothing wrong with publishing it openly on the web.
And there is still value in aggregating other papers, even if it's just a list with description. That's the reason why these "awesome-XX" Github repos are so popular. Time to hunt them down?
If you look at the plagiarized language in the article, it seems as if the BM paper authors are claiming contributions (emphasis mine). Credit is a major currency in research, and it's important to give it where it is due. If someone did this with one of my papers, I'd be quite upset.
For example (Emphasis mine):
> The risks of data memorization, for example, the ability to extract sensitive data such as valid phone numbers and IRC usernames, are highlighted by Carlini et al. [41]. While their paper identifies 604 samples that GPT-2 emitted from its training set, we show that over 1 of the data most models emit is memorized training data. In computer vision, memorization of training data has been studied from various angles for both discriminative and generative models Deduplicating training data does not hurt perplexity: models trained on deduplicated datasets have no worse perplexity compared to baseline models trained on the original datasets. In some cases, deduplication reduces perplexity by up to 10%. Further, because recent LMs are typically limited to training for just a few epochs
Yes, I agree that's bad but looks like sloppy copy and pasting as opposed to intentional plagiarism to claim contributions. Would it have been okay if they said "they" instead of "we"?
I want to echo the other comments here that low expense ratio (<0.25%) funds from Vanguard, Fidelity, Schwab, etc... are all great stable investments.
My time horizon is longer than 5 years, and I buy broad market index funds split up as follows: 55% US large cap (e.g. VIIIX, VTSAX, SWTSX), 15% US mid cap (e.g. VMCPX), 10% US small cap (e.g. VSCPX), and 20% international (e.g. VTSNX, SWISX, VXUS).
I also highly recommend dollar cost averaging. i.e. buying a fixed amount of your portfolio at fixed periods. I have my bank do this automatically every 2 weeks. The benefit of dollar cost averaging is (1) it takes the emotion out of investing, and (2) over a long time window, more of your assets will be purchased at a low prices than high prices (because you're buying a fixed dollar amount of assets every N days, fewer you will buy fewer assets when prices are high and more assets when prices are low).
I was a software engineer briefly before starting grad school. During that time, I found I didn't have the time to sit down and learn about topics that interested me. I also wanted to be in research-y roles where I could build things that were more experimental and less well understood.
During my PhD, I got to spend time learning, and attending talks/seminars/conferences. Gaining deeper background knowledge in my field as well as learning how to quickly evaluate and explore new ideas gave me the tools to have the type of job I wanted. I'm a research scientist at an industrial lab now and quite enjoy it.
That being said, I agree with the grandparent post that doing a PhD can be a grueling experience. I had to carry the bulk of the work for many of the papers I submitted. If I took a day off, nobody would pick up the slack. Tight deadlines meant the only way to succeed was putting in long hours. My advisors were also spread very thin so it was difficult to get a lot of time with them. There were times when I felt very alone. This was a really stark contrast to how collaborative engineering in industry was and I don't think I ever fully adjusted to it. My current job feels like a happy middle ground. I publish papers alongside other people and we split the work.
It’s not the site you’re looking for, but I found https://poolside.fm recently and it’s become one of those quirky corners of the internet that I have come to enjoy. I definitely miss the days of discovering weird specialty sites, and poolside gave me a bit of that new site discovery rush (also the music is great).
No mention of conda for environment management? In my experience, conda is by far the best tool for this. Especially when using packages which have non python dependencies.
Hi, it's the author here. You're right conda is fantastic and in fact you can use pyenv/pyenv-virtualenv with conda. [1] The article was already getting long and the goal of the article was a survey at 42,000 feet with the example that was created.
I also own https://stonks.money and am looking for good ideas for what to do with it