Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dead]
on March 8, 2014 | hide | past | favorite


So, this article appears to be plagiarized directly (as in, word for word) from Wes McKinney's Python for Data Analysis (which is an excellent, excellent resource):

http://books.google.com/books?id=JtJAkfzds4wC&lpg=PA46&ots=p...

Seeing as the 'author' of this post was the submitter, I'd love to hear an explanation of what happened here.



That's disgusting. They have stolen entire articles word for word. And received upvotes in the process.

In a way, they are being rewarded for stealing other peoples work. Sad.


My post below asks the same question. Is this a regular phenomenon w/ PyPix?

Also, I'd post a link to the original chapter here if it was publicly available; it's (of course) much more complete. Check out your local library's e-resources (I found it at Oakland).


This is the first time I've looked at the site, but looking at the other article in the 'Scientific Category', Machine Learning with Python (http://pypix.com/scientific/machine-learning-python/), also appears to be taken from Building Machine Learning Systems with Python (http://books.google.com/books?id=C-yglCEcK0sC&pg=PT31&lpg=PT...).

Unless I'm misunderstanding something, this is pretty awful behavior.


cough content farm cough cough


I'm a researcher and social scientist, and today my tool stack is a complete mess. I use python for crawling and organizing the data that I collect. I then use GIS software to analyze geographic data, R to produce statistical charts and then write documents in LaTeX, integrated with sweave, to pull in outputs from R. In an ideal world I would do this all in python.

    def getdata:
       ...
       ...

    def cleanandstoredata:
       ...
       ...

    def producecharts:
       ...
       ...

    def producetables:
       ...
       ...

    def writepaper:
        ...
        ...
I hear Julia's one purported aim is to make something like this possible, though have not tried it out.


Have you tried the iPython notebook? Don't know about the GIS software but I do most of what you mention inside the my notebooks.


right, I have not. I'm very much wedded to emacs, and I like the iPython style of working. Maybe one of these days I will setup emacs + iPython and see how far it gets me. The main problem I've found so far is that python is quite bad with advanced statistical analysis, like producing regressions with the right standard errors -- charts and table are better, but still not there. But maybe things have changed ...


Pandas + Statsmodels for linear regression / GLM / Discrete Choice, ARMA / ARIMA, and various non-parametric ops.

For GIS: Fiona (Python OGR operations); Shapely for geometric object manipulation, analysis, and set-theoretic operations; RasterIO (self-explanatory); Descartes and/or Basemap for drawing; PySAL for spatial analysis.

GeoPandas (an unholy, massively useful combination of most of the above) is also under heavy development.


> The main problem I've found so far is that python is quite bad with advanced statistical analysis, like producing regressions with the right standard errors

Can you tell me an example? Scipy, NumPy and statsmodels etc. pretty much handle these.


Looking at statsmodel Google Summer of Code 2013,

"Google Summer of Code 2013: We have had two students accepted to work on statsmodels as part of the Google Summer of Code 2013. The first project will focus on improving the discrete choice models, adding, for example, Conditional Logit, Nested Logit, and Mixed Logit models. The second project will focus on time series analysis, including regime-switching models such as SETAR, STAR, and Markov Switching models."

These are basics that I would expect any decent statistical language to have. I can see potential, but not for many years do I see the kind of support that I'm looking for in python.


Hardcore python user here. It sounds like you haven't used R. The OP is a social scientist and researcher. These things are vastly easier and more comprehensively implemented and documented in R. (upvoted you accidentally in the android app...) statsmodels is WIP and has an emphasis on econometrics.


Especially with more cutting edge and Not Machine Learning based statistics, R has packages where Python does not. I've found SciPy/NumPy and statsmodels to be an adequate "What if I need a GLM in my Python code?" solution, but less of a great solution to heavy stats work.


With regards to regressions, have you tried statsmodels?


I think I have some good news for you at least (but sadly not about the fact that statistics is easiest in R than anything else though!). For context, I'm a longtime, daily emacs and python user; I used to use R and ESS more than I do now.

1. You say you use sweave and emacs . So I assume you are using Emacs Speaks Statistics[1]. Running a python REPL in emacs now works very well thanks to Fabian Gallina's python.el which is in recent releases of emacs 24. I use it as my main python REPL. Use ipython instead of python as the interpreter running inside emacs. For config, see the comments at the top of the python.el file[2].

2. ipython notebook is amazing! [3] But, we don't want to write code in a browser. Good news. You can have ipython notebook in the browser share the same python kernel as a python shell in the terminal or in emacs. The steps are: (1) `ipython notebook` in shell and watch for the localhost URL it prints out, (2) Open the URL in a browser and start a new notebook, (3) Back in the shell, it has just printed out the name of the kernel, like `Kernel started: 5df5d119-85b5-4f33-8ba1-1c6b2eaa950f`; copy that. (4) Now start a shell ipython sharing the same kernel: `ipython console --existing 5df5d119-85b5-4f33-8ba1-1c6b2eaa950f`. To do (4) in emacs, one way to add the arguments is to pass a prefix argument to run-python. So `C-u M-x run-python` and then edit the interpreter arguments in the minibuffer.

So now you can define variables, functions etc in your shell/emacs ipython, and they are available in the web browser notebook.

3. No need to write code in the browser notebook! Implement functions in .py files in emacs, and in the web browser just import modules and call functions from within them.

4. Now, I don't have experience trying to replicate the sweave workflow in ipython notebook, but it is very flexible. See ipython author Min RK's reply here: http://stackoverflow.com/a/13222501/583763. That suggests you can indeed replicate much of sweave functionality. (Typesetting mathematical notation in latex and having MathJax display it in the browser I know works beautifully).

5. You can call R from ipython notebook, see e.g. http://nbviewer.ipython.org/github/ipython/ipython/blob/mast.... Rather than getting fancy, you may just want to have the R graphics device write figures to file explicitly and then work in python with those graphics files on disk.

6. There is also RPy2, I don't have much experience with that.

[1] For those not familiar, ESS is an implementation of an R REPL in emacs with extremely good usability and "feel". I would always use it over the vanilla R shell.

[2] `M-x find-library python` or https://github.com/fgallina/python.el/blob/master/python.el#...

[3] https://github.com/ipython/ipython/wiki/A-gallery-of-interes...


The Orgmode extension to Emacs has quite an elegant, flexible, and powerful way of doing what you want. Can be hard to wrap your mind around at the beginning but has good payoff once you're up to speed. Here's a video showing how one guy uses it with Python (though it integrates with multitude of languages, including R): http://www.youtube.com/watch?v=1-dUkyn_fZA

Here's paper (itself from LaTeX generated by Orgmode) describing use of Orgmode for this kind of "reproducible research": http://www.jstatsoft.org/v46/i03/paper


Python is quite good for this, actually. Julia is fantastic, but the toolchain isn't as mature. (E.g GIS functionality is more or less absent in julia, as far as I know.) To be fair, R is also fantastic for a lot of it.

At any rate, have a look at the pgf backend for matplotlib, cartopy, shapely, etc. Feel free to drop me a line if you want suggestions, examples, etc.


julia is indeed designed to give you the power of R along with much of the ease of python, although it's a bit off from dong that in terms of packages and extensions. i would suggest that if you know python julia would be easy to pick up.

a pure python alternative would be to use the Sage math (http://sagemath.org/) setup that wraps a lot of the Python math and science packages into a cohesive set of APIs. i use that a lot and enjoy it. i have yet to find a way to do the document generation in LaTeX although i think with some templating it should be easy (e.g. jinja2 or something and emit LaTeX and not HTML). charts are pretty easy using matplotlib for example. and also rpy (http://rpy.sourceforge.net/rpy2.html), which binds Python and R.


Interesting coincidence - a colleague of mine just shared this link with me: http://www.wolfram.com/data-science-platform/

Seems to be in line with what you are looking for.


Great share, thanks!

Note: the 'NotebookCloud' tag near the bottom (in section: 'IPython HTML Notebook') points to a 404 page (http://notebookcloud%20.appspot.com/).


Great guide, learning iPython and iPdb (or pdb) are crucial to good debugging and other quick tasks in python.

Note though %timeit is also good for slow code since it iterates less times (usually 3) for slower code.


This is great, but I would recommend anyone doing a "beginner's" guide for python tools like this first refer the reader to virtualenv.


What? The article title is "IPython: The Complete Beginners Guide", it serves its purpose.


Why virtualenv?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: