Have you read the book by its author, Python for Data Analysis
Data Wrangling with Pandas, NumPy, and IPython
By Wes McKinney · 2017?
I think it is a bit mean to say about a package as popular as this to have “no design philosophy”. You should read about their design philosophy before making that comment.
From my experience, I jumped all in when I discovered pandas and then I dialed it back. (It was partly because of my inexperience before.)
Pandas is more useful for exploratory data analysis. It is kind of in the philosophy of working in the terminal (with UNIX pipes, etc.) to explore things quickly. That’s why you’d see in the wild people chaining tons of methods together, sort of like people writing terminal one liner chaining a lot of pipes.
It is also useful as a dictionary containers, in fact you can treat a data frame as if it is a dictionary of dictionary of values in terms of API. Vice versa, if one has an internal structure that is dict of dict of values, you can convert that to a DataFrame as a drop in replacement (I’ve done that when working with a software that does not use pandas.)
For simple things that one has prototyped, it can be left as is for “production”.
But for more complicated things, one should “productionize” it using easier to understand and/or more performant logic.
Some of the mistakes of using pandas is to treat it as you “data container”, as if the table itself is self explanatory. From my experience I’ve been confused by the table I saved in the past. So now I write classes that has a to_frame method that my internal data structure can be converted to a dataframe for further exploration if needed.
I think it is a bit mean to say about a package as popular as this to have “no design philosophy”. You should read about their design philosophy before making that comment.
From my experience, I jumped all in when I discovered pandas and then I dialed it back. (It was partly because of my inexperience before.)
Pandas is more useful for exploratory data analysis. It is kind of in the philosophy of working in the terminal (with UNIX pipes, etc.) to explore things quickly. That’s why you’d see in the wild people chaining tons of methods together, sort of like people writing terminal one liner chaining a lot of pipes.
It is also useful as a dictionary containers, in fact you can treat a data frame as if it is a dictionary of dictionary of values in terms of API. Vice versa, if one has an internal structure that is dict of dict of values, you can convert that to a DataFrame as a drop in replacement (I’ve done that when working with a software that does not use pandas.)
For simple things that one has prototyped, it can be left as is for “production”.
But for more complicated things, one should “productionize” it using easier to understand and/or more performant logic.
Some of the mistakes of using pandas is to treat it as you “data container”, as if the table itself is self explanatory. From my experience I’ve been confused by the table I saved in the past. So now I write classes that has a to_frame method that my internal data structure can be converted to a dataframe for further exploration if needed.