Visualize any public CSV on GitHub in a few clicks

sureshv · on April 29, 2014

Looks like you handle CSV's and non-Github links to datasets. You should add transparent handling of gz/zip files as well as xls. This would be useful for poking around government datasets.

glaugh · on April 29, 2014

Yeah, agreed. We do actually handle xls files in our main product (this is sort of a demo of an API connection). Probably should have enabled that for this little implementation.

Can't yet take gz/zip files, probably should though.

Thanks/cheers

paulsef11 · on April 30, 2014

Very cool!

Is there any limitation on the delimiter? I often choose less frequently used characters like * as delimiters to avoid parsing values in the csv. It would be nice to see an option to specify or even detect the delimiter. Same could be said for headers.

Thanks for sharing!

lionheart · on April 29, 2014

Very cool. Is it possible to easily embed this into an application?

I bet my users would love that.

glaugh · on April 29, 2014

Yup!

Here's details: https://www.statwing.com/overview/integrations

API docs: https://www.statwing.com/docs/api

scrollaway · on April 29, 2014

I misread and thought it was an app to visualize CVS repositories on Github.

Man that would be cool... it's insane CVS is still used by more than one person on this planet.

sheetjs · on April 29, 2014

I wonder if the entire stack could be run in the browser (click-drag to submit a file, perform statistical analysis in browser and display results)

caiobegotti · on April 29, 2014

Couldn't quickly find the limitations when parsing these .csv files. How many lines in them would be still ok?

glaugh · on April 29, 2014

Things will definitely start slowing down pretty linearly after 100k lines, but we often see millions, and most files shouldn't break us as long as they're not over ~500MB.

Edit: Fleshed out explanation

toomuchtodo · on April 29, 2014

Do you try to check a resource with an http head request to ensure its under 500MB before ingesting?

glaugh · on April 29, 2014

Nope. Folks who sign up for the API generally have some awareness of what size files work and what don't. And since they're uploading for their users, we're aligned around wanting those users to have a good experience.

We haven't worried about it in this particular implementation around the API because we didn't run across many raw github files that were big. And even when we do get the odd big one, we just refuse to process it once we receive it, so it doesn't hurt us much if someone sends us a few GBs.