Looks like you handle CSV's and non-Github links to datasets. You should add transparent handling of gz/zip files as well as xls. This would be useful for poking around government datasets.
Yeah, agreed. We do actually handle xls files in our main product (this is sort of a demo of an API connection). Probably should have enabled that for this little implementation.
Can't yet take gz/zip files, probably should though.
Is there any limitation on the delimiter? I often choose less frequently used characters like * as delimiters to avoid parsing values in the csv. It would be nice to see an option to specify or even detect the delimiter. Same could be said for headers.
Things will definitely start slowing down pretty linearly after 100k lines, but we often see millions, and most files shouldn't break us as long as they're not over ~500MB.
Nope. Folks who sign up for the API generally have some awareness of what size files work and what don't. And since they're uploading for their users, we're aligned around wanting those users to have a good experience.
We haven't worried about it in this particular implementation around the API because we didn't run across many raw github files that were big. And even when we do get the odd big one, we just refuse to process it once we receive it, so it doesn't hurt us much if someone sends us a few GBs.