Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Curious about the workload, but as Im trying to make a tool about json, what are those files compressed with? What is the size of the average file ? What is their structure (ndjson ? Dict with some huge data structure a few level deep?)


In S3 the JSON is stored in plain-old .zip files. While downloading to local the files are unzipped to plain old JSON. It's basically an object containing tons of data about each website I manage including all fragments of HTML and metadata used on the sites. It can get quite large, some sites have thousands of pages. We often need to find things stored many levels deep in the JSON that may be tricky to find, it isn't usually a specific path, and lots of iterable arrays and objects are involved. The files range from ~20MB to ~400MB, depending on how much content each site has. And we have ~9000 total sites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: