This dataset is is part of the AWS Open Data program and so it is freely available from S3. By running the API within AWS then you get a massive latency and bandwidth advantage.
So with GribStream you are pushing the computation closer to where the big data is and only downloading the small data. And GribStream uses a custom grib2 parser that allows it to extract the data in a streaming fashion, using very little memory.
It makes a huge difference if you need to extract timeseries of a handful of coordinates for months at a time.
Grib2 files are much more compact than netCDF, just less convenient to use. But GribStream takes care of that and just returns you the timeseries for the coordinates you need.
Besides using the usual index files to only do http range requests for weather parameters of interest, GribStream also avoids creating big memory buffers to decode/decompress the whole grid. It does the decoding in a streaming fashion and only accumulates the values that are being looked for so it can do so very efficiently. It doesn't even finish downloading the partial grib file, it early aborts. And it also skips ahead many headers and parts of the grib2 format that are not really required or that can be assumed for being constant in the whole dataset. In other words, it cuts all possible corners and the parse is (currently) specifically optimized for the NBM and GFS datasets.
Although I intend to support several others, like the Rapid Refresh (RAP) model.
And the fact that this process runs close to the data (AWS), it can do so way faster than you can run it anywhere else.
This is a little different though.
This dataset is is part of the AWS Open Data program and so it is freely available from S3. By running the API within AWS then you get a massive latency and bandwidth advantage.
So with GribStream you are pushing the computation closer to where the big data is and only downloading the small data. And GribStream uses a custom grib2 parser that allows it to extract the data in a streaming fashion, using very little memory.
It makes a huge difference if you need to extract timeseries of a handful of coordinates for months at a time.
Cheers!