you didn't need to read to rewrite to C# to do that - python should be able to handle streaming that amount/velocity of data fine, at least through a native extension like msgspec or pydantic. additionally, you made it much harder for other data engineers that need to maintain/extend the project in the future to do so.
The C# is probably far more maintainable and less error prone than Python. At least in my experience that's almost always the case.
The amount of Python jobs I've had which run fine for several hours and then break with runtime errors, whereas with C# you can be reliably sure that if it starts running it will finish running.
Not a language problem, it's a dev culture problem. You can hold your devs accountable to the quality of their code. Strong er typing support via static analysis as well as runtime validation with untrusted input/data has really helped python alot.
I'm not necessarily the biggest fan of python, but writing a data engineering tool in a non-data engineering focused language seems like a bad decision. Now when the OP leaves the organization is in a much tougher position.
> Now when the OP leaves the organization is in a much tougher position.
Are they really, though? You're assuming their org is unfamiliar with C#. Not all data engineers only know Python. The ones I work with mainly use C# because we all do!
I'm a software and data engineer. I work with C# pretty extensively in my software day job. I've never seen a data engineer job listing mention C#.
Additionally, the way the OP's comment reads, I'm ok with the assumption I made. It reads like it was a unilateral decision on their part and not something that got buy in from the team.
Yes, that's part of the problem, deploying nextjs to cloudflare in the first place used to be an absolute nightmare, let alone the dev experience (I think it's better now)
That doesn't sound too preposterous; I wouldn't assume you'd be able to run a React Router project on Turbopack or Webpack either, and Next.js I think has a way more intricate dependence on the bundler to power a significant chunk of its features.
Definitely irrational. There are lots of logical reasons to dislike Next (like the fact that they pile new shiny bit on top of new shiny bit without caring about the regular user experience) ... but being mad that it can't run on Vite is silly.
It's like being mad that Rails can't run on Python, or that React can't run on jQuery. Next already has its own build system, so of course it doesn't work with another build system.
Luckily DX is much better now with Turbopack as a bundler. First they improved the dev server, now with Turbo builds the production builds are faster as well. Still not fully stable in my opinion, but they will get there.
It's also wise to use monorepo orchestration with build caching like Turborepo.
They did well on the turbo stuff, no doubt about it.
The main bottleneck with big projects in my experience is Typescript. Looking forward to the Go rewrite. :)
I feel like with custom vector based styles, you should be able to get pretty dang close to cloning the look of it? Also subjectively, I find the protomaps basemap themes to be much nicer.
I feel bad about this comment as of today's news. I love tailwind and feel like it has supercharged my ability to be productive with CSS, but recognize that it can be overprescribed.
Look into using duckdb with remote http/s3 parquet files. The parquet files are organized as columnar vectors, grouped into chunks of rows. Each row group stores metadata about the set it contains that can be used to prune out data that doesn’t need to be scanned by the query engine. https://duckdb.org/docs/stable/guides/performance/indexing
LanceDB has a similar mechanism for operating on remote vector embeddings/text search.
> Look into using duckdb with remote http/s3 parquet files. The parquet files are organized as columnar vectors, grouped into chunks of rows. Each row group stores metadata about the set it contains that can be used to prune out data that doesn’t need to be scanned by the query engine. https://duckdb.org/docs/stable/guides/performance/indexing
But, when using this on frontend, are portions of files fetched specifically with http range requests? I tried to search for it but couldn't find details
Yes, you should be able to see the byte range requests and 206 responses from an s3 compatible bucket or http server that supports those access patterns.
reply