zurawiki's comments

zurawiki · on April 11, 2024

Bun.sh advertised JSC perf improvements, some listed under the “Various Miscellaneous Optimizations for Real World Use Cases“ section.

How much of an effect those optimizations had the Speedometer score (if any)?

The ~6% + the few ~1% changes don’t add up to the 60% score increase mentioned.

zurawiki · on March 8, 2024

I'm really happy to see we're exploring ways to make SQL more maintainable. TypeScript did a lot for the JS ecosystem, giving us nice autocomplete and making many kinds of bugs impossible to type-check, adding this kind of type-system to SQL makes a bunch of sense.

I selfishly hope this can smooth over all the missing SQL functionality in redshift

Very cool work team!

cotera-grant · on March 8, 2024

> I selfishly hope this can smooth over all the missing SQL functionality in redshift

Hi! I'm Grant, I work at Cotera and wrote most of the warehouse compatibility stuff for NASTY

Redshift is the bane of my existence. It was definitely the hardest warehouse to write a NASTY compatible SQL gen for.

A couple of annoyances that immediately come to mind.

1. Redshift Query Planner does wild stuff

At Cotera we'll typically develop analytics libraries on one warehouse working closely with a customer and use the same library for other customers afterwards. A library will go to prod on one warehouse and then get start running on others as new customers with different warehouses want the functionality.

Moving a library between Snowflake, BigQuery an Postgres is almost never a problem performance wise. In, Redshift the semantics will be correct but performance can unexpectedly fall off a cliff for innocuous stuff. We typically write a bunch of unit tests so it's pretty easy to refactor, but I've been shocked at the things that Redshift can't optimize that everyone else had no problem with

2. Redshift does silly stuff with types of literals.

  with cte as (select *, 'foo' as "bar" from "cotera_data".foo) select coalesce("bar", 'baz') from cte;

Fails with the error `[XX000] ERROR: failed to find conversion function from "unknown" to text` Because 'foo' is passed as `any` type...

this fixes is but the error is bizarre and shows up way far away from the problem

  with cte as (select *, 'foo'::text as "bar" from "cotera_data".foo) select coalesce("bar", 'baz') from cte;

(NASTY now fixes this for you when it detects it will happen)

3. The `super` type breaks referential transparency

Here's just one head scratching example, but there are many super type limitations

  -- Allowed
  with bar as (select (json_parse('{"a": 1}')) as foo) select foo.a from bar;

  -- Not allowed
  select (json_parse('{"a": 1}')).a as foo

  -- [0A000] ERROR: applying array subscript on complex expression of SUPER type is currently not supported

4. Leader Only vs Compute Node Functions. Basic things like `generate_series` blow up in surprising ways

From the NASTY source code for Redshift

  // Valid redshift
  // ```
  // select generate_series(0, 10);
  // ```
  //
  // Not valid redshift
  // ```
  // -- Inserts run on compute nodes
  // insert into foo (a) (
  //     -- Leader only function
  //     select generate_series(0, 10) as a
  // )
  // ```
  //
  // This is because `generate_series` is a leader only function, so it can’t be run on worker nodes
  // https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_functions_leader_node_only.html
  // https://docs.aws.amazon.com/redshift/latest/dg/c_sql-functions-leader-node.html
  // https://stackoverflow.com/questions/62716606/redshift-loading-data-issue-specified-types-or-functions-one-per-info-message
  // https://stackoverflow.com/questions/17282276/using-sql-function-generate-series-in-redshift#comment96402527_22782384
  //
  // Recurive CTEs are NOT supported in subqueries
  // ```
  // -- Not valid
  // select \* from (
  //    with recursive t(n) as (
  //        select 1::integer union all select n + 1 from t where n < 100
  //    ) select n from t
  // );
  // ```
  // To get around this, we can use the approach outlined by how dbt does ansi sql generate_series

  // https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/generate_series.sql
  const numbers = (upperBound: number) => {
    if (upperBound > 2 ** 11) {
      throw new Error(
        `We only support generating series in Reshift where the upperBound is less than ${
          2 ** 11
        }`
      );
    }

    return `
  (
    with p as (
      select 0::integer as generated_number union all select 1::integer
    ),
      unioned as (
      select
        (   p0.generated_number * power(2, 0) 
  // ... Omitted for brevity
         +  p11.generated_number * power(2, 11) 
        ) as generated_number
      from
        p as p0
        cross join p as p1
  // ... Omitted for brevity

        cross join p as p11
    )
    select generated_number::integer from unioned where generated_number <= ${upperBound} order by generated_number
  )
  `;
  };

zurawiki · on Jan 11, 2024

What about just using Linear or another off-the-shelf task management software?

Seems like most of the features are covered there

jpochtar · on Jan 11, 2024

Totally— OpsQueue is for custom internal React apps to have integrated task management. If workers can switch between Linear and your internal apps, or you don’t have a custom internal React app, you should use Linear.

In some ways OpsQueue’s goal is to be a headless, API-first Linear.

zurawiki · on Sept 19, 2023

Excited to see the progress here on the public launch!

Stacking was a meaningful time-saving as a FAANG sw eng. Excited to see stacking grow within the great tech community

tragic95 · on Sept 19, 2023

Agree. After leaving FB, I was surprised to see no open source tool recreate its stacking workflow.

Graphite is a god send in my post FB career!

Xiulung · on Sept 19, 2023

Thanks for the support @zurawiki

Agreed! We think this is a huge performance boost and why we think any engineer should benefit from stacking!

zurawiki · on May 22, 2023

Happy to see the tide turning on microservices. Monoliths make sense for small teams in terms of development and operational complexity.

I was pleasantly surprised to see that AWS offers EC2 pretty beefy instances these days. 64 CPU cores, 128+ GB RAM should scale to most of what a small team would ever need...

rbanffy · on May 22, 2023

The goal of microservices isn't to have many (no pun intended) "micro" "services" doing each a little bit of work, but to allow an organization to deploy parts of an application separately.

Even if all the two 64-core 128GB nodes on the cluster run copies of all the two services.

zurawiki · on May 22, 2023

When an organization is large and siloed, there is value in decoupling different parts of the application, assuming the ownership and expertise are properly distributed.

As the comment below mentions, microservices is a fashion–one that took off among overly-staffed eng teams. Having separate parts of the app to deploy can be a liability for a small team frequently deploying breaking changes.

rbanffy · on May 23, 2023

> When an organization is large and siloed

I'd argue that lose coupling is a core principle of good software architecture. Bing able to deploy separately only makes it more convenient to manage.

> frequently deploying breaking changes

That's never a good sign. Frequently deploying, yes. Frequently breaking, not so much.

zurawiki · on March 18, 2023

In case others find this useful. Just updated for GPT-4, tiktoken-rs is a Rust crate you can use to tokenize text sent to and from OpenAI's LLM models

zurawiki · on March 3, 2023

The idea behind gptee is to bring composability to language models at the shell script level. In line with the unix philosophy, we simplify the LLM interface to just stdin and stdout

https://en.wikipedia.org/wiki/Unix_philosophy

zurawiki · on Nov 17, 2021

> This idea isn't new, and if you've worked at a company like Facebook, Google, Uber, Dropbox, or Lyft (among others), you've either had first-class support for stacked changes built into your code review platform (i.e. Phabricator at Facebook...

Phabricator got a lot of stuff about code review right (I may be biased from working at FAANG). I'm happy to see someone trying to improve the code review situation on GitHub. When I did use GitHub at the startups I worked at, we were never fully satisfied with the code review experience.

I hope they surface "Phabricator versions" as first-class objects. I never liked it how GitHub just clobbers the history left behind from force-pushes from to branch.

Denvercoder9 · on Nov 17, 2021

Gitlab got this right, Merge Requests are versioned and you can look at the differences between versions of a MR.

It's somewhat amazing and shocking how little the review experience on GitHub has improved over the last 10 years.

masklinn · on Nov 18, 2021

Honestly what i want is for github to purchase https://reviewable.io/ and make it native.

Maybe gate sone of the more advanced features behind an opt-in so beginners are not too overwhelmed (and because you probably don’t need them on smaller projects), but the current state of things is just appalling, and more importantly has not at all kept up with workflows on large projects or long / complicated PRs.

The complete lack of support for reviewing over force pushes is insane.

gibolt · on Nov 17, 2021

The ability to quickly manage branches and reorder commits within them in Mercurial is super powerful, something that Git and `rebase -i` could only dream of.

Doing the same in Phabricator is awesome. One blocking commit PR on the bottom of a stack can easily be moved to the top or a new stack, unblocking the rest of your changes. Re-writing and entire stack due to one small change is mostly eliminated.

(Good) Tooling is a major gain in efficiency.

tadfisher · on Nov 18, 2021

> The ability to quickly manage branches and reorder commits within them in Mercurial is super powerful, something that Git and 'rebase could only dream of.

This is a git CLI wart more than anything. Magit (git interface in Emacs) makes juggling branches/worktrees and rewriting history intuitive to humans. I feel super-powerful in Magit and absolutely feeble using the CLI.

surajrmal · on Nov 18, 2021

I completely agree. When Google decided to invest in a mercurial front end to their version control system a few years ago, I was confused, but after using it for a while, I can honestly say it has much better ergonomics for common workflows. Creating a tree of dependent commits is simple and trivial. I sometimes wish I could create a dag instead, but that's probably more trouble than it's worth.

xapata · on Nov 17, 2021

One thing Phabricator did/does worse than GitHub is the interface for browsing the individual commits that make up a PR. I like being able to tell a story with the sequence of commits and their messages.

baby · on Nov 17, 2021

I honestly don't feel like Github did a good job there either, I mean at least on the web UI because the VSCode Github extension helps a lot (when it's not buggy).

Lhiw · on Nov 18, 2021

Gerrit is the only thing I've seen do change exploration well.

Github is annoying as crap. Can't even comment on a line in the file that needs changing but was missed in the PR, can only comment on a tual changes.

Phab is god awful, it's like someone built that whole system with a purist mindset, everything is perfectly designed with perfect referential IDs. But every screen is a view into the db table, no screens join information and make it usable. This is worse when you have n:m relationships because they expose those tables via screens too, it's just plain awful.

disgruntledphd2 · on Nov 18, 2021

> Phab is god awful, it's like someone built that whole system with a purist mindset, everything is perfectly designed with perfect referential IDs. But every screen is a view into the db table, no screens join information and make it usable. This is worse when you have n:m relationships because they expose those tables via screens too, it's just plain awful.

I actually like Phabricator, but the DB schema is insane. Like, literally insane. I spent hours trying to figure it out once before giving up in disgust.

zurawiki · on Feb 22, 2016

Facebook still delivers ads on their platform sites, and links your browser (not just cookies) to your identity. FWIW, Google isn't that much different