Hacker Newsnew | past | comments | ask | show | jobs | submit | hellodanylo's commentslogin

If need anything more complicated than simple $var substitution, it's time to use a general purpose scripting language with appropriate libraries to generate your data structure. A half-baked template DSL will never work.


Why not buy a giant display panel and connect it to a computer that you fully control?


This. I do something similar, but instead got a laser projector. Similar price to a TV. I can either plug in my laptop or otherwise remotely cast to a plugged in stick. It just works. I get the added benefit from the projector where the screen needs to be pulled down - it's not in the view and I have to actively decide to start watching something.


I didn't think of projectors because I don't have a good set-up for it, but that's another great option along side dumb TVs, TVs that work without internet, and monitors/digital signage. Starting with a mini PC or something similar unlocks plenty of good options for avoiding all of the smart TV issues.


Also you don't even need an extremely dedicated space these days. Laser projectors like https://www.epson.com.au/LS12000B/ give better image than some TVs... without closing the curtains during the day. (Based on my older model, I assume they only got better)


I do a version of this approach. I have a 2018 model TCL TV that I do not give access to the internet (this one works without any issue, but I don't know if that is true of newer models). My TV is right next to my desktop computer, but in order to reduce noise and power, I bought a cheap Beelink mini PC and set it up with Fedora-GNOME and Kodi. Works very well for simple purposes and can easily handle light games. There are plenty of other (possibly better) solutions to this effect such as SBCs, spare laptops, or more capable mini PCs to handle some newer games. I would prefer using a computer instead of the Smart TV OS even without all of the concerns listed in the original post.


What's an example of a "giant display panel" that you can buy?


I don't seem to understand where OpenAI's market segment ends and Azure's begins.


It's helpful to think of OpenAI as Microsoft's R&D lab for AI without the political and regulatory burdens that MSR has to abide by. Through that lens, it's really all just the same thing. There is no endgame for OpenAI that doesn't involve being a part of Microsoft.


IIRC it is impossible for OpenAI to become part of Microsoft since the incorporation documents of the for-profit bit of OpenAI prevent anyone from having a majority of the shares (except the non-profit foundation, of course).


Yes, their corporate structure is unprecedented. Very weird and unintuitive.


I'm quite positive that can be addressed when the time comes.


There will probably be overlap. If you are an Azure customer you use Azure, if not you use OpenAI.


It's Azure all the way down. The OpenAI stuff is certainly hosted on Azure.


This should give a second life to Test-Driven Development.

One of the under-appreciated wisdoms of TDD is that there is a complexity asymmetry in many problems between finding a solution and (fully or partially) verifying it. Examples of asymmetric problems: inverting matrices, sorting an array, computing a function's gradient, compressing a byte stream, etc.

Human writes the easier part -- the test suite, the language model writes the harder part -- the solution. This can be a net gain in productivity.


The problem is without a formal definition of the program semantics, you run the risk of overfitting or uncovered behaviors that, for a human developer who understands the intent of the program, would be implicit.

And given how hard formal verification is, I don't know that you'll ever get away with not having to manually check these programs, at which point I question just how much productivity you've gained.

It's kinda like self-driving cars: when they work, they work great. But when they fail, they fail in ways a human never would, and therefore a human struggles to anticipate or trust their behaviour.

That said, I'm waiting to see the rise of programming languages designed with LLMs in mind, where a human could use contract oriented programming or similar (think: Ada) combined with TDD methods to more formally specify the problem that an LLM is being asked to solve.


> I question just how much productivity you've gained.

Me too. It's an empirical question to be answered by those who will dare to try.

> It's kinda like self-driving cars

Strong disagree. Yes, neural nets are blackboxes, but the generated code can be idiomatic, modular, easy to inspect with a debugger, etc.

> more formally specify the problem that an LLM is being asked to solve.

That would be a great direction to explore.


> Strong disagree. Yes, neural nets are blackboxes, but the generated code can be idiomatic, modular, easy to inspect with a debugger, etc.

I think you missed my point.

If I'm inspecting code from another human, I'm going to make assumptions about the kinds of errors they're gonna make. There's probably obvious dumb stuff I won't look for because a human would never typically make certain classes of mistake. They're the self-driving car equivalent of driving into the back of a stopped semi truck because it was mistaken for a billboard, an error no human of sound mind and body would make.

So if I'm inspecting code written by a computer, I'll either 1) make those same assumptions and then run the risk of missing unexpected problems in the code, or 2) I'm gonna be overly cautious (because I don't trust the machine) and will examine the code with a fine tooth comb, which will take a great deal more time.


Based on my experience with Autopilot and Copilot, I think this is way less of a problem in code.

You can put code mistakes on a gradient, from subtle to obvious. Obvious bugs are like when the LLM finds a pattern and repeats it for 100 lines. Subtle mistakes are like misnaming a variable so you use one left over from earlier, not the correct one.

Obvious mistakes are easy to catch because they’re obvious. The LLM makes more of those. I think because of the way LLMs work, I have never seen Copilot make a subtle mistake that I wouldn’t expect of a person. People are so good at making surprising bugs, it’s really hard for Copilot to beat it.


Finally something will kill the last somewhat satisfying part of the job finally, for the greater profit of the corporate overlords!


>Human writes the easier part -- the test suite, the language model writes the harder part -- the solution. This can be a net gain in productivity.

I've been doing just this with Llama 2 local lately, and I'm convinced that anyone who isn't will be a dinosaur in a couple years.

Fact is, these things are damn good (at JS at least) when dealing with pure unit-testable functions. To the point that I can accept practically any solution it comes up with will be better than mine. Everything is still verified of course, but the days of writing algorithms from scratch are basically over for me.


How do you avoid overfitting and how do you test performance?


It would be quite hard to overfit with property based tests.


What plugins or other setup is needed to get Llama 2 integrated with your IDE?


There's a good one called "Continue" for VS Code. There are also a million copilot chat clones if you search vs code extensions. Probably some of those do it. The vs code extension "Cody" uses anthropic.


Any tips on how to use it locally this way?


Yeah I'd be interested in watching a demo video, or something like that.


How readable is the code? Does it follow SOLID design and the likes?


>How readable is the code? Does it follow SOLID design and the likes?

More readable and better commented than my own. The thing is, this isn't generating any kind of "compiler output" type code that you're used to. It writes exactly as a person would. Here's an example I just ran on llama2-7b-chat on my Macbook:

##PROMPT

Llama: My name is Llama. I will write a Javascript function that implements the "emailTest" function, given the following test case that it is expected to pass. I am sure to always fully explain all of my code, and include comments to provide context.

  describe('it takes a string as input, validates that it is an email address, and returns the host name of the email address.', () => {
    const testVal = emailTest('[email protected]');
    expect(testVal).toBe('google.com')
  });
##END PROMPT

##RESPONSE

Llama: Here are the test cases:

  test('it takes a string as input, validates that it is an email address, and returns the host name of the email address.', () => {
      const testVal = emailTest('[email protected]');
      expect(testVal).toBe('google.com')
  })

  test('it should validate email addresses with different domains', () => {
      const testVal1 = emailTest('[email protected]');
      const testVal2 = emailTest('[email protected]');
      expect(testVal1).toBe('example.com')
      expect(testVal2).toBe('example.net')
  })

  test('it should validate email addresses with special characters', () => {
      const testVal3 = emailTest('hello@$%^&*()_!');
      expect(testVal3).toBe('example.com')
  })

  test('it should validate email addresses with typos', () => {
      const testVal4 = emailTest('[email protected]');
      expect(testVal4).toBe('example.com')
  });
Here is the function that I have written to implement the emailTest function:

  function emailTest(input) {
    const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  
    if (emailRegex.test(input)) {
      // If the input is a valid email address, return the host name of the email address
      const hostName = input.split('@')[1];
      return hostName;
    } else {
      // Otherwise, return an error message indicating that the input is not a valid email address
      return 'Invalid email address';
    }
  }
Can you please provide feedback on my implementation? Specifically, I would like to know if there are any improvements I could make to the function's performance or accuracy.

##END RESPONSE

And from there you can have a conversation with it, and ask it to fix anything that's mistaken. You can also have it extend your test suite with fuzzing and negative cases. This is obviously a trivial example, but you get the idea. I haven't been able to stump it yet.


    const testVal3 = emailTest('hello@$%^&*()_!');
    expect(testVal3).toBe('example.com')
At the risk of terribly embarrassing myself, shouldn't the expected result be "invalid email address" in this case?


> At the risk of terribly embarrassing myself, shouldn't the expected result be "invalid email address" in this case?

Indeed. This particular test case got jumbled up, but you should always be verifying the output anyways. You can ask it to correct itself when this happens, i.e.:

  User: That's wrong. The test would return 'Invalid email address' here, not 'example.com'

  Llama: Ah, I see! Thank you for correcting me! In that case, here's the modified test case:

  test('it should validate email addresses with special characters', () => {
    const testVal3 = emailTest('hello@$%^&*()_!');
    expect(testVal3).toBe('Invalid email address');
  })

  Llama: Now, when the `emailTest` function is called with the input 'hello@$%^&\*()_!', it will return 'Invalid email address' instead of 'example.com', because the
  input does not match the regular expression pattern defined in the function.


There are other problems in there in addition to that one, but it isn’t clear to me that the GP is intended to be an example of completed good output.


Is writing System prompts in first person better than writing them in third person?


> Is writing System prompts in first person better than writing them in third person?

That's what I've found. Llama doesn't really seem to have a concept of "you" or "they". But if you make it assume that it is something specific, and that it is doing something, it will continue doing that thing.


What is the most difficult thing you’ve gotten it to do?


>What is the most difficult thing you’ve gotten it to do?

It's particularly good at recursive tasks I don't want to expend the mental effort on. I.e:

  Llama: I am Llama, a programming assistant. Please provide me with a task.

  User: Thanks. Here is a data structure. Write a Javascript algorithm which will recursively parse the structure to collect all 'id' properties into a single array.
  Be sure to check the input and not cause any errors which could result in an infinite loop.

  ```
  const data = {
      'id': 'test1',
      'nodes': [{
          'data': {},
          'list': [{
              'id': 'test2',
              'values': [{
                  'id': 'test3',
                  'name': 'val'
              }]
          }]
      }]
  }
  ```

  Llama: Sure thing! Here is the algorithm in Javascript:

  ```
  function parseId(data) {
    const idArray = [];

    // Recursively traverse the data structure
    function traverse(data) {
      Object.keys(data).forEach((key) => {
        if (key === 'id') {
          idArray.push(data[key]);
        } else if (typeof data[key] === 'object') {
          traverse(data[key]);
        }
      });
    }

    // Start the recursion
    traverse(data);

    return idArray;
  }

  parseId(data) // Output: ["test1","test2","test3"]
  ```


I think people aren't getting yet is that this generalizable beyond software. We can do this with LLM and anything that can be designed.

https://michaelfeathers.silvrback.com/prompt-hoisting-for-gp...


And with code already that exists we can have the AI help us with writing the tests for TDD. Many large companies have huge amounts of code that are essential to their operation but changing it is difficult and maintaining it to replace it with a new system is difficult or even impossible since it’s very difficult to even know what it does.


Using AI to automate TDD is a great idea but unfortunately AIs can get stuck and we are far from solving this problem. Many pieces of information to exactly manipulate while LLMs are all soft and fuzzy.


the LLM should write the test suite too, from the description of the app


There is a major flaw in placement of the EC2 instance in this diagram.

Each EC2 instance has 1 or more Network Interfaces, where each Network Interface resides in 1 subnet and can have multiple public and private IP addresses. The diagram currently suggests that an EC2 instance is located in a single subnet — it’s not. Each Network Interface connected to an EC2 instance is only required to be in the same Availability Zone where the instance was launched.


I would rephrase this concern as "dconf is not a stable API" rather than "you are not supposed to do this". It's my computer, I am supposed to do whatever I want, but using unstable API comes with costs that should be understood.


> It's my computer, I am supposed to do whatever I want

I don't think the Gnome devs share that view.


I suppose Gnome is aimed at the Red Hat enterprise customer base. Those computers are no private systems, they are company hardware. What matters for company hardware is standardization: easy to roll out, easy to train users, which means a standardized, simple GUI. Building @hellodanylo 's dream workflow isn't the mission and doesn't scale anyway.

If you want a custom desktop workflow exactly to your specification build it yourself or pay someone to build it. Expecting people with different opinions to do it for you for free will certainly be a disappointment.


>> Gnome is aimed at the Red Hat enterprise customer base. Those computers are no private systems, they are company hardware. What matters for company hardware is standardization: easy to roll out, easy to train users, which means a standardized, simple GUI.

This is largely the case:

https://www.redhat.com/en/about/press-releases/gnome-desktop...

As stated in https://lwn.net/Articles/600506/: "When a project is controlled by a single company, that company's needs will almost certainly win out over anything that the wider community may want to do."


On the other hand, when a project tries to please everybody it will undoubtedly regress into a mess of options, toggles, extra buttons, have an atrocious UX and only be usable for the 'in crowd'.

The few OSS tools I know to not have a terrible UX are tools built by a single author or a small team with a coherent vision. It's definitely a place where the bazaar model of software development doesn't seem to work as wonderful as with OS kernels or development tools.


I agree that there is a balance between customization and "cleanness" in design and implementation.

However, I think the GNOME 3 and 4 designers went too far and alienated many users:

https://www.zdnet.com/article/linus-torvalds-finds-gnome-3-4...

https://medium.com/@fulalas/gnome-42-the-nonsense-continues-...

https://www.reddit.com/r/linux/comments/wte7tr/gnomes_design...

https://linuxreviews.org/GNOME_Developers_have_Made_Their_Mo...

https://www.osnews.com/story/133955/gnome-to-prevent-theming...

When a designer's "coherent vision" eclipses the needs of the software's users then users get frustrated and either fork the project or go to another project. MATE (https://mate-desktop.org/), Cinnamon (https://github.com/linuxmint/Cinnamon), and Unity (https://unityd.org/) exist largely because of how far the GNOME 3 designers went and how they were not willing to compromise their "coherent vision":

https://bbs.archlinux.org/viewtopic.php?id=121162

https://blog.linuxmint.com/?p=1910

https://web.archive.org/web/20101129161856/http://www.pcworl...


The Gnome 3 adversity is ancient history at this point and in my opinion very much a fabricated feud to drive up news article engagement. Publishing an article with a few open bugs or a disagreeing opinion doesn't say much about the quality of the software in general. I find the latest versions of Fedora with Gnome very usable, also in comparison with Windows/macOS. I don't have a need for very specific customisations and I think the Gnome people know it's but a tiny percentage of their users that use the very exotic features.


> If you want a custom desktop workflow exactly to your specification build it yourself or pay someone to build it.

Or use a DE that allows sufficient customization.


They do, as long as you do it without expecting them to bend over backwards to accommodate your specific snowflake setup.


Ah yes... my snowflake setup where I expect to be able to paste file paths into Gtk file->open dialogs without it erroring out.


Bugs will always happen, of course, but let's not pretend that those are conscious decisions.


Generally true. But this one 100% is, and it's been a concious decision since it was introduced by the removal of the gsetting org.gtk.Settings.FileChooser location-mode. This was done intentionally by Gtk dev mclassen and he has been downright hostile about not fixing it since then. Gtk/GNOME devs do not think being able to type/paste file paths in File->Open dialogs (gtkfilechooserwidget.c) should be allowed. https://gitlab.gnome.org/GNOME/gtk/-/issues/938

They didn't even acknowledge or leave the bug reports it caused until this month: https://gitlab.gnome.org/GNOME/gtk/-/issues/5872 despite it being introduced in 2014 and me reporting it 2 times every year since then. And there's about 0% chance of this bug being fixed despite them finally leaving the bug report there instead of closing it.


Hmm, maybe I'm misunderstanding, but AFAICS I am able to paste file paths into those dialogs? Steps to reproduce: copy a file path, Ctrl+O, Ctrl+L, Ctrl+V. So I guess I'm misunderstanding what behaviour you're referring to?


Yeah, that attitude is one of the things that leads me to avoid Gnome.


Does it matter though?


In the case of GNOME, absolutely.

GNOME devs are infamous for actively fighting users' desire for customization. They have a long history; here's[¹] an example from a couple of years ago (not sure how the situation described evolved); this [²] is an article that describes the wider point of view.

[¹] https://www.osnews.com/story/133955/gnome-to-prevent-theming...

[²] https://igurublog.wordpress.com/2012/11/05/gnome-et-al-rotti...


Unless you're able to maintain a fork of the whole system by yourself, yes it does.


There is also ECR Public Gallery, which mirrors many public images from DockerHub. https://gallery.ecr.aws


or more specifically, https://gallery.ecr.aws/docker

for any "official" image you're pulling from docker hub, just prefix it with "public.ecr.aws/docker/library/" to pull from ecr instead


I find SQL very hard to use when the data schema and/or transformation graph needs to be dynamic (e.g. depends on the data itself). It's hard to make SQL dynamic even at build time -- Jinja+SQL is one of the worst development experiences I have ever had.


> I find SQL very hard to use when the data schema and/or transformation graph needs to be dynamic

If the schema is "dynamic" then I'd accuse the business of being poorly-defined and not worthy of any development time.


That's great except for when you're interacting with a decrepit data system from 10 years ago with a variable record format.

Some things can't be locked in stone, and SQL will leave you out to dry when that's the case.


I use graph database, and resources of the graph are typed with types/supertypes. Relationships also are typed with types/supertypes. And my queries are heavily dependant on that typing structure.

Honestly, I cannot live without that feature. [sorry, that is my OOP minute. Continue without me...]


Yeah, I am also struggling to interpret the metrics in this post positively.

The 50% success rate is also best out of 3200 completions. For best out of 1 completion, the success rate is in low single digits.

I think the lesson here is that these models bring a lot more value when: 1. you have unit tests, 2. can afford compute/time to let the model try many solutions, 3. have enough isolation to run unverified code.


> I'm sure the recovery process is not as easy.

A lot of services offer one-time backup codes and connecting multiple 2FA devices. Making Yubikey a single point of failure is certainly a bad idea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: