Hacker Newsnew | past | comments | ask | show | jobs | submit | liquidki's commentslogin

I think this is the achilles heel of LLM-based AI: the attention mechanisms are far, far, inferior to a human, and I haven't seen much progress here. I regularly test models by feeding in a 20-30 minute transcript of a podcast and ask them to state the key points.

This is not a lot of text, maybe 5 pages. I then skim it myself in about 2-3 minutes and I write down what I would consider the key points. I compare the results and I find the AI usually (over 50% of the time) misses 1 or more points that I would consider key.

I encourage everyone to reproduce this test just to see how well current AI works for this use case.

For me, AI can't adequately do one of the first things that people claim it does really well (summarization). I'll keep testing, maybe someday it will be satisfactory in this, but I think this is a basic flaw in the attention mechanism that will not be solved by throwing more data and more GPUs at the problem.


> I encourage everyone to reproduce this test just to see how well current AI works for this use case.

I do this regularly and find it very enlightening. After I’ve read a news article or done my own research on a topic I’ll ask ChatGPT to do the same.

You have to be careful when reading its response to not grade on a curve, read it as if you didn’t do the research and you don’t know the background. I find myself saying “I can see why it might be confused into thinking X but it doesn’t change the fact that it was wrong/misleading”.

I do like when LLM‘s cite their sources, mostly because I find out they’re wrong. Many times I’ve read a summary, then followed it to the source, read the entire source, and realized it says nothing of the sort. But almost always, I can see where it glued together pieces of the source, incorrectly.

A great micro example of this are the Apple Siri summaries for notifications. Every time they mess up hilariously I can see exactly how they got there. But it’s also a mistake that no human would ever make.


To be sure, this is also taught in writing workshops, speaking workshops, comedy and improv as well, the rule of three:

https://en.wikipedia.org/wiki/Rule_of_three_(writing)


Yes and chatgpt is incredibly heavy handed with it. This is not how most NYT articles are written.


> One pernicious stat is that we are judged by graduation rate.

What were you judged on before this? How do you evaluate the quality of a teacher? My mom is an elementary school teacher and she talks about similar things that the schools are pushing hard to have the highest graduation rates.

That sounds like a dangerous stat to judge by, since as you and she are noting the teacher's jobs are now tied to how many kids pass (and that the number keeps going up), not how well they teach.


Kinda like giving home loans to people who have no ability to pay them back?


That's exactly why it seems obvious!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: