Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also a cute prompt I saw on the same twitter thread

```

This is a variant of the monty hall problem:

Suppose you're on a game show, and you're given the choice of three doors. IMPORTANT: All the doors are transparent. Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who clearly knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

```

The response from chatGPT (v3, non plus) is:

```

Yes, it is to your advantage to switch your choice.

In this variant of the Monty Hall problem, the fact that the doors are transparent does not change the underlying probability of the problem.

```

I wonder how gpt-4 does. There does seem to be a bit of overfitting, but I consider the prompt fair enough and even called out the trick in capital letters, so that even a tired human should be able to get the answer.

Edit: An even more obvious variant of the prompt is seen in https://news.ycombinator.com/item?id=35192466, which goes further and spells out that the contestant explicitly picks the door with the car. ChatGPT still gets it wrong.



GPT 4 gave me this answer:

In this variant of the Monty Hall problem, since all the doors are transparent, you can clearly see what is behind each door. This changes the nature of the problem entirely. The element of uncertainty, which is present in the original Monty Hall problem, is no longer a factor.

In this scenario, if you pick door No. 1 and see a car behind it, you should stick with your choice, as there is no advantage to switching. If you pick door No. 1 and see a goat behind it, you should switch to door No. 2, as you can clearly see the car behind it.

Since you can see what's behind the doors, the probability of winning the car is no longer based on conditional probability, and the original Monty Hall paradox does not apply. Instead, your decision is simply based on your observation of what's behind each door.


Somehow, this would be one of the most impressive things I've read about GPT-4. It's really difficult to argue that it has well-founded understanding of the question, assuming, of course, that this wasn't actually in its training set.

And I see someone DID ask GPT-3.5-based ChatGPT the same question at least a month ago [1], so OpenAI certainly has it on record. That's long enough ago that it could well have been used to fine-tune GPT-4.

[1] https://www.reddit.com/r/bing/comments/117hj14/bing_chat_can...


GPT-4 responses.

-- First variant --

Since all the doors are transparent in this variant of the Monty Hall problem, you can already see what is behind each door when making your initial choice. Therefore, there is no need to rely on probability or switching strategies to maximize your chances of winning the car.

In this case, you simply choose the door with the car visible behind it. The host's action of opening another door and offering you the chance to switch doesn't provide any new information or change the odds, as you can already see the contents of each door.

-- Explicit variant --

In this particular case, since the doors are transparent and you can see that the car is behind door number one, you should stick with your original choice. There is no need to switch doors as the probability of winning the car by staying with door number one is 100%. The Monty Hall problem typically assumes that the doors are not transparent and that the contents behind them are unknown to the contestant.


> I wonder how gpt-4 does.

The problem is that as soon as people started tricking ChatGPT 3 into problems like that, the correct answers are now being used to train the next versions and are going to be part of the dataset.

So GPT-4 or GPT-5 may get the answer right, but that still wouldn't mean anything.


Not the case for GPT-4 though, as it’s knowledge cutoff is the same as GPT-3s, that’s why it’s easy to compare the two on the same problems and see what’s the difference.


Not so fast!

Yes, the GPT-4 paper says

> GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its pre-training data cuts off in September 2021 [footnote: The pre-training and post-training data contain a small amount of more recent data], and does not learn from its experience.

But note the more recent data. We know that InstructGPT (GPT-3.5) was RL trained on examples of previous queries to GPT3, such as those trick questions. We could assume everything (after filtering e.g. for benchmark contamination) ever sent to OpenAI is in that post-training set. This is indeed a very small amount of data compared to the trillion-plus tokens of older data it was surely trained on. We also know that when ARC did their evaluations of GPT-4, OpenAI hadn't finished fine-tuning yet, so they've certainly been continuing to do so recently.

See also my other comment https://news.ycombinator.com/item?id=35300668


I suspect you're right. Part of the supervised "learning" is hard-coding answers to gotchas posted on Twitter.


Fairly certain that is not what is going on here. GPT-4 seems genuinely better at reasoning and harder to trick from my testing.


I've read your prompt several times now and still don't understand. It seems intentionally crafted to be confusing with messy punctuation. I get lost before finishing the paragraph every time.

Just a couple of years ago anything else than it responding "I don't understand" would be science-fiction and now we are surprised it's answering incorrectly on something even humans have a hard time to parse.


The point of the prompt is that in the classic game you don’t know what’s behind the doors.

But in the variant of the game used in the prompt, the doors are transparent.

So in this variant of the game you already know what is behind all of the doors, meaning that you will already have been able to choose the right door, and also meaning that “revealing” what is behind one of the doors does not change the probability of what is behind the other doors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: