Maybe it comes down to semantics but when I read things like [1] I come away wit...

whiteandnerdy · on April 11, 2023

I think it does come down to semantics. When you say "weights", people will take you to mean the pre-trained parameters of the network.

I agree that in some sense the attention weights are more like meta-weights that are applied to the context of the conversation to decide how to actually weight the various words. So it's totally correct to say that previous words in the conversation affect how future words will be weighted, and I think it's reasonable to call that 'learning': for example, you can tell ChatGPT new words and it will be able to use them in context. Again though, people usually take 'learning' to mean making updates to the trained parameters of the model itself, which obviously isn't happening here.

fenomas · on April 11, 2023

An attention mechanism is one or more layers in the neural network. When someone talks about attention altering the input vectors, they're referring to what those layers are doing and how data is transformed as it passes through them. But zooming out to the big picture, a neural network is a bunch of layers full of weights, and none of the weights changes except during training (including the weights in the attention layers).