Curious about the interaction between this memory behavior and fine-tuning. If the base model has these emergent memory patterns, how do they transfer or adapt when we fine-tune for specific domains?
Has anyone experimented with deliberately structuring prompts to take advantage of these memory patterns?
Has anyone experimented with deliberately structuring prompts to take advantage of these memory patterns?