You're probably better off analyzing how different models work with different RAG subjects/content.
It will proxy the work you're trying to do of analyzing models trained on different datasets. Find an HF model trained on natural lang/web, one on code etc.