That’s a great idea. I hope it can be done for other languages, too.
I used to help prepare study materials for Japanese learners of English. The other editors and I would try to adjust the vocabulary to keep it at an appropriate level for the target learners. Word-frequency lists provided some guidance, but they showed only how often words appeared in the surveyed texts, not the meanings in which they were used. The word “medium,” for example, might have a fairly high frequency, but could we expect the learners to know the meanings “a substance through which a force travels” or “someone who claims to have the power to receive messages from dead people”?
A similar problem was with multiword idioms. The verb “make” is one of the most common words in English, but how common are “make it,” “make do,” “make up,” “make away with,” or “make out”? Ten years ago, I was unable to find any reliable answers. We had to rely on our gut feelings.
Good luck with your project. LLMs should be a big help.
Thanks you! Yep, multi-word idioms are tough. How do you quantify whether a phrase is just a "sum" of it's words, or is there some additional meaning, "idiomness" to it. I haven't thought a lot about that yet, but it's a problem that I need to solve for this.
If you’d like to discuss these issues, feel free to get in touch. My website URL is on my profile page. I’m not a programmer or expert on natural language processing, but I have worked on over a dozen Japanese-English and English-Japanese dictionaries and enjoy thinking about such problems.
I used to help prepare study materials for Japanese learners of English. The other editors and I would try to adjust the vocabulary to keep it at an appropriate level for the target learners. Word-frequency lists provided some guidance, but they showed only how often words appeared in the surveyed texts, not the meanings in which they were used. The word “medium,” for example, might have a fairly high frequency, but could we expect the learners to know the meanings “a substance through which a force travels” or “someone who claims to have the power to receive messages from dead people”?
A similar problem was with multiword idioms. The verb “make” is one of the most common words in English, but how common are “make it,” “make do,” “make up,” “make away with,” or “make out”? Ten years ago, I was unable to find any reliable answers. We had to rely on our gut feelings.
Good luck with your project. LLMs should be a big help.