Hacker Newsnew | past | comments | ask | show | jobs | submit | simedw's commentslogin

Great suggestin, added a toggle to see pinyin.

Thank for the great feedback!

I have just added sandhi support, please let me know if it's working better.


Still having some issues that match my previous comment, I'll try to follow your blog and give more feedback as you work on it.

Will comment that the shorter phrases (2-4 characters long) were generally accurate at normal speed, but the longer sentences have issues.

Maybe focusing on the accuracy of the smaller phrases and then scaling that might be a good way to go, since those smaller phrases are returning better accuracy.

Again, really think this is a great initiative, want to see how it grows. :)


ACKing your comment.

Will check once the TV is off in the house. :)


Hi, thanks for the feedback. The 了 issue was a bug on the JavaScript side; that should be fixed (training did thankfully handle it correctly).

The other two are probably things that could be fixed with a bigger and more varied dataset.


It’s fairly sensitive to background noise at the moment. I’m planning to train an improved version with stronger data augmentation, including background noise.

For accents, I’ve mostly tested with a few friends so far. I’m wondering whether region should be a parameter, because training on all dialects might make the system too lax.


Probably be a lot of work but it would be really interesting if you had sufficient data sets to train across accents.

Highly recommend taking a look at Phonemica for this:

https://phonemica.net/


Thank you.

I had a quick look at Farsi datasets, and there seem to be a few options. That said, written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?


> written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?

You can't, but Farsi dictionaries list the missing short vowels/diacritics/"eraab" for every word.

For instance, see this entry: https://vajehyab.com/dehkhoda/%D8%AD%D8%B3%D8%A7%D8%A8?q=%D8...

With the short vowel on the first letter it would be written حِساب (normally written as just حساب)

The dictionary entry linked shows that there is a ِ on the first letter ح

But you would have to disambiguate between homographs that differ only in the eraab.


It would be neat if it had a headless mode.


https://simedw.com personal site, mostly posts regarding various experiments


First of all, big kudos for not missing a single day. When I used flashcards in the past, missing even a couple of days led to an avalanche of cards to review.

Since you’ve been so consistent and are using your own software, have you experimented with different resurfacing rates? Did you notice a material difference in recall?


I have done a lot of algorithmic experiments. I wouldn't say I noticed a big difference in recall, but my graphs say that there definitely is one!


Is there a reason you chose to roll your own software instead of just using Anki? Custom algorithm you think is better than FSRS? UI preferences?


I don't exactly think I have an algorithm better than FSRS yet, but I have an algorithm I like better. Hopefully I'll have more to say about this soon.


Thanks for the questions. Very fair concerns. Take all of this with a fairly large pinch of salt; this is still an experiment.

1. How does it know which words I already know? It doesn’t automatically. You provide that set. For example, if you’ve completed HSK 1, you can paste the HSK 1 word list into LangSeed and mark those as "known". From there, new explanations are constrained to that vocabulary. You can also paste in real text and mark the easy words as known, though that’s a bit more manual.

2. How much might I misunderstand word meanings? Depends on how advanced the vocab is and how large your known-word set is. I think of this as building intuition rather than giving dictionary-precise definitions. As you see words in more contexts, that intuition sharpens. This is just my experience from testing it over the last couple of weeks.

3. How inaccurate are the explanations? I tested it on Swedish (my native language). There are occasional awkward or slightly odd phrasings, but it’s rarely outright wrong.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: