Good points, but I think we underestimate how much situational context humans use when they interpret language. Sometimes we can communicate with very little language simply because we know what the purpose of the interaction is.
Another thing I keep wondering about is why so little emphasis is put on dialog. When humans don't understand something, they ask, or offer an interpretation and ask whether it's the right one.
Speech recognition systems don't seem to do that. They say "Sorry, I could not understand what you said. Please repeat". That's not very helpful for the computer of course. It should say: "Huh, Peas? Why would anyone rest in peas for heaven's sake??". Then the human could sharpen his SS and say "PeaCCCEE!!! not peas. I'm not talking about food, I'm talking about dying!".
Context is huge for human interpretation. If you've ever have someone address you in a different language than you were expecting, you know what I mean. It's almost like you can imagine the search just going deeper and deeper without finding anything that makes sense until it swaps in the other language and go: Ah, you said "good morning"! :-)
It is true that humans do use situational context. In the cases where semantics is important and complex for understanding an utterance a computer will fail even more because it won't get the semantics or the speech signal.
On the topic of dialog, this is arguably the area that speech recognition has gained in over the last nine years. Prior to 2001 there were not many usable dialog systems and (depending on your definition of "usable") there are many usable dialog systems deployed in call centers around the world.
Most call center dialog systems have a rudimentary system asking for people to repeat things when it doesn't understand. Although, if it asks more than once the callers tend to get very angry.
It shouldn't interrupt you once every 5 words of course. What it should try to do is to create a model of what you meant to say. At some point, if the system is unsure, it should ask you to confirm or correct what it has understood so far.
Another thing I keep wondering about is why so little emphasis is put on dialog. When humans don't understand something, they ask, or offer an interpretation and ask whether it's the right one.
Speech recognition systems don't seem to do that. They say "Sorry, I could not understand what you said. Please repeat". That's not very helpful for the computer of course. It should say: "Huh, Peas? Why would anyone rest in peas for heaven's sake??". Then the human could sharpen his SS and say "PeaCCCEE!!! not peas. I'm not talking about food, I'm talking about dying!".