> However, the idea is that often a lot of the probability mass - an amount that is not small - will be concentrated around the maximum likelihood estimate, and so that's why it makes a good estimate, and worth using.
This is a Bayesian point of view. The other answers are more frequentist, pointing out that likelihood at a parameter theta is NOT the probability of theta being the true parameter (given data). So we can't and don't interpret it like a probability.
That's not a Bayesian point of view. You can re-word it in terms of a confidence interval / coverage probability. It is true that in frequentist statistics parameters don't have probability distributions, but their estimators very much do. And one of the main properties of a good estimator is formulated in terms of convergence in probability to the true parameter value (consistency).
This is a Bayesian point of view. The other answers are more frequentist, pointing out that likelihood at a parameter theta is NOT the probability of theta being the true parameter (given data). So we can't and don't interpret it like a probability.