Yes, most shells are UTF-8 aware, but terminals are largely not Bidi-encoding aware. Bidi, short for birectional, is what allows you to use left-to-right and right-to-left (Arabic, Farsi, Hebrew, Pashto, Urdu, etc.) and read/write both variants in harmony.
There are very, very few terminals that do this at all. I read and write Arabic, so I use mlterm. It is quite good, but unlike other emulators, it requires setup and learning configuration, but is very flexible. I use Mutt to read for emails and mailing lists in Arabic, Farsi, and Urdu, and I rarely have problems.
In the Linux console, the situation is abysmal. There was a project that works there, bicon from Arabeyes project, but does not understand SIGWINCH signals. This is problematic when I share a tmux session through the Linux console and X.org with my preferred WM, for example. But to run small, mostly console programs with Arabic, bicon is old and can do some jobs for value of jobs.
A big problem is that Arabic input, and RTL input in general, is a problem where most devs in this space respond "each application must address it in its own way." This is why few console or GUI applications handle Arabic well.
FYI, Firefox was the only browser I could view Arabic language news in until like two or three years ago, on any computer for a while. The others were terrible. And this is coming from a linguistics guy; this was long before I wanted to dig deeper in the machine. Linux guys in the Arab world fight an uphill battle, since much software will not work them and is ignored.
Come join arabeyes.org and help some of them out if you interested in translating documentation, applications, or building tools to help the process.
tmux (and screen) don't even let you search the history for non-ASCII characters :-(
That said, the git and shell mentions in the article made things look worse than they are – e.g. in Xubuntu I didn't have to do anything special to at least display the characters, though in the wrong order: http://bildr.no/view/dFN0elo4
Emacs has had bidi support since 24.1, works great, and most importantly, those of us who get totally confused by it can turn it off (I still haven't gotten the hang of movement commands suddenly going the opposite direction, very confusing when RTL and LTR scripts are in the same file).
There are very, very few terminals that do this at all. I read and write Arabic, so I use mlterm. It is quite good, but unlike other emulators, it requires setup and learning configuration, but is very flexible. I use Mutt to read for emails and mailing lists in Arabic, Farsi, and Urdu, and I rarely have problems.
In the Linux console, the situation is abysmal. There was a project that works there, bicon from Arabeyes project, but does not understand SIGWINCH signals. This is problematic when I share a tmux session through the Linux console and X.org with my preferred WM, for example. But to run small, mostly console programs with Arabic, bicon is old and can do some jobs for value of jobs.
A big problem is that Arabic input, and RTL input in general, is a problem where most devs in this space respond "each application must address it in its own way." This is why few console or GUI applications handle Arabic well.
FYI, Firefox was the only browser I could view Arabic language news in until like two or three years ago, on any computer for a while. The others were terrible. And this is coming from a linguistics guy; this was long before I wanted to dig deeper in the machine. Linux guys in the Arab world fight an uphill battle, since much software will not work them and is ignored.
Come join arabeyes.org and help some of them out if you interested in translating documentation, applications, or building tools to help the process.