Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Command line tools for the novice (andymatthews.net)
63 points by commadelimited on July 27, 2012 | hide | past | favorite | 56 comments


I knew before I even clicked on this article, that something inside was going to cause me nerd rage; it's almost always the case when I read these Baby's first UNIX type articles. I thought this one might be different when I saw Andy's formidable neck beard, and for the most part it was decent but then I saw it.

..plus Ack is 4 or 5 times faster than grep.

Ack is implemented in Perl, which has a full featured regex engine, but is not generally known for being super fast. It's going up against GNU grep, which is a highly tuned regex engine, which is very fast. The ack guys make the argument that ack is faster because it doesn't search stuff you, the developer, don't care about, which in my estimation is horseshit. Try it yourself, I threw a regex at my email spool from the 90s which is about 128M. I made sure it was in disk cache first and then did:

    chunky:~$ time egrep -c -i '(d|m|j)ingle' RMAIL
    23

    real    0m3.129s
    user    0m3.075s
    sys     0m0.051s

    chunky:~$ time ack -c -i '(d|m|j)ingle' RMAIL
    RMAIL:23

    real    0m26.208s
    user    0m26.060s
    sys     0m0.121s


Yeah, I don't like it when people spout speed comparisons without backing it up either.

A lot of people do want/need PCRE, which ack relieves the pain point for.


GNU grep also supports PCRE.

    chunky:~$ otool -L /usr/bin/grep | grep pcre; grep --help | grep perl
      /usr/lib/libpcre.0.dylib (compatibility version 1.0.0, current version 1.1.0)
      -P, --perl-regexp         PATTERN is a Perl regular expression
Many users choose ack over grep for the same reason they choose htop over top, tmux over screen, zsh over bash, and postgres over mysql -- the perception that one is significantly better than the other. This advocacy often comes from folks whose use and understanding of either tool is superficial at best.


htop over top, tmux over screen, zsh over bash, and postgres over mysql

That doesn't seem like a good comparison to me. I use htop over top because it has features that top doesn't have (the funky coloured ASCII graphs). I use tmux over screen because unicode seems to work without any effort on my part. I use bash because I have never looked at zsh in detail and don't care enough. I use postgres because I like its license better than mysql and its not controlled by any one company.

So its not at all about the perception that one is significantly better than the other, at least for me, its about specific features (license is a feature). Or in the case of zsh vs bash, where there are no specific features, I use bash because it seems to be the default on most distros.


I often see ack recommended by word of mouth but unfortunately I feel it's between programmers that don't understand the Unix command-line enough to instruct grep, find, xargs, etc., sufficiently accurately to do what they intend and ack has the allure of doing the right thing, though often slowly, as you say. GNU grep's -I option to ignore binary files for example, or -boa to search binary files.

Once the standard commands are learnt then by all means use ack if you understand the trade-offs and can fall back to the normal way when on another system or when ack's insufficient.


How would you exclude the same kinds of files that ack does in grep without typing out the exclude patterns?

As described here ( http://blog.sanctum.geek.nz/default-grep-options/ ), you could use GREP_OPTIONS, but then you run into the problems here ( http://brainstorm.ubuntu.com/idea/24141/ ) -- scripts that use grep without clearing GREP_OPTIONS will fail.

Do you have an alias? Or another script that wraps grep?


My normal way of running grep is ~/bin/g.

    #! /bin/sh

    exec egrep --color "$@"
(Not a bash, my shell, alias because they're unavailable in many contexts.)

egrep is AKA grep -e. I don't use GREP_OPTIONS because it pollutes those that don't expect it, as you pointed out.

At the top of a particular source tree I may have a ./g that knows more about what's interesting.

    #! /bin/bash

    find \( -name .svn -o -name tags \) -prune -o -type f -exec egrep "$@" {} +
It checks -name before -type because the latter needs a stat(2).

For other things I do ad hoc queries using grep, find, xargs, etc.


    find . -type f -name \*.[ch] | grep -v \\.git | xargs grep something
Normally my directories aren't going to have .bzr, .git, .hg, and .svn in the same directory tree, but if they did it wouldn't be that much harder.

    alias nocrap="egrep -v \\.\(git\|bzr\|hg\|svn\)"
    find . -type f -name \*.[ch]| nocrap | xargs grep whatever
This is how I do it because I'm lazy. I suppose I could tell find to exclude directories so it wouldn't have to recurse into them, but the list of directories and files in my source tree is almost always in buffer cache anyway, so I never really worry about it.


You might want to consider using single quotes to protect your globs and regexps more rather than backslashes. \✳.[ch] will match the ./✳.c that may have been accidentally created by an earlier error, then find will only look for a file called precisely ✳.c and nothing else. '✳.[ch]' or \✳.\[ch] is what's meant.

(Edited to use ✳ to workaround unescapable mark-up.)


Good point, thanks!


I like ack because I like the output format better.


> In Grep grep -R 'some term' * , in Ack ack 'some term'. The -R tells grep to search through all directories and subdirectories while the * tells it what types of files.

No, that's not how it works with grep. The shell is expanding the glob * to all the things in the current directory that don't start with a dot and that's what grep sees. It then reads each of them in turn or if they're a directory it descends into them recursively; the * plays no further part in the descent.


I was under the impression that * expanded to all things not starting with a dot in all directories-- or better: in each directory that -R dictates. I understand that "* tells it what types of files," isn't correct, but I don't understand what you mean by "plays no further part in the descent."


  $ mkdir x
  $ cd x
  $ echo aa > .y
  $ mkdir y
  $ echo aa > y/z
  $ echo aa > y/.z
  $ grep -r a *
  y/.z:aa
  y/z:aa
Notice that hidden files in the top-level (.y) don't show up, but those in subdirectories (y/.z) do.

It's kind of a gotcha. Fortunately one rarely finds dotfiles in subdirs.


holy crap - I didn't realize this. Thanks!


What do you think is doing the expansion of the asterisk? It's the shell, not grep, and is called globbing. http://en.wikipedia.org/wiki/Globbing

The reason there are files that start with a dot is that ls(1) and globs starting with an asterisk ignore them by default. ls needs -a (all) to show them. Some shells, but not bash, provide a globbing syntax for the recursive expansion you think you require here.

But if, as normal, you want to grep recursively down from the current directory then that's what you should specify; just pass grep one path to search, dot, the current directory.

    grep -r foo .
This is very basic knowledge of the Unix command line for a programmer and I highly recommend the classic text _The Unix Programming Environment_ by Kernighan and Pike to learn the philosophy behind Unix. It has nothing on window systems, ssh, etc., nor should it have as that's modern-day noise that hides the essence. It and Kernighan and Ritchie's _The C Programming Language_ were the key books every starting-out Unix programmer read. http://www.amazon.com/exec/obidos/ASIN/013937681X/mqq-20


just run echo * in your directory. That's the same argument grep will get(instead of the * ) when you pass it * .

grep will run through all those files, if any, and recurse through all the directories, if any. grep recursing down a directory will process all the files/subdirs in that directory. (which includes . files).

If the shell can't expand the * , grep will receive a literal * and try to open a file/directory named * .


Personally, I `shopt -s failglob' in my .bashrc so any glob that fails to expand is treated as an error by the shell and the command never gets started. I then quote all special globbing characters that I intend to pass to the command as it avoids the sometimes hard to spot errors where a glob expanded unintentionally and the command didn't see the wildcards. Relying on non-matching globbing to reach the command will bite one day.


I've wanted to try zsh for a long time now, but I worry about the mental context switching I'll have to do when developing locally and then ssh'ing into another server to perform other tasks (servers, I should add, where I can't install zsh).

Is anyone in a similar situation? If you switched, was it worth it?


I switched cold turkey from bash to zsh a few months ago, and going back to bash on servers hasn't been a problem. In my experience, if you've used bash for a long time you don't forget it.

Minor tip - you can use the same .aliases file for both shells, I just source it in .bashrc and .zshrc on my dev machine. So if I ever need to switch to bash for some reason, my most used aliases are still available.

If you switch, don't forget oh-my-zsh [1].

1. https://github.com/robbyrussell/oh-my-zsh


I'm actually not very taken with oh-my-zsh. It seems to add a layer of complexity while not really adding anything you couldn't just paste into your .zshrc. If it eliminated tinkering overhead that would be one thing, but in my experience it just gave me a new layer of software that needed tinkering.


It's along similar lines to copying someone else's rc file willy hilly, without understanding what you're getting. Extra layer of quirks and features that you need to figure out, on top of the already existing layer.


zsh can mostly be thought of as a collection of enhancements to bash, so I find that rather than changing my thought process, I just try to use a zsh feature, realize it's not working, and do it the crummy bash way instead. If a feature exists in bash, the zsh syntax is usually identical.

Also just for the record zsh is installed by default on many OSes, including OSX, so you might be surprised to learn that some of your servers already have it.


I've been on zsh for ages, and the thing that always kills me dead when stuck in bash is the terrible default command line editor. Whereas in zsh, if I go back to edit a multi-line loop or something, it preserves the white-space, bash by default presents the commands as a semicolon delimited list on one line. That's ass. Is there a handy bash shortcut to mimic zsh's behavior here?


How I wish there was a better language for interacting with Unix command line utilities than the godawful POSIXy shells.


Try Tom Duff's rc shell: http://rc.cat-v.org

Is available on *nix systems as part of Plan 9 from User Space: http://plan9.us


Interesting. I'll give it a go.

I've tried living in iPython and eshell, and while I prefer python or lisp as a programming language to almost anything, neither is nearly as quick and dirty as sh for doing stuff with stdin/out from a pipeline of tools. I don't quite know what a better language than sh would look like, but a man can dream.

Also, universe, if you're going to get on fixing the sh problem, can we restructure how the shell operates entirely? Don't do argument expansion! This is one area where the Unix Haters got it really right.


Please define "argument expansion".


In many (all?) unix shells, globs are expanded by the shell before being passed to the program.

   'ls *.jpg' 
is passed expanded to 'ls 1.jpg 2.jpg 3.jpg' BEFORE ls is run. This choice means that if a program wants to use another style of regular expression (ie: pcre), it must be enclosed in quotes on the shell, ie:

  "ls | grep -P '.*jpg'"


Yes, that's globbing, but I was wondering what the OP meant by argument expansion. It could be just globbing, or include globbing but extend to variable expansion, braces, command substitution... It's hard to put a case that having the shell do these things is the right way to do it without understanding the OP's point clearly.


Minor point: if the glob doesn't match it isn't expanded. Sometimes useful. I rarely escape scp commands for instance.


See `failglob' elsewhere on this page. What you suggest is sometimes useful but also sometimes wrong and hard to spot because the glob might match locally and one not realise.


You might like to look at scheme shell.


As someone who's converted from bash -> fish (arguably a much bigger context switch as fish isn't bash compatible) I haven't found this to be an issue.


Grey text on grey background: http://contrastrebellion.com/


Contrast rebellion refers to W3C's Web Content Accessibility Guidelines, which specify a contrast ratio of at least 4.5. Just for fun, I plugged in the numbers, the site has a contrast ratio of about 4.9.

W3C: http://www.w3.org/TR/UNDERSTANDING-WCAG20/visual-audio-contr...


It's not nearly enough for my liking.

I'll take a look when I've got a few more moments to see where various of the W3C's recommendation levels fall on my own assessment of readability.

Speaking as a 40-something with decent vision, the site isn't impossible to read, but it's most definitely annoying. For someone with compromised or old eyes, it would be much worse.


I agree, annoying, and I didn't realise many others thought the same. Thanks for the link, I've tried to spread the word. https://plus.google.com/115649437518703495227/posts/iWtSJhR8...


I'd love to hear other tools that you guys are using for your daily workflow.


HTTPie, an easy to use cURL replacement: https://github.com/jkbr/httpie (disclaimer: I'm the author)


This was a good list for the intermediate command line user looking to customize. For the absolute newbie I'd reduce it down to git/hg, ssh, grep/ack and curl.

For crazy power users, I'm a huge fan of vim+tmux+zsh combo as described in http://www.drbunsen.org/text-triumvirate.html


That's a good list. I considered including Git but that could be an entire article by itself. I've looked at TMux but I just don't I'd use it.


What's the difference between Tmux and iTerm2?


tmux is a CLI terminal multiplexer. As an interactive CLI program, you need a shell to use it. To run such a shell, you need a terminal emulator. iTerm2 is a Mac OS X only GUI terminal emulator.

You can run tmux in iTerm2.


CLI (Mac OS X and Linux): bash, git/tig, svn, cdargs, vifm or ranger (can't decide), vim, grep, man, ssh, a few custom functions and tmux.

GUI (Mac OS X): MacVim, ClipMenu, YummyFTP, Quicksilver, iTerm2, SourceTree, FileMerge, Chrome dev tools.

GUI (Linux): GVim, Glippy, FileZilla, Synapse, Gnome terminal, SmartGit/SmartSVN, Meld, Chrome dev tools.


Terminator (http://www.tenshu.net/p/terminator.html) is a nice replacement for Gnome Terminal thats very similar to iTerm2.


Thanks, I'll take a look.

I don't use any advanced features of iTerm, though, I just use it over Terminal.app because it supports 256 colors. We have not been updated to Lion or Mountain Lion at work so Terminal.app is still limited to 16 colors for me.


zsh, vim, git, pry, ack, htty, pv, multitail to name just a few of the most common and powerful things in my daily text shell arsenal.


Tools similar to z:

1. [Autojump](https://github.com/joelthelion/autojump), first implementation and what z was based on, written in python. Functionality similar to z.

2. [Fasd](https://github.com/clvv/fasd), feature-rich fork of z, support for files, support for more shells and platforms (BSDs, Android).


tmux - Without a doubt, the biggest upgrade to my workflow in years. If you aren't using tmux (or at least screen), you are doing yourself a big disservice.

Also, I keep seeing iTerm2 get mention. You mentioned split panes and what not, which are not critical when I have that with tmux (as well as the tabs). And while before Lion, Terminal's color support was weak, that's changed. So, I really wonder what the big reason for using iTerm2 is over Terminal?


What's the benefit from tmux? Why is it better than just using multiple windows?


Because I can shut down my computer, move to another one, and pick up right where I left off. This means even if I leave a process running, it doesn't stop.

tmux is like screen, only it has more features.

Which is really why I asked. What does iTerm2 offer that I can't get from tmux and current terminal? People always refer to iTerms ability to do multiple windows/tabs. For me, this is useless if it's just available client side.

There is an amazing amount of freedom being able to just shut down, move to another computer, and be able to pick up exactly where I left off.


So you have to install tmux on the server?


See tmux's homepage: http://tmux.sourceforge.net/

Tmux is not only a window manager, it allows me to:

* login only once into my VPS and run all kinds of things in parallel

* leave a session at work and come back to it at home

* run processes in the background with textual feedback

Its precursor, screen, works similarily with a little less bells and whistles.


So, first (and most obvious) point of interest. Your function "server" will not work. It contains "python -m SImpleHTTPServer .." .. Notice the typo? It should be SimpleHTTPServer. In addition, that other fluff around it is unneeded in general. Just "python -m SimpleHTTPServer $port" is almost always "good enough", though the SimpleHTTPServer module contains severe bugs. Secondly, your link right above that snipped to Paul Irish contains an href="" which, obviously, links it to your own page. It makes it seem like you're trying to take credit for his work. This is all sloppy and bad form.

Honestly, anyone who puts code into a blog post without even bothering to RUN IT THEMSELVES first should get out of the business of writing even slightly technical posts. Feel free to review food without tasting it first instead; at least then it's subjective so you can't be wrong. Here, the "/usr/bin/python: No module named SImpleHTTPServer" you'd get if you had arsed yourself to actually check your code to be valid easily clues the reader into your carelessness.

In addition to all that, there's the audacity of posting a post for the "novice" to hacker news when the post is clearly neither relevant for any hacker who has performed any real work or has any real intelligence (they would already know all the mentioned tools) and neither is news for all the tools mentioned are fairly old and well known. You might as well just call us all stupid and dilute the front page further with crap.


Citnoven...

I appreciate you taking the time not only to read the post, but to offer your opinion and corrections to my article. I've updated those two mistakes you pointed out.

As I understand it, SimpleHTTPServer offers basic functionality to the user, which is all I use it for...a quick way to view pages using AJAX requests. I attempted to use http.server instead, but as I'm not running Python 3.0 I'm not able to use that option. Since Lion ships with Python 2.7, and many people are using that OS, I felt safe leaving the reference in place.

I do run the server function myself, many times a day in fact, but I made the mistake of typing it in manually rather than copying and pasting from my .zshrc file.

This post isn't intended for hackers, thus the title "...for the novice". The comments I've received about the article thus far, most likely from people not as advanced as yourself, have been positive and appreciative for the information. I'd love to hear some of the tools that improve your workflow daily...would you care to share them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: