-
Use your browser to open any web page and save the source as
page.html
in your home directory. -
View the file's contents:
$ cat page.html
-
That usually makes the contents whiz by so that you only see the very end. This variation allows you to page through the file one screen at a time:
$ more page.html
Type a space to page forward, and type
q
to quit. Themore
command only pages forward, butless
lets you page back and forth by typing additional commandsb
orf
:$ less page.html
- page
-
See the start or end of the file:
$ head page.html $ tail page.html
For either command, specify the number of lines to display:
$ head -10 page.html
Optionally clear the screen before displaying the output:
$ clear ; head -10 page.html
-
Look for stuff in the file:
$ grep body page.html
It displays lines of text with
body
anywhere in it. In Unix, lines are a very important unit of content, even though each line can be very long. Lines that are longer than 80 characters display as if they wrap onto the following line.- line
-
A case-insensitive search for a multi-word string:
$ grep -i "BODY class" page.html
Note the need for the quotes. Otherwise the shell interprets
BODY
andclass
separately as arguments, and it gets confused. (If this doesn't work, try it with any other two adjacent words you see in the file.) A string is any sequence of characters, and can include letters, digits, whitespace, and punctuation characters.- string
- case-insensitive
- line
-
See more context in the output:
$ grep -n -C 1 "body" page.html
The
-n
option shows the line number on which the match appears, which may tell you how far down from the top of the file it is. The-C
(context) option shows an extra line around each match, with each chunk of text marked with---
regions. Try changing the1
to2
or more. It lets you see more of the file that surrounds the match. -
The
grep
command stands for global regular expression parser, a fancy way of saying the stuff within the quotes is special. Regular expressions (aka regex or regexps) offer a system of matching patterns. These patterns resemble shell wildcards, but work differently. Suppose the word body appears all over the place and you only want to see it when it's used as an HTML tag. You could do one command each to match the open and close tags:$ grep -n "<body" page.html $ grep -n "</body" page.html
But in this variation, the
*
specifies zero or more of the preceding/
character, so it matches both scenarios:$ grep -n "</*body" page.html
This is a simple regular expression, but they can do very powerful and complex things. You'll encounter slight variations in support for them in three kinds of environment: in simple line-based Unix utilities such as
grep
andsed
, in more complex streaming text editors such as Emacs, and in full programming languages such as Python, JavaScript, and Perl.- pattern
- regex, regexp
-
Maybe you just want to count (
-c
) how many hits there are in a bunch of files. Perhaps you also want to search recursively (-r
) through all nested subdirectories. This is a powerful way to inspect a directory structure:$ grep -cr "body" *
-
The
-v
option reverses the matching results, so this matches any line that doesn't have an angle bracket that marks HTML tags:$ grep -v "<" *.html
-
Find out how big the file is in lines, words, and characters:
$ wc page.html
-
Copy a file, make some random edits, and save the file:
$ cp page.html page2.html $ open -e page2.html
Now compare to the original:
$ diff page.html page2.html
This common diff format appears in various content-management systems such as Git, and is the main way you compare one version of a file to another and track changes. They're much easier to read when each line of text doesn't exceed 80 characters, the standard width of most terminals.