-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathlxr_search_help.html
executable file
·173 lines (173 loc) · 7.34 KB
/
lxr_search_help.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Balthisar Tidy, version 5.1.24">
<link rel="stylesheet" href="tidy.site.css" type="text/css">
<meta charset="UTF-8">
<title>
Linux Cross-Reference
</title>
<style type="text/css">
body {
background-color: white;
}
h1.c2 {text-align: center}
a.c1 {font-style: italic}
</style>
</head>
<body>
<div class="legacy">
<h1>
Legacy Content Notice
</h1>
<p>
The content on this page is no longer under active maintenance, however
it remains here for historical interest as well as for preventing link
rot.
</p>
<p>
Links have been updated, the site stylesheet has been applied, and
the content source has been cleaned with HTML Tidy,
but there is no guarantee of future changes to this page.
</p>
</div>
<h1 class="c2">
Help doing searches<br>
<a class="c1" href=
"http://frontierkernel.sourceforge.net/cgi-bin/lxr/source/">Browse the
code</a>
</h1>
<p>
<i>This text is directly stolen from the Glimpse manual page. I have
tried to remove things that do not apply to the lxr searcher, but beware,
some things might have slipped through. I'll try to put together
something better when I get the time. For more information on glimpse go
to the <a href="http://webglimpse.net">Glimpse homepage</a>.</i> <a name=
"Patterns" id="Patterns"></a>
</p>
<h2>
Patterns
</h2>
<p>
glimpse supports a large variety of patterns, including simple strings,
strings with classes of characters, sets of strings, wild cards, and
regular expressions (see <a href="#Limitations">Limitations</a>).
</p>
<h3>
Strings
</h3>
<p>
Strings are any sequence of characters, including the special symbols `^'
for beginning of line and `$' for end of line. The following special
characters (`$', `^', `*', `[', `^', `|', `(', `)', `!', and `\' ) as
well as the following meta characters special to glimpse (and agrep):
`;', `,', `#', `>', `<', `-', and `.', should be preceded by `\\'
if they are to be matched as regular characters. For example, \\^abc\\\\
corresponds to the string ^abc\\, whereas ^abc corresponds to the string
abc at the beginning of a line.
</p>
<h3>
Classes of characters
</h3>
<p>
a list of characters inside [] (in order) corresponds to any character
from the list. For example, [a-ho-z] is any character between a and h or
between o and z. The symbol `^' inside [] complements the list. For
example, [^i-n] denote any character in the character set except
character 'i' to 'n'. The symbol `^' thus has two meanings, but this is
consistent with egrep. The symbol `.' (don't care) stands for any symbol
(except for the newline symbol).
</p>
<h3>
Boolean operations
</h3>
<p>
Glimpse supports an `AND' operation denoted by the symbol `;' an `OR'
operation denoted by the symbol `,', a limited version of a 'NOT'
operation (starting at version 4.0B1) denoted by the symbol `~', or any
combination. For example, pizza;cheeseburger' will output all lines
containing both patterns. '{political,computer};science' will match
'political science' or 'science of computers'.
</p>
<h3>
Wild cards
</h3>
<p>
The symbol '#' is used to denote a sequence of any number (including 0)
of arbitrary characters (see <a href="#Limitations">Limitations</a>). The
symbol # is equivalent to .* in egrep. In fact, .* will work too, because
it is a valid regular expression (see below), but unless this is part of
an actual regular expression, # will work faster. (Currently glimpse is
experiencing some problems with #.)
</p>
<h3>
Combination of exact and approximate matching
</h3>
<p>
Any pattern inside angle brackets <> must match the text exactly
even if the match is with errors. For example, <mathemat>ics
matches mathematical with one error (replacing the last s with an a), but
mathe<matics> does not match mathematical no matter how many errors
are allowed. (This option is buggy at the moment.)
</p>
<h3>
Regular expressions
</h3>
<p>
Since the index is word based, a regular expression must match words that
appear in the index for glimpse to find it. Glimpse first strips the
regular expression from all non-alphabetic characters, and searches the
index for all remaining words. It then applies the regular expression
matching algorithm to the files found in the index. For example, glimpse
'abc.*xyz' will search the index for all files that contain both 'abc'
and 'xyz', and then search directly for 'abc.*xyz' in those files. (If
you use glimpse -w 'abc.*xyz', then 'abcxyz' will not be found, because
glimpse will think that abc and xyz need to be matches to whole words.)
The syntax of regular expressions in glimpse is in general the same as
that for agrep. The union operation `|', Kleene closure `*', and
parentheses () are all supported. Currently '+' is not supported. Regular
expressions are currently limited to approximately 30 characters
(generally excluding meta characters). The maximal number of errors for
regular expressions that use '*' or '|' is 4.
</p>
<p>
<a name="Limitations" id="Limitations"></a>
</p>
<h2>
Limitations
</h2>
<p>
The index of glimpse is word based. A pattern that contains more than one
word cannot be found in the index. The way glimpse overcomes this
weakness is by splitting any multi-word pattern into its set of words and
looking for all of them in the index. For example, <i>'linear
programming'</i> will first consult the index to find all files
containing both <i>linear</i> and <i>programming</i>, and then apply
agrep to find the combined pattern. This is usually an effective
solution, but it can be slow for cases where both words are very common,
but their combination is not.
</p>
<p>
As was mentioned in the section on <a href="#Patterns">Patterns</a>
above, some characters serve as meta characters for glimpse and need to
be preceded by '\\' to search for them. The most common examples are the
characters '.' (which stands for a wild card), and '*' (the Kleene
closure). So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"
will not, and "glimpse ab*de" will not match ab*de, but "glimpse ab\\*de"
will. The meta character - is translated automatically to a hypen unless
it appears between [] (in which case it denotes a range of characters).
</p>
<p>
There is no size limit for simple patterns and simple patterns within
Boolean expressions. More complicated patterns, such as regular
expressions, are currently limited to approximately 30 characters. Lines
are limited to 1024 characters.
</p>
<hr>
<address>
<a href="mailto:[email protected]">Arne Georg Gleditsch and Per Kristian
Gjermshus</a>
</address>
</body>
</html>