The Index
The index is where the spider-collected data are stored. When you perform a
search on a major search engine, you are not searching the web, but the cache of
the web provided by that search engine’s index.
Reverse Index
Search engines organize their content in what is called a reverse index. A reverse
index sorts web documents by words. When you search Google and it displays 1-
10 out of 143,000 websites, it means that there are approximately 143,000 web
pages that either have the words from your search on them or have inbound links
containing them. Also, note that search engines do not store punctuation, just
words.
The following is an example of a reverse index and how a typical search engine
might classify content. While this is an oversimplified version of the real thing, it
does illustrate the point. Imagine each of the following sentences is the content of
a unique page:
The dog ate the cat.
The cat ate the mouse.
Word Document # Position #
The 1,2 1-1, 1-4, 2-1, 2-4
Dog 1 2
Ate 1,2 1-3, 2-3
Cat 1,2 1-5, 2-2
Mouse 2 5
Storing Attributes
Since search engines view pages from their source code in a linear format, it is best
to move JavaScript and other extraneous code to external files to help move the
page copy higher in the source code.
Some people also use Cascading Style Sheets (CSS) or a blank table cell to place the
page content ahead of the navigation. As far as how search engines evaluate what
words are first, they look at how the words appear in the source code. I have not
done significant testing to determine if it is worth the effort to make your unique

No Comments »
RSS feed for comments on this post. TrackBack URL