Wildcard Searches

Using wildcard searches

OCR (Optical Character Recognition) technology sometimes has problems with historical newspapers. In some cases, especially in the case of old newspapers, the letters “bleed” into each other, making the shapes unrecognizable or mistakenly interpreted as other letters by the computer. Any problems on the page, such as inkblots, speckles, poor type quality, fading, folds, wrinkles, tears or discoloration of the original paper page, can interfere with the OCR process. When the computer cannot recognize or misinterprets some of the letter shapes on the page this can result in false hits for the researcher.

A few of the most commonly misinterpreted characters are a, o, e, r, i, and n. Researchers can often minimize the problems caused by these misinterpreted characters in OCR databases by using wildcard searches. Wildcard searches enable users to allow for unlikely variations in spelling that might be caused by the OCR process. For our search engine, a single-character wildcard is a question mark and a multi-character (allow for up to 5 characters) wildcard is an asterisk.

For example, a search that included the term majesty might yield a broader result if you use wildcards for the most commonly misinterpreted characters, as in this example: m?j?sty. You might even want to try something like this: m?j?sty or m?j?fty, since many of the older newspapers used the old-fashioned elongated s character, which can sometimes be interpreted as an f. You can also use the multi-character wildcard to account for variations in spelling and possible misinterpretation of certain characters. For example, you search for St*nbock, instead of limiting your results to Steinbock.

Wildcarding/truncation

Wildcarding, or truncation, is the use of certain symbols (? or *) to replace one or more letters or characters in a search term. This can be useful when:

  • you want to make sure you find items containing slight variants of your search term.
  • you are not completely sure how to spell your subject.

Single Character Wildcarding: Question Mark (?)

Use the question mark in place of single letters. For example:

  • WOM?N will search for items containing woman or women.

Multiple Character Wildcarding: Asterisk (*)

Use the asterisk in place of multiple letters. For example:

  • ENVIRONMENT* will search for items containing environment, environments, environmental, environmentalist, etc.
  • COL*R will find all the items containing color or colour -- a great help for finding variant spellings of words.