About natural language search

For general search information, see Overview of Goldfire Researcher.

The challenge: precision vs. completeness

In computer-aided searches, you want to limit the number of results to those that precisely address your information needs. However, the numerous grammatical variations in how a concept can be expressed make it difficult to retrieve with traditional search technologies all the available relevant information from electronic documents.

Consider an exact-phrase search, for example. An exact-phrase search, where the results contain your query words exactly as typed—the same words in the same order, adjacent to one another—was developed to limit unwieldy keyword search results. This would have been the answer if finding an exact phrase were equivalent to finding an exact concept: Although all exact-phrase results are relevant, the retrieved results are far from complete. For example, if your exact-phrase search retrieved documents with reduce soap viscosity, it missed documents with reduced soap viscosity, or reducing the viscosity of soap, or soap viscosity is reduced by, and so on. Clearly, these results are equally relevant to the original query because they all deal with the concept where soap viscosity is being reduced.

Keyword searches find some of the relevant results that would be missed by an exact-phrase search. Keyword searches use logical conditions to specify which words and their variations must be included in or excluded from a document. However, building conventional Boolean expressions with all the possible keyword variations is time-consuming and does not guarantee both complete and relevant results.

Search technologies that do not take into account the structure and the grammar of language are limited to matching strings of text. This approach makes it difficult and cumbersome to retrieve both complete and relevant results.

The solution: natural language "exact concept" search

Goldfire 's natural language search uses sophisticated linguistic technology that matches the grammatical relationships between search words. This makes Goldfire's natural language search an exact concept search because it extracts results based on meaning, rather than by simply matching text strings.

Meaning in a sentence is conveyed by the grammatical structures subject, action, and object, also called semantic structures. Although there can be numerous grammatical variations in sentences that have the same subject, the same action, and the same object, these sentences necessarily have the same meaning.

In addition, Goldfire's natural language search automatically finds all variations of query words based on their stem, or root.

Stemming in Goldfire searches is sophisticated enough to take into account whether the word is a noun or a verb. For example, in the query a light, documents that contain the noun lights are also selected. However, if your query includes to light, documents are selected based on verb variations of this word (such as lit).

Example: how natural language search works

When you submit a natural language query in Goldfire, it is analyzed into semantic structures: subject, action, and object. Then the semantic structures in your query are matched to the knowledge bases of pre-indexed documents.

For example, the query apply pulse laser is analyzed as follows: First the noun phrase pulse laser is identified. Here laser is recognized as the main noun, and pulse is recognized as its modifier. In addition, the relationship between apply and pulse laser is established as an action-object relationship. This means that you are looking only for sentences where to apply is an action, and pulse laser is the object. Goldfire's natural language search retrieves sentences with all grammatical variations of this action-object relationship.

When this example query was used to search U.S. patents, the following sentences were retrieved:

However, Goldfire's natural language search will not extract nonrelevant results such as the following:

...and so on.

These results are not relevant because their semantic structures do not match those of the query apply pulse laser. In the first example, the action apply is not directed at the object pulse laser, but on heat. In the second example, the word apply modifies the noun optics and is not an action. In the third example, the word application is a noun rather than an action directed at the object pulse laser. However, such results would be retrieved by a conventional keyword search, which matches keywords without taking into account the semantic relationships between the words.

Why natural language results answer questions

By design, Goldfire's natural language search is able to retrieve information that is not explicitly specified in a query. This implicit information is retrieved by virtue of its relationship to the information that you did specify in the query.

For example, if you type What absorbs carbon dioxide?Goldfire extracts all sentences where the action is some form of the verb to absorb, and the object is carbon dioxide. The subjects of the retrieved sentences provide answers to this question in terms of the process or the technology related to absorbing carbon dioxide.

Goldfire 's natural language search matches synonyms as well as exact words. Matches on synonyms of an action word in your query are included in your search results. For example, results of the query reduce soap viscosity also include results from minimize soap viscosity, since the verb to reduce is a synonym of the verb to minimize. If you are connected to a corporate Goldfire Server, your search results also include a list of subject, action, and object synonyms. If you click a synonym in this list, the synonym replaces the original word in the query, and your query is run again. For example: if your original query was How to reduce cholesterol? and you click the synonym minimize in the synonym list, your original query is replaced by How to minimize cholesterol?, and that query is automatically run.

Related Topics