About natural language search
For general search information, see
The challenge: precision vs. completeness
In computer-aided searches, you want to limit the number of results to those that precisely address your information needs. However, the numerous grammatical variations in how a concept can be expressed make it difficult to retrieve with traditional search technologies all the available relevant information from electronic documents.
Consider an exact-phrase search, for example. An exact-phrase search, where the results contain your query words exactly as typed—the same words in the same order, adjacent to one another—was developed to limit unwieldy keyword search results. This would have been the answer if finding an exact phrase were equivalent to finding an exact concept: Although all exact-phrase results are relevant, the retrieved results are far from complete. For example, if your exact-phrase search retrieved documents with reduce soap viscosity, it missed documents with reduced soap viscosity, or reducing the viscosity of soap, or soap viscosity is reduced by, and so on. Clearly, these results are equally relevant to the original query because they all deal with the concept where soap viscosity is being reduced.
Keyword searches find some of the relevant results that would be missed by an exact-phrase search. Keyword searches use logical conditions to specify which words and their variations must be included in or excluded from a document. However, building conventional Boolean expressions with all the possible keyword variations is time-consuming and does not guarantee both complete and relevant results.
Search technologies that do not take into account the structure and the grammar of language are limited to matching strings of text. This approach makes it difficult and cumbersome to retrieve both complete and relevant results.
The solution: natural language "exact concept" search
Meaning in a sentence is conveyed by the grammatical structures subject, action, and object, also called semantic structures. Although there can be numerous grammatical variations in sentences that have the same subject, the same action, and the same object, these sentences necessarily have the same meaning.
In addition,
Stemming in
Example: how natural language search works
When you submit a natural language query in
For example, the query apply pulse laser is analyzed as follows: First the noun phrase pulse laser is identified. Here laser is recognized as the main noun, and pulse is recognized as its modifier. In addition, the relationship between apply and pulse laser is established as an action-object relationship. This means that you are looking only for sentences where to apply is an action, and pulse laser is the object.
When this example query was used to search U.S. patents, the following sentences were retrieved:
- The present invention further provides a defect repair apparatus comprising an ultrashort pulse laser generator for adjustably generating an
- Consequently, there is normally used a pulse laser irradiation method in which only the semiconductor film is heated and molten in a short time by applying an excimer pulse laser to the amorphous semiconductor film or the semiconductor film comprising a fine crystal...
- In this embodiment mode, a pulse oscillation type KrF
- The laser sputtering, called also laser ablation or laser deposition, applies pulse laser of high energy density to a solid state target to form a layer on the opposite substrate.
However,
- ... using a pulse laser which applies heat onto the surface...
- ... calibration of pulsed lasers," Applied Optics, ...
- ... pulse of laser, application of voltage to needle ...
...and so on.
These results are not relevant because their semantic structures do not match those of the query apply pulse laser. In the first example, the action apply is not directed at the object pulse laser, but on heat. In the second example, the word apply modifies the noun optics and is not an action. In the third example, the word application is a noun rather than an action directed at the object pulse laser. However, such results would be retrieved by a conventional keyword search, which matches keywords without taking into account the semantic relationships between the words.
Why natural language results answer questions
By design,
For example, if you type What absorbs carbon dioxide?