Version 11 (modified by csa, 15 years ago) |
---|
SEARCH SubSystem
ADEI has a modular search subsystem. The search capabilities are provided by the search engines which providing one-or-more search modules. Besides the search term, the search string may specify the search modules to perform search and number of limits to filter results. The module parameters could be specified along with the modules.
Format of search string
The search string consists of four components:
- The first component defines type of the search. Examples are item search, channel value search, datetime search.
- Second component provides some options. For example, demands exact or fuzzy match
- Third and fourth components are type-dependent and containing search string and additional limits
I: The format is as follows:
[type/module specification] [global flags] <search string> [limits]
Everything besides search string is optional. By default if the type is not specified, the search string is analyzed. Analysis routine guesses the type of search and executes a default set of modules for this type. The default behavior is to search for channel and group names. See String Analysis section for details.
The search type is specified in the curly brackets in the beginning of the search string. The search module available in the classes/search should be indicated (name of the class should be specified). Optional options for the class constructor could be indicated as well. If multiple modules are specified, the multiple searches are performed sequentially. The following format is expected:
{module_name(opt1=value1,opt=value2), another_module(...)}
II: The global options are going next to the search type and specified in the square brackets. This options then passed to the search modules with the search string and handled by the module code. The following options are supported:
- = - Exact match, this means what the search string is matched completely without splitting into the phrases
- w - Word match, if not overridden by match modifiers, see below
- ~ - Fuzzy match, if not overridden by match modifiers, see below
III: Then the search string is follows. If the Exact match flag is not specified, it consists of the phrases. The phrase is
- words consisting of alphanumeric symbols, dash and underscore symbols (-,_)
- multiple words enclosed in singular or double quotes('")
- regular expressions enclosed in / from both ends
This is an example of a search string consisting of 4 components: two words, one phrase, and a regular expression: {{ {word1 word2 "phrase 3" /regexp/ }}}
Before each phrase, a match modifier could be specified. The following match modifiers are supported
- if no modifier is specified, the phrases starting from search term will be matched
- = - full match, the whole words are matched
- ~ - fuzzy match, any part of a word could be matched
Please consider following example to understand the meaning of match modifiers. By default if a search for sin is performed the words sin and sinus will be matched, but cosinus - not. However, if a fuzzy search is given (~sin), the cosinus will be matched as well. On other hand if a full match is required (=sin), only sin will be matched. Both sinus and cosinus will be rejected.
The match of each phrase against data records produces ratings from 0 to 1 indicating match quality. The value 0 means what the record is not matched and value 1 indicates a full match. If several phrases are listed in search string, the ratings of each phrase match are multiplied to produce overall rating. For example, if phrase1 matched with rating 0.70, phrase2 matched with rating 0.30 and word3 is fully matched, the overall rating would be: 0.21 = 0.70 * 0.30 * 1.
Rating computation could be altered using unary and binary operations. Lets assume what [word] is a rating of word, then the ratings of these operations are computed as follows:
- ! word - The resulting rating would be 1 - [word]
- + word - The rating below 1 will be cat to 0
- - word - All non-zero ratings will be cut to zero, and zero rating will be replaced with 1
- (word1|word2) - The maximal rating amongst [word1] and [word2]
Few examples of complex search strings:
=sinus | cos1
!"a b c" ~d -e +('f g' !(!i (k))) "m n"
IV: On-or-more limits can be set in the last part of the search string. The following format is expected
limit_name:limit_value another_limit:another_limit_value
The limits handling is completely module specific. Example:
+sinus | cos1 interval:2006
Default Implementation
The search modules are implemented using SEARCHEngines?. Each SEARCHEngine could provide one or more search module. The !SEARCHEngines are placed in classes/search folder in ADEI source tree. They should implement a Search function which accepts four parameters (module, search string, search filter, global options) and returns the SEARCHResults? object with results or false if nothing found. However, standard modules can reuse default Search function implemented in base class classes/searchengine.php. The following procedure is exeuted in this case:
- Search function of Search Engine is executed with four parameters: module, search string, search filter, global options.
- GetList? function is called to get complete associative list of elements. In this list the key is element identificator and value contains an associative array with terms to check against the search terms. Besides
- uid - record unique identificator if any (used for matching)
- name - record short name (used for matching)
- title - title to use to present this record in the results
- description - longer description (html content is allowed)
- props - an associative array containing standard ADEI properties fully describing this record. Fro example, for found data item the props array will contain: db_server, db_name, db_group, and db_mask properties. For found interval, it would be just property window.
- CheckString? function is called on each element of the list, the elements for which the non-zero rating is returned are checked against filters and added to the search results
- To prevent duplicating results, the SEARCHResults::Accept function is used. The results are compared using GetCmpFunction?.
The CheckString? is working in following way:
- The search string is splited in phrases and for each phrase CheckPhrase? function is called.
- Depending on the used module, the CheckPhrase? function is selecting from the associative array describing record a single string value and passes it to the CheckTitlePhrase? function.
- CheckTitlePhrase? checks if passed string is fitting to the current search phrase and returns the rating. The matching is performed in one of 4 supported modes depending on the match modifiers and global options
- defualt - The beginning of any word should match search phrase. The word sinus cosinusfff matches the phrase sinus cosinus, but xsinus cosinus - not.
- word match - The words should match completely. The word sinus cosinus fff matches, and sinus cosinusfff - not.
- fuzzy match - The words boundaries are not important and even xsinus cosinusx matches the sinus cosinus search phrase.
- regex match - In this mode the search phrase considered regular expression and this regular expression is matched against passed string
- Finally the rating computed for all search phrases are reconciled in overall rating using rules described in the section above.
Search Filters
The filters are used to reject part of the search results as well as to add/modify information associated with found record. The filters are specified at the search string as follows:
interval:June 2005
If such filter is found, the INTERVALSearchFilter object (from classes/search/intervalfilter.php) is constructed. This object will get the filter value (June 2005) as a single parameter to its constructor. And it should implement a single function: FilterResult? which should return true if the current record should be filtered out or false otherwise. The FilterResult? receives two parameters:
- associative array with information on current record
- a number between 0 and 1 with the rating of match
Both these parameters can be altered by FilterResult? function.
Example. Lets consider standard item search used in conjunction with interval filter. The search will provide multiple records describing found item (i.e. the associative array with information will contain standard properties: db_server, db_name, db_group, and db_mask). The interval filter is intended to limit the display interval. Therefore, when the FilterResult? function is called, it will add the window property to the associative array limiting display window to June 2005.
If multiple filters are specified they executed sequentially until any filter will not reject the current record.
New Search Engine
- The search engine should provide a list of supported modules in the modules member of class. It is associative array where the key is module id and the value is module title.
- It should define either special Search function or provide at least the GetList? function to be used in conjunction with the approach described above.
GetList? function should return array containing the records. Each record is represented by associative array with following members:
- title - the title used to describe record in the search results
- description - the longer description of the record, HTML content is allowed
- props - the associative array with standard ADEI properties describing the record
- certain - this option indicates what the search module is completely certain what it is this record what he user is actually looking for
- Arbitrary properties used by the search engine for record matching
Example:
array( array( 'title' => 'January 2005', 'props' => array( 'window' => "1104537600-1107216000" ), 'description' => false, 'certain' => true ) )
Besides GetList? function it is highly desirable to provide CheckPhrase? function which will check the record info against the search phrase and return the match rating, from 0 (not matched) to 1 (fully matched). The CheckPhrase? function accepts the following parameters
- The associative array with information described above
- The phrase to match
- Type of match: SEARCH::WORD_MATCH, SEARCH::FUZZY_MATCH, SEARCH::REGEX_MATCH, false (default)
- The search module
- The global options
The special search engines intended to return custom XHTML content should use following approach in the Search function:
$result = new SEARCHResults(NULL, $this, $module, ""); $result->Append("<XHTML content>"); return $result;
The <?xml?> should not be included into the content.
INTERVALSearch Engine
Provided Modules:
- interval - Tries to parse the time interval from textual representation given in search string. The only property window is returned with interval of UNIX timestmaps.
Supported Filters:
- interval - allows to find intersection of two intervals
ITEMSearch Engine
Provided Modules:
- channel - Searches items by uid only
- item - Searches items by uid and name
- group - Searches groups by name
- mask - Searches masks by name
- control - Searches controls by uid only
- control_item - Searches controls by uid and name
- control_group - Searches control groups by name
Supported Filters:
- interval - adds window property to the items specification
PROXYSearch Engine
Provided Modules:
- proxy - downloads XML document from the specified location and applying XSLT stylesheet to convert it into the XHTML. Accepts several parameters:
- xml - the service to obtain XML document from (mandatory)
- xslt - the stylesheet to apply to XML, could be omitted if the service returns XHTML directly
- noprops - instructs ADEI to not add current properties when calling the service, otherwise the passsed db_server, db_name, and other properties would be added in the end of service request.
Supported Filters:
- interval - adds window property to the XML service request
Example usage:
proxy(xml=katrin.php?target=runs;xslt=katrinsearch;noprops)} interval:1218431322-1253472677
String Analysis
At the moment performed by DetectModule? funcion defined in classes/search.php. Should be extended by searchengines claiming the search string.