Changes between Version 12 and Version 13 of adeiSEARCH

Show
Ignore:
Author:
csa (IP: 141.52.232.84)
Timestamp:
09/14/09 01:24:39 (15 years ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • adeiSEARCH

    v12 v13  
    33ADEI has a modular search subsystem. The search capabilities are provided by the search engines which providing one-or-more search modules. Besides the search term, the search string may specify the search modules to perform search and number of limits to filter results. The module parameters could be specified along with the modules. 
    44 
    5 === Format of search string === 
    6 The search string consists of four components: 
    7  * The first component defines type of the search. Examples are ''item search'', ''channel value search'', ''datetime search''. 
    8  * Second component provides some options. For example, demands exact or fuzzy match 
    9  * Third and fourth components are type-dependent and containing search string and additional limits 
     5The search subsystem is implemented around three main classes: 
     6 * ''SEARCHEngine'' - Search engine providing one-or-more search modules, defined in ''classes/searchengine.php'' 
     7 * ''SEARCHFilter'' - Search filter providing an interface to restrict search results (like google's ''site:kernel.org''), defined in ''classes/searchfilter.php'' 
     8 * ''SEARCHResults'' - Provides the results of search 
    109 
    11 '''I''': The format is as follows: 
    12   {{{ [type/module specification] [global flags] <search string> [limits] }}} 
    13 Everything besides search string is optional. By default if the type is not specified, the search string is analyzed. Analysis routine guesses the type of search and executes a default set of modules for this type. The default behavior is to search for channel and group names. See [wiki:adeiSEARCH#StringAnalysis String Analysis ] section for details. 
     10The user supplies to the search subsystem: 
     11 * A list of search modules to perform search along with parameters 
     12 * Global search options 
     13 * Search string 
     14 * A set of limits to restrict search results 
    1415 
    15 The search type is specified in the curly brackets in the beginning of the search string. The search module available in the ''classes/search'' should be indicated (name of the class should be specified). Optional options for the class constructor could be indicated as well. If multiple modules are specified, the multiple searches are performed sequentially. The following format is expected: 
    16   {{{ {module_name(opt1=value1,opt=value2), another_module(...)} }}} 
     16For each specified module, the search subsystem identifies a search engine providing this module and executes the ''Search'' function of the engine. The function returns a ''SEARCHResults'' with results or ''false'' if nothing is found. Finally, the search subsystem merges the results from individual modules and returns the merged ''SEARCHResults'' object or false if no module provided results. 
    1717 
    18 '''II''': The global options are going next to the search type and specified in the square brackets. This options then passed to the search modules with the search string and handled by the module code. The following options are supported: 
    19  * ''='' - Exact match, this means what the search string is matched completely without splitting into the phrases 
    20  * ''w'' - Word match, if not overridden by match modifiers, see below  
    21  * ''~'' - Fuzzy match, if not overridden by match modifiers, see below 
     18The ''SEARCHResults'' object able to store the results of two different types (simultaneously): 
     19 * Standard results: each result item is described by the associative array. The following members are defined 
     20  * ''title'' - short title describing the result item 
     21  * ''description'' - the longer description of the result item, an HTML content is allowed 
     22  * ''props'' - the associative array with standard ADEI properties describing the item 
     23  * ''certain'' - this option indicates what the search module is completely certain what it is this record what the user is actually looking for 
     24  * Arbitrary number of other properties which are used by the search engine internally (for record matching, for example) 
     25Just an example, of associative array describing a found time interval: 
     26{{{ 
     27   array( 
     28     'title' => 'January 2005', 
     29     'props' => array( 
     30         'window' => "1104537600-1107216000" 
     31     ), 
     32     'description' => false, 
     33     'certain' => true 
     34  ) 
     35}}} 
     36 * Custom results: The search engine is mainly used to provide results in ADEI web display. In some cases the results are provided by 3rd party applications which doesn't respect the ADEI way of structuring results, but provide just an HTML page with all results in ready to display form. In order to support such third party application, ADEI search module may return a ''SEARCHResults'' module with custom results. In this case, instead of per-item associative array, the SEARCHResults will store the XHTML content representing all results provided by the search module.  
    2237 
    23 '''III''': Then the search string is follows. If the ''Exact match'' flag is not specified, it consists of the phrases. The phrase is 
    24  * words consisting of alphanumeric symbols, dash and underscore symbols (''-'',''_'') 
    25  * multiple words enclosed in singular or double quotes('") 
    26  * regular expressions enclosed in ''/'' from both ends 
    27  
    28 This is an example of a search string consisting of 4 components: two words, one phrase, and a regular expression: 
    29 {{ 
    30 {word1 word2 "phrase 3" /regexp/ 
    31 }}} 
    32  
    33 Before each phrase, a match modifier could be specified. The following match modifiers are supported 
    34  * if no modifier is specified, the phrases starting from search term will be matched 
    35  * ''='' - full match, the whole words are matched 
    36  * ''~'' - fuzzy match, any part of a word could be matched 
    37  
    38 Please consider following example to understand the meaning of match modifiers. By default if a search for '''sin''' is performed the words '''sin''' and '''sinus''' will be matched, but ''cosinus'' - not. However, if a fuzzy search is given ('''~sin'''), the '''cosinus''' will be matched as well. On other hand if a full match is required ('''=sin'''), only '''sin''' will be matched. Both '''sinus''' and '''cosinus''' will be rejected. 
    39  
    40 The match of each phrase against data records produces ratings from ''0'' to ''1'' indicating match quality. The value ''0'' means what the record is not matched and value ''1'' indicates a full match. If several phrases are listed in search string, the ratings of each phrase match are multiplied to produce overall rating. For example, if phrase1 matched with rating ''0.70'', phrase2 matched with rating ''0.30'' and word3 is fully matched, the overall rating would be: 0.21 = 0.70 * 0.30 * 1. 
    41  
    42 Rating computation could be altered using unary and binary operations. Lets assume what ''[word]'' is a rating of ''word'', then the ratings of these operations are computed as follows: 
    43  * ''! word'' - The resulting rating would be 1 - [word] 
    44  * ''+ word'' - The rating below 1 will be cat to 0 
    45  * ''- word'' - All non-zero ratings will be cut to zero, and zero rating will be replaced with 1 
    46  * ''(word1|word2)'' - The maximal rating amongst [word1] and [word2] 
    47  
    48 Few examples of complex search strings: 
    49 {{{ 
    50 =sinus | cos1 
    51 }}} 
    52 {{{ 
    53 !"a b c" ~d -e +('f g' !(!i (k))) "m n" 
    54 }}} 
    55  
    56 '''IV''': On-or-more limits can be set in the last part of the search string. The following format is expected 
    57 {{{ 
    58  limit_name:limit_value another_limit:another_limit_value 
    59 }}} 
    60  
    61 The limits handling is completely module specific. Example: 
    62 {{{ 
    63 +sinus | cos1 interval:2006 
    64 }}} 
     38The merged results of multiple search module may contain both per-item associative arrays for results of some modules and XHTML results for others. 
    6539 
    6640== Default Implementation == 
    67 The search modules are implemented using [wiki:adeiClassSEARCHEngine SEARCHEngines]. Each SEARCHEngine could provide one or more search module. The !SEARCHEngines are placed in ''classes/search'' folder in ADEI source tree. They should implement a ''Search'' function which accepts four parameters (module, search string, search filter, global options) and returns the [wiki:adeiClassSEARCHResults SEARCHResults] object with results or ''false'' if nothing found. However, standard modules can reuse default ''Search'' function implemented in base class ''classes/searchengine.php''. The following procedure is exeuted in this case: 
     41 
     42 
     43The search modules are implemented using [wiki:adeiClassSEARCHEngine SEARCHEngines]. Each SEARCHEngine could provide one or more search module. The !SEARCHEngines are placed in ''classes/search'' folder in ADEI source tree. They should implement a ''Search'' function which accepts four parameters (module, search string, search filter, global options) and returns the [wiki:adeiClassSEARCHResults SEARCHResults] object with results or ''false'' if nothing found. However, standard modules can reuse default ''Search'' function implemented in base class ''classes/searchengine.php''. The following procedure is executed in this case: 
    6844 * ''Search'' function of ''Search Engine'' is executed with four parameters: module, search string, search filter, global options. 
    6945 * ''GetList'' function is called to get complete associative list of elements. In this list the key is element identificator and value contains an associative array with terms to check against the search terms. Besides  
    181157== String Analysis == 
    182158At the moment performed by ''DetectModule'' funcion defined in classes/search.php. Should be extended by searchengines claiming the search string. 
     159 
     160[wiki:adeiSEARCH/String]