US8296279B1 - Identifying results through substring searching - Google Patents
Identifying results through substring searching Download PDFInfo
- Publication number
- US8296279B1 US8296279B1 US12/132,094 US13209408A US8296279B1 US 8296279 B1 US8296279 B1 US 8296279B1 US 13209408 A US13209408 A US 13209408A US 8296279 B1 US8296279 B1 US 8296279B1
- Authority
- US
- United States
- Prior art keywords
- word
- search
- index
- substrings
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000004590 computer program Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 3
- 240000001972 Gardenia jasminoides Species 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
Definitions
- Locating results for certain search queries containing one or more search terms can be difficult in a conventional search engine.
- Some conventional search engines only retrieve results that exactly match a user-input search term so that pertinent results can be missed.
- those search engines that do identify results that include partial matches of search terms conventionally do so by comparing a string of characters (e.g., a search word or a portion thereof) to all possible results within a database. This type of identification of all pertinent results is computationally intensive. The problem is compounded if a search engine permits the use of a wildcard character that broadens search terms to simultaneously represent numerous words or phrases.
- the present disclosure describes systems, methods, and computer program products that provide substring search results responsive to search queries.
- One method includes identifying a search query, the search query including at least one search term, and using an index to identify a word as a search result for the search query.
- the index includes a substring of the word, one or more inclusive strings corresponding to the substring, the one or more inclusive strings comprising the substring and at least one more character, and one or more word objects, the one or more word objects identifying content including the substring of the word.
- the word occurs in a web page identified by a uniform-resource locator (URL).
- the method includes identifying a numerical variable ‘k’, and generating the index using the numerical variable ‘k’, wherein the substring of the word does not include a greater number of characters than the numerical variable ‘k’.
- the index can include at least one index table or at least on indexed tree.
- identifying the search query can include receiving a search query comprising at least one search term having a wildcard character.
- the search term can divide the search term into two or more sub-patterns.
- a sub-pattern, from the two or more sub-patterns, that is identical to the substring of the word can be identified.
- the word can be compared to the sub-patterns to determine if the word satisfies the sub-patterns.
- the word can be identified as a search result for the search query if the word satisfies the sub-patterns.
- an index can be created, the index corresponding to a word occurring at a network location, the index including two or more substrings of the word, one or more substrings associated with at least one of the two or more substrings of the word, and one or more locations, the one or more locations identifying content including the at least one of the two or more substrings of the word.
- the method also includes identifying a search query, the search query including at least one search term, and using the index to identify the word as a search result for the search query.
- creating an index corresponding to a word can include creating an index corresponding to a word occurring in an object identified by a memory address or a unique ID in a storage device.
- creating an index corresponding to a word can include creating an index corresponding to a word occurring in a web page identified by a uniform-resource locator (URL).
- the method includes receiving a numerical variable ‘k’, where two or more substrings do not include a greater number of characters than the numerical variable ‘k’.
- the index can include at least one index table or at least on indexed tree.
- identifying the search query can include receiving a search query comprising at least one search term having a wildcard character.
- the search term can be divided into two or more sub-patterns.
- a sub-pattern, from the two or more sub-patterns is identified that is identical to the substring, of the two or more substrings, having the least number of associated sub-strings.
- Search results for queries including one or more search terms, including one or more wildcard characters can be retrieved without comparing the search terms to all entries in an index or database. Instead, the search terms are compared to only a subset of indexed entries. Additionally, search results are returned that are inclusive of search terms so that exact matches are not the only search results identified.
- FIG. 1 is a block diagram of an example search system.
- FIG. 2 is a block diagram of the example index system and search system.
- FIG. 3 shows an example index table
- FIG. 4 shows a tree structure corresponding to the example index table of FIG. 3 .
- FIG. 5 shows another example index table.
- FIG. 6 shows an example index process
- FIG. 7 shows an example process for identifying search results
- FIG. 8 shows an example of a generic computing device and a generic mobile device.
- the present disclosure describes systems, methods, and computer program products that identify search results to a query, where the query can include one or more wildcard characters.
- the search results include results that are inclusive of a query.
- a query for the term ‘cycle’ can return results (e.g., words or URLs for web pages containing words) including ‘motorcycle’, ‘bicycle’, ‘tricycle’, and the like.
- results e.g., words or URLs for web pages containing words
- index tables are generated for words and sub-strings of the word.
- the index tables optionally include word objects that can identify, for instance, the URL of one or more web pages including the sub-strings. Search results are identified by comparing terms within a query against the sub-strings in the index tables.
- FIG. 1 is a block diagram of an example online environment 100 .
- the online environment 100 can facilitate the identification and serving of content items, e.g., web pages, advertisements, etc., to users.
- One or more computer networks 140 such as a local area network (LAN), wide area network (WAN), the Internet, or a combination thereof, connect advertisers a search system 120 and client devices 110 a , 110 b , 110 c , . . . , 110 ⁇ .
- Example client devices 110 include personal computers, mobile communication devices, television set-top boxes, etc.
- the search system 120 can include one or more servers that gather, process, maintain, manage information and/or provide search results to the client devices 110 .
- the search system 120 includes a search server 135 that can receive search queries from client devices 110 and facilitate the return of relevant information to the client devices 110 .
- a search server 135 can receive search queries from client devices 110 and facilitate the return of relevant information to the client devices 110 .
- the search system 120 can include an index server 125 that processes and stores information associated with, for example, web pages, which may be collected during a crawl of a network, such as the Internet or a LAN.
- the index server stores word lists that identify words that occur in web pages, along with a word object.
- the word can occur in an object identified by a memory address or a unique ID in a storage device.
- the word object can include, for example, a location of a web page (e.g., URL) in which the word occurs.
- the word object may be a null value, which may occur where a user seeks only words that occur but not the locations at which they occur.
- the system 120 could be used to search a dictionary, where only search results, and not the location of the search results, are provided to a client device.
- the word lists are used by the index server 125 to generate index tables of strings against which user-provided search terms can be compared by the search server 135 .
- the index server 125 can also be separate from and remote to the search system 120 .
- a client device such as client device 110 a submits a search query to the search server 135 , and search results can be provided to the client device 110 a in response to the query.
- the search query can be submitted to a search interface (not illustrated) by a client device 110 , and the search system 120 uses the one or more search terms (including, e.g., one or more ASCII characters and/or words) in the search query to identify relevant search results.
- the search interface can be separate from the search server 135 and/or search system 120 .
- the search results can include a link to web pages provided by the one or more web page publishers and/or by another client device 110 . Other types of results are possible.
- a search term can be, for example, a keyword or group of keywords submitted as part of a search query.
- a user can search for a particular type of yellow flower with a long stem.
- the query submitted can be a search query for ‘yellow gardenia.’
- the search terms for the search query are ‘yellow’ and ‘gardenia.’
- the wildcard characters ‘*’ and ‘?’ can be used as part of a search term, where the question mark ‘?’ can be replaced by any character, and the asterisk ‘*’ can be replaced by any number of characters, or no characters.
- the search term ‘c?r’ is inclusive of the search term ‘car’.
- the query ‘*cycle’ is inclusive of the search terms ‘cycle’, ‘bicycle’, and ‘motorcycle’.
- the search system 120 can index the content provided by content providers (e.g., web page providers or other publishers) for later search and retrieval of search results that are relevant to the queries.
- content providers e.g., web page providers or other publishers
- An exemplary search engine is described in S. Brin and L. Page, ‘The Anatomy of a Large-Scale Hypertextual Search Engine,’ Seventh International World Wide Web Conference, Brisbane, Australia (1998) and in U.S. Pat. No. 6,285,999.
- Search results can include, for example, lists of web page titles, snippets of text extracted from those web pages, and hypertext links to those web pages, and may be grouped into a predetermined number (e.g., ten) of search results.
- FIG. 2 is a functional block diagram of the example index server 125 and search server 135 of FIG. 1 .
- the logical blocks illustrated in FIG. 2 can be implemented in software, hardware, or a combination of hardware and software.
- each of the functional blocks can represent one or more computer processors, threads, and/or objects.
- the functions performed by one of the blocks in FIG. 2 can be performed by another block, or by other components internal or external to the search system 120 .
- a single logical block/processing device can perform the functions of the index server 125 and search server 135 .
- the index server 125 generally includes word list files 215 , an indexer 220 , and one or more indexed tables in memory 225 .
- the word list files 215 identify words occurring within content at certain locations, for instance, web pages having URLs.
- the word list files 215 can be generated by a crawling of a network such as the Internet.
- the word list files can be generated by a search of a database of content, such as documents or emails.
- the indexer 220 takes the list of words and generates an index table for each word.
- Each index table includes a word, a word object (e.g., a location of content in which the word was included, such as a URL), one or more substrings of the word, and word objects associated with each substring, which may identify, for instance, locations (e.g., URLs) of content where the substrings exist.
- the word and word objects can be extracted from the word list files 215 . In some implementations, however, the word objects may be a null value. Additionally, for each word and substring the index table includes a list of inclusive strings that contain the word or substring.
- FIG. 3 shows an example index table 300 for the word ‘xyz’.
- the example index table 300 includes three fields: string 305 , inclusive string(s) 310 , and word object(s) 315 .
- the first string 320 identifies the indexed word ‘xyz’, which represents a search term that is part of a search query.
- the word objects in this example identify the location of content that includes each string in the table 300 .
- ‘xyz’ represents a word identified within content at five locations, e.g., web pages, identified by URLs 1 , 3 , 5 , 7 , and 11 .
- the URL information is provided, for instance, by a word list file to the indexer 220 .
- the remaining strings 330 within the index table 300 are substrings of the indexed word ‘xyz’, which means that each substring includes one or more, but not all of, the characters within the indexed word ‘xyz’. According to some implementations, the remaining strings 330 can correspond to a word (e.g., the first string) in another index such that the other index is referenced for any substrings.
- the inclusive strings field 310 identifies the one or more parent strings that include each substring plus at least one more character, e.g., an ASCII character.
- the inclusive strings for substring ‘z’ 333 are ‘xz’ and ‘yz’, because ‘xz’ and ‘yz’ include the character ‘z’ along with one more ASCII character ('x′ and ‘y’, respectively).
- the inclusive strings for substrings ‘xz’ and ‘yz’ are the indexed word ‘xyz’.
- the indexer 220 can routinely update index tables, for instance, once a day, or whenever word list files are updated or new word list files are generated. If a word is already present in an index table and an additional location is identified for that word the indexer 220 will update the index table for the word with the new location. The indexer 220 also ensures that duplicative word objects (e.g., locations such as URLs) are not identified within an index table for a particular word. On the other hand, if a new word is identified the indexer 220 will generate a new index table for the word.
- duplicative word objects e.g., locations such as URLs
- index table 300 includes data that can be represented by the example tree structure 400 shown in FIG. 4 .
- Each unique string and its corresponding word objects (here, URLs) represents a single node within the tree.
- the word that the index table 300 was generated for (here, ‘xyz’) represents the top node 402 in the tree. For this reason there is no inclusive string (‘IS’) for the word ‘xyz’.
- each node also contains the word object, which in this example identifies the location at which the string occurs, as extracted from the word list files 215 .
- the top node 402 for the word ‘xyz’ identifies URLs 1 , 3 , 5 , 7 , and 11 , just as in the example index table 300 of FIG. 3 .
- FIG. 4 shows nodes 404 , 406 , 408 , 410 , 412 , 414 corresponding to the remaining strings (i.e., substrings of the string ‘xyz’) contained within the example index table 300 .
- Index tables permits the search system 135 to identify search results without comparing a search term to every entry in a database or tree. For instance, if a search is executed for the term ‘y’, the search results can use the table to identify the search results from corresponding inclusive strings containing the searched string, which are ‘xy’, ‘yz’, and ‘xyz’ in the example index table 300 of FIG. 3 and the tree 400 of FIG. 4 . In this example, because ‘xy’ and ‘yz’ do not identify any URLs, the search result will identify only the URLs associated with the string ‘xyz’.
- index table entries (or nodes, such as nodes 412 , 414 , and 408 ) that are associated with characters that are not included in the search term are not considered. This is in contrast to previous, exact match search systems that would require all entries to be considered as possible matches to a search term. As a result, the speed of identifying search results is increased.
- FIG. 5 shows an example index table 500 for the word ‘mukeskumar’ 520 .
- the index system 125 can limit the number of substrings to all possible substrings of size ‘K’ or less, where ‘K’ is a variable that can be set by an administrator of the index system 125 . This serves to decrease the size of the index tables for each word.
- the variable ‘K’ would be greater than or equal to 1, and less than or equal to the length of an indexed word (e.g., less than or equal to 10 in the example index table 500 ).
- the value of ‘K’ is set to 4. Therefore, only substrings of length 4 or less (i.e., substrings that have 4 or less ASCII characters) exist in the index table 500 .
- the size of the variable ‘K’ will impact the number of index table substrings, the number of inclusive strings identified for each substring, and the search time to identify search results. For instance, if the variable ‘K’ is large, e.g., 6 or greater, the number of index table entries and the number of inclusive strings for substrings will be large. If ‘K’ is large and a search is executed for a search term (e.g., word) having a string length (i.e., the number of characters in the search term) that is near ‘K’, the search time will be minimized.
- a search term e.g., word
- the search server 135 uses the index tables to identify search results for a search query having one or more search terms.
- the search server 135 generally includes a word list generator 230 , a pattern verifier 255 , a pattern divider 250 , and a sub-pattern selector 245 .
- a search query 270 input by a user is received from a client device 110 by the pattern divider 250 .
- the search query 270 can include one or more search terms, such as words.
- the one or more search terms can each include one or more wildcard characters to allow a user to submit searches for words including various combination of characters and wildcard characters.
- the wildcard characters ‘*’ and ‘?’ can be used, where the question mark ‘?’ can be replaced by any character, and the asterisk ‘*’ can be replaced by any number of characters, or no characters.
- the pattern divider 250 identifies sub-patterns from each search term of the search query 270 , and transmits 260 each in a sub-pattern array to the sub-pattern selector 245 . This permits the search server 135 to identify all possible search results for search terms including a wildcard expression. For instance, if a search term is ‘ab?cd’, the pattern divider 250 can identify sub-patterns that satisfy portions of the wildcard expression. According to some implementations, the pattern divider 250 can separate a search term into sub-patterns, where the wildcard characters ‘*’ and ‘?’ are used as separators.
- the pattern divider 250 can generate a sub-pattern array that includes ‘ab’ and ‘cd’. If a search term does not include a wildcard character, however, the pattern divider 250 does not divide the search term into sub-patterns.
- the sub-pattern selector 245 receives the sub-pattern array 260 (i.e., list of sub-patterns) and identifies the sub-pattern matching the substring least number of inclusive strings within the indexed tables 225 . For instance, if a search term is ‘ab?cd’, the sub-pattern selector 245 can identify the number of inclusive strings associated with the substring and sub-pattern ‘ab’, and the number of inclusive strings associated with the substring and sub-pattern ‘cd’. The sub-pattern selector 245 will identify the sub-pattern having fewer number of inclusive strings associated with it. This may be done by a simply lookup of the number of inclusive strings in an index table for each substring that matched the sub-patterns.
- the sub-pattern selector 245 could use the index table 500 to identify that the sub-pattern and substring ‘mar’ has fewer inclusive strings than the sub-pattern ‘k’.
- the number of inclusive strings may be determined not only from the number of inclusive strings of the substring matching the sub-pattern, but a sum of the number of inclusive strings for the substring matching the sub-pattern and the number of inclusive strings corresponding to any string in an index identified by the inclusive strings.
- use of the sub-pattern ‘mar’ to identify matching strings reduces the number of inclusive strings that are considered as matches to the search term.
- the sub-pattern selector 245 will count and store the number of inclusive strings associated with each node in an index table for each sub-pattern.
- the sub-pattern selector can attempt to guess the sub-pattern that has the least number of inclusive strings based on the size of a sub-pattern and/or based on the frequency of characters. For instance, if one sub-pattern includes several rarely-occurring characters, the sub-pattern selector can identify that sub-pattern as a better sub-pattern than another sub-pattern having only one or two characters that occur at a much higher frequency (i.e., exist in more locations, e.g., within web pages).
- the word list generator 230 is operable to compare a sub-pattern 243 received from the sub-pattern selector 245 to index tables.
- the word list generator looks up the index tables (or indexed tree) for all the strings that are inclusive strings of the sub-pattern 243 received from the sub-pattern selector. For instance, referring to the example tree 400 of FIG. 4 , if the sub-pattern 243 is ‘xy’, the word list generator 230 will identify the word ‘xyz’ because it is the inclusive string of the sub-pattern and substring ‘xy’.
- the strings (including, e.g., words) identified by the word list generator 230 are provided 257 in a word list to the pattern verifier 255 .
- sub-patterns may be identified as ‘muke’, ‘ukes’, and ‘kesh’.
- the sub-pattern selector 245 will use those sub-patterns to identify the best sub-pattern for use by the word list generator 230 .
- the pattern verifier 255 verifies that the strings and/or words identified by the word list generator 230 satisfy the sub-pattern provided by the pattern divider 250 . If so, the words are added to a word list 265 with their corresponding location as possible search results for a search pattern 270 . Each search term within a query is handled in this manner by the pattern divider 250 , sub-pattern selector 245 , word list generator 230 , and pattern verifier 255 , and the intersection or union of locations within the word list 265 satisfying a search query can be provided as search results.
- search query includes two terms ‘ab?cd’ and ‘d?e’
- intersection of the word lists for each of the individual terms ‘ab?cd’ and ‘d?e’ would identify the locations of content that includes words that satisfy both search terms.
- FIG. 6 shows an example index process 600 .
- a word occurring at one or more locations e.g., within one or more web pages having URLs, is identified along with the one or more locations 605 .
- the indexer 220 receives the word and optionally, its corresponding locations from at least one word list file 215 . If an index table fails to exist for the word, an index table is created for the word 610 , 615 . If an index table already exists for the word, it is determined if any new word objects (e.g., locations such as URLs) are identified by the at least one word list file 610 , 620 . According to some implementations, the determination can be performed by the indexer 220 .
- new word objects e.g., locations
- new word objects e.g., locations
- they are added to a corresponding entry in the index 630 , for example, to the index table.
- the creation and/or update of an index table with new word objects is performed by the indexer 220 .
- a ‘K’ variable is identified to determine whether the index table will contain all substrings of the word, or only substrings of the word having ‘K’ or fewer characters 640 .
- the value of ‘K’ is set by an administrator. If the ‘K’ variable is set to a value that equals or exceeds the string length of the word, the index table for the word is populated with all substrings of the word 645 , 650 . If the ‘K’ variable is set to a value that is less than the string length of the word, the index table for the word is populated with substrings of the word having characters of length ‘K’ or less 645 , 655 .
- the index table is then populated with the strings (the word and its substrings, if any), along with the inclusive strings, and the location (e.g., URLs) at which the words occur 665 .
- the indexer 200 generates the index table and populates the information in the index table.
- FIG. 7 shows an example process for identifying search results 700 .
- a search query having one or more search terms is identified 705 .
- the one or more search terms are received by the search server 135 from a client 110 .
- Each search term can optionally include one or more wildcard characters to allow a user to submit searches for partial words.
- the wildcard characters ‘*’ and ‘?’ can be used, where the question mark ‘?’ can be replaced by any character, and the asterisk ‘*’ can be replaced by any number of characters, or no characters.
- the search term is divided into sub-patterns 710 and a sub-pattern array is generated 715 .
- the pattern divider 250 divides the search term into sub-patterns and each of the sub-patterns are transmitted in a sub-pattern array to the sub-pattern selector 245 .
- the sub-pattern within the sub-pattern array that corresponds to the least number of inclusive strings within the indexed tables 225 (or indexed tree) is identified 720 .
- the sub-pattern selector 245 receives the sub-pattern array 260 (i.e., list of sub-patterns) and identifies the sub-pattern identical to the substring that has the least number of inclusive strings within the indexed tables 225 .
- the identified sub-pattern is compared to the index tables (and/or to an indexed tree containing the same information as the index tables) 725 .
- the word list generator 230 is operable to compare a sub-pattern 243 received from the sub-pattern selector 245 to strings in the index tables. Inclusive strings corresponding to the identified sub-pattern are identified and added to a word list 730 .
- the word list can also include, in some implementations, a word object, such as a word object identifying the location at which each inclusive string occurs. For instance, the word list can include one or more URLs that identify the location of content in which words occur.
- the strings (which may include, e.g., an indexed word) are identified by the word list generator and are compared to the sub-pattern identified in block 720 to determine if the strings satisfy the sub-pattern 735 . If so, the strings are added to a word list (e.g., word list 265 ) with their corresponding location as possible search results 740 .
- a word list e.g., word list 265
- Each search term within a search query is handled in this manner by the pattern divider 250 , sub-pattern selector 245 , word list generator 230 , and pattern verifier 255 . Thus, the process repeats itself until no additional search terms exist within the search query 745 . If there is only one search term, the word object (which may include, e.g., the location of content) for words satisfying the identified sub-pattern are identified, and can be provided and/or displayed to the client 110 as search results 750 , 760 .
- the one or more word objects for words satisfying the identified sub-patterns for each search term is determined, for example, as the intersection of the word objects corresponding to the words satisfying the identified sub-patterns for each search term 750 , 755 .
- a search query includes two terms ‘ab?cd’ and ‘d?e’, where the term ‘ab?cd’ results in the identification of a word at URLs 1 , 2 , and 3 , and the term ‘d?e’ results in the identification of a word at URLs 1 , 3 , and 5
- the intersection of the word objects for the terms ‘ab?cd’ and ‘d?e’ will be URLs 1 and 3 because the content at those locations satisfy both search terms.
- the union of results for two or more search terms may be identified as search results.
- FIG. 8 is a block diagram of an example computer system 800 that can be utilized to implement the systems and methods described herein.
- the system 800 includes a processor 810 , a memory 820 , a storage device 830 , and an input/output device 840 .
- Each of the components 810 , 820 , 830 , and 840 can, for example, be interconnected using a system bus 850 .
- the processor 810 is capable of processing instructions for execution within the system 800 .
- the processor 810 is a single-threaded processor.
- the processor 810 is a multi-threaded processor.
- the processor 810 is capable of processing instructions stored in the memory 820 or on the storage device 830 .
- the memory 820 stores information within the system 800 .
- the memory 820 is a computer-readable medium.
- the memory 820 is a volatile memory unit.
- the memory 820 is a non-volatile memory unit.
- the storage device 830 is capable of providing mass storage for the system 800 .
- the storage device 830 is a computer-readable medium.
- the storage device 830 can, for example, include a hard disk device, an optical disk device, or some other large capacity storage device.
- the input/output device 840 provides input/output operations for the system 800 .
- the input/output device 840 can include one or more of a network interface devices, e.g., an Ethernet card, a serial communication device, e.g., and RS-232 port, and/or a wireless interface device, e.g., and 802.11 card.
- the input/output device can include driver devices configured to receive input data and send output data to other input/output devices, e.g., keyboard, printer and display devices 860 .
- the apparatus, methods, flow diagrams, and structure block diagrams described in this patent document may be implemented in computer processing systems including program code comprising program instructions that are executable by the computer processing system. Other implementations may also be used. Additionally, the flow diagrams and structure block diagrams described in this patent document, which describe particular methods and/or corresponding acts in support of steps and corresponding functions in support of disclosed structural means, may also be utilized to implement corresponding software structures and algorithms, and equivalents thereof.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (28)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/132,094 US8296279B1 (en) | 2008-06-03 | 2008-06-03 | Identifying results through substring searching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/132,094 US8296279B1 (en) | 2008-06-03 | 2008-06-03 | Identifying results through substring searching |
Publications (1)
Publication Number | Publication Date |
---|---|
US8296279B1 true US8296279B1 (en) | 2012-10-23 |
Family
ID=47017520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/132,094 Expired - Fee Related US8296279B1 (en) | 2008-06-03 | 2008-06-03 | Identifying results through substring searching |
Country Status (1)
Country | Link |
---|---|
US (1) | US8296279B1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120109994A1 (en) * | 2010-10-28 | 2012-05-03 | Microsoft Corporation | Robust auto-correction for data retrieval |
US20140222852A1 (en) * | 2013-01-31 | 2014-08-07 | Verint Systems Ltd. | System and method for bit-map based keyword spotting in communication traffic |
WO2016076604A1 (en) * | 2014-11-12 | 2016-05-19 | Samsung Electronics Co., Ltd. | Apparatus and method for processing query |
US20160335294A1 (en) * | 2015-05-15 | 2016-11-17 | Bjorn J. Gruenwald | System and Method for Organizing Data |
TWI604403B (en) * | 2016-05-24 | 2017-11-01 | Fungogo Co Ltd | A quick and accurate way to search for tourist attractions |
US9842111B2 (en) * | 2013-12-22 | 2017-12-12 | Varonis Systems, Ltd. | On-demand indexing |
US9959354B2 (en) | 2015-06-23 | 2018-05-01 | Google Llc | Utilizing user co-search behavior to identify search queries seeking inappropriate content |
US10169451B1 (en) | 2018-04-20 | 2019-01-01 | International Business Machines Corporation | Rapid character substring searching |
US20190294735A1 (en) * | 2018-03-26 | 2019-09-26 | Apple Inc. | Search functions for spreadsheets |
US10732972B2 (en) | 2018-08-23 | 2020-08-04 | International Business Machines Corporation | Non-overlapping substring detection within a data element string |
US10747819B2 (en) | 2018-04-20 | 2020-08-18 | International Business Machines Corporation | Rapid partial substring matching |
US10782968B2 (en) | 2018-08-23 | 2020-09-22 | International Business Machines Corporation | Rapid substring detection within a data element string |
US10996951B2 (en) | 2019-09-11 | 2021-05-04 | International Business Machines Corporation | Plausibility-driven fault detection in string termination logic for fast exact substring match |
US11042371B2 (en) | 2019-09-11 | 2021-06-22 | International Business Machines Corporation | Plausability-driven fault detection in result logic and condition codes for fast exact substring match |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5799299A (en) * | 1994-09-14 | 1998-08-25 | Kabushiki Kaisha Toshiba | Data processing system, data retrieval system, data processing method and data retrieval method |
US6285999B1 (en) | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US6556990B1 (en) * | 2000-05-16 | 2003-04-29 | Sun Microsystems, Inc. | Method and apparatus for facilitating wildcard searches within a relational database |
US20060253431A1 (en) * | 2004-11-12 | 2006-11-09 | Sense, Inc. | Techniques for knowledge discovery by constructing knowledge correlations using terms |
US20070106640A1 (en) * | 2005-10-05 | 2007-05-10 | Udaya Shankara | Searching for strings in messages |
US20070162445A1 (en) * | 2005-11-23 | 2007-07-12 | Dun And Bradstreet | System and method for searching and matching data having ideogrammatic content |
US20080097988A1 (en) * | 2004-11-22 | 2008-04-24 | International Business Machines Corporation | Methods and Apparatus for Assessing Web Page Decay |
US20080104502A1 (en) * | 2006-10-26 | 2008-05-01 | Yahoo! Inc. | System and method for providing a change profile of a web page |
US7444326B1 (en) * | 2002-06-17 | 2008-10-28 | At&T Corp. | Method of performing approximate substring indexing |
US20080294597A1 (en) * | 2006-09-19 | 2008-11-27 | Exalead | Computer-implemented method, computer program product and system for creating an index of a subset of data |
US20080313128A1 (en) * | 2007-06-12 | 2008-12-18 | Microsoft Corporation | Disk-Based Probabilistic Set-Similarity Indexes |
US20090300770A1 (en) * | 2002-09-18 | 2009-12-03 | Rowney Kevin T | Mechanism to search information content for preselected data |
-
2008
- 2008-06-03 US US12/132,094 patent/US8296279B1/en not_active Expired - Fee Related
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5799299A (en) * | 1994-09-14 | 1998-08-25 | Kabushiki Kaisha Toshiba | Data processing system, data retrieval system, data processing method and data retrieval method |
US6285999B1 (en) | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US6556990B1 (en) * | 2000-05-16 | 2003-04-29 | Sun Microsystems, Inc. | Method and apparatus for facilitating wildcard searches within a relational database |
US7444326B1 (en) * | 2002-06-17 | 2008-10-28 | At&T Corp. | Method of performing approximate substring indexing |
US20090300770A1 (en) * | 2002-09-18 | 2009-12-03 | Rowney Kevin T | Mechanism to search information content for preselected data |
US20060253431A1 (en) * | 2004-11-12 | 2006-11-09 | Sense, Inc. | Techniques for knowledge discovery by constructing knowledge correlations using terms |
US20080097988A1 (en) * | 2004-11-22 | 2008-04-24 | International Business Machines Corporation | Methods and Apparatus for Assessing Web Page Decay |
US20070106640A1 (en) * | 2005-10-05 | 2007-05-10 | Udaya Shankara | Searching for strings in messages |
US20070162445A1 (en) * | 2005-11-23 | 2007-07-12 | Dun And Bradstreet | System and method for searching and matching data having ideogrammatic content |
US20080294597A1 (en) * | 2006-09-19 | 2008-11-27 | Exalead | Computer-implemented method, computer program product and system for creating an index of a subset of data |
US20080104502A1 (en) * | 2006-10-26 | 2008-05-01 | Yahoo! Inc. | System and method for providing a change profile of a web page |
US20080313128A1 (en) * | 2007-06-12 | 2008-12-18 | Microsoft Corporation | Disk-Based Probabilistic Set-Similarity Indexes |
Non-Patent Citations (2)
Title |
---|
Brin, S. and Page, L., "The anatomy of a large-scale hypertextual web search engine," Computer Networks and ISDN Systems, 30:107-117, 1998. |
W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization, In Symposium on Document Analysis and Information Retrieval, pp. 161-176, University of Nevada-Las Vegas, 1994. * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120109994A1 (en) * | 2010-10-28 | 2012-05-03 | Microsoft Corporation | Robust auto-correction for data retrieval |
US20140222852A1 (en) * | 2013-01-31 | 2014-08-07 | Verint Systems Ltd. | System and method for bit-map based keyword spotting in communication traffic |
US9690873B2 (en) * | 2013-01-31 | 2017-06-27 | Verint Systems Ltd. | System and method for bit-map based keyword spotting in communication traffic |
US9842111B2 (en) * | 2013-12-22 | 2017-12-12 | Varonis Systems, Ltd. | On-demand indexing |
US10810247B2 (en) | 2013-12-22 | 2020-10-20 | Varonis Systems, Ltd. | On-demand indexing |
US10482082B2 (en) | 2014-11-12 | 2019-11-19 | Samsung Electronics Co., Ltd. | Apparatus and method for processing query |
WO2016076604A1 (en) * | 2014-11-12 | 2016-05-19 | Samsung Electronics Co., Ltd. | Apparatus and method for processing query |
KR20160056591A (en) * | 2014-11-12 | 2016-05-20 | 삼성전자주식회사 | Query processing apparatus and method |
KR102329333B1 (en) | 2014-11-12 | 2021-11-23 | 삼성전자주식회사 | Query processing apparatus and method |
US20160335294A1 (en) * | 2015-05-15 | 2016-11-17 | Bjorn J. Gruenwald | System and Method for Organizing Data |
US9959354B2 (en) | 2015-06-23 | 2018-05-01 | Google Llc | Utilizing user co-search behavior to identify search queries seeking inappropriate content |
TWI604403B (en) * | 2016-05-24 | 2017-11-01 | Fungogo Co Ltd | A quick and accurate way to search for tourist attractions |
US20190294735A1 (en) * | 2018-03-26 | 2019-09-26 | Apple Inc. | Search functions for spreadsheets |
US10747819B2 (en) | 2018-04-20 | 2020-08-18 | International Business Machines Corporation | Rapid partial substring matching |
US10169451B1 (en) | 2018-04-20 | 2019-01-01 | International Business Machines Corporation | Rapid character substring searching |
US10732972B2 (en) | 2018-08-23 | 2020-08-04 | International Business Machines Corporation | Non-overlapping substring detection within a data element string |
US10782968B2 (en) | 2018-08-23 | 2020-09-22 | International Business Machines Corporation | Rapid substring detection within a data element string |
US10996951B2 (en) | 2019-09-11 | 2021-05-04 | International Business Machines Corporation | Plausibility-driven fault detection in string termination logic for fast exact substring match |
US11042371B2 (en) | 2019-09-11 | 2021-06-22 | International Business Machines Corporation | Plausability-driven fault detection in result logic and condition codes for fast exact substring match |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8296279B1 (en) | Identifying results through substring searching | |
CA2669236C (en) | Extending keyword searching to syntactically and semantically annotated data | |
US8489573B2 (en) | Search engine | |
US8812531B2 (en) | Concept bridge and method of operating the same | |
CN100590617C (en) | Phrase-based indexing method and system in information retrieval system | |
US8751484B2 (en) | Systems and methods of identifying chunks within multiple documents | |
US7937395B2 (en) | Systems and methods of displaying and re-using document chunks in a document development application | |
US6754650B2 (en) | System and method for regular expression matching using index | |
US9275144B2 (en) | System and method for metadata search | |
US8924374B2 (en) | Systems and methods of semantically annotating documents of different structures | |
US8352485B2 (en) | Systems and methods of displaying document chunks in response to a search request | |
US20110119262A1 (en) | Method and System for Grouping Chunks Extracted from A Document, Highlighting the Location of A Document Chunk Within A Document, and Ranking Hyperlinks Within A Document | |
US20090217159A1 (en) | Systems and Methods of Performing a Text Replacement Within Multiple Documents | |
US8126880B2 (en) | Systems and methods of adaptively screening matching chunks within documents | |
US8924421B2 (en) | Systems and methods of refining chunks identified within multiple documents | |
Wheeldon et al. | DbSurfer: A search and navigation tool for relational databases | |
US20060179046A1 (en) | Web operation language | |
Ahuja et al. | Hidden web data extraction tools | |
Aggarwal et al. | Ranking of Web Documents for Domain Specific Database | |
Abdulrahman | Web Pages Ranking Algorithms: A Survey | |
Rao et al. | Web Search Engine | |
KR20020067162A (en) | Method and system for indexing document | |
Singhal et al. | Text content based web page refresh policy | |
Sharma et al. | Crawler indexing using tree structure and its implementation | |
Gupta et al. | A Framework for Context Ontology Driven Indexing, Ranking and Search in Search Engines. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SINGH, MUKESH KUMAR;REEL/FRAME:021236/0239 Effective date: 20080603 |
|
ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
CC | Certificate of correction | ||
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044101/0405 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20241023 |