JPH11328191A

JPH11328191A - Www robot retrieving system

Info

Publication number: JPH11328191A
Application number: JP10129829A
Authority: JP
Inventors: Takashi Kato; 剛史加藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-05-13
Filing date: 1998-05-13
Publication date: 1999-11-30

Abstract

PROBLEM TO BE SOLVED: To provide an automatic leading-out method of a searching reference point not for simple retrieval but for retrieving only a pertinent page by estimating a WWW page updated in advance. SOLUTION: This system is provided with an update frequency arithmetic engine 22 calculating the update frequency of an optional WWW page from a retrieved result, an update expecting degree arithmetic engine 23 calculating an update expected value in an optional time from the updating frequency and a retrieving order table preparing engine 24 automatically extracting retrieving priority order. A retrieving order table is prepared by estimating whether the optional WWW page is updated at a certain time from the engine 22 and the engine 23. This WWW robot retrieving system retrieves according to this retrieving order table.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ＷＷＷ（Ｗｏｒｌ
ｄＷｉｄｅｗｅｂ）ロボットによる検索時の探索基準
点の導出方法に関し、特にＷＷＷロボットによる検索を
行う場合において、開始点となる複数個の探索基準点か
ら最適な検索順序を求めるためのＷＷＷロボット検出シ
ステムに関する。The present invention relates to a WWW (World).
d Wide web) A method for deriving a search reference point at the time of a search by a robot, and particularly to a WWW robot detection system for obtaining an optimum search order from a plurality of search reference points serving as start points when performing a search by a WWW robot. .

【０００２】[0002]

【従来の技術】ＷＷＷロボット検索システムは、ＷＷＷ
サーバ内部のＷＷＷページの構造や各ＷＷＷページの更
新を検出する機能を有することを特徴とするシステムで
ある。ＷＷＷロボット検索システムは、特定のＷＷＷサ
ーバのトップページやある特定のＷＷＷページを始点と
してＷＷＷサーバの検索を実施し、検索によって取得し
たＷＷＷページ情報からページを記述しているＨＴＭＬ
（ＨｙｐｅｒＴｅｘｔＭａｋｅｕｐＬａｎｇｕａｇ
ｅ）を解析して、このＷＷＷページからハイパーリンク
されている次のＷＷＷページの位置を抽出する。なお、
ＨＴＭＬは、ＷＷＷにおいてクライアントとサーバとが
通信するためのプロトコルであるＨＴＴＰ（Ｈｙｐｅｒ
ＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｃｏｌ）にした
がってハイパーテキストを記述するための言語である。2. Description of the Related Art A WWW robot search system is called WWW.
This system has a function of detecting a structure of a WWW page inside a server and an update of each WWW page. The WWW robot search system searches a WWW server starting from a top page of a specific WWW server or a certain WWW page, and describes an HTML page from the WWW page information obtained by the search.
(HyperTextMakeup Languag
e) is analyzed to extract the position of the next WWW page hyperlinked from this WWW page. In addition,
HTML is an HTTP (Hyper) that is a protocol for communication between a client and a server in the WWW.
This is a language for describing hypertext in accordance with Text Transfer Protocol (Text Transfer Protocol).

【０００３】また、この検索システムは、ＷＷＷサーバ
ーを検索していく過程において検出した、新規に作成さ
れたＷＷＷページ、変更されたＷＷＷページ、削除され
たＷＷＷページの内容と位置情報を記憶する。ＷＷＷロ
ボットは前述の手順により目的のＷＷＷサーバ内部のＷ
ＷＷページの構造、更新履歴の管理を行う。Further, this search system stores the contents and position information of a newly created WWW page, a changed WWW page, and a deleted WWW page detected in the process of searching for a WWW server. The WWW robot performs the above-described procedure to set the W inside the target WWW server.
It manages the structure of the WW page and the update history.

【０００４】しかしながら、従来のＷＷＷ検索ロボット
においては、検索の開始位置は必ず操作者によって予め
指定されており、同様に検索順序も操作者があらかじめ
指定されている。ＷＷＷ検索ロボットの検索処理は、操
作者があらかじめ指定した検索の開始位置と検索順序か
ら、順次ＨＴＭＬを解析しＷＷＷページを取得するルー
ルに基づいて処理している。However, in the conventional WWW search robot, the search start position is always specified by the operator in advance, and the search order is similarly specified by the operator in advance. The search process of the WWW search robot is performed based on a rule for sequentially analyzing HTML and acquiring a WWW page from a search start position and a search order designated by an operator in advance.

【０００５】このように操作者が事前に指定したパラメ
ータをそのままで用いるような検索ルールで動くＷＷＷ
ロボット検索システムにおいては、広範囲に多くのＷＷ
Ｗページを取得する場合にＷＷＷロボットが検索する範
囲と検索時間、およびネットワークへの負荷は、検索す
るＷＷＷサーバの数とハイパーリンク（ＨｙｐｅｒＬｉ
ｎｋ）の深さの積算に比例する。[0005] As described above, the WWW that operates according to a search rule that uses the parameters specified in advance by the operator as they are.
In robot search system, many WW
When a W page is acquired, the range and search time of the WWW robot and the load on the network are determined by the number of WWW servers to be searched and the hyperlink (HyperLi).
nk) is proportional to the sum of the depths.

【０００６】従来の文書検索システムにおいて、利用者
から指示された語句を含む文書を検索するに当たり、記
憶された文書全体を対象として全文検索を行う機能と、
各文書から予め抽出された語句により構成される索引を
参照して指示語句を含む文書を検索するキーワード検索
機能とを備え、さらにこれら両機能のいずれを利用すべ
きかを、指示された語句その他の条件から判定し、この
判定結果にしたがっていずれか有利な検索を行う判定手
段を備えた文書検索システムも提案されている（例え
ば、特開平１０−２１２５５号公報）。しかし、この文
書検索システムでは、全文検索、キーワード検索のいず
れが有利かについてのみ判定するものであり、検索開始
の基準点や検索順序は所定基準に従って実行される。In a conventional document search system, when searching for a document including a phrase specified by a user, a function of performing a full-text search on the entire stored document;
A keyword search function for searching for a document containing the instruction word by referring to an index composed of words and phrases extracted in advance from each document, and further determining which of these two functions should be used, the specified word or the like There has also been proposed a document search system provided with a judgment means for judging from conditions and performing any advantageous search according to the judgment result (for example, JP-A-10-21255). However, in this document search system, it is determined only which of the full text search and the keyword search is more advantageous, and the reference point for starting the search and the search order are executed according to a predetermined reference.

【０００７】これら従来のＷＷＷロボット検索システム
においては、必ず操作者の指定した開始点から、順次ハ
イパーリンクされているＷＷＷページを継続的に検索す
る。しかし、現在のＷＷＷサーバ内部のＷＷＷページの
構造は複雑、かつ、ハイパーリンクの階層も非常に深く
なっており、操作者が事前に開始点を与えて、本情報を
基に順次検索を行う従来の検索ルールでは、ＷＷＷサー
バ内部で実際に変更されたＷＷＷページに到達するまで
非常に多くの時間を要するといった問題点が指摘されて
いる。さらに、ＷＷＷサーバ検索はネットワークを経由
して行う関係から、不要な検索を多く行うことによりネ
ットワークリソースを浪費し、さらに、ネットワーク負
荷を増大させてしまうという問題点もある。In these conventional WWW robot search systems, WWW pages that are hyperlinked sequentially are always searched from the start point specified by the operator. However, the structure of the current WWW page inside the WWW server is complicated and the hierarchy of hyperlinks is very deep, so that the operator gives a starting point in advance and sequentially searches based on this information. Has pointed out a problem that it takes a very long time to reach a WWW page actually changed inside a WWW server. Further, since the WWW server search is performed via a network, there is a problem in that unnecessary search is performed a lot to waste network resources and increase the network load.

【０００８】[0008]

【発明が解決しようとする課題】本発明の課題は、上述
のような従来技術の問題点を解消し、ＷＷＷサーバ内の
各ＷＷＷページの更新頻度と更新期待度を導出し、これ
ら２つから最適な検索開始点の検索順序を自動的に導出
することであり、またＷＷＷロボットによる検索時のネ
ットワークへの負荷を軽減するＷＷＷロボット検索シス
テムを提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems of the prior art, derive the update frequency and update expectation of each WWW page in a WWW server, and derive from these two. An object of the present invention is to provide a WWW robot search system for automatically deriving a search order of an optimum search start point and reducing a load on a network at the time of a search by a WWW robot.

【０００９】[0009]

【課題を解決するための手段】本発明は、ＷＷＷロボッ
ト検索システムがＷＷＷページを検索するための優先順
位の決定を自動的に導出して、それに従って実際のＷＷ
Ｗページの検索を実施する。より具体的には、図１に示
すように、ＷＷＷロボット検索システムは、検索した結
果より任意のＷＷＷページの更新頻度を演算する更新頻
度演算エンジン２２と、ＷＷＷページの更新頻度から任
意の時間における更新期待値を演算する更新期待度演算
エンジン２３と、更新頻度と更新期待度の値からＷＷＷ
ページの検索優先順位を自動抽出する検索順序テーブル
作成エンジン２４と、を具備している。SUMMARY OF THE INVENTION The present invention automatically derives a priority determination for a WWW robot search system to search for a WWW page, and accordingly determines the actual WW.
A search for the W page is performed. More specifically, as shown in FIG. 1, the WWW robot search system includes an update frequency calculation engine 22 that calculates an update frequency of an arbitrary WWW page from a search result, and an update frequency engine at an arbitrary time based on the update frequency of the WWW page. An update expectation calculation engine 23 for calculating an update expectation value; and WWW from the update frequency and the update expectation value.
A search order table creation engine 24 for automatically extracting the search priority of the page.

【００１０】[0010]

【作用】ＷＷＷサーバ・ページ構造記憶部３２は、ＷＷ
Ｗサーバ検索エンジン２１が検索した結果であるＷＷＷ
サーバ内部のＷＷＷページの情報とページ間の繋がりを
表す構造情報を記憶している。ＷＷＷサーバ・ページ更
新頻度演算エンジン２２は、ＷＷＷページの構造情報か
ら各々のＷＷＷページの更新頻度の値を演算し、その結
果をＷＷＷサーバ・ページ更新頻度記憶部３３に記憶さ
せる。The WWW server / page structure storage unit 32 stores
WWW as a result of the search by the W server search engine 21
It stores information on WWW pages inside the server and structural information indicating connections between pages. The WWW server / page update frequency calculation engine 22 calculates the value of the update frequency of each WWW page from the structure information of the WWW page, and stores the result in the WWW server / page update frequency storage unit 33.

【００１１】ＷＷＷサーバ・ページ更新期待度演算エン
ジン２３は更新頻度記憶部３３の情報からある時刻にお
ける各々のＷＷＷページが更新される期待度を演算する
ためのパラメータを自動生成して、このパラメータをＷ
ＷＷサーバ・ページ更新期待度記憶部３４に記憶させ
る。ＷＷＷサーバ検索順序テーブル作成エンジン２４は
更新頻度と更新期待度から次回、ＷＷＷサーバ検索エン
ジン２１が検索を実行する際に、どのＷＷＷサーバのペ
ージから検索しているかを順序付けたテーブルを自動生
成する。The WWW server / page update expectation calculation engine 23 automatically generates a parameter for calculating the degree of expectation that each WWW page is updated at a certain time from the information in the update frequency storage unit 33, and stores this parameter. W
It is stored in the WW server / page update expectation storage unit 34. The WWW server search order table creation engine 24 automatically generates a table in which WWW server pages are searched from the next time when the WWW server search engine 21 executes a search from the update frequency and the update expectation degree.

【００１２】[0012]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して詳細に説明する。図１を参照すると、
本発明の第一の実施の形態は、キーボートやディスプレ
イなどの入出力装置１とプログラム制御により動作する
演算処理部２と、情報を記憶する記憶装置３、そしてイ
ンターネット等を介して外部のＷＷＷサーバと情報のや
り取りを行うネットワーク通信装置４とを含む。Next, embodiments of the present invention will be described in detail with reference to the drawings. Referring to FIG.
In the first embodiment of the present invention, an input / output device 1 such as a keyboard or a display, an arithmetic processing unit 2 operated by program control, a storage device 3 for storing information, and an external WWW server via the Internet or the like are provided. And a network communication device 4 for exchanging information.

【００１３】記憶装置３は、検索スケジュール記憶部３
１と、ＷＷＷサーバ・ページ構造記憶部３２、ＷＷＷサ
ーバ・ページ更新頻度記憶部３３、ＷＷＷサーバ・ペー
ジ更新期待値記憶部３４、検索順序テーブル３５とを備
える。The storage device 3 includes a search schedule storage unit 3
1, a WWW server / page structure storage unit 32, a WWW server / page update frequency storage unit 33, a WWW server / page update expected value storage unit 34, and a search order table 35.

【００１４】検索スケジュール記憶部３１は、過去にＷ
ＷＷサーバを検索した開始時間と検索に要した時間の履
歴情報を記憶する。The search schedule storage unit 31 stores W
The history information of the start time at which the WW server was searched and the time required for the search are stored.

【００１５】ＷＷＷサーバ・ページ構造記憶部３２は、
検索することによって得られたＷＷＷサーバ内に配置さ
れているＷＷＷページの文章の内容とこれらＷＷＷペー
ジが各々どのような接続関係にあるのかを記憶する。The WWW server / page structure storage unit 32
The contents of the sentences of the WWW pages arranged in the WWW server obtained by the search and the connection relations of these WWW pages are stored.

【００１６】ＷＷＷサーバ・ページ更新頻度記憶部３３
は、各ＷＷＷページの更新頻度の度合いを算出して数値
化した情報を記憶する。WWW server / page update frequency storage unit 33
Stores information obtained by calculating the degree of update frequency of each WWW page and quantifying it.

【００１７】ＷＷＷサーバ・ページ更新期待値記憶部３
４は、各ＷＷＷページが任意の時間の時点で更新されて
いると期待される可能性を算出して数値化した情報を記
憶する。WWW server page update expected value storage unit 3
Numeral 4 stores information obtained by calculating the possibility that each WWW page is expected to be updated at an arbitrary time and quantifying it.

【００１８】検索順序テーブル３５は、次回に検索を行
う場合に、どのＷＷＷサーバのどのＷＷＷページから検
索するのかという観点から優先順序づけられた情報を記
憶する。The search order table 35 stores information that is prioritized from the viewpoint of which WWW page of which WWW server to search from when performing a next search.

【００１９】演算処理部２は、ＷＷＷサーバ検索エンジ
ン２１と、ＷＷＷサーバ・ページ更新頻度演算エンジン
２２、ＷＷＷサーバ・ページ更新期待値演算エンジン２
３、ＷＷＷサーバ検索順序テーブル作成エンジン２４と
を備える。The calculation processing unit 2 includes a WWW server search engine 21, a WWW server / page update frequency calculation engine 22, and a WWW server / page update expected value calculation engine 2.
3. It has a WWW server search order table creation engine 24.

【００２０】ＷＷＷサーバ検索エンジン２１は、入出力
装置１からの実行命令を契機として検索順序テーブルが
記憶している順序情報に従って、ネットワーク装置４を
経由して外部のＷＷＷサーバ・ページの検索を実施す
る。検索の結果、ＷＷＷページの更新や新規ＷＷＷペー
ジの追加・削除、およびＷＷＷページ間のハイパーリン
ク関係といった情報をＷＷＷサーバ・ページ構造記憶部
３２に記憶させる。ＷＷＷページを検索した時間の情報
は検索スケジュール記憶部３１に記憶させる。The WWW server search engine 21 searches for an external WWW server page via the network device 4 in accordance with the order information stored in the search order table, triggered by an execution command from the input / output device 1. I do. As a result of the search, information such as updating of WWW pages, addition / deletion of new WWW pages, and hyperlink relationships between WWW pages is stored in the WWW server / page structure storage unit 32. Information on the time at which the WWW page was searched is stored in the search schedule storage unit 31.

【００２１】ＷＷＷサーバ・ページ更新頻度演算エンジ
ン２２は、ＷＷＷサーバ・ページ構造記憶部３２が記憶
しているＷＷＷページの更新情報やハイパーリンク情報
とＷＷＷサーバ・ページ更新頻度記憶部３３が記憶して
いるＷＷＷページ毎の更新頻度に関する情報を基に新た
に各々のＷＷＷページの更新頻度を演算して、演算結果
をＷＷＷサーバ・ページ更新頻度記憶部３３に記憶させ
る。The WWW server page update frequency calculation engine 22 stores the WWW page update information and hyperlink information stored in the WWW server page structure storage unit 32 and the WWW server page update frequency storage unit 33. The update frequency of each WWW page is newly calculated on the basis of the information on the update frequency of each existing WWW page, and the calculation result is stored in the WWW server / page update frequency storage unit 33.

【００２２】ＷＷＷサーバ・ページ更新期待値演算エン
ジン２３は、ＷＷＷサーバ・ページ更新頻度記憶部３３
が記憶している情報を元に、各ＷＷＷページが任意の時
間の時点で更新されていると期待される度合いを演算し
て、その結果をＷＷＷサーバページ更新期待値記憶部34
に記憶させる。The WWW server / page update expected value calculation engine 23 includes a WWW server / page update frequency storage unit 33.
Calculates the degree that each WWW page is expected to be updated at an arbitrary time based on the information stored by the server, and stores the result as a WWW server page update expected value storage unit 34
To memorize.

【００２３】ＷＷＷサーバ検索順序テーブル作成エンジ
ン２４は、入出力装置１から入力されたキーとＷＷＷサ
ーバ・ページ更新頻度記憶部３３、およびＷＷＷサーバ
・ページ更新期待度記憶部３４が記憶している情報から
検索順序を演算して、その結果を検索順序テーブル３５
に記憶させる。The WWW server search order table creation engine 24 stores the key input from the input / output device 1 and the information stored in the WWW server / page update frequency storage unit 33 and the WWW server / page update expectation storage unit 34. From the search order table 35
To memorize.

【００２４】[0024]

【動作の説明】次に、図１、図２、図３、図４、図５を
参照して本発明にかかる構成の動作について詳細に説明
する。Next, the operation of the configuration according to the present invention will be described in detail with reference to FIG. 1, FIG. 2, FIG. 3, FIG. 4, and FIG.

【００２５】入出力装置１からの実行指示を受信を契機
に、ＷＷＷサーバ検索エンジン２１は検索順序テーブル
３５があらかじめ記憶している検索順序に従ってネット
ワーク通信装置４を用いて外部のＷＷＷサーバからＷＷ
Ｗページの情報を取得する。検索によって取得するＷＷ
Ｗページ情報は、「現在検索しているＷＷＷページを記
述しているＨＴＭＬ」、「現在検索しているＷＷＷペー
ジが前回検索したときから更新されているか」、「現在
検索しているＷＷＷページをハイパーリンクしていた親
のＷＷＷページ」、「現在検索しているＷＷＷページが
ハイパーリンクしている子のＷＷＷページ」の４つであ
る。ＷＷＷサーバ検索エンジン２１はこれら４つのＷＷ
Ｗページに関する情報をＷＷＷサーバ・ページ構造記憶
部３２に記憶させる。In response to receiving the execution instruction from the input / output device 1, the WWW server search engine 21 uses the network communication device 4 according to the search order stored in the search order table 35 in advance to send a WWW from an external WWW server.
Acquires information on the W page. WW obtained by search
The W page information includes “HTML describing the currently searched WWW page”, “whether the currently searched WWW page has been updated since the last search”, and “WWW page currently searched”. The parent WWW page that has been hyperlinked, and the child WWW page to which the currently searched WWW page is hyperlinked. The WWW server search engine 21 uses these four WWs
Information on the W page is stored in the WWW server / page structure storage unit 32.

【００２６】ＷＷＷサーバ・ページ更新頻度演算エンジ
ン２２はＷＷＷサーバ・ページ構造記憶部３２が記憶し
ている情報を基に、ＷＷＷサーバやＷＷＷページの更新
の頻度の度合いを示すＷＷＷサーバ・ページ更新頻度を
演算して決定する。The WWW server / page update frequency calculation engine 22 uses the information stored in the WWW server / page structure storage unit 32 to update the WWW server and the WWW page, which indicates the frequency of update of the WWW page. Is calculated and determined.

【００２７】ＷＷＷサーバ検索エンジン２１で検索した
ＷＷＷサーバ内部のＷＷＷページの構造は図２のような
パーセプトロン型のニューラルネットと同等の形状をし
ている。そこで、本発明では各ＷＷＷページのニューラ
ルネットのノードと見なし、更新頻度を各ＷＷＷページ
（ノード）の持つ重みがＷＷＷページの更新頻度に相当
すると考えて各ＷＷＷページの更新頻度を演算してい
く。図２のようなＷＷＷページ構造をしているＷＷＷサ
ーバの場合、ＷＷＷサーバ・ページ更新頻度演算エンジ
ン２２は任意のＷＷＷページa の更新頻度をＷＷＷペー
ジa からハイパーリンクする子のＷＷＷページの更新頻
度と現状のＷＷＷページa の更新頻度の値から図２に示
す式(1) を用いて求める。The structure of the WWW page inside the WWW server searched by the WWW server search engine 21 has the same shape as a perceptron type neural network as shown in FIG. Therefore, in the present invention, the update frequency of each WWW page is calculated by assuming that the update frequency of each WWW page (node) corresponds to the update frequency of the WWW page, assuming that the weight of each WWW page (node) corresponds to the update frequency of each WWW page. . In the case of a WWW server having a WWW page structure as shown in FIG. 2, the WWW server / page update frequency calculation engine 22 updates the update frequency of an arbitrary WWW page a with the update frequency of a child WWW page hyperlinked from the WWW page a. And the value of the update frequency of the current WWW page a using Equation (1) shown in FIG.

【００２８】ＷＷＷサーバ・ページ更新頻度演算エンジ
ン２２はＷＷＷサーバ内の全てのＷＷＷページの更新頻
度を導出する手法として、任意のＷＷＷページに着目し
て、その着目したＷＷＷページの更新頻度を演算してい
く手法を用いる。着目するＷＷＷページの決定手順を以
下で説明していく。As a technique for deriving the update frequency of all WWW pages in the WWW server, the WWW server page update frequency calculation engine 22 focuses on an arbitrary WWW page and calculates the update frequency of the focused WWW page. Use the following method. The procedure for determining the WWW page of interest will be described below.

【００２９】第一に、ＷＷＷサーバ・ページ更新頻度演
算エンジン２２は、ＷＷＷサーバ・ページ構造記憶部３
２が記憶しているＷＷＷページの中で最下部に位置する
子のＷＷＷページに着目して、このページのＷＷＷサー
バ・ページ更新頻度を導出する。第二に、先ほど更新頻
度を導出したページにマークを付ける。第三に、ＷＷＷ
サーバ・ページ更新頻度演算エンジン２２は、ＷＷＷサ
ーバ・ページ構造記憶部３２が記憶しているＷＷＷペー
ジの中でマークの付いていないＷＷＷページの中で最下
部に位置する子のＷＷＷページを次の着目するＷＷＷペ
ージとする。以後、この手順を繰り返すことによりＷＷ
Ｗページの更新頻度を導出していく。First, the WWW server / page update frequency calculation engine 22 stores the WWW server / page structure storage unit 3.
Paying attention to the child WWW page located at the bottom of the WWW pages stored in No. 2, the WWW server page update frequency of this page is derived. Second, the page from which the update frequency was derived is marked. Third, WWW
The server / page update frequency calculation engine 22 converts the child WWW page located at the bottom of the unmarked WWW pages among the WWW pages stored in the WWW server / page structure storage unit 32 into the next one. The target WWW page is set. Thereafter, by repeating this procedure, WW
The update frequency of the W page is derived.

【００３０】８つのＷＷＷページが図３のような関係で
ハイパーリンクされている場合には、演算順序は、例え
ば図３の枠１内に示すように『ページ８→ページ５→ペ
ージ６→ページ７→ページ２→ページ３→ページ４→ペ
ージ１』という具合に演算していく。If the eight WWW pages are hyperlinked in the relationship shown in FIG. 3, the calculation order is, for example, as shown in the frame 1 in FIG. 3, "page 8 → page 5 → page 6 → page 7 → Page 2 → Page 3 → Page 4 → Page 1 ”.

【００３１】ＷＷＷサーバ・ページ更新頻度演算エンジ
ン２２は上記で述べた手法により各々のＷＷＷページの
更新頻度を導き出し、求めた更新頻度をＷＷＷサーバ・
ページ更新頻度記憶部３３に記憶させる。The WWW server / page update frequency calculation engine 22 derives the update frequency of each WWW page by the method described above, and converts the obtained update frequency to the WWW server / page.
It is stored in the page update frequency storage unit 33.

【００３２】ＷＷＷサーバ・ページ更新期待度演算エン
ジン２３は、ある時点におけるＷＷＷページの更新され
ていると期待できる度合いの値を演算して、その結果を
ＷＷＷサーバ・ページ更新期待度記憶部３４に記憶させ
る。The WWW server / page update expectation calculation engine 23 calculates a value of a degree at which a WWW page can be expected to be updated at a certain point in time, and stores the result in the WWW server / page update expectation storage unit 34. Remember.

【００３３】ＷＷＷサーバ・ページ更新期待度エンジン
２３は、各々のＷＷＷページの更新期待度を求める手法
として次の手法を用いて演算する。ＷＷＷサーバ・ペー
ジ更新期待度演算エンジン２３はＷＷＷページの更新期
待を演算する場合に、検索スケジュール記憶部３１が記
憶している前回検索を実施した時点から現在検索を実施
した時点に至るまでの経過時間と、ＷＷＷサーバ・ペー
ジ更新頻度記憶部３３が記憶している更新頻度ｗを用い
て、更新期待度Ｅx を導き出すための図４の式（１）の
パラメータεを図４の式（２）を用いて演算する。任意
の時刻ｔにおける更新期待度Ｅx は図４の式（１）から
導出する。The WWW server / page update expectation engine 23 calculates the update expectation of each WWW page using the following method. When the WWW server / page update expectation calculation engine 23 calculates the update expectation of the WWW page, the progress from the time when the previous search stored in the search schedule storage unit 31 is performed to the time when the current search is performed is performed. Using the time and the update frequency w stored in the WWW server / page update frequency storage unit 33, the parameter ε of the equation (1) of FIG. 4 for deriving the update expectation Ex is converted to the equation (2) of FIG. Is calculated using The update expectation Ex at an arbitrary time t is derived from Expression (1) in FIG.

【００３４】ＷＷＷサーバ検索順序テーブル作成エンジ
ン２４がＷＷＷ検索ロボットの検索順序情報を作成する
手順は以下のとおりである。ＷＷＷサーバ検索順序テー
ブル作成エンジン２４は、第一にＷＷＷサーバ・ページ
構造記憶部３２が記憶しているＷＷＷページ情報を検索
順序テーブル３５へ複写する（図５の表5-1 ）。The procedure by which the WWW server search order table creation engine 24 creates search order information of WWW search robots is as follows. First, the WWW server search order table creation engine 24 copies the WWW page information stored in the WWW server / page structure storage unit 32 into the search order table 35 (Table 5-1 in FIG. 5).

【００３５】第二に、ＷＷＷサーバ検索順序テーブル作
成エンジン２４は、ＷＷＷサーバ・ページ更新期待度記
憶部３４、および検索スケジュール記憶部３１が記憶し
ている情報から、各ＷＷＷページの更新確信度を演算し
て、その結果を検索順序テーブル３５に記憶させる（図
５の表5-2 ）。Second, the WWW server search order table creation engine 24 determines the update reliability of each WWW page from the information stored in the WWW server page update expectation storage unit 34 and the search schedule storage unit 31. The calculation is performed, and the result is stored in the search order table 35 (Table 5-2 in FIG. 5).

【００３６】第三にＷＷＷサーバ検索順序テーブル作成
エンジン２４は、演算によって求めた各ＷＷＷページの
更新期待度を値の大きいものが検索の優先順位の高いも
のと見なして、検索順序テーブル３５の情報を優先順序
の高い順に並び替える（図５の表5-3 ）。Thirdly, the WWW server search order table creation engine 24 regards the update expectation degree of each WWW page obtained by the operation as having a higher value as a search priority and assuming that it has a higher search priority. Are sorted in descending order of priority (Table 5-3 in FIG. 5).

【００３７】第四にＷＷＷサーバ検索順序テーブル作成
エンジン２４は、入出力装置１を介して人間（操作者）
から与えられた条件に従って、検索順序テーブル３５が
記憶している情報の中から条件に合わないＷＷＷページ
情報を削除する（図５の表5-4 ）。Fourth, the WWW server search order table creation engine 24 is operated by a human (operator) via the input / output device 1.
The WWW page information that does not meet the condition is deleted from the information stored in the search order table 35 according to the condition given by (5) (Table 5-4 in FIG. 5).

【００３８】[0038]

【発明の効果】本発明にかかるＷＷＷロボット検索シス
テムによって得られる効果は、対象としているＷＷＷペ
ージとそのＷＷＷページがハイパーリンクする子のＷＷ
Ｗページの更新状態から、対象としているＷＷＷページ
のある時刻における更新期待度を自動生成して、その結
果を基にして次回の検索時の優先順序を決定することで
ある。これにより、ＷＷＷロボットは更新されている期
待度の高いＷＷＷページから先に検索していくことにな
るので、更新されているＷＷＷページの情報をすばやく
取得することが可能となる。The effects obtained by the WWW robot search system according to the present invention are as follows: the target WWW page and the child WW to which the WWW page is hyperlinked.
The purpose is to automatically generate an update expectation degree of a target WWW page at a certain time from the update state of the W page, and to determine a priority order for the next search based on the result. Thus, the WWW robot searches for the updated WWW page with a high expectation first, and thus can quickly acquire information on the updated WWW page.

【００３９】また、検索テーブルを作成時に人間（操作
者）の与える条件に従って検索するに値しないＷＷＷペ
ージを間引きできる（図５の表5-4 ）ので、不要な検索
作業を軽減させて、ＷＷＷロボット検索システムが検索
する際に発生するネットワークへの負荷を軽減すること
ができる。In addition, when a search table is created, WWW pages that are not worth searching according to the conditions given by a human (operator) can be thinned out (Table 5-4 in FIG. 5), so unnecessary search work can be reduced and WWW can be reduced. It is possible to reduce the load on the network generated when the robot search system searches.

[Brief description of the drawings]

【図１】本発明の実施の形態の構成を示すブロック図で
ある。FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.

【図２】本発明の実施の形態の動作の更新頻度の演算方
法を示す説明図である。FIG. 2 is an explanatory diagram illustrating a method of calculating an update frequency of an operation according to the embodiment of this invention.

【図３】本発明の実施の形態の動作の更新頻度演算時の
演算順序の具体例を示す説明図である。FIG. 3 is an explanatory diagram showing a specific example of a calculation order at the time of calculating the update frequency of the operation according to the embodiment of the present invention;

【図４】本発明の実施の形態の動作の更新期待度の演算
方法を示す説明図である。FIG. 4 is an explanatory diagram showing a method of calculating an update expectation degree of an operation according to the embodiment of the present invention.

【図５】本発明の実施の形態の動作の検索テーブル作成
方法を示す説明図である。FIG. 5 is an explanatory diagram illustrating a search table creation method for operation according to the embodiment of this invention.

[Explanation of symbols]

１入出力装置２演算処理部３記憶装置４ネットワーク通信装置２１ＷＷＷサーバ検索エンジン２２ＷＷＷサーバ・ページ更新頻度演算エンジン２３ＷＷＷサーバ・ページ更新期待度演算エンジン２４ＷＷＷサーバ検索順序テーブル作成エンジン３１検索スケジュール記憶部３２ＷＷＷサーバ・ページ構造記憶部３３ＷＷＷサーバ・ページ更新頻度記憶部３４ＷＷＷサーバ・ページ更新期待度記憶部３５ＷＷＷサーバ検索順序テーブル Reference Signs List 1 input / output device 2 arithmetic processing unit 3 storage device 4 network communication device 21 WWW server search engine 22 WWW server / page update frequency calculation engine 23 WWW server / page update expectation calculation engine 24 WWW server search order table creation engine 31 search schedule Storage unit 32 WWW server / page structure storage unit 33 WWW server / page update frequency storage unit 34 WWW server / page update expectation storage unit 35 WWW server search order table

Claims

[Claims]

1. A WWW which performs a search according to a search order automatically generated at the time of a WWW page search.
Robot search system.

2. A WWW robot search system, wherein an optimum reference point is specified from a plurality of search reference points that can be a starting point at the time of WWW page search, and the search is performed in accordance with an automatically generated search order.

3. A WWW robot search system which automatically extracts an update frequency of an arbitrary WWW page based on the structure of the WWW page searched by the WWW robot.

4. An update frequency uniquely obtained from each WWW page based on a hyperlink relationship between WWW pages,
A WWW robot search system, which automatically extracts an update expectation at an arbitrary time.

5. A WWW robot search system, wherein a priority order of search destinations when a WWW robot searches is automatically generated from an update frequency and an expected degree of update of each WWW page.

6. An update frequency calculation engine for calculating the update frequency of each WWW page, an update expectation calculation engine for calculating an update expected value at an arbitrary time from the update frequency of the WWW page, a value of the update frequency and an update expectation WWW from degrees
A search order table creation engine for automatically extracting a search priority when a robot searches. A WWW robot search system.