Table of contents:
Why WebSQL?
Information can be
WWW is ?
Yossi home page has links
Assumption: link distance related to conceptual relevance
What are Yossi Matias's publications?
A different search pattern.
Where are Moti Matias's publications?
SELECT * FROM PUBLICATIONS WHERE AUTHOR = "Yossi Matias"
Assume no PUBLICATIONS relation
CREATE VIEW PUBLICATIONS SELECT * FROM WWW_PAGES WHERE WWW_PAGES contains "publications" SELECT d.url, d.title FROM PUBLICATIONS WHERE PUBLICATIONS contains "Curtis"
SELECT d.url, d.title FROM Document d SUCH THAT "http://www.infoseek.com" -> d, Document p SUCH THAT d -> p WHERE d.title contains "Yossi " AND p.title contains "publications"
SELECT d.url, d.title FROM Document d SUCH THAT "http://www.math.tau.edu" -> d, Document p SUCH THAT d -> p WHERE d.title contains "Yossi" AND p.title contains "publications"
WebSql presentation of the WebSQL compiler, query engine, and user interfaces.
Both the WebSQL compiler and query engine are implemented as a set of Java classes, which form the WebSQL class library. The library can be used from any Java program.
The WebSQL system architecture is depicted in the following Figure .
The Architecture of the WebSQL System
The Compiler and Virtual Machine. The WebSQL compiler parses the query and translates it into a nested loop program in a custom-designed object language. The object program is executed by an interpreter that implements a stack machine. The evaluation of the range specified in the FROM clause is done via specially designed operation codes whose results are vectors of Document or Anchor tuples.
The Query Engine. Whenever the interpreter encounters an operation code corresponding to a range specifying condition, the query engine is invoked to perform the actual evaluation. Depending on the type of condition, this involves either sending a request to index servers or a depth-first traversal of a sub-part of the document network.
There are three different interfaces that allow us to use the language interactively. The simplest interface is an HTML form connected to a CGI script. The user can either fill in the form to assemble a query or type a complete WebSQL query directly. When the Submit button is pressed, the query is sent to the CGI script that invokes a stand-alone Java application running on our server. This application parses the query, and if no errors are found, hands it in to the query execution engine which produces the result as a list of tuples that gets formatted into an HTML table and is shipped back to the user. This interface, although slow and with limited user interaction, has the advantage that it can be used with any browser.
Find all documents accesible from the "ISG Technologies" home page only the documents in the same server will be accesible .
select d.url, d.title, d.type, d.length, d.modif from Document d SUCH THAT "http://www.isgtec.com" ->* d ; |
Find documents about aluminum.
select d.url, d.title from Document d such that d mentions "aluminum"; |
d.url | d.title |
---|---|
http://www.cygnus.nb.ca/retail/universal/univrsl5.html | UNIVERSAL SIGNS |
http://altavista.software.digital.com/ | AltaVista Software |
http://www.drms.dla.mil/drmo/newengland/56800002.html | 56800002 |
http://www-cmrc.sri.com/CIN/sep-oct94/article02.html | ALUMINUM CHEMICALS |
http://westpasco.com/members/Aluminum.html | West Pasco Chamber Of Commerce Member Directory |
http://www.crisny.org/communities/colonie/government/gen.colonie.html | PURCHASING DEPARTMENT - Sand for Ice Control |
http://www.digital.com/ | |
http://www.rmc.com/divs/rasco/areas/rascogrm.html | RASCO - Grand Rapids, Michigan |
http://www.gassprings.com/mc-as_.htm | Guden Continuous Hinges -- Aluminum / Stainless Pin |
http://www.metalogic.be/MatWeb/reading/mat-cor/al___ccc.htm | Aluminum Alloys : Corrosion Hazards Overview |
Note: This result of this WebSQL query is constructed by sending the string pattern ("aluminum") to an index server. There is a default index server (currently AltaVista), but a different one can be selected by using the "define index" statement (see Language Reference).
Anchor(base, label, href)
where base is the URL of the anchor document, label is the link's label and href is the URL of the destination document, all represented as character strings. Now we can pose queries that refer to the links present in documents.
select x.url from document x such that "http://www.math.tau.ac.il/~matias" =>|-> x, anchor y such that base = x where y.label contains "publications"; |
A hypertext link in an HTML document is said to be:
If we assign an arrow-like symbol to each of the three link types, we can write path regular expressions in a compact, intuitive way. Therefore, let #> denote an interior link, -> - a local link and => a global link. Also, let = denote the empty path. Path regular expressions are built from these symbols using concatenation, alternation (|) and repetition (*). For example, =|=>->* is a regular expression that represents the set containing the zero length path and all paths that start with a global link and continue with zero or more interior links.
Find all documents directly accesible from the Computer Science department home page that reffer to graduate studies.
select x.url from document x such that "http://www.cs.tau.edu/" ->|=> x where x.url contains "grad" and not (x.url contains "undergrad"); |
Note: The expression ->|=> is a path regular expression that means local link (->) in the same server or global link (=>) to a remote server.
Find all the computer science graduate students interested in databases on a remote or local server .
select x.url from document x such that "http://www.cs.tau.ca/homepages.html" =>|-> x where x.text contains "database"; |
Find all documents related to Java and the documents directly accesible from them the documents accesible from them are at a remote host and not on the local server.
select y.url from document x such that x mentions "java", document y such that x => y; |