WHOIS++ & centroids ------------------- (... plus some other stuff :-) Martin Hamilton, Loughborough University martin@mrrl.lut.ac.uk Pros of search and retrieval protocols o can deliver structured query to server. o return of structured results back to client. o facilitates searches across multiple servers. o whilst not typically supported at the moment in WWW browsers, can be retro-fitted via proxy servers, plug-ins, or by using (say) CGI gateways. Cons of search and retrieval protocols o no provision (typically) made for including of presentation oriented material in search results - i.e. adverts ! o some search and retrieval protocols are open-ended about query and response formats, hindering interoperability. o some query and response formats are very open-ended in themselves, which also hinders interoperability. o still very much a specialized interest, rather than a mainstream activity like (say) HTTP. Where does this leave us ? o Wide variety of protocols to choose from, usually with associated query and response formats. o Mainstream interest not necessarily a good match for the sorts of things we are trying to do. o Possible to support a variety of protocols and formats, up to a point. Expect some loss of information if deriving everything from one database. o May be desirable to identify a lowest common denominator (subset?) of database elements a la Dublin Core. Where next for protocols and formats ? o IAFA template format provided a reasonable starting point for simple Internet resource descriptions - low cost of creation, easily transcribed into other formats. Dublin Core & Warwick Framework the next evolutionary step ? o WHOIS++ protocol attractive as a lowest common denominator search and retrieval protocol, and as the driving force behind centroids. Still quite young, though - the RFC only came out a year ago. o History suggests LCDs usually most successful, but other lowest common denominator approaches abound, e.g. Harvest and LDAP. Z-Lite might be on its way, etc etc. Watch the skies! What about centroids ? o Provide an abstract characterization of the database in a standard format. o Can export this to any number of index servers by pushing or pulling. o Index servers act as brokers between client and potentially multiple servers. o Next step on after Harvest Brokers/Gatherers - index servers don't need all of the info for a server, just its centroid. o Obvious role would be for subject services to act as brokers (index servers) for their disciplines. Conclusions o Much interest in centroid approach for whole-Web indexing and searching. May want to be able to search index server via multiple protocols, and index Web stuff in general. o No clear winner in the lowest common denominator search and retrieval protocol / data format stakes. More work needed. Heavyweight solutions like Z39.50 and X.500 not popular for Internet applications due to interoperability problems and (perceived?) implementation complexity. o Do we even need/want a search and retrieval protocol ? Search and retrieval protocol model fundamentally incompatible with advertising based revenue model, which is one possible future for subject services ? Martin Hamilton Wed Aug 7 00:31:59 BST 1996