Web Intelligence and Fuzzy Logic-The Concept of Web IQ (WIQ)


Lotfi A. Zadeh, University of California, Berkeley


In moving further into the age of machine intelligence and automated reasoning, we have reached a point where we can speak, without exaggeration, of systems which have a high machine IQ (MIQ). The Web and especially search engines--with Google at the top-fall into this category. In the context of the Web, MIQ becomes Web IQ, or WIQ, for short.

Existing search engines have many remarkable capabilities. However, what is not among them is the deduction capability-the capability to answer a query by a synthesis of information which resides in various parts of the knowledge base. A question-answering system is by definition a system which has this capability. One of the principal goals of Web intelligence is that of evolving search engines into question-answering systems. Achievement of this goal requires a quantum jump in the WIQ of existing search engines.

Can this be done with existing tools such as the Semantic Web and ontology-centered systems--tools which are based on bivalent logic and bivalent-logic-based probability theory? It is beyond question that, in recent years, very impressive progress has been made through the use of such tools. But can we achieve a quantum jump in WIQ? A view which is advanced in the following is that bivalent-logic- based methods have intrinsically limited capability to address complex problems which arise in deduction from information which is pervasively ill-structured, uncertain and imprecise.

The major problem is world knowledge-the kind of knowledge that humans acquire through experience and education. Simple examples of fragments of world knowledge are: Usually it is hard to find parking near the campus in early morning and late afternoon; Berkeley is a friendly city; affordable housing is nonexistent in Palo Alto; almost all professors have a Ph.D. degree; and Switzerland has no ports.

Much of the information which relates to world knowledge-- and especially to underlying probabilities-- is perception-based. Reflecting the bounded ability of sensory organs, and ultimately the brain, to resolve detail and store information, perceptions are intrinsically imprecise. More specifically, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are unsharp; and (b) the values of perceived attributes are granular, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity and functionality.

Imprecision of perception-based information is a major obstacle to dealing with world knowledge through the use of methods based on bivalent logic and bivalent-logic-based probability theory. What is needed for this purpose is a collection of tools drawn from fuzzy logic-- a logic in which everything is, or is allowed to be, a matter of degree. The principal tool is Precisiated Natural Language (PNL).

The point of departure in PNL is the assumption that the meaning of a proposition, p, drawn from a natural language, NL, can be represented as a generalized constraint of the form X isr R, where X is the constrained variable; R is the constraining relation; and r is a modal variable, that is, a variable whose value defines the modality of the constraint. The principal modalities are: possibilistic (r=blank); veristic(r=v); probabilistic(r=p); random set(r=rs); fuzzy graph (r=fg); usuality (r=u); and Pawlak set (r=ps). The set of all generalized constraints together with their combinations, qualifications and rules of constraint propagation, constitutes the Generalized Constraint Language (GCL). By construction, GCL is maximally expressive.

A proposition, p, in NL is precisiable if it is translatable into GCL. In this sense, PNL consists of precisiable propositions, with the understanding that not every proposition in NL is precisiable. The importance of PNL derives from the fact that it has a far greater expressive power than predicate-logic-based synthetic languages like LISP, Prolog, SQL, etc. A concept which plays a key role in PNL is that of a protoform-an abbreviation of " prototypical form." Informally, the protoform of a lexical entity such as a proposition, command, question, or scenario is its abstracted summary. For example, the protoform of p : Eva is young, is A(B) is C, where A is abstraction of age, B is abstraction of Eva, and C is abstraction of young. Similarly, the protoform of p: Most Swedes are tall, is Count (B/A) is Q, where A is abstraction of Swedes, B is abstraction of tall Swedes, Count (B/A) is abstraction of the relative count of tall Swedes among Swedes, and Q is abstraction of most.

The importance of the concept of a protoform derives from the fact that it places in evidence the deep semantic structure of the lexical entity to which it applies. In this sense, propositions p and q are PF-equivalent, written as PFE(p, q), if they have identical protoforms, that is, identical deep semantic structures. As a simple example, p: Most Swedes are tall, and q: Few professors are rich, are PF-equivalent.

The concept of PF- equivalence serves as a basis for what may be called protoform-centered mode of knowledge organization. In this mode, a protoformal module consists of all propositions which have a specified protoform in common, e.g., A(B) is C. Submodules of such a module are generated through instantiation of A, B and C. For example, the partially instantiated protoform: price (B) is low, would represent all objects in a universe of discourse, U, whose price is low.

An important function of PNL is that of serving as a deduction language. For this purpose, PNL contains a Deduction Database, DB, which consists of so-called protoformal rules of deduction. Basically, such rules govern generalized constraint propagation, with antecedents and consequents expressed as protoforms. Typically, a protoformal rule of deduction has two parts: symbolic and computational. A simple example is the compositional rule of inference in fuzzy logic. In this case, the symbolic part is: if X is A and (X, Y) is B, then Y is C; and the computational part is: C = AB, that is, C is the composition of A and B.

The Deduction Database contains a large number of modules and submodules comprising protoformal rules drawn from a wide range of domains. Examples of such modules are: the Search module, the World Knowledge module, the Extension Principle module, the Probability module, the Possibility module, the Usuality module, etc.

In summary, abandonment of bivalence is a prerequisite to achieving a quantum jump in WIQ. By abandoning bivalence, the door is opened to the use of tools such PNL for adding to search engines two essential capabilities: (a) capability to operate on perception-based information; and (b) question-answering capability. What should be stressed, however, is that achievement of this goal will be a major challenge involving exploration of many new directions.