Cyc and the Cyc Knowledge Base


We first discussed Cyc in the last edition of the AI page (which, by the way, was some time ago now; we expect to publish more regularly in the future). Cyc, which is not an acronym but rather is short for enCYClopedia, aims to capture all, or a large segment, of human knowledge such as might be found in a typical encyclopedia in a knowledge base. Cyc is the brainchild of Douglas Lenat, with significant assistance from R.V. Guha, who conceived back in 1984 of the idea of encapsulating all of human knowledge in a rule-based system. At the time, Cyc was imagined to be a 10-year project. Since it is now 2001, and the first open-source version of Cyc has not been released yet, they are a bit behind schedule, but it remains a very ambitious and exciting project.

Cyc is a very large knowledge base containing, in its open source version, about 6,000 concepts and 60,000 assertions about those concepts. These concepts contain basic human knowledge which is available in a typical encyclopedia. The full version of Cyc contains many more concepts, although they don't say what the precise number is. Cyc provides, in addition to the concepts, a rule based system for reasoning about the concepts (they call their rule-based system a "knowledge server"). To use Cyc, one would need to couple their general knowledge server with specifics about a particular problem to be solved. For example, let's say we know, on our own without using Cyc, that a given business is located in Chicago. Cyc will probably have concepts describing Chicago, Illinois, the Midwest, and the United States. The knowledge base will be able to infer that the business located in Chicago is also located in Illinois, the Midwest, and the U.S., without specifically having to be told those facts. If we then know that an Illinois business has to file certain state tax returns, or a U.S. business certain federal tax returns, or a Midwest business has to work with particular specific conditions in the labor market, we can determine additional facts about this business.

One important thing to be noted here is that Cyc only provides basic, general knowledge. It is still necessary to provide it with specific information about the particular problem to be solved before it is likely to do anything useful. However, the hope would be that this process is made significantly easier because of the presence of the knowledge server. Other applications that they suggest on their website include: speech understanding, database integration, rapid development of an ontology in a vertical area (essentially what we've just mentioned), and email prioritizing, routing, summarizing, and annotating. Following is an example of a concept from the Cyc knowledge base:

  The present (though not the original) capital city of the
#$UnitedStatesOfAmerica, seat of its Federal government, which is located in the specially created Federal district between the States of Maryland and Virginia.
isa: #$USCity #$Entity #$CountrySubsidiary #$CapitalCityOfRegion

As can be seen, the knowledge base tends to contain fairly factual information about its concepts. That is, it contains knowledge about a particular concept (in this case, Washington, DC) which is largely unambiguous and not open to debate. It doesn't attempt to include more fuzzy information such as "Washington, DC, tends to vote overwhelmingly Democrat in presidential elections". Fuzzy reasoning is a very important part of human reasoning; it's not a direction they seem to have addressed as yet. Additionally, they will probably find that in building vertical apps, virtually everyone needs a good deal of business specific knowledge and perhaps less general human knowledge. There are many areas of human endeavor which are fascinating but tend not to be relevant in 99% of all business vertical apps. It seems highly likely that Cyc will need to move towards a general business ontology as they seek to build a client base.

Despite these possible shortcomings (which, given the outstanding reputation of Doug Lenat and his colleagues, I am sure they are well aware of and have clear plans in place for resolving), it is very exciting to see that the Cyc project is reaching a point where they are nearly ready for their first release. It will be the culmination of a long period of effort, and should open the door for the development of many business related ontologies which will bring AI into the business community in a way which hasn't happened up to now. The other areas of application that they mention, such as speech understanding or email handling, are intriguing as well although Cyc may be too powerful a tool for these relatively simple problems. Cyc had previously planned for its first open source release in mid-2001; this has obviously been pushed back although they seem to still expect it in the first half of 2002. More information about Cyc may be found at www.cyc.com and www.opencyc.org.

Increasingly, therefore, AI research is meeting the "real world" and needing to adjust and revise its lofty goals to reflect the needs of the business community. For this reason, the next edition of this page will focus on "Using AI Techniques in the Real World".


