Introduction to the initiative
Building bot-ready knowledge bases: Introduction
In blogging about this initiative we hope to inspire knowledge base owners and managers to try new, synergistic techniques to improve the quality of their information and the efficiency of their AI-based delivery systems.
The grocery shopping project
Some years ago when we were writing DITA/XML technical documentation for the DITA Open Toolkit, we invented for prototyping purposes a set of short, non-technical, structured articles that we called “Grocery Shopping.” The seven articles in our KB prototype are written in the traditional DITA style and linked through a map. There are two each of concept, task, and reference topics, and one connecting them conceptually. Our information domain is limited to produce and canned goods.
The articles can be processed as a set or individually into various output formats, including HTML and PDF. Our tool of choice for creating, editing, and processing is oXygen, but there are a number of other products on the market that we could have used.
Grocery shopping output files
The following image shows the Grocery Shopping table of contents, as processed by oXygen in PDF format.

Below is the output text for one of the KB articles, also processed by oXygen to PDF.

Grocery shopping source files
DITA/XML is a tag-based language similar to JSON (JavaScript Object Notation). In most DITA authoring tools you can view files in either tag or “author” format.
The following image shows the source of one of the Grocery Shopping files in tag format.

Below is the same source file in author format.

One of the key benefits of DITA/XML is the ability to attach metadata to the files. For example, you can create a short description explaining the audience and purpose of a file. Information in short descriptions is also useful in bot training.

Keywords and index terms are also part of the metadata of typical DITA projects. Index terms are used by readers to understand and locate information. The same terms (appearing as “entities”) are also used in bot processing.
Building and training a KB chatbot involves both handing processed information to the bot directly (for example, giving it a list of keywords and index terms) and providing raw text that allows the bot to discover information on its own (for example, the Produce: Overview output file). Building and training a bot is an iterative process that requires lots of trial-and-error experimentation.
When we embarked on this initiative, we looked for a likely chatbot-building environment that we could pair with our Grocery Shopping knowledge base. Our first test case was produced on Dialogflow, which is a Google company and product.
Grocery shopping files we put into the Dialogflow knowledge base
- PDF output files of the 7 individual DITA source files
- CSV file containing FAQs created manually
What’s next?
In our next “Building bot-ready knowledge bases” blog , we will talk about how we built and trained GROCERYbot using Dialogflow.