VR Communications LLC

Search
Skip to content
  • About VR Communications
  • About Us
  • Experience and Projects
  • Portfolio
  • Clients
  • Blog/post home pages
    • Building bot-ready knowledge bases
    • My high-tech adventure… continued
    • Communication, or MIScommunication?
    • Computer history
    • My high-tech adventure… original
communication, education, linguistics, technology

Building bot-ready knowledge bases #2: The grocery shopping project

December 2, 2019 Anna

Our experimental initiative to prototype a bot-ready information solution using Google’s Dialogflow


This post is part of a series. For more information and links to other posts in the series, see the “Building bot-ready knowledge bases” home page.

Overview

Some years ago when we were writing DITA/XML technical documentation for the DITA Open Toolkit,  we invented for prototyping purposes a set of short, non-technical, structured articles that we called “Grocery Shopping.” The seven articles in our KB prototype are written in the traditional DITA style and linked through a map. There are two each of concept, task, and reference topics, and one connecting them conceptually. Our information domain is limited to produce and canned goods.

The articles can be processed as a set or individually into various output formats, including HTML and PDF. Our tool of choice for creating, editing, and processing is oXygen, but there are a number of other products on the market that we could have used.

Grocery shopping output files

The following image shows the Grocery Shopping table of contents, as processed by oXygen in PDF format.

Grocery shopping table of contents
Grocery shopping table of contents (PDF format)

Below is the output text for one of the KB articles, also processed by oXygen to PDF.

Produce overview output file (PDF file)
Produce overview output file (PDF file)

Grocery shopping source files

DITA/XML is a tag-based language similar to JSON (JavaScript Object Notation). In most DITA authoring tools you can view files in either tag or “author” format.

The following image shows the “tag” input file  for the same “Produce: Overview” topic shown above.

DITA source file, tag view
DITA source file, tag view

Value of DITA metadata

One of the key benefits of DITA/XML is the ability to attach metadata to the files.

For example, you can create a short description explaining the audience and purpose of a file. Information in short descriptions (also known as abstracts) can be valuable to users, as well as an effective tool for bot training.

Short description of the Produce: Overview article
Short description of the Produce: Overview article

Keywords and index terms are also part of the metadata of typical DITA projects. Index terms are used by readers to understand and locate information. The same terms can also be used in bot creation and configuration.

A grocery shopping bot will need to recognize entity names like “oranges” and “pears,” and it will also need to know that both oranges and pears are members of the “fruit” category.

Keywords and index terms in a grocery shopping file
Keywords and index terms in a grocery shopping file

Building and training a KB chatbot can involve both providing raw text that allows the bot to discover information on its own (for example, the Produce: Overview PDF output file), and handing processed information to the bot directly (for example, giving it a set of entity names like oranges and pears, along with attributes like variety, color, and price).

Why “grocery shopping”?

If your knowledge base consists of topics like “How to set up your new Dell laptop,” “Understanding your 401K account,” or “Predicting the weather,” it might be difficult for you to relate to our simplistic  grocery shopping example.

We can only say that during our years of experience doing content strategy and implementation we’ve found that the best way to lay the foundation for complex, enterprise-level documentation sets for any domain is often to start with a simple, everyday example. This is how “grocery shopping” came to be, and we believe that it can still serve the same purpose in the AI world.

What’s next?

When we embarked on this initiative, we looked for a likely chatbot-building environment that we could pair with our grocery shopping sample. Our first test case was created on Dialogflow, which is a Google company and product.

In our “Building bot-ready knowledge bases #3” blog, we’ll talk about the objectives we established for our grocery shopping chatbot.

0botreadykbs20192020chatbotDialogflowDITA/XMLgrocery shopping projectGROCERYbotHTML output formatJSONknowledge basemetadataoXygen XML EditorPDF output formatshort description

Post navigation

Previous PostBuilding bot-ready knowledge bases #1: IntroductionNext PostBuilding bot-ready knowledge bases #3: Objectives for the grocery shopping chatbot

Project: Building bot-ready knowledge bases (2018-2020)

A VR Communications experimental initiative to prototype a bot-ready information solution using Google’s Dialogflow.

A synergistic approach to AI information systems using structured content and chatbot technologies.

Examples of how meaningfully annotated knowledge base (KB) articles, preferably by their authors, can increase the effectiveness of the KB/bot relationship.

Building “bot-ready” knowledge bases: Our experimental initiative – UPDATED (September 1, 2020)

Project: Linguistic analysis and text annotation (2018-2020)

Organization

Moveworks, AI start-up located in Mountain View, California. Came out of stealth April 2019. On November 14, 2019, announced $75 million Series B financing round. Customers include Autodesk, Broadcom, Freedom Financial, Medallia, Nutanix, and Rambus.

Product

Cloud-based AI platform, purpose-built for large enterprises, that resolves employees’ IT support issues instantly and automatically.

Role

Data specialist

Tasks

Annotation of KB articles and user intents for machine learning and natural language processing. Linguistic and user experience (UX) analytics. Named entity recognition.

Project: Museum record and image cataloging (2014-2020)

 

For the Computer History Museum in Mountain View, California, researched and cataloged computer artifacts, archival documents, images, and software. Our current record-count is over 5000.

Project: Educational flashcards (2015-ongoing)

Created flashcard sets for Spanish, German, and Dutch (A1-B2 levels), all publicly available on the Brainscape platform.

Subject matter includes key words and phrases (English / foreign language) culture, geography, and history.

Statistics

  • 18
  • 361
  • 1,072
  • 3,698
  • 56,972
  • 2,648

Linguistic analysis, text annotation for machine learning, writing, editing

Recent Posts

  • Viz of the day: News from Nan posts per year
    February 17, 2021Dick
  • Building “bot-ready” knowledge bases: Our experimental initiative – UPDATED
    September 1, 2020Anna
  • Building bot-ready knowledge bases #12: Summary Presentation
    April 26, 2020Anna
  • Building bot-ready knowledge bases #11: Integration with the Telegram instant messaging service
    March 27, 2020Dick
  • Building bot-ready knowledge bases #10: Using DITA/XML metadata as a bot training kit
    March 23, 2020Anna
  • Building bot-ready knowledge bases #9: Adding a webhook to our GROCERYbot project
    March 8, 2020Dick
  • Building bot-ready knowledge bases #8: Lessons learned from our GROCERYbot project
    February 24, 2020Dick
  • Building bot-ready knowledge bases #7: Creating a GROCERYbot web demo
    February 17, 2020Anna
  • Building bot-ready knowledge bases #6: GROCERYbot training and validation
    February 10, 2020Dick
  • Building bot-ready knowledge bases #5: Adding knowledge connectors to GROCERYbot
    February 3, 2020Anna
  • Building bot-ready knowledge bases #4: The Dialogflow console and adding entities and intents
    January 24, 2020Anna
  • Building bot-ready knowledge bases #3: Objectives for the grocery shopping chatbot
    December 26, 2019Anna
  • Building bot-ready knowledge bases #2: The grocery shopping project
    December 2, 2019Anna
  • Building bot-ready knowledge bases #1: Introduction
    September 26, 2019Anna
  • Data science: A comparison of interpreted languages for AI and data science
    September 13, 2019Dick

Pages

Building bot-ready knowledge bases

My hi-tech adventure… continued

Communication, or MIScommunication?

Computer history

My hi-tech adventure… original

To contact Anna or Dick

avanraaphorst@gmail.com

rjohnson42@gmail.com

Copyright

2019-2021 VR Communications LLC

Our personal website

newsfromnan.com

Our Address

1354 Oak View Cir Apt 228
Rohnert Park, California 94928

Proudly powered by WordPress