Setting Up and Managing Knowledge Sources

  • Updated

A knowledge source is a collection of web content that powers your InSite agent's responses. When a visitor asks a question, the agent searches a knowledge source to find a relevant answer. This article walks through creating and configuring a knowledge source, assigning it to your agent, and keeping its content current.

 

Quick Reference (Advanced Users) - Click to Expand
  • Knowledge sources collect content from your website to power agent responses. An agent can use one or more.
  • Three collection strategies are available: Recursive, Sitemap, and Discrete.
  • You can exclude up to 500 URLs or patterns from collection.
  • If some of your content is behind a login, you can configure authentication credentials.
  • Content refreshes automatically every 30 days. You can also trigger a manual sync at any time.
  • Individual pages can be removed from a knowledge source by adding to the content exclusions or removing from the source URLs and running a resync.
Try it like this: Create a knowledge source using the Recursive strategy starting from your homepage. Let it run, then review the collected pages and add exclusions for any content you don't want the agent using, such as internal pages or outdated blog posts.

Creating a knowledge source

  1. In Act-On, navigate to InSite > Knowledge Sources
  2. Click Create source, give it a name, and optionally add an internal-facing description. 
  3. Choose a collection patern (see below). 
  4. Open the Content tab and add your URLs:
    • Recursive: Add one or more source URLs. InSite will start from these and follow links to collect reachable content.
    • Sitemap: Add one or more sitemap URLs. InSite will collect the pages listed in your sitemap.
    • Discrete: Add the URL for each individual page you want included.
  5. Click Start content update. InSite will begin collecting content.

Choosing a collection pattern

When you create a knowledge source, you choose how InSite discovers and collects pages from your website.

Recursive Automatically collect pages starting from the source URLs. InSite follows links it finds on each page, continuing until it has crawled all reachable content. Best for most websites where you want broad coverage. InSite only collects pages within the domains represented in your source URLs. Links to external sites are not followed.

Sitemap Use your sitemap to determine which pages to collect. InSite reads the URLs listed in your sitemap file rather than following links. Best when you have a well-maintained sitemap and want precise control over what's included.

Discrete Only collect the specific URLs you provide. InSite will not follow links or read a sitemap. Best when you only want a defined set of pages included.

Excluding content

If there are pages you don't want the agent to use — such as internal tools, outdated content, or pages not relevant to visitors — you can exclude them by URL or pattern.

Navigate to the content tab for your knowledge source and click add/edit in the Content exclusions section. Add URLs or wildcard patterns (e.g., https://example.com/blog/archive/*). You can add up to 500 exclusions.

Exclusions take effect the next time content is collected.

Collecting from login-protected pages

If some of your content is behind a login, you can configure InSite to authenticate before collecting those pages.

To set this up, you'll need:

  • The URL of the page that contains the login form
  • The HTML id attribute of the username or email field
  • The username or email value to enter
  • The HTML id attribute of the password field
  • The password value to enter

Finding field IDs with browser developer tools

  1. Open the login page in Chrome
  2. Right-click on the username field and select Inspect (or press F12 to open developer tools, then click the element picker and click the field).
  3. In the HTML panel, look for an id attribute on the <input> element. For example: <input id="user_email" type="email" ...> — the field ID here is user_email.
  4. Repeat for the password field.

Once you have these values, enter them in the Authentication section of your knowledge source settings.

Note: InSite uses these credentials only to collect your own content during scheduled and manual syncs. Credentials are stored securely and are never surfaced in agent responses.

Assigning knowledge sources to your agent

Each agent can draw from one or more knowledge sources. To assign a knowledge source to an agent:

  1. Navigate to InSite > Web Agents and edit the agent you want to configure.
  2. In the agent settings, find the Knowledge Sources section.
  3. Add the knowledge sources this agent should use.

When an agent has multiple knowledge sources assigned, it searches across all of them when responding to visitors.

Keeping your content current

Automatic refresh: InSite automatically re-collects content for each of your knowledge sources every 30 days to pick up changes on your site.

Manual sync: To trigger a resync immediately — for example, after publishing significant new content — open your knowledge source and click Update content.

 

Was this article helpful?

Have more questions? Submit a request