Files
tendril/kb/scripts/generate-index.sh
Data Warrior aecc370e1d docs(kb): implement Phase 2 KB System Setup
This commit implements the complete Knowledge Base (KB) system for the Tendril
project, establishing a structured, LLM-friendly system for capturing and
organizing external information that informs project development.

## What Was Implemented

### 1. KB System Documentation (kb/README.md)
   - Comprehensive documentation explaining the KB system's purpose and structure
   - Directory structure explanation with all 8 category directories
   - File naming schema: YYYY-MM-DD--slug--type.md with regex validation
   - Complete frontmatter schema documentation (18 required fields for Tendril)
   - Routing decision tree for categorizing content
   - Routing confidence system (0.00-1.00 scale) with policy for low-confidence items
   - Usage guidelines for creating and managing KB files
   - Integration notes with phase documentation system
   - Index generation and changelog update procedures

### 2. KB File Templates (kb/_templates/)
   Created three template files with complete frontmatter:

   - note.md: General notes template with draft status default
   - decision.md: ADR-style decision template with active status default
   - howto.md: How-to guide template with active status default

   All templates include:
   - All 18 required frontmatter fields (base + Tendril-specific)
   - Placeholder syntax (${VARIABLE}) for easy customization
   - Appropriate default values (routing_confidence, status, etc.)
   - Template-specific content sections
   - Customized for Tendril project (project: ["tendril"])

### 3. KB Ingestion Prompt (kb/_guides/KB_INGEST_PROMPT.md)
   Complete system prompt for LLM-assisted KB ingestion:

   - System instructions for content analysis and routing
   - Classification and routing rules for all 8 categories
   - Routing decision tree with 9-step decision process
   - Routing confidence assessment guidelines
   - File naming standards with examples and validation
   - Complete frontmatter requirements documentation
   - JSON output format specification
   - Quality and style guidelines
   - Safety constraints (NEVER/ALWAYS rules)
   - Validation checklist
   - Completion summary format with mandatory index/changelog updates

### 4. Index Generation Script (kb/scripts/generate-index.sh)
   Bash script for automatic KB index generation:

   - Scans all KB files in category directories (01_projects through 08_archive)
   - Excludes special directories (_guides, _templates, _inbox, _review_queue)
   - Extracts YAML frontmatter from each KB file
   - Parses metadata fields (title, date, type, summary, topics, tags, phases)
   - Generates kb/_index.md with:
     * File listing organized by category
     * Topics index (all unique topics with file references)
     * Tags index (all unique tags with file references)
     * Phase relevance index (files organized by phase)
     * Summary statistics
   - Compatible with Windows (Git Bash) and Unix systems
   - Uses temporary files for cross-platform compatibility
   - Handles errors gracefully (missing frontmatter, invalid files)
   - Script is executable (chmod +x)

### 5. KB Changelog (kb/CHANGELOG.md)
   Change tracking for KB system:

   - Initial entry documenting Phase 2 setup
   - Date-based format: ## [YYYY-MM-DD] KB System Setup
   - Lists all files created during setup
   - Notes about customization for Tendril project

### 6. Initial Index (kb/_index.md)
   Auto-generated searchable index:

   - Generated by running generate-index.sh
   - Currently empty (no KB files exist yet)
   - Ready to be populated as KB files are added
   - Includes proper structure for all index sections

## Why This Implementation

### Structured Knowledge Capture
   The KB system provides a lightweight staging area for external information
   (Pulse Daily chats, ideas, notes, research) that may inform Tendril project
   development. Unlike formal phase documentation, KB entries capture informal
   knowledge that complements the structured phase blueprints.

### LLM-Friendly Design
   The system is designed for LLM-assisted ingestion and management:
   - Clear routing decision tree enables automated classification
   - Confidence scoring allows human review of uncertain routing
   - Complete frontmatter ensures rich metadata for searchability
   - JSON output format enables automated file creation

### Searchability and Discovery
   The automatic index generation creates multiple access paths:
   - By category (for browsing related content)
   - By topic (for finding content on specific subjects)
   - By tag (for cross-cutting categorization)
   - By phase relevance (for finding content related to specific phases)

### Integration with Phase Documentation
   KB decisions complement phase-specific ADRs, KB research informs phase
   planning, and KB playbooks provide operational guides. The phase_relevance
   field creates explicit links between KB content and project phases.

### Project Customization
   All files are customized for Tendril:
   - Project name: "tendril" (replaced "pairs" references)
   - Default project field: ["tendril"]
   - Path references updated for Tendril structure
   - Gitea Actions noted (not GitHub Actions) for Phase 3

## Technical Details

### Frontmatter Schema (18 Fields)
   Base fields (14): title, date, author, source, project, topics, tags, type,
   status, routing_hint, proposed_path, routing_confidence, related, summary

   Tendril-specific (4): captured_at, source_type, related_projects,
   phase_relevance

   Optional (2): key_takeaways, action_candidates

### File Naming Pattern
   Regex: ^\d{4}-\d{2}-\d{2}--[a-z0-9-]{3,}--(idea|note|spec|decision|howto|retro|meeting)(--p[0-9]+)?\.md$

   Components: Date (YYYY-MM-DD) + Slug (3-8 words, no stop-words) + Type

### Routing Confidence Policy
   - >= 0.60: File goes to proposed_path
   - < 0.60: File goes to _review_queue/ (with proposed_path in frontmatter)

## Next Steps

Phase 2 complete. Ready for Phase 3: Gitea Actions Workflows configuration.

## Files Added

- kb/README.md (290 lines)
- kb/_templates/note.md
- kb/_templates/decision.md
- kb/_templates/howto.md
- kb/_guides/KB_INGEST_PROMPT.md (~400 lines)
- kb/scripts/generate-index.sh (executable)
- kb/CHANGELOG.md
- kb/_index.md (auto-generated)
2025-11-11 11:43:52 -07:00

271 lines
8.1 KiB
Bash

#!/bin/bash
# KB Index Generation Script
# Generates kb/_index.md with searchable metadata from all KB files
set -e
# Get the script directory and KB root directory
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
KB_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
INDEX_FILE="$KB_ROOT/_index.md"
echo "Generating KB index..."
# Create temporary files for indexing
TMP_DIR=$(mktemp -d 2>/dev/null || mktemp -d -t 'kb-index')
trap "rm -rf '$TMP_DIR'" EXIT
TOPICS_FILE="$TMP_DIR/topics.txt"
TAGS_FILE="$TMP_DIR/tags.txt"
PHASES_FILE="$TMP_DIR/phases.txt"
FILES_FILE="$TMP_DIR/files.txt"
touch "$TOPICS_FILE" "$TAGS_FILE" "$PHASES_FILE" "$FILES_FILE"
# Categories to scan
CATEGORIES=("01_projects" "02_systems" "03_research" "04_design" "05_decisions" "06_glossary" "07_playbooks" "08_archive")
# Function to extract frontmatter from a file
extract_frontmatter() {
local file="$1"
if [[ ! -f "$file" ]]; then
return 1
fi
# Extract content between first --- and second ---
awk '/^---$/{if(++count==2)exit} count==1' "$file" 2>/dev/null || echo ""
}
# Function to extract a YAML field value (simple fields)
extract_yaml_simple() {
local frontmatter="$1"
local field="$2"
echo "$frontmatter" | grep "^${field}:" | sed "s/^${field}:[[:space:]]*//" | sed 's/^["'\'']//;s/["'\'']$//' | head -1
}
# Function to extract array values from YAML
extract_yaml_array() {
local frontmatter="$1"
local field="$2"
# Try to extract array - handle both single-line and multi-line
local array_content=$(echo "$frontmatter" | awk -v field="$field:" '
BEGIN { in_array=0; found=0 }
$0 ~ "^" field {
found=1
sub("^" field "[[:space:]]*", "")
if ($0 ~ /\[.*\]/) {
print $0
exit
}
in_array=1
next
}
in_array {
if ($0 ~ /^[^[:space:]]/ && $0 !~ /^-/ && $0 !~ /^\[/) {
in_array=0
exit
}
if ($0 ~ /^-/ || $0 ~ /^\[/) {
print $0
}
}
')
# Extract values from array
echo "$array_content" | grep -oE '["'\''][^"'\'']+["'\'']|[^, \[\]]+' | sed 's/^["'\'']//;s/["'\'']$//;s/^[[:space:]]*//;s/[[:space:]]*$//' | grep -v '^$' | grep -v '^\[' | grep -v '^\]'
}
# Function to process a KB file
process_kb_file() {
local file="$1"
local relative_path="${file#$KB_ROOT/}"
local category=""
# Determine category from path
for cat in "${CATEGORIES[@]}"; do
if [[ "$relative_path" == "$cat"/* ]]; then
category="$cat"
break
fi
done
if [[ -z "$category" ]]; then
return 0 # Skip files not in known categories
fi
# Extract frontmatter
local frontmatter=$(extract_frontmatter "$file")
if [[ -z "$frontmatter" ]]; then
echo "Warning: No frontmatter found in $relative_path" >&2
return 0
fi
# Extract metadata
local title=$(extract_yaml_simple "$frontmatter" "title")
local date=$(extract_yaml_simple "$frontmatter" "date")
local type=$(extract_yaml_simple "$frontmatter" "type")
local summary=$(extract_yaml_simple "$frontmatter" "summary")
# Store file info
echo "$category|$relative_path|$title|$date|$type|$summary" >> "$FILES_FILE"
# Extract and index topics
local topics=$(extract_yaml_array "$frontmatter" "topics")
if [[ -n "$topics" ]]; then
while IFS= read -r topic; do
topic=$(echo "$topic" | xargs)
if [[ -n "$topic" ]]; then
echo "$topic|$relative_path" >> "$TOPICS_FILE"
fi
done <<< "$topics"
fi
# Extract and index tags
local tags=$(extract_yaml_array "$frontmatter" "tags")
if [[ -n "$tags" ]]; then
while IFS= read -r tag; do
tag=$(echo "$tag" | xargs)
if [[ -n "$tag" ]]; then
echo "$tag|$relative_path" >> "$TAGS_FILE"
fi
done <<< "$tags"
fi
# Extract and index phase relevance
local phases=$(extract_yaml_array "$frontmatter" "phase_relevance")
if [[ -n "$phases" ]]; then
while IFS= read -r phase; do
phase=$(echo "$phase" | xargs)
if [[ -n "$phase" ]]; then
echo "$phase|$relative_path" >> "$PHASES_FILE"
fi
done <<< "$phases"
fi
}
# Scan all KB files
echo "Scanning KB files..."
for category in "${CATEGORIES[@]}"; do
category_dir="$KB_ROOT/$category"
if [[ ! -d "$category_dir" ]]; then
continue
fi
# Find all .md files in category
find "$category_dir" -type f -name "*.md" | while read -r file; do
# Skip if in a special subdirectory
if [[ "$file" == *"/_guides/"* ]] || \
[[ "$file" == *"/_templates/"* ]] || \
[[ "$file" == *"/_inbox/"* ]] || \
[[ "$file" == *"/_review_queue/"* ]]; then
continue
fi
# Check if filename matches KB pattern
filename=$(basename "$file")
if [[ "$filename" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}--[a-z0-9-]+--(idea|note|spec|decision|howto|retro|meeting)(--p[0-9]+)?\.md$ ]]; then
process_kb_file "$file"
fi
done
done
# Count files
FILE_COUNT=$(wc -l < "$FILES_FILE" 2>/dev/null || echo "0")
# Generate index file
echo "Generating index file..."
{
cat << EOF
# KB Index
_Last updated: $(date +%Y-%m-%d)_
This index is automatically generated from KB file metadata. It provides searchable access to all KB content organized by category, topic, tag, and phase relevance.
---
## File Listing by Category
EOF
# Output files by category
for category in "${CATEGORIES[@]}"; do
category_files=$(grep "^$category|" "$FILES_FILE" 2>/dev/null || true)
if [[ -n "$category_files" ]]; then
echo "### $category"
echo ""
while IFS='|' read -r cat path title date type summary; do
echo "- [\`$path\`]($path) - $title ($date, $type)"
done <<< "$category_files"
echo ""
fi
done
# Topics Index
if [[ -s "$TOPICS_FILE" ]]; then
echo "## Topics Index"
echo ""
sort -u "$TOPICS_FILE" | cut -d'|' -f1 | sort -u | while read -r topic; do
echo "### $topic"
grep "^$topic|" "$TOPICS_FILE" | cut -d'|' -f2 | sort -u | while read -r file; do
echo "- [\`$file\`]($file)"
done
echo ""
done
fi
# Tags Index
if [[ -s "$TAGS_FILE" ]]; then
echo "## Tags Index"
echo ""
sort -u "$TAGS_FILE" | cut -d'|' -f1 | sort -u | while read -r tag; do
echo "### $tag"
grep "^$tag|" "$TAGS_FILE" | cut -d'|' -f2 | sort -u | while read -r file; do
echo "- [\`$file\`]($file)"
done
echo ""
done
fi
# Phase Relevance Index
if [[ -s "$PHASES_FILE" ]]; then
echo "## Phase Relevance Index"
echo ""
sort -u "$PHASES_FILE" | cut -d'|' -f1 | sort -u | while read -r phase; do
echo "### $phase"
grep "^$phase|" "$PHASES_FILE" | cut -d'|' -f2 | sort -u | while read -r file; do
echo "- [\`$file\`]($file)"
done
echo ""
done
fi
# Summary
TOPIC_COUNT=$(cut -d'|' -f1 "$TOPICS_FILE" 2>/dev/null | sort -u | wc -l || echo "0")
TAG_COUNT=$(cut -d'|' -f1 "$TAGS_FILE" 2>/dev/null | sort -u | wc -l || echo "0")
PHASE_COUNT=$(cut -d'|' -f1 "$PHASES_FILE" 2>/dev/null | sort -u | wc -l || echo "0")
echo "---"
echo ""
echo "## Summary"
echo ""
echo "- **Total KB Files**: $FILE_COUNT"
echo "- **Unique Topics**: $TOPIC_COUNT"
echo "- **Unique Tags**: $TAG_COUNT"
echo "- **Phases Referenced**: $PHASE_COUNT"
echo ""
echo "_Index generated on $(date +%Y-%m-%d\ %H:%M:%S)_"
} > "$INDEX_FILE"
echo "Index generated successfully: $INDEX_FILE"
echo " - Files indexed: $FILE_COUNT"
echo " - Topics: $TOPIC_COUNT"
echo " - Tags: $TAG_COUNT"
echo " - Phases: $PHASE_COUNT"