Outstatic and FlexSearch indexing
OutstaticNextJsflexsearch

Outstatic and FlexSearch indexing


When I have been testing to make the website searchable with FlexSearch I couldn't find a really good way of integrating the indexing in the build process. The most obvious solution was to run the indexing as a script that is referenced from the build command in package.json, but there's a few issues with that approach: When using typescript we need to run the script with ts-node, which often causes issues due with esm and imports. It would also be nicer to have it integrated within the nextjs framework.

As nextjs supports statically building the sitemap when building this seems like a good place to build the indexes. It looks something like this:

import { MetadataRoute } from "next";
import fs from "fs";
import { NEXT_PUBLIC_APP_URL } from "../lib/constants";
import { load } from "outstatic/server";
import { Page, Section, pageIndex, sectionIndex } from "@/lib/search";
import { join } from "path";
import matter from "gray-matter";
import { title } from "process";
 
const postsDirectory = join(process.cwd(), "outstatic/content/posts");
const publicDirectory = join(process.cwd(), "public");
 
function getPostBySlug(slug: string, fields: string[]) {
  const fullPath = join(postsDirectory, `${slug}.md`);
  const fileContents = fs.readFileSync(fullPath, "utf8");
 
  const { data, content } = matter(fileContents);
 
  return {
    ...data,
    content,
    title: data.title,
    url: `${NEXT_PUBLIC_APP_URL}/${data.slug}`,
    slug: data.slug,
  };
}
 
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const db = await load();
  // fetch all published posts
  const items = await db
    .find(
      {
        collection: "posts",
        status: "published",
      },
      ["slug", "publishedAt"],
    )
    .toArray();
 
  // --------------------
  // Flexsearch indexing
  // --------------------
  items.forEach((item, index) => {
    const post = getPostBySlug(item.slug, ["slug", "title", "content", "url"]);
 
    let pageContent = "";
 
    const { title, content, url } = post;
    const paragraphs = content.split("\n");
 
    sectionIndex.add({
      id: post.slug,
      url,
      title,
      pageId: `page_${index}`,
      content: title,
      ...(paragraphs[0] && { display: paragraphs[0] }),
    });
 
    paragraphs.forEach((paragraph, i) => {
      sectionIndex.add({
        id: `${url}_${i}`,
        url,
        title,
        pageId: `page_${index}`,
        content: paragraph,
      });
    });
 
    pageContent += ` ${title} ${content}`;
 
    pageIndex.add({
      id: index,
      title,
      content: pageContent,
    });
  });
 
  const indexes: {
    pageIndex: { [key: string]: Page };
    sectionIndex: { [key: string]: Section };
  } = {
    pageIndex: {},
    sectionIndex: {},
  };
 
  await pageIndex.export(async (key, data) => {
    indexes.pageIndex[key] = data;
  });
  await sectionIndex.export(async (key, data) => {
    indexes.sectionIndex[key] = data;
  });
 
  fs.writeFileSync(
    join(publicDirectory, "search-index.json"),
    JSON.stringify(indexes),
  );
 
  // Return the sitemap
  return items.map((post) => ({
    url: `${NEXT_PUBLIC_APP_URL}/${post.slug}`,
    lastModified: post.publishedAt,
  }));
}

The way FlexSearch works is that it creates a page and a section-index as a json file that is shipped all the way to the client and the search is performed there. The code above creates the indexes as stores it in the public folder and it available at https://ahlstrand.es/search-index.json. It is a fairly large file so I wonder if this will scale when adding a lot of content, but I guess we'll see.