Building an Iterative Summarizer

Context

I am interested in having better ideas and improving my ability to implement my ideas. My experience in learning suggests that a good approach is implementing some ideas. This post documents the implementation of one of my ideas.

Vision

Internalizing information is sometimes more difficult than I would like it to be. I am curious how generative AI can be leveraged to make this easier.

At the extreme, I am imagining on-demand, fully generative, multi-sensory, world-class learning experiences. How much more of my Calculus III class would I recall if each chapter were taught by a celebrity avatar, in a lifelike changing virtual location, with the teaching skill of Paul Erdős?

A minimal incremental step towards this vision is making at least one piece of information more comprehensible to at least one human. Text is a natural candidate.

When internalizing text, a key part is condensing the information via summaries. I generally do this iteratively, forming subsequent, smaller summaries.

Let’s try using an LLM to make these summaries for me, then make the app available online.

Action

Selecting Stack

  • Figma - a friend who builds lots of products told me so.
  • React - because it seems popular, and I used it in college.
  • Material Design via MUI - a friend who builds many web apps told me so.
  • Next.js - a friend who builds many web apps told me so.
  • OpenAI - I’ve heard their language models are pretty good.

Re-learning React

This is straightforward - https://react.dev/learn is extensive and has in-browser coding exercises.

Frames in Figma

After making an account, I snagged this MUI Figma component kit. This was extremely useful because all of the Figma component names match the React library, so using this UI kit served as a warmup for the real thing.

The two issues I encountered in this part were:

  • The <Pagination> component from the MUI library does not allow me to customize button labels, but I wanted custom button labels.
  • There is ostensibly no way to generate a Material color palette and import it into Figma via Token Studio, which MUI uses for Figma theming.

After some time, I had the following:

Doing Full-Stack Web Development

Familiarizing myself with Next.js, the MUI component library, and CSS layout techniques was straightforward, given the extensive documentation. A few issues I encountered follow.

Theme Issue

MUI provides a nice way to enable theming. But the theming is broken for Next 13 because of an issue with the emotion dependency. I worked around this by forcing all components to be client components in my layout.tsx, but this is a hack.

This was my first time hitting an issue in the open-source JavaScript ecosystem. It was very exciting. I feel confident estimating that I would be able to resolve this issue in less than 54 pomodoros (likely lower) which makes me feel powerful.

Custom Pagination Hook w/ MUI

If the reader looks closely at my Figma frames, they will see that my pagination component has custom labels. Unfortunately, the MUI <Pagination> component does not support custom labels. This was my first extension of the MUI library, and you can see the code here:

Testing

As a responsible person, I felt compelled to learn how to write tests for my JavaScript components. I immediately hit this issue with jest / nextjs compatibility. Besides this, testing using Jest feels straightforward.

Prompt Engineering

Once my app infrastructure was there, it was now time to coax the LLM into doing what I wanted it to do.

Initial Prompting

I initially tried prompt techniques like chain-of-thought prompting, few-shot prompting, and step-by-step prompting, resulting in:

Input JSON with keys "t", "maxw", "minw".

Job described step by step:

1 Identify word limits "maxw", "minw".
2 Summarize "t" respecting word limits.
3 Count words in result of (2).
4 Check whether result (3) matches constraint (1).
5 If (4) is false, output new summary that matches constraint (1).

Here are examples:

<Examples>

GPT-4 was decent with this prompt, but GPT-3.5-turbo was not. It was not respecting the output length limits. Additionally, I had no information on how it was choosing information to include/omit.

Better - Extract, Rank, Rewrite

I achieved better results using a multi-step approach. First, I extract the key pieces of information from the text:

Input JSON with key "text".

Your job is to take text "text", extract all information conveyed by the text into a list of complete sentences, and provide a short title describing the content of the text.

Your output should be a JSON with a key "title" pointing to the string title and a key "info_list" pointing to a list of strings representing the result of your job.

Here are a few examples:

<Examples>

Then, I score each item based on “importance”:

Input JSON with key "title" and key "info_list".

Suppose you are interested in the input "title" value. For each element of the "info_list" provide a score from 0 to 1 indicating how interesting this piece of information is.

Your output should be a JSON with a "info_list_scored" pointing a list of (string, double) tuples representing the result of your job.

Here are a few examples:

<Examples>

On the client side, I can now sort the pieces of information by score and make arbitrary decisions on how much info to include. Once I make this selection, I submit the information again to the LLM for a rewrite:

Input JSON with keys "title" and "info_list".

Your job is to write a paragraph containing the information from "info_list" about the given "title". Your result should only use information from the "info_list".

Your output should be a JSON with a key "text" containing the result.

Here are a few examples:

<Examples>

Generating Examples

I would generate my <Examples> in the previous prompts using GPT-4 chat. As I understand it, this is called a “poor man’s Alpaca”.

Latency

Unfortunately, submitting all of my prompts for a body of text to the OpenAI API takes longer than reading the original text. This seems bad.

OpenAI is Slow

It is a well-documented phenomenon that OpenAI model access via the developer API is significantly slower than access via the provided chat UI. See this issue, one of many. Surprisingly, a developer advocate for OpenAI would lock this thread with an unsubstantiated (no data) claim that they do not slow down the API.

Trying Huggingface

I decided to try out some hosted models on Huggingface but:

The natural next step is to fine-tune a smaller model, but I’ve decided this is out-of-scope for my initial build.

Trying Claude via Anthropic AI

I also experimented with Claude via Anthropic AI, which looked promising in their Slack integration. Unfortunately, I do not have API access. I considered using a Slack bot shim, but this seemed risky and was also extra work.

Implementing Streaming

At a bare minimum, I needed to expose some feedback to the user to indicate that the reason it is taking so long is somebody else’s fault. To do this, I use the streaming API from OpenAI via Embed GitHubEmbed GitHub. The library is pretty good.

Deployment

Heroku + SSL Enforcement via Next.js

I used Heroku for deployment via Github, which is very easy. The only snafu in my deployment was having to do some custom code for forcing SSL redirects. There was a promising library, but I opened an issue because it did not work.

Final Result

Code: Embed GitHubEmbed GitHub

Live: internalize.ai

Video

Next Steps

The next step here is to learn how to fine-tune and host smaller models s.t. I can provide a more responsive service.

Reflection

Six-Point Review

Item
Notes
Motivation
1. I am in the process of aligning my passions with how I spend my time. 2. I am passionate about skill acquisition and learning. 3. I am interested in taking ideas from conception to completion. 4. I am interested in building community around technology. 5. I am interested in building my online persona.
Expectation
1. That I continue to use the pomodoro technique for managing my intention. 2. That I avoid being self-conscious and fearing failure. 3. That I seek support from people I know as I do this work.
Goals
1. Average ≥ 13 pomodoros a day over the course of building. 2. Actually ship something. 3. Make content about what you ship.
Postures
Determined // Secure // Passionate
Distractions
Lack of Will // Pessimism // Scope Creep
Diligence
Yessir

Result

In the context of these goals, I did well. I still need to write some Twitter threads about some stuff, but I will do that after making this post public.

Fun Diversions

Collection of interesting diversions while building this: