- Context
- Vision
- Action
- Selecting Stack
- Re-learning React
- Frames in Figma
- Doing Full-Stack Web Development
- Theme Issue
- Custom Pagination Hook w/ MUI
- Testing
- Prompt Engineering
- Initial Prompting
- Better - Extract, Rank, Rewrite
- Generating Examples
- Latency
- OpenAI is Slow
- Trying Huggingface
- Trying Claude via Anthropic AI
- Implementing Streaming
- Deployment
- Heroku + SSL Enforcement via Next.js
- Final Result
- Next Steps
- Reflection
- Six-Point Review
- Result
- Fun Diversions
Context
I am interested in having better ideas and improving my ability to implement my ideas. My experience in learning suggests that a good approach is implementing some ideas. This post documents the implementation of one of my ideas.
Vision
Internalizing information is sometimes more difficult than I would like it to be. I am curious how generative AI can be leveraged to make this easier.
At the extreme, I am imagining on-demand, fully generative, multi-sensory, world-class learning experiences. How much more of my Calculus III class would I recall if each chapter were taught by a celebrity avatar, in a lifelike changing virtual location, with the teaching skill of Paul Erdős?
A minimal incremental step towards this vision is making at least one piece of information more comprehensible to at least one human. Text is a natural candidate.
When internalizing text, a key part is condensing the information via summaries. I generally do this iteratively, forming subsequent, smaller summaries.
Let’s try using an LLM to make these summaries for me, then make the app available online.
Action
Selecting Stack
- Figma - a friend who builds lots of products told me so.
- React - because it seems popular, and I used it in college.
- Material Design via MUI - a friend who builds many web apps told me so.
- Next.js - a friend who builds many web apps told me so.
- OpenAI - I’ve heard their language models are pretty good.
Re-learning React
This is straightforward - https://react.dev/learn is extensive and has in-browser coding exercises.
Frames in Figma
After making an account, I snagged this MUI Figma component kit. This was extremely useful because all of the Figma component names match the React library, so using this UI kit served as a warmup for the real thing.
The two issues I encountered in this part were:
- The
<Pagination>
component from the MUI library does not allow me to customize button labels, but I wanted custom button labels. - There is ostensibly no way to generate a Material color palette and import it into Figma via Token Studio, which MUI uses for Figma theming.
After some time, I had the following:
Doing Full-Stack Web Development
Familiarizing myself with Next.js, the MUI component library, and CSS layout techniques was straightforward, given the extensive documentation. A few issues I encountered follow.
Theme Issue
MUI provides a nice way to enable theming. But the theming is broken for Next 13 because of an issue with the emotion
dependency. I worked around this by forcing all components to be client components in my layout.tsx
, but this is a hack.
This was my first time hitting an issue in the open-source JavaScript ecosystem. It was very exciting. I feel confident estimating that I would be able to resolve this issue in less than 54 pomodoros (likely lower) which makes me feel powerful.
Custom Pagination Hook w/ MUI
If the reader looks closely at my Figma frames, they will see that my pagination component has custom labels. Unfortunately, the MUI <Pagination>
component does not support custom labels. This was my first extension of the MUI library, and you can see the code here:
Testing
As a responsible person, I felt compelled to learn how to write tests for my JavaScript components. I immediately hit this issue with jest
/ nextjs
compatibility. Besides this, testing using Jest feels straightforward.
Prompt Engineering
Once my app infrastructure was there, it was now time to coax the LLM into doing what I wanted it to do.
Initial Prompting
I initially tried prompt techniques like chain-of-thought prompting, few-shot prompting, and step-by-step prompting, resulting in:
Input JSON with keys "t", "maxw", "minw".
Job described step by step:
1 Identify word limits "maxw", "minw".
2 Summarize "t" respecting word limits.
3 Count words in result of (2).
4 Check whether result (3) matches constraint (1).
5 If (4) is false, output new summary that matches constraint (1).
Here are examples:
<Examples>
GPT-4 was decent with this prompt, but GPT-3.5-turbo was not. It was not respecting the output length limits. Additionally, I had no information on how it was choosing information to include/omit.
Better - Extract, Rank, Rewrite
I achieved better results using a multi-step approach. First, I extract the key pieces of information from the text:
Input JSON with key "text".
Your job is to take text "text", extract all information conveyed by the text into a list of complete sentences, and provide a short title describing the content of the text.
Your output should be a JSON with a key "title" pointing to the string title and a key "info_list" pointing to a list of strings representing the result of your job.
Here are a few examples:
<Examples>
Then, I score each item based on “importance”:
Input JSON with key "title" and key "info_list".
Suppose you are interested in the input "title" value. For each element of the "info_list" provide a score from 0 to 1 indicating how interesting this piece of information is.
Your output should be a JSON with a "info_list_scored" pointing a list of (string, double) tuples representing the result of your job.
Here are a few examples:
<Examples>
On the client side, I can now sort the pieces of information by score and make arbitrary decisions on how much info to include. Once I make this selection, I submit the information again to the LLM for a rewrite:
Input JSON with keys "title" and "info_list".
Your job is to write a paragraph containing the information from "info_list" about the given "title". Your result should only use information from the "info_list".
Your output should be a JSON with a key "text" containing the result.
Here are a few examples:
<Examples>
Generating Examples
I would generate my <Examples>
in the previous prompts using GPT-4 chat. As I understand it, this is called a “poor man’s Alpaca”.
Latency
Unfortunately, submitting all of my prompts for a body of text to the OpenAI API takes longer than reading the original text. This seems bad.
OpenAI is Slow
It is a well-documented phenomenon that OpenAI model access via the developer API is significantly slower than access via the provided chat UI. See this issue, one of many. Surprisingly, a developer advocate for OpenAI would lock this thread with an unsubstantiated (no data) claim that they do not slow down the API.
Trying Huggingface
I decided to try out some hosted models on Huggingface but:
- https://huggingface.co/facebook/bart-large-cnn - fast but non-iterative and has quality issues.
- https://huggingface.co/bigscience/bloom - pretty good but also slow, like OpenAI.
The natural next step is to fine-tune a smaller model, but I’ve decided this is out-of-scope for my initial build.
Trying Claude via Anthropic AI
I also experimented with Claude via Anthropic AI, which looked promising in their Slack integration. Unfortunately, I do not have API access. I considered using a Slack bot shim, but this seemed risky and was also extra work.
Implementing Streaming
At a bare minimum, I needed to expose some feedback to the user to indicate that the reason it is taking so long is somebody else’s fault. To do this, I use the streaming API from OpenAI via Embed GitHub. The library is pretty good.
Deployment
Heroku + SSL Enforcement via Next.js
I used Heroku for deployment via Github, which is very easy. The only snafu in my deployment was having to do some custom code for forcing SSL redirects. There was a promising library, but I opened an issue because it did not work.
Final Result
Code: Embed GitHub
Live: internalize.ai
Video
Next Steps
The next step here is to learn how to fine-tune and host smaller models s.t. I can provide a more responsive service.
Reflection
Six-Point Review
Item | Notes |
Motivation | 1. I am in the process of aligning my passions with how I spend my time.
2. I am passionate about skill acquisition and learning.
3. I am interested in taking ideas from conception to completion.
4. I am interested in building community around technology.
5. I am interested in building my online persona. |
Expectation | 1. That I continue to use the pomodoro technique for managing my intention.
2. That I avoid being self-conscious and fearing failure.
3. That I seek support from people I know as I do this work. |
Goals | 1. Average ≥ 13 pomodoros a day over the course of building.
2. Actually ship something.
3. Make content about what you ship. |
Postures | Determined // Secure // Passionate |
Distractions | Lack of Will // Pessimism // Scope Creep |
Diligence | Yessir |
Result
In the context of these goals, I did well. I still need to write some Twitter threads about some stuff, but I will do that after making this post public.
Fun Diversions
Collection of interesting diversions while building this:
- Chip Huyen’s MLOps Guide - https://huyenchip.com/mlops/
- Interesting because it offers a clear way to upskill.
- Callaway Cloud - https://www.callawaycloud.com/#meet-our-team
- Interesting because it looks like an intentionally small, lifestyle-oriented company. Curious what the founder’s perspective is like.
- AWS Lambda Tutorial - went through this which was cool but decided to just use this other stack.
- Replit
- Tried doing the AWS Lambda Tutorial here and ran into some issues. Wrote a Twitter thread on this: https://twitter.com/phildakin_/status/1653867726530551809?s=20.