Programming with AI — image based book recommender — Part 2
How I built a book recommender app without writing a line of code
In my previous post, I set myself a goal to build a book recommender app using only AI (Claude web app). In the previous post, we completed the computer vision functionality for the “Book Shazam”.
Let’s review the planned stages
- Computer vision functionalities
- Recommendation engine <- we are here
- UX alignment (user login, modern look and feel)
Intro and planning
So now we are going to add a recommendation system.
Recommendation systems are a researched field with many technologies such as collaborative filtering, matrix factorization, etc, which are mostly built to predict the users' preferences according to their peers.
But there is a simpler trick we are going to apply, best value-for-money approach: LLM-based recommender.
We can apply LLM knowledge to estimate if a user would like a book or not, given some data about the user. I hypothyse, that the “best” signal here is “a few books you’ve liked”. So our plan here is as follows:
- Ask the user to fill in a few books they liked recently. To make it robust, we’ll ask a minimum of 3, but won’t limit the number.
- Given a book to inquire about, we’ll send the list of 3 books to the LLM with a query of its estimated rating of the book
But wait, we are not going to ask the user for his favorite books with every query. We are going to save it at the session level, at least for now.
If we are to productize this MVP, we are going to add a login system and a database — and:
- Remember the user throughout sessions
- Save this data throughout sessions.
- Allow the user to enrich their data.
But this is for later. For, now let’s ask the LLM to add:
- Acquisition of favorite books from the user
- Ask the LLM for a recommendation
The second item has an implicit task, of parsing the book from the image since the OCR might be a bit wrong.
The process: jumping back in
Let’s continue with our previous work. Since the Claude session became a bit unstable, I’ve restarted a new session and uploaded both of our files.
We are kind of living on the edge here. Asking Claude for such an extensive feature bulk deep in our process may be challenging. Let’s see how it handles. Pay attention — I’ve asked Claude to use the latest version of LLM since the rapid version changes of the APIs tend to confuse it a bit. Without this command, Claude might return our code with an API call to “Davicni 2” — LLM from 2021. Even with this request, Claude puts the Opus model in the code, instead of using the better and cheaper Sonnet.
Additioanly, Claude returned a few small llm-related bugs. After some back and forths, we actually got this:
- Claude used anthropic API, and the API key needs to be ingested
- The ״add books״ UX is nice and intuitive
- The “Rate” button was properly added
- Clicking the rating button does what it should, and a rating is returned after around 5 seconds.
And… we have it! All the boxes of our planned product are ticked! We can now rent a small server, run the script, and wait for users…
Or are we?
Analysis
Indeed, we completed our MVP, but there are still a few issues that require our attention, additionally to the ones we discussed in the previous part (computer vision, deployment, and architecture).
- Is this recommender any good?
- The UX
Let’s start with the second one, which with all due respect for the recommender, is more important:
UX
Indeed, the MVP is ugly. The landing page, the green boxes around the books, the popups in the final rating message.
But that is not all: the entire user experience should be polished and made smoother and more intuitive. This is extremely important for a consumer product. However, we do have a working product without this, right? Anyway, we’ll try to address this in the next part.
The recommender
The “LLM-based recommender” is quite a new concept, naturally, and I really don’t know if any serious company uses it. However, it does seem to work. But even if we adopt it, we need to evaluate it.
As before, with the image recognition part, we need a test set. But how are we going to get one?
Fortunately, there are a few book rating datasets that can be useful for our task, like this one. https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset. To test our strategy, we should choose a few users, preferably “active” ones (such as users that read a significant amount of books), and fake our process: choose 3 of their top-rated books, and query some of the other rated books. This will show us the gap between the real preferences to the recommendations.
Another thing that we are yet to discuss: along with the rating, the LLM, unlike the classic recommenders, returns some textual explanation on why it returned such a rating, this is also valuable information for our analysis.
But once again, the models are not our focus here, so we just leave what we got.
What’s Next
That’s it for now. Everything is mostly working as planned. In the next posts we’ll make a real app out of it:
- We’ll add modern design and look and feel.
- We’ll smooth some rough edges in the app
- We’ll add login system
- We’ll deploy the app.
Here is a sneak peek to the next post:
Code of the current state.