Programming with AI — image based book recommender — Part 3

How I built a book recommender app without writing a line of code

Gidi Shperber
6 min readAug 22, 2024

Intro

In our previous post, #2 we built the recommender module for the “Book Shazam” app.

We de facto completed the MVP. now, let’s go one big step towards a full-fledged app.

Let’s review the planned stages

So we have a working app, but pretty ugly:

The process

As I’ve mentioned, I don’t know much about frontend development and/or web design, so I’ll give Claude here some general instructions:

Again, I’m letting Claude “run wild”, and — and again it delivers. First, it separates a javascript file from the HTML file for the first time and adds a CSS stylesheet.

Since frontend development is not my domain, I’ll add Claude explanations here:

  1. We’ve used Tailwind CSS for quick and responsive styling.
  2. We’ve added custom CSS for additional styling and animations.
  3. We’ve improved the UI layout and added visual feedback for user actions,
  4. We’ve implemented book highlighting on the uploaded image.
  5. We’ve added loading indicators and better error handling.
  6. We’re using SweetAlert2 for more attractive and informative popups.

Items 3 and 5 are truly awesome: Claude doesn’t limit itself to coding design only, but acts as a “product manager” and adds functionalities that I didn’t ask for explicitly: responsive buttons, loading indications, and error handling. If you are experienced in app development you might notice these features, but if not — you might not notice them, because these things separate good professional UX from bad ametuer UX.

Here are the results:

Book Buddy! I’ve yet to reveal Claude my genius name “Book Shazam”, so Claude named the app Book Buddy.

The second form has this nice design as well:

The rating popup also looks better than before:

That is great, although didn’t go 100% smoothly: the bounding boxes were a bit out of bounds and required a prompt or two to get them in place.

Now let’s make a small but crucial correction: the book Shazam

And the result:

It also added some related microcopy in some places which is really cool:

And also added some sparkles to the rating pop-up:

Now, let’s fix a small but annoying issue: the messy OCR

If we look at this book and its OCR result, they don’t align. This is a “minor” case. There are cases where some of the text will be occluded by another book or bad lighting. In the past, we had to use NLP trickery to solve such issues. Nowadays, we can just ask Claude to add LLM code to “clean” this.

And… it works:

Now let's add an important feature: a login system. This is challenging. It sounds simple, but a login system includes both frontend and backend components. Additionally, it includes a database. It really takes Claude’s understanding of our code base to the edge.

And… it works as well.

Claude does quite some work here:

Also with explanations, what goes where, and if you’ll ask it, it will explain how to enter and analyze the user database.

Now our page really begins to look like an application:

Analysis

I must say that what we achieved here is wild. In terms of implementation, in terms of understanding — it is evident that Claude truly understands our entire code base (which it wrote on its own) and knows exactly what goes where, and its abilities to add features, fix bugs, etc. are really extensive.

It is also wild how much you can learn from such a project, either if you are non-developer, a junior developer, or an experienced one — you can build yourself a personalized tutorial with a private tutor for whichever skill you want to acquire.

Deployment

I wasn’t really planning to write about deployment initially, but since this series might encourage some of you to try it yourself, the deployment part might be challenging.

There are numerous ways to deploy such a project. Many of them claim to be as easy as a few clicks, but it is never that easy, and those few clicks are for people who know what they do.

Addiotioanly, we have numerous components in our app e.g computer vision model, that might make things a bit harder than expected

A few deployment options:

Pseudo deployment

  • Ngrok — this is a kind of “pseudo” deployment. This tool allows you to expose a local endpoint (in other words, from your computer) to the outside world, and let your friends try it. A nice thing is that modern HTML looks nice on most platforms so you can open this URL on your mobile phone and take live photos.

Easy

  • Heroku — is actually the easiest, but is limited to 500MB instances, and our package will be much bigger because of computer vision packages such as torch
  • Good old cloud server — this is what I have eventually done. The cloud server is the “least cool” solution here, but it’s really the simplest. Running our app on such a server is very similar to running on a PC, just requires som fiddling with the AWS console. I’ve also added a gunicorn wrapper for robustness and a domain with SSL to make it secure for usage.

Devops favourites

  • AWS beanstalk
  • Google app engine
  • Google cloud run

What's Next

I genuinely can’t believe we made it so far.

As said — the next step is to deploy this app and make it available for users, as described above.

But there are many more things to do, and at this stage, I believe Claude will be able to complete most of them (some of these features were requested by test users)

  • Adding instructions
  • Rating a book just by name (without an image)
  • Retrieving the highest-rated books
  • Verifying and improving the computer vision and recommender pipelines

Costs

One final ons such apps: nowadays nothing is free, and API’s might get expensive.

  • Aws machine — t3 large machine with 8GB RAM costs roughly 2$ per day.
  • Google Vision API — every uploaded image triggers an API call, 1000 API calls cost 1.5$
  • Cluade sonnet — costs 3$ per million input tokens and 15$ per million output tokens. Every recommendation has ±200 tokens in input, and up to 150 output so it toughly 1.5$/1000 calls as in Google Vision

The code for this stage can be found here

--

--