Rechek

A clean iOS app that delivers real-time NYC restaurant inspection data and uses on-device Apple Intelligence to turn complex health reports into simple, bite-sized summaries.

November 2025

  • React Native

  • Expo

  • TypeScript

  • React JS

  • AI-SDK

Lessons from a Small Language Model

I want to share my experience working with Apple Intelligence, a Small Language Model (SLM) that runs on Apple’s latest devices. The first thing to note is that the model is tiny, only 3 billion parameters. For context, Google’s Gemini 3, at the time of writing, is estimated to have around 1 trillion parameters. As a result, SLMs have much more limited knowledge, reasoning, and context-handling capabilities. Understanding these limits was key to designing an effective solution.

The Goal

The objective was to take all the inspection violations a restaurant had and have Apple Intelligence summarize them in a way that anyone could quickly understand.

The Naive Solution

My first attempt was to pass the entire JSON blob into the model and ask it for a summary. This one-shot approach often works with bigger models. As you can probably guess, the results were verbose, inconsistent, and not very helpful. It quickly became clear that a different approach was needed.

Less Is More

The NYC OpenData Service returns a lot of data that was not actually needed. By reducing the input to only the relevant information, the model produced responses that were faster and cleaner. This approach helped, but the results were still not consistently reliable.

Finding the Right Balance

Even with smaller inputs, the model sometimes struggled. The key question became what users really care about in a summary. I decided the summary should focus on two parts: highlighting recent violations and serious past violations.

When I asked the SLM to follow this structure, the results were still inconsistent. The model often reported no recent violations even when there were some. Injecting the current date into the prompt and standardizing date formats improved accuracy slightly, but the results remained unreliable.

My Aha Moment

The breakthrough came when I started preprocessing the data before giving it to the model. Instead of handing the model raw objects, I converted everything into a well-defined XML layout. This simple structure provided the model with exactly the scaffolding it needed.

Example

<RecentInspections>
  <!-- recent inspection nodes here, or placeholder if none -->
</RecentInspections>

<PastInspections>
  <!-- past inspection nodes here, or placeholder if none -->
</PastInspections>

Key Recommendations

From this experience, I have distilled three key guidelines for working with small language models. Start by assuming the model alone will not solve the problem. First, think about what can be done without the SLM. This thought exercise forces you to focus on what really needs the model, which eventually reduces its cognitive load and leads to better results.

1. Keep Input Minimal

Only feed the model the data it truly needs. Extra noise will reduce accuracy and increase processing time.

2. Preprocess Whenever Possible

Do as much preprocessing as you can using traditional programming before giving data to the model. Sorting, filtering, and grouping can reduce the model's cognitive load.

3. Structure Matters

SLMs perform best when given clear, explicit formats like XML. Providing well-defined scaffolding allows the model to focus on reasoning rather than interpreting messy data.