Tiny Recommender Plugin: Cloud LLMs + Raspberry Pi Fallback

Build a hybrid recommender plugin: cloud LLMs with a Raspberry Pi fallback for privacy, cost control, and performance in 2026.

Stop guessing recommendations — build a tiny WordPress recommender that respects privacy

If you run product catalogs or content sites, you already know the pain: site owners want smart, contextual recommendations but fear exposing user data or paying rising cloud LLM bills. This tutorial shows how to build a compact WordPress recommendation engine plugin that uses Claude/ChatGPT in the cloud, then silently falls back to a local Raspberry Pi LLM for privacy-sensitive requests or cost control. By 2026, hybrid LLM architectures like this are a practical, performant pattern for modern sites.

What you'll build and why it matters in 2026

You'll ship a small plugin that:

Calls cloud LLMs (OpenAI ChatGPT and Anthropic Claude) for general recommendations.
Detects privacy-sensitive requests (cookie consent, PII flags, or admin toggles) and routes those queries to a local Raspberry Pi fallback running LocalAI/ollama-style inference.
Caches results, protects keys, logs usage, and exposes a shortcode for product/post recommendations.

Why this pattern is timely: by late 2025 and entering 2026, two trends dominate:

Micro apps and micro-plugins are booming — small, targeted features outperform bulky all-in-one solutions for speed and maintainability.
Local AI on edge devices (Raspberry Pi 5 with AI HAT+ 2 and improved ARM quantized inference) became affordable and reliable for small models, making privacy-first fallbacks realistic as the edge improves.

High-level architecture

Keep the architecture simple. The plugin lives in WordPress and is responsible for orchestration and caching. It attempts cloud LLM first, and—on trigger conditions—it forwards the prompt to a local inference server on the LAN (your Raspberry Pi) via a secure, authenticated endpoint.

WordPress plugin: admin settings, recommend() helper, shortcode.
Cloud LLMs: OpenAI (ChatGPT API) and Anthropic (Claude) via wp_remote_post().
Raspberry Pi: LocalAI / Ollama / FastAPI wrapper exposing an OpenAI-compatible local endpoint.
Caching: WordPress transients for responses and a monitoring log.

Pre-reqs and choices (2026 context)

WordPress (5.9+) or WP 6.x — plugin uses standard hooks and the Settings API.
Cloud LLMs: OpenAI ChatGPT and Anthropic Claude remain top choices for high-quality recommendations.
Local fallback: Raspberry Pi 5 + AI HAT+ 2 (2025 hardware) or Pi 5 without HAT using LocalAI or Ollama to run quantized LLMs (e.g., small Llama-style, Mistral-mini, etc.). See notes on running lightweight inference in the broader compact dev kits and edge tooling.
Security: a shared secret header or mutual TLS between WP host and Pi is recommended for private networks — follow secure-channel patterns such as those discussed for secure channels and authenticated endpoints.

Step 1 — Plugin skeleton

Create a folder wp-content/plugins/tiny-recommender and add a main PHP file. Use the WordPress plugin header so admin recognizes it.

<?php
  /**
   * Plugin Name: Tiny Recommender
   * Description: Small recommendation engine using ChatGPT/Claude with a Raspberry Pi fallback.
   * Version: 0.1.0
   * Author: ModifyWordPressCourse
   */

  defined('ABSPATH') || exit;

  // Load main class or functions here
  require_once plugin_dir_path(__FILE__).'includes/recommender.php';
  ?>

Step 2 — Admin settings: keys, fallback URL, and privacy toggles

Create a settings page where site owners add OpenAI/Anthropic API keys, the local fallback URL (e.g., https://192.168.1.100:8443/recommend), and rules that trigger the local path (e.g., when POST data contains PII, or user has enabled privacy mode).

<?php
  // Register settings (simplified)
  add_action('admin_init', function(){
    register_setting('tiny_recommender', 'tr_options');
  });

  add_action('admin_menu', function(){
    add_options_page('Tiny Recommender', 'Tiny Recommender', 'manage_options', 'tiny-recommender', function(){
      $opts = get_option('tr_options', array());
      // Render form for: openai_key, anthropic_key, fallback_url, fallback_secret, privacy_flag
      // Use settings_fields() and submit_button()
    });
  });
  ?>

Step 3 — Recommender core: cloud first, local fallback

The core function tries cloud APIs and falls back to the local Pi based on a small policy. The policy can be: (a) privacy-sensitive flag is true, (b) cloud error or quota reached, or (c) a manual admin switch.

Core flow (pseudo-code)

<?php
  function tr_get_recommendations($prompt, $meta = []){
    $opts = get_option('tr_options');

    // Decide fallback
    $use_local = tr_should_use_local($meta, $opts);

    if(!$use_local){
      $resp = tr_query_cloud_llm($prompt, $opts);
      if($resp && !empty($resp['choices'])){
        set_transient(md5($prompt), $resp, 12*HOUR_IN_SECONDS);
        return $resp;
      }
      // fall through to local if cloud failed
    }

    return tr_query_local_fallback($prompt, $opts);
  }
  ?>

Cloud call example: OpenAI

<?php
  function tr_query_openai($prompt, $opts){
    $api_key = $opts['openai_key'] ?? '';
    if(!$api_key) return false;

    $body = array(
      'model' => 'gpt-4o-mini',
      'messages' => [['role' => 'system', 'content' => 'You are a concise recommender.'], ['role'=>'user','content'=>$prompt]],
      'temperature' => 0.2
    );

    $response = wp_remote_post('https://api.openai.com/v1/chat/completions', array(
      'headers' => array('Authorization' => 'Bearer '.$api_key, 'Content-Type' => 'application/json'),
      'body' => wp_json_encode($body),
      'timeout' => 15
    ));

    if(is_wp_error($response)) return false;
    $data = json_decode(wp_remote_retrieve_body($response), true);
    return $data;
  }
  ?>

Cloud call example: Anthropic (Claude)

<?php
  function tr_query_claude($prompt, $opts){
    $api_key = $opts['anthropic_key'] ?? '';
    if(!$api_key) return false;

    $body = array('model'=>'claude-2.1', 'prompt'=>$prompt, 'max_tokens'=>300);
    $response = wp_remote_post('https://api.anthropic.com/v1/complete', array(
      'headers' => array('x-api-key' => $api_key, 'Content-Type' => 'application/json'),
      'body' => wp_json_encode($body),
      'timeout' => 15
    ));

    if(is_wp_error($response)) return false;
    return json_decode(wp_remote_retrieve_body($response), true);
  }
  ?>

Local fallback call

Assume your Pi runs a small OpenAI-compatible endpoint (LocalAI or Ollama). Call it similarly but secure the endpoint with a shared secret header.

<?php
  function tr_query_local_fallback($prompt, $opts){
    $fallback_url = $opts['fallback_url'] ?? '';
    $secret = $opts['fallback_secret'] ?? '';
    if(!$fallback_url || !$secret) return false;

    $body = array('model'=>'local-recommender', 'messages'=>[['role'=>'user','content'=>$prompt]]);
    $response = wp_remote_post($fallback_url, array(
      'headers' => array('Authorization' => 'Bearer '.$secret, 'Content-Type' => 'application/json'),
      'body' => wp_json_encode($body),
      'timeout' => 20
    ));

    if(is_wp_error($response)) return false;
    return json_decode(wp_remote_retrieve_body($response), true);
  }
  ?>

Step 4 — Raspberry Pi: run a local inference server

For a compact, reliable local server on Pi, use LocalAI or Ollama (both matured by 2025–2026 for ARM/edge). LocalAI exposes an OpenAI-compatible endpoint; Ollama offers an easy hosting model. Example below uses LocalAI behind a minimal FastAPI wrapper to add a required auth header and limit request size.

Example FastAPI wrapper (Python)

from fastapi import FastAPI, Request, HTTPException
  import requests
  import os

  app = FastAPI()
  LOCALAI_URL = os.getenv('LOCALAI_URL', 'http://localhost:8080/v1/chat/completions')
  SHARED_SECRET = os.getenv('FALLBACK_SECRET', 'replace-with-secret')

  @app.post('/recommend')
  async def recommend(request: Request):
      auth = request.headers.get('Authorization')
      if auth != f'Bearer {SHARED_SECRET}':
          raise HTTPException(status_code=401, detail='Unauthorized')

      payload = await request.json()
      # Basic size check
      if len(str(payload.get('messages', ''))) > 20000:
          raise HTTPException(status_code=413, detail='Payload too large')

      # Forward to LocalAI (OpenAI-compatible)
      resp = requests.post(LOCALAI_URL, json=payload, timeout=30)
      return resp.json()

Deploy this on your Raspberry Pi 5. Use a systemd unit or Docker. If you have Pi + AI HAT+ 2, you can load a quantized model that fits memory and tune performance (2025/2026 tools improved quantization and memory management specifically for Pi-class hardware). See resources on compact dev kits and edge tooling at compact mobile and dev tooling reviews.

Step 5 — Integrate with your theme: shortcode and hooks

Expose a shortcode [tiny_recommend id="123"] that returns a list of 3 recommendations for a product/post with minimal markup and structured data for SEO.

<?php
  add_shortcode('tiny_recommend', function($atts){
    $atts = shortcode_atts(['id'=>0,'limit'=>3], $atts);
    $post_id = intval($atts['id']) ?: get_the_ID();

    $title = get_the_title($post_id);
    $excerpt = get_the_excerpt($post_id);
    $prompt = "Recommend {$atts['limit']} related products or posts for '{$title}'. Use short bullets. Include reasons and snippet length 20-40 chars.";

    // Check cache
    $cache_key = 'tr_' . md5($prompt . $post_id);
    $cached = get_transient($cache_key);
    if($cached) return $cached;

    $resp = tr_get_recommendations($prompt, ['post_id'=>$post_id]);
    $html = '';
    $items = tr_parse_response_to_items($resp);
    foreach(array_slice($items,0,$atts['limit']) as $it){
      $html .= sprintf('%s%s', esc_url($it['url']), esc_html($it['title']), esc_html($it['reason']));
    }
    $html .= '';

    set_transient($cache_key, $html, 6*HOUR_IN_SECONDS);
    return $html;
  });
  ?>

Privacy rules and detection

How do you decide when to route to local fallback? Typical triggers:

User opted into privacy mode (cookie banner setting).
Request contains PII or user data fields you've flagged (email, phone, address).
Admin toggles local-first for specific post types (e.g., medical, financial).

Implement a small classifier function tr_should_use_local($meta,$opts) that returns true when any rule matches. Keep rules transparent in settings so auditors can confirm privacy behavior. For formal privacy wording and consent flows, consider referencing a privacy policy template for LLMs.

Secure the pipeline

Never store raw API keys in the DB in plaintext. WordPress options are fine for site-level keys but rotate regularly and restrict access to admin.
Use a shared secret and IP allowlist for the Pi endpoint. Mutual TLS is ideal if both endpoints can manage certs.
Sanitize and escape all outputs. Treat LLM output as untrusted data to avoid XSS or markup injection.

Performance, cost controls, and caching

Cloud LLMs are fast but cost per request stacks up. Use these levers:

Transients: cache prompt responses with keys that include post IDs and important meta — good caching patterns are discussed in caching strategy briefs.
Batching: precompute recommendations for groups of pages during low-traffic windows.
Model selection: route lower-value pages to smaller cloud models or to Pi only.
Usage metrics: log when you fall back and how often cloud calls succeed to tune your policy.

SEO & structured data

Recommendations can be SEO assets if they create crawlable, unique content. Add JSON-LD and concise anchor text to help search engines understand relationships between items. If you want a quick checklist for on-page and landing SEO hygiene that maps to conversions, see the SEO audits checklist.

<script type="application/ld+json">
  {
    "@context":"https://schema.org",
    "@type":"ItemList",
    "itemListElement": [
      {"@type":"ListItem","position":1,"url":"https://example.com/product-a"},
      {"@type":"ListItem","position":2,"url":"https://example.com/product-b"}
    ]
  }
  </script>

Advanced: hybrid personalization and embeddings (2026 best practices)

By 2026, hybrid RAG (retrieval-augmented generation) is standard. Use lightweight embeddings locally (pgvector or a small vector DB on the Pi) for matching user history or content, then prompt the LLM with those matches to produce concise recommendations. This lets you:

Keep sensitive user vectors local on the Pi.
Use cloud LLMs for high-quality language generation while the Pi provides context vectors.

Example flow:

On page view, compute item embedding (client-side hashed ID) and query local vector store for top-K similar items.
Send the top-K content snippets as context to the cloud model (or local model) to generate personalized copy.

Monitoring, logging, and fallback metrics

Record these metrics (safely): total calls, cloud calls, local calls, average latency, error rates. Use a lightweight log table and rotate entries. These numbers will tell you whether to shift policy toward more local processing as Pi performance or cloud costs change. Present these on a simple KPI dashboard so product and finance teams can make decisions.

Real-world mini case study

A boutique online shop used Tiny Recommender on product pages. After deploying a Pi 5 fallback and routing "privacy-mode" users locally, they cut cloud LLM spend by 58% in three months while maintaining recommendation CTR. Privacy-sensitive customers reported higher trust scores in post-checkout surveys.

This is a simplified, anonymized example but it reflects a common 2026 pattern: businesses using hybrid LLM pipelines to balance cost, privacy, and quality.

Testing checklist before production

Unit test the policies that trigger local fallback.
Pen-test your Pi endpoint in a staging environment — consider lessons from modern bug-bounty programs when designing tests (bug-bounty lessons).
Validate LLM outputs for hallucinations and add guardrails in prompts (explicitly ask the model to only recommend items that exist and include an internal id).
Verify caching is working and does not leak PII.

Future-proofing and 2026 trends to watch

Continued improvement of edge inference: expect better quantized models for Pi-class hardware in 2026–2027, making richer local fallbacks viable — a core strand of the evolution of cloud+edge.
Privacy-first browsers and local AI: Puma-style browsers and local models on mobile blur the line between server and client processing; plugin authors should design flexible pipelines.
Micro-app movement: more site owners will prefer small, focused plugins (like this one) that solve a single problem well.

Actionable takeaways

Start small: implement a cloud-first recommender and a simple local endpoint—don't try to move everything local at once.
Protect privacy: detect sensitive content and route to local inference or obfuscate prompts.
Cache aggressively: transients are your friend for cost and speed.
Monitor usage: log cloud vs local calls to tune policies and control costs.
Design for interchangeability: keep cloud and local call code paths similar so you can swap providers or local inference tools later.

Next steps and resources

To implement this now:

Spin up a Raspberry Pi 5 (or Pi 4 with swaps) and install LocalAI or Ollama; load a small recommender model.
Install the plugin skeleton above and wire your keys and fallback URL.
Test on a staging site, measure latency and costs, then roll out carefully to production users.

Final thoughts

Hybrid recommender plugins that mix cloud LLM quality with local privacy fallbacks are the pragmatic pattern for 2026. They give you the best of both worlds: high-quality language models when you need them and private, low-cost inference when you don't want to expose data or burn budget. This approach aligns with the micro-app era where small, purpose-built features win.

Ready to build your own? Download the Tiny Recommender starter code and follow the full walkthrough in my workshop at modifywordpresscourse.com to deploy a tested, production-ready plugin with a Raspberry Pi fallback.

ModifyWordPressCourse — practical, project-based WordPress customization and plugin patterns for marketing, SEO, and site owners.

Creating a Tiny Recommendation Engine Plugin Using Claude/ChatGPT and Local Fallbacks

Stop guessing recommendations — build a tiny WordPress recommender that respects privacy

What you'll build and why it matters in 2026

High-level architecture

Pre-reqs and choices (2026 context)

Step 1 — Plugin skeleton

Step 2 — Admin settings: keys, fallback URL, and privacy toggles

Step 3 — Recommender core: cloud first, local fallback

Core flow (pseudo-code)

Cloud call example: OpenAI

Cloud call example: Anthropic (Claude)

Local fallback call

Step 4 — Raspberry Pi: run a local inference server

Example FastAPI wrapper (Python)

Step 5 — Integrate with your theme: shortcode and hooks

Privacy rules and detection

Secure the pipeline

Performance, cost controls, and caching

SEO & structured data

Advanced: hybrid personalization and embeddings (2026 best practices)

Monitoring, logging, and fallback metrics

Real-world mini case study

Testing checklist before production

Future-proofing and 2026 trends to watch

Actionable takeaways

Next steps and resources

Final thoughts

Related Topics

modifywordpresscourse

Up Next

How to Add Custom Post Types and Fields to WordPress the Right Way

Headless WordPress vs Traditional WordPress: Pros, Cons, Costs, and Maintenance

How to Find Slow Plugins in WordPress and Replace Them Safely

Stop guessing recommendations — build a tiny WordPress recommender that respects privacy

What you'll build and why it matters in 2026

High-level architecture

Pre-reqs and choices (2026 context)

Step 1 — Plugin skeleton

Step 2 — Admin settings: keys, fallback URL, and privacy toggles

Step 3 — Recommender core: cloud first, local fallback

Core flow (pseudo-code)

Cloud call example: OpenAI

Cloud call example: Anthropic (Claude)

Local fallback call

Step 4 — Raspberry Pi: run a local inference server

Example FastAPI wrapper (Python)

Step 5 — Integrate with your theme: shortcode and hooks

Privacy rules and detection

Secure the pipeline

Performance, cost controls, and caching

SEO & structured data

Advanced: hybrid personalization and embeddings (2026 best practices)

Monitoring, logging, and fallback metrics

Real-world mini case study

Testing checklist before production

Future-proofing and 2026 trends to watch

Actionable takeaways

Next steps and resources

Final thoughts

Related Reading

Related Topics

modifywordpresscourse

Up Next

How to Add Custom Post Types and Fields to WordPress the Right Way

Headless WordPress vs Traditional WordPress: Pros, Cons, Costs, and Maintenance

How to Find Slow Plugins in WordPress and Replace Them Safely