The Shrine | RAG Architecture
v2.0 RefactorInstead of relying on a heavy database, I designed a lightweight, split-architecture system. When a user asks a question, the Flask server first pings the Google Cloud API to convert that text into vector data.
It then uses NumPy to perform a high-speed mathematical search against my local document store. Finally, it combines that retrieved context with a custom prompt to generate a unique, in-character response.