When reading the tutorial, we are describing one stack to build a specific app. But the stack is made of building blocks that you can replace with others if you need to.
- Airbyte has two self-hosted options: OSS & Enterprise
- Langchain: OSS
- OpenAI: you can host an OSS model if you want to
- Pinecone: there are OSS/self-hosted alternatives
Isn't it the dream? Today there is a lot of stack that needs to be built to enable what you're describing. This is actually what we are doing with that post. What foundations do we need to build so that the UX for the end user is what you're describing. Will take some time to get there :)
Airbyte comes in 3 flavors: OSS, Cloud, Enterprise.
For OSS & Enterprise, data doesn't leave your infra since Airbyte is running in your infrastructure.
For Cloud, you would have to allow some IPs to allow us to access your local db.
For the purpose of the tutorial that we built, it really comes down to the type of data that you're using.
If you have data with PII:
One option would be to use Airbyte and bring the data into files/local db rather than directly to the vector store, add an extra step that strips the data from all PII and then configure Airbyte to move the clean file/record to the vector store.
The option that jmorgan mention is relevant here, using a "self-hosted" model.
Thanks! I agree with your point. There is a lot of tuning that needs to happen, including context aware splitting and any other kind of transformation before the unstructured data gets indexed. This is one of the big challenge of productionizing LLM apps with external data. So far we are using internally since the team as experience dealing with building these connectors and that becomes a great co-pilot.
The great thing we get by plugging this whole stack together is that we get all the refreshed data as more issues/connectors get created.
hmm, as a person of low technical savvy, do you expect there will be a point at which I can upload a large text file and have you do all the work to let me chat with it? I'd pay for that today if it exists, but can't put a ton of effort into building/implementing something myself.