More

mtricot · on Sept 24, 2024

Talking about going back memory lane :) The initial name of the project was "conduit"...

mtricot · on Aug 8, 2023

Not at the moment but let me bring that to the team so we can brainstorm what it could look like.

mtricot · on Aug 8, 2023

When reading the tutorial, we are describing one stack to build a specific app. But the stack is made of building blocks that you can replace with others if you need to.

- Airbyte has two self-hosted options: OSS & Enterprise

- Langchain: OSS

- OpenAI: you can host an OSS model if you want to

- Pinecone: there are OSS/self-hosted alternatives

samspenc · on Aug 9, 2023

> - OpenAI: you can host an OSS model if you want to

Just to confirm: you mean models like Facebook's Llama 2 and variants right? Since OpenAI hasn't released any OSS models.

mtricot · on Aug 9, 2023

correct

zarazas · on Aug 10, 2023

What about the embedding?

mtricot · on Aug 8, 2023

No good reason. Does "it made the post's title too long" work?

replwoacause · on Aug 9, 2023

Works for me!

mtricot · on Aug 8, 2023

Isn't it the dream? Today there is a lot of stack that needs to be built to enable what you're describing. This is actually what we are doing with that post. What foundations do we need to build so that the UX for the end user is what you're describing. Will take some time to get there :)

mtricot · on Aug 8, 2023

It depends.

Airbyte comes in 3 flavors: OSS, Cloud, Enterprise.

For OSS & Enterprise, data doesn't leave your infra since Airbyte is running in your infrastructure. For Cloud, you would have to allow some IPs to allow us to access your local db.

mtricot · on Aug 8, 2023

For the purpose of the tutorial that we built, it really comes down to the type of data that you're using.

If you have data with PII:

One option would be to use Airbyte and bring the data into files/local db rather than directly to the vector store, add an extra step that strips the data from all PII and then configure Airbyte to move the clean file/record to the vector store.

The option that jmorgan mention is relevant here, using a "self-hosted" model.

mtricot · on Aug 8, 2023

Thanks! I agree with your point. There is a lot of tuning that needs to happen, including context aware splitting and any other kind of transformation before the unstructured data gets indexed. This is one of the big challenge of productionizing LLM apps with external data. So far we are using internally since the team as experience dealing with building these connectors and that becomes a great co-pilot.

The great thing we get by plugging this whole stack together is that we get all the refreshed data as more issues/connectors get created.

mtricot · on Aug 8, 2023

I am sure we can build something around that. Going to take a look at it. Thanks for mentioning it.

mtricot · on Aug 8, 2023

Shouldn't have any limits here. Can you let us know how it goes?

johndhi · on Aug 8, 2023

hmm, as a person of low technical savvy, do you expect there will be a point at which I can upload a large text file and have you do all the work to let me chat with it? I'd pay for that today if it exists, but can't put a ton of effort into building/implementing something myself.

tomr75 · on Aug 9, 2023

chatpdf..?

johndhi · on Aug 17, 2023

chatpdf doesn't support my volume -- files are too big.