Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Congratulation on the launch! Best wishes! Would absolutely love to dive into it soon.

Here are some high level questions:

- How does it handle failure of individual tasks in the pipeline? - What if the underlying jobs (e.g. training or dataset extraction or metrics evaluation) need to run outside the k8s cluster (e.g. running bare-metal, slurm, sagemaker, or even a separate k8s cluster)? - How does caching work if multiple pipeline can share some common components (e.g. dataset extraction)?



> - How does it handle failure of individual tasks in the pipeline? At this time there are no handling of failures (Sematic is 6 weeks old :). In the near future we will have fault tolerance mechanisms: retries, try/except.

> - What if the underlying jobs need to run outside the k8s cluster? You are free to launch jobs on third-party platforms from one of your pipeline steps. This is a pretty common pattern, for instance launching a Spark job, or a training job on a dedicated GPU cluster. In this case, the pipeline step that launches the job (the Sematic function) needs to wait for the third-party job to complete, or pass a reference to the job to a downstream step that will do the waiting.

> - How does caching work? At this time there is no caching (as mentioned Sematic is very new :). We will implement memoization soon. What you can do is run a data processing pipeline separately and then use the generated dataset as input to other pipelines. This is a pretty common pattern: having a number of sub-pipelines (e.g. a data processing loop, a train/eval loop, a testing/metrics loop, etc.) that you can run independently, but also you can put them together in an end-to-end pipeline for automation. Sematic lets your nest pipelines in arbitrary ways, and each sub-pipeline can still have its own entry-point for independent execution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: