I use Emacs for most of my work related to coding and technical writing.
I've been running phind-v2-codellama and openhermes using ollama and gptel, as well as github's copilot. I like how you can send an arbitrary region to an LLM and ask for things about it. Of course the UX is in early stage, but just imagine if a foundation model can take all the context (i.e. your orgmode files and open file buffers) and can use tools like LSP.
I just stopped worry and succumbed to https://github.com/emacs-evil/evil.
Now I only mostly just fiddle with orgmode configs to generate nice looking HTML and PDFs.
Been using Spack for a while to manage my machine learning package dependencies. It allows me to quickly spin up projects with complex dependencies (my current environment has 329 packages built ...). It's pretty easy to use with containers. It allows me to evaluate and migrate to different PyTorch/CUDA versions easily.
Congratulation on the launch! Best wishes!
Would absolutely love to dive into it soon.
Here are some high level questions:
- How does it handle failure of individual tasks in the pipeline?
- What if the underlying jobs (e.g. training or dataset extraction or metrics evaluation) need to run outside the k8s cluster (e.g. running bare-metal, slurm, sagemaker, or even a separate k8s cluster)?
- How does caching work if multiple pipeline can share some common components (e.g. dataset extraction)?
> - How does it handle failure of individual tasks in the pipeline?
At this time there are no handling of failures (Sematic is 6 weeks old :). In the near future we will have fault tolerance mechanisms: retries, try/except.
> - What if the underlying jobs need to run outside the k8s cluster?
You are free to launch jobs on third-party platforms from one of your pipeline steps. This is a pretty common pattern, for instance launching a Spark job, or a training job on a dedicated GPU cluster. In this case, the pipeline step that launches the job (the Sematic function) needs to wait for the third-party job to complete, or pass a reference to the job to a downstream step that will do the waiting.
> - How does caching work?
At this time there is no caching (as mentioned Sematic is very new :). We will implement memoization soon. What you can do is run a data processing pipeline separately and then use the generated dataset as input to other pipelines.
This is a pretty common pattern: having a number of sub-pipelines (e.g. a data processing loop, a train/eval loop, a testing/metrics loop, etc.) that you can run independently, but also you can put them together in an end-to-end pipeline for automation. Sematic lets your nest pipelines in arbitrary ways, and each sub-pipeline can still have its own entry-point for independent execution.
Is there a plan to tighter integrate into k8s, potentially in a multi-cluster/federated setting. It's a lot easier to get buy-ins for ray adoption from infra teams where k8s is the centralized compute substrate.
Great work and kudos to the Ray team!
It's definitely a fresh look with a lot of lessons learned from previous generations (e.g. spark).
There are a few nice features I wish Ray would eventually get to.
On the user experience side, it would be nice to have task level logs: often time it's easier for users to reason at task level, especially the task is a facade that triggers other complicated library/subprocess calls.