More

kpw94 · 2026-02-20T22:27:01 1771626421

> Basically, I was told to make it so that my phone's camera could see something on the screen and my desk at the same time without washing out

+1. The low-tech version of this I've heard and I've been doing is:

Hold a printed white paper sheet right next to your monitor, and adjust the amount of brightness in monitor so the monitor matches that sheet.

This of course requires good overall room lightning where the printed paper would be pleasant to read in first place, whether it's daytime or evening/night

taeric · 2026-02-20T22:54:16 1771628056

I think this was what I was told the first time. The advantage of taking a picture with my phone's camera is it kind of made it obvious just how much brighter the screen was then the paper.

Which, fair that it may be obvious to others to just scan their eyes from screen to paper. I've been surprised with how much people will just accept the time their eyes have to adjust to a super bright screen. Almost like it doesn't register with them.

Marsymars · 2026-02-21T06:17:21 1771654641

There's some overlap with bias lighting here - good overall room lighting works if you've got good daylight, but it's much easier to get bright bias lighting at night than to light up the entire room.

kpw94 · 2026-02-16T16:58:38 1771261118

Per https://github.com/QwenLM/Qwen3.5, more are coming:

> News

> 2026-02-16: More sizes are coming & Happy Chinese New Year!

kpw94 · 2026-01-23T23:47:15 1769212035

> However Germany and it's infrastructure can not be compared to the Netherlands. I refuse to take trains through that country anymore.

In which country are the trains bad? Netherlands or Germany? Do you care elaborating why? is that punctuality? strikes? decaying infrastructure?

MrDresden · 2026-01-24T06:54:56 1769237696

Yeah I see now how that was unclear.

I was talking about Germany's infrastructure. Last year I had 3x separate trips turn into chaos due to how broken their system is. Broken trains, broken track infrastructure etc. Think multiple hours on each trip rather than just 10 minutes delay.

The Ditch system is very reliable in contrast.

kpw94 · 2025-12-16T20:45:59 1765917959

Very cool! And important for sure, thank you.

Few questions:

- is the stack to index those open source?

- is there some standardized APIs each municipality provides, or do you go through the tedious task of building a per-municipality crawling tool?

- how often do you refresh the data? Checked a city, it has meeting minutes until 6/17, but the official website has more recent minutes (up to 12/2 at least)

phildini · 2025-12-16T21:33:36 1765920816

Thanks for asking!

- The framework for crawling is open-source. https://github.com/civicband

- There is absolutely not a standardized API for nearly any of this. I build generalized crawlers when I can, and then build custom crawlers when I need.

- Can you let me know which city? The crawlers run for every municipality at least once every day, so that's probably a bug

kpw94 · 2025-12-09T21:40:02 1765316402

> I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system.

That's a good idea!

Curious about this, if you don't mind sharing:

- what's the stack ? (Do you run like llama.cpp on that rented machine?)

- what model(s) do you run there?

- what's your rough monthly cost? (Does it come up much cheaper than if you called the equivalent paid APIs)

clusterhacks · 2025-12-09T22:46:12 1765320372

I ran ollama first because it was easy, but now download source and build llama.cpp on the machine. I don't bother saving a file system between runs on the rented machine, I build llama.cpp every time I start up.

I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap.

I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons.

Juminuvi · 2025-12-10T02:36:34 1765334194

I know you say you don't use the paid apis, but renting a gpu is something I've been thinking about and I'd be really interested in knowing how this compares with paying by the token. I think gpt-oss-120b is 0.10/input 0.60/output per million tokens in azure. In my head this could go a long way but I haven't used gpt oss agentically long enough to really understand usage. Just wondering if you know/be willing to share your typical usage/token spend on that dedicated hardware?

KronisLV · 2025-12-10T10:32:47 1765362767

For comparison, here's my own usage with various cloud models for development:

  * Claude in December: 91 million tokens in, 750k out
  * Codex in December: 43 million tokens in, 351k out
  * Cerebras in December: 41 million tokens in, 301k out
  * (obviously those figures above are so far in the month only)
  * Claude in November: 196 million tokens in, 1.8 million out
  * Codex in November: 214 million tokens in, 4 million out
  * Cerebras in November: 131 million tokens in, 1.6 million out
  * Claude in October: 5 million tokens in, 79k out
  * Codex in October: 119 million tokens in, 3.1 million out

As for Cerebras in October, I don't have the data because they don't show the Qwen3 Coder model that was deprecated, but it was way more: https://blog.kronis.dev/blog/i-blew-through-24-million-token...

In general, I'd say that for the stuff I do my workloads are extremely read heavy (referencing existing code, patterns, tests, build and check script output, implementation plans, docs etc.), but it goes about like this:

  * most fixed cloud subscriptions will run out really quickly and will be insufficient (Cerebras being an exception)
  * if paying per token, you *really* want the provider to support proper caching, otherwise you'll go broke
  * if you have local hardware that is great, but it will *never* compete with the cloud models, so your best bet is to run something good enough, basically cover all of your autocomplete needs, and also with tools like KiloCode an advanced cloud model can do the planning and a simpler local model do the implementation, then the cloud model validate the output

adam_patarino · 2025-12-12T16:12:38 1765555958

This is the perfect use case for local models. It's why we set out to create cortex.build! A local LLM

clusterhacks · 2025-12-10T14:32:28 1765377148

Sorry, I don't much track or keep up with those specifics other than knowing I'm not spending much per week. My typical scenario is to spin up an instance that costs less than $2/hr for 2-4 hours. It's all just exploratory work really. Sometimes I'm running a script that is making a call to the LLM server api, other times I'm just noodling around in the web chat interface.

bigiain · 2025-12-10T01:19:21 1765329561

I don't suppose you have (or would be interested in writing) a blog post about how you set that up? Or maybe a list of links/resources/prompts you used to learn how to get there?

clusterhacks · 2025-12-10T02:22:19 1765333339

No, I don't blog. But I just followed the docs for starting an instance on lambda.ai and the llama.cpp build instructions. Both are pretty good resources. I had already setup an SSH key with lambda and the lambda OS images are linux pre-loaded with CUDA libraries on startup.

Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp.

I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai.

connected from terminal on my box at home and setup the ssh tunnel.

ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard>

  Started building llama.cpp from source, history:    
     21  git clone   https://github.com/ggml-org/llama.cpp
     22  cd llama.cpp
     23  which cmake
     24  sudo apt list | grep libcurl
     25  sudo apt-get install libcurl4-openssl-dev
     26  cmake -B build -DGGML_CUDA=ON
     27  cmake --build build --config Release

MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build

     28  cmake --build build --config Release -j 16
     29  ls
     30  ls build
     31  find . -name "llama.server"
     32  find . -name "llama"
     33  ls build/bin/
     34  cd build/bin/
     35  ls
     36  ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja

MISTAKE, didn't specify the port number for the llama-server

     37  clear;history
     38  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking -c 0 --jinja --port 11434
     39  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking.gguf -c 0 --jinja --port 11434
     40  ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking-GGUF -c 0 --jinja --port 11434
     41  clear;history

I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface.

Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.

bigiain · 2025-12-10T07:17:02 1765351022

Thanks, much appreciated.

kpw94 · 2025-11-12T18:19:36 1762971576

> $1 and $2 coins in wide circulation (instead of worn-out $1 bills).

This has its own pros/cons...

One advantage of $1 bill over coin is the majority of people in US don't need a wallet with zipper to hold coins. Five $1 bills is much less bulky and much lighter than five $1 CAD or five 1€ coins

wasabi991011 · 2025-11-13T00:15:36 1762992936

Of course everything has its pros and cons, but not all of them are worth considering.

The amount of wallets with zipper is a country is not worth considering when discussing what coins should be minted.

jamincan · 2025-11-13T13:40:36 1763041236

I would contend that 5 bills are more bulky than 5 coins. The only upside of dealing with US bills when travelling in the US is that you feel like a millionaire when you pull out the massive wad of bills from your pocket.

kpw94 · 2025-11-10T23:52:17 1762818737

> one can absolutely check the text to remove all occurrences of Indiana Jones

How do you handle this kind of prompt:

“Generate an image of a daring, whip-wielding archaeologist and adventurer, wearing a fedora hat and leather jacket. Here's some back-story about him: With a sharp wit and a knack for languages, he travels the globe in search of ancient artifacts, often racing against rival treasure hunters and battling supernatural forces. His adventures are filled with narrow escapes, booby traps, and encounters with historical and mythical relics. He’s equally at home in a university lecture hall as he is in a jungle temple or a desert ruin, blending academic expertise with fearless action. His journey is as much about uncovering history’s secrets as it is about confronting his own fears and personal demons.”

Try copy-pasting it in any image generation model. It looks awfully like Indiana Jones for all my attempts, yet I've not referenced Indiana Jones even once!

runeblaze · 2025-11-11T09:17:11 1762852631

Emmmm sure, but throw this to a human artist who has not heard of Indiana Jones and see if they draw something alike.

kpw94 · 2025-10-20T23:27:25 1761002845

Nice!

So the cache check tries to find if a previously existing text embedding has >0.8 match with the current text.

If you get a cache hit here, iiuc, you return that matched' text label right away. But do you also insert a text embedding of the current text in the text embeddings table? Or do you only insert it in case of cache miss?

From reading the GitHub readme it seems you only "store text embedding for future lookups" in the case of cache miss. This is by design to keep the text embedding table not too big?

frenchmajesty · 2025-10-20T23:42:52 1761003772

Op here. Yes that's right. We do also insert the current text embedding on misses to expand the boundaries of the cluster.

For instance: I love McDonalds (1). I love burgers. (0.99) I love cheeseburgers with ketchup (?).

This is a bad example but in this case the last text could end up right at the boundary of the similarity to that 1st label if we did not store the 2nd, which could cause a cluster miss we don't want.

We only store the text on cache misses, though you could do both. I had not considered that idea but it make sense. I'm not very concerned about the dataset size because vector storage is generally cheap (~ $2/mo for 1M vectors) and the savings in $$$ not spend generating tokens covers for that expense generously.

kpw94 · 2025-09-21T02:39:07 1758422347

A workaround: Long press on image, "open image in a new tab"

kpw94 · 2025-09-02T17:38:13 1756834693

Yeah the landscpe when there were many more Search engines must have been exactly the same...

I think the eng teams behind those were just more competent / more frugal on their processing.

And since there wasn't any AWS equivalent, they had to be better citizens since well-known IP range ban for the crawled websites was trivial.

danudey · 2025-09-02T21:20:55 1756848055

It's worth noting that search engines back then (and now? except the AI ones) generally tended to follow robots.txt, which meant that if there were heavy areas of your site that you didn't want them to index you could filter them out and let them just follow static pages. You could block off all of /cgi-bin/ for example and then they would never be hitting your CGI scripts - useful if your guestbook software wrote out static files to be served, for example.

The search engines were also limited in resources, so they were judicious about what they fetched, when, and how often; optimizing their own crawlers saved them money, and in return it also saved the websites too. Even with a hundred crawlers actively indexing your site, they weren't going to index it more than, say, once a day, and 100 requests in a day isn't really that much even back then.

Now, companies are pumping billions of dollars into AI; budgets are infinite, limits are bypassed, and norms are ignored. If the company thinks it can benefit from indexing your site 30 times a minute then it will, but even if it doesn't benefit from it there's no reason for them to stop it from doing so because it doesn't cost them anything. They cannot risk being anything other than up-to-date, because if users are coming to you asking about current events and why space force is moving to Alabama and your AI doesn't know but someone else's does, then you're behind the times.

So in the interests of maximizing short-term profit above all else - which is the only thing AI companies are doing in any way shape or form - they may as well scrape every URL on your site once per second, because it doesn't cost them anything and they don't care if you go bankrupt and shut down.

acdha · 2025-09-02T19:18:46 1756840726

Bandwidth cost more then, so the early search engines had an inventive not to massively increase their own costs if nothing else.

ccgreg · 2025-09-02T17:55:06 1756835706

The blekko search engine index was only 1 billion pages, compared to Common Crawl Foundation's crawl of 3 billion webpages per month.