Hacker Newsnew | past | comments | ask | show | jobs | submit | c1ccccc1's commentslogin

Radiators can shadow each other, so that puts some kind of limit on the size of the individual satellite (which limits the size of training run it can be used for, but I guess the goal for these is mostly inference anyway). More seriously, heat conduction is an issue: If the radiator is too long, heat won't get from its base to its tip fast enough. Using fluid is possible, but adds another system that can fail. If nothing else, increasing the size of the radiator means more mass that needs to be launched into space.

please check my didactic example here: https://news.ycombinator.com/item?id=46862869

"Radiators can shadow each other," this is precisely why I chose a convex shape, that was not an accident, I chose a pyramid just because its obvious that the 4 triangular sides can be kept in the shade with respect to the sun, and their area can be made arbitrarily large by increasing the height of the pyramid for a constant base. A convex shape guarantees that no part of the surface can appear in the hemispherical view of any other part of the surface.

The only size limit is technological / economical.

In practice h = 3xL where L was the square base side length, suffices to keep the temperature below 300K.

If heat conduction can't be managed with thermosiphons / heat pipes / cooling loops on the satellite, why would it be possible on earth? Think of a small scale satellite with pyramidal sats roughly h = 3L, but L could be much smaller, do you actually see any issue with heat conduction? scaling up just means placing more of the small pyramidal sats.


Kudos for giving a concrete example, but the square-cube law means that scaling area A results in A^(3/2) scaling for the mass of material used and also launch costs. If you make the pyramid hollow to avoid this, you're back to having to worry about heat conduction. You assumed an infinite thermal conductivity for your pyramid material, a good approximation if it's solid aluminum, but that's going to be very expensive (mainly in launch costs).

In reality, probably radiator designs would rely on fluid cooling to move heat all the way along the radiator, rather than thermal conduction. This prevents the above problem. The issue there is that we now need to design this system with its pipes and pumps in such a way that it can run reliably for years with zero maintenance. Doable? Yes. Easy or cheap? No. The reason cooling on Earth is easier is that we can transfer heat to air / water instead of having to radiate it away ourselves. Doing this basically allows us to use the entire surface of the planet as our radiator. But this is not an option in space, where we need to supply the radiator ourselves.

In terms of scaling by instead making many very small sats, I agree that this will scale well from a cooling perspective as long as you keep them far enough apart from each other. This is not as great from the perspective of many things we actually want to use a compute cluster for, which require high-bandwidth communication between GPUs.

In any case, another very big problem is the fact that space has a lot of ionizing radiation in it, which means we also have to add a lot of radiation shielding too.

Keep in mind that the on-the-ground alternative that all this extra fooling around has to compete with is just using more solar panels and making some batteries.


At no point did I propose a massive block of solid aluminum. I describe the heated surface and I describe a radiating surface, so programmers understand the concept of the balance of energy flow and how to calculate rest temperature with Stefan Boltzmann law, if they want to explore the details they now have enough information to generalize, they can use RMAD and run actual calculations to optimize for different scenarios.

Radiation hardening:

While there is some state information on GPU, for ML applications the occasional bit flip isn't that critical, so Most of the GPU area can be used as efficiently as before and only the critical state information on GPU die or host CPU needs radiation hardening.

Scaling: the didactic unoptimized 30m x 30m x 90m pyramid would train a 405B model 17 days, it would have 23 TB RAM (so it can continue training larger and larger state of the art models at comparatively slower rates). Not sure what's ridiculous about it? At some point people piss on didactic examples because they want somebody to hold their hand and calculate everything for them?


̶I̶ ̶t̶h̶i̶n̶k̶ ̶h̶e̶ ̶a̶c̶t̶u̶a̶l̶l̶y̶ ̶g̶e̶n̶e̶t̶i̶c̶a̶l̶l̶y̶ ̶e̶n̶g̶i̶n̶e̶e̶r̶e̶d̶ ̶h̶i̶s̶ ̶g̶u̶t̶ ̶b̶a̶c̶t̶e̶r̶i̶a̶ ̶r̶a̶t̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶a̶n̶y̶ ̶o̶f̶ ̶h̶i̶s̶ ̶o̶w̶n̶ ̶c̶e̶l̶l̶s̶ ̶t̶h̶e̶r̶e̶,̶ ̶r̶i̶g̶h̶t̶?̶

EDIT: Above is false. Went back and checked and I had mis-remembered the video.


From the video, it sounds like he engineered his own cells. Using a virus that is known for transferring genetic material into other organisms, he added a gene for producing lactase, and then ate it. I suppose that would affect both his gut bacteria and his own cells. But it lasted for ~1.5 years, which probably indicates that it truly was his cells. Also he seems to know what he's talking about, and he claims it was his own cells.


Could be placebo though?


You're right, thanks!


Is there a reason why I'm seeing squares between each of the characters in your message? It's making it pretty hard to read...

I'm using Chrome on Android


The text is crossed out using Unicode combining characters of a strikethrough. This allows it to display without any specific formatting support, but it does require that the font support those characters. The font you're using doesn't support the characters, so it displays boxes, instead.


Firefox on iOS, I see strikethrough but the strikes are at varying heights.


Name some of the contradictory possibilities you have in mind?

Also, do you actually think the core idea is wrong, or is this more of a complaint about how it was presented? Say we do an experiment where we train an alpha-zero-style RL agent in an environment where it can take actions that replace it with an agent that pursues a different goal. Do you actually expect to find that the original agent won't learn not to let this happen, and even pay some costs to prevent it?


A contradictory possibility is that agents which have different ultimate objectives can have different and disjunct sets of goals which are instrumental towards their objectives.

I do think the core idea of instrumental convergence is wrong. In the hypothetical scenario you describe, the behavior of the agent, whether it learns to replace itself or not, will depend on its goal, its knowledge of and ability to reason about the problem, and the learning algorithm it employs. These are just some of the variables that you’d need to fill in to get the answer to your question. Instrumental convergence theoreticians suggest one can just gloss over these details and assume any hypothetical AI will behave certain ways in various narratively described situations, but we can’t. The behavior of an AI will be contingent on multiple details of the situation, and those details can mean that no goals instrumental to one agent are instrumental to another.


Why is that? My guess would be that you could adjoin an i all the time to the p^n field and get the p^2n field, as long as you had p = 4k + 3. But that's admittedly based on approximately zero thinking.

EDIT: Looking things up indicates that if n is even, there's already a square root of -1 in the field, so we can't add another. So now I believe the 1/4 of the time thing you mentioned, and can't see how that's wrong.


Spitballing here, but I suspect it's a density thing. If you are considering all prime powers up to some bound N, then the density of prime powers (edit: of size p^n with n > 1) approaches 0 as N tends to infinity. So rather than things being 1/4 like our intuition says, it should unintuitively be 1/2. I haven't given this much thought, but I suspect this based on checking some examples in Sage.


Oh, so just a probability density thing where we sample q and check if it's p^n (retrying if not) rather than sampling p and n separately and computing q=p^n? I guess that's probably what the they were going for, yeah.


exactly


You should basically assume they are pulled from thin air. (Or more precisely, from the brain and world model of the people making the prediction.)

The point of giving such estimates is mostly an exercise in getting better at understanding the world, and a way to keep yourself honest by making predictions in advance. If someone else consistently gives higher probabilities to events that ended up happening than you did, then that's an indication that there's space for you to improve your prediction ability. (The quantitative way to compare these things is to see who has lower log loss [1].)

[1] https://en.wikipedia.org/wiki/Cross-entropy


> If someone else consistently gives higher probabilities to events that ended up happening than you did, then that's an indication that there's space for you to improve your prediction ability.

Your inference seems ripe for scams.

For example-- if I find out that a critical mass of participants aren't measuring how many participants are expected to outrank them by random chance, I can organize a simplistic service to charge losers for access to the ostensible "mentors."

I think this happened with the stock market-- you predict how many mutual fund managers would beat the market by random chance for a given period. Then you find that same (small) number of mutual fund managers who beat the market and switched to a more lucrative career of giving speeches about how to beat the market. :)


Is there some database where you can see predictions of different people and the results? Or are we supposed to rely on them keeping track and keeping themselves honest? Because that is not something humans do generally, and I have no reason to trust any of these 'rationalists'.

This sounds like a circular argument. You started explaining why them giving percentage predictions should make them more trustworthy, but when looking into the details, I seem to come back to 'just trust them'.


Yes, there is: https://manifold.markets/

People's bets are publicly viewable. The website is very popular with these "rationality-ists" you refer to.

I wasn't in fact arguing that giving a prediction should make people more trustworthy, please explain how you got that from my comment? I said that the main benefit to making such predictions is as practice for the predictor themselves. If there's a benefit for readers, it is just that they could come along and say "eh, I think the chance is higher than that". Then they also get practice and can compare how they did when the outcome is known.


Would you also get triggered if you saw people make a bet at, say, $24 : $87 odds? Would you shout: "No! That's too precise, you should bet $20 : $90!"? For that matter, should all prices in the stock market be multiples of $1, (since, after all, fluctuations of greater than $1 are very common)?

If the variance (uncertainty) in a number is large, correct thing to do is to just also report the variance, not to round the mean to a whole number.

Also, in log odds, the difference between 5% and 10% is about the same as the difference between 40% and 60%. So using an intermediate value like 8% is less crazy than you'd think.

People writing comments in their own little forum where they happen not to use sig-figs to communicate uncertainty is probably not a sinister attempt to convince "everyone" that their predictions are somehow scientific. For one thing, I doubt most people are dumb enough to be convinced by that, even if it were the goal. For another, the expected audience for these comments was not "everyone", it was specifically people who are likely to interpret those probabilities in a Bayesian way (i.e. as subjective probabilities).


> Would you also get triggered if you saw people make a bet at, say, $24 : $87 odds? Would you shout: "No! That's too precise, you should bet $20 : $90!"? For that matter, should all prices in the stock market be multiples of $1, (since, after all, fluctuations of greater than $1 are very common)?

No.

I responded to the same point here: https://news.ycombinator.com/item?id=44618142

> correct thing to do is to just also report the variance

And do we also pull this one out of thin air?

Using precise number to convey extremely unprecise and ungrounded opinions is imho wrong and to me unsettling. I'm pulling this purely out of my ass, and maybe I am making too much out of it, but I feel this is in part what is causing the many cases of very weird, and borderline associal/dangerous behaviours of some associated with the rationalists movement. When you try to precisely quantify what cannot be, and start trusting those numbers too much, you can easily be led to trust your conclusions way too much. I am 56% confident this is a real effect.


I mean, sure people can use this to fool themselves. I think usually the cause of someone fooling themselves is "the will to be fooled", and not so much that fact that they used precise numbers in the their internal monologue as opposed to verbal buckets like "pretty likely", "very unlikely". But if you estimate 56% it sometimes actually makes a difference, then who am I to argue? Sounds super accurate to me. :)

In all seriousness, I do agree it's a bit harmful for people to use this kind of reasoning, but only practice it on things like AGI that will not be resolved for years and years (and maybe we'll all be dead when it does get resolved). Like ideally you'd be doing hand-wavy reasoning with precise probabilities about whether you should bring an umbrella on a trip, or applying for that job, etc. Then you get to practice with actual feedback and learn how not to make dumb mistakes while reasoning in that style.

> And do we also pull this one out of thin air?

That's what we do when training ML models sometimes. We'll have the model make a Gaussian distribution by supplying both a mean and a variance. (Pulled out of thin air, so to speak.) It has to give its best guess of the mean, and if the variance it reports is too small, it gets penalized accordingly. Having the model somehow supply an entire probability distribution is even more flexible (and even less communicable by mere rounding). Of course, as mentioned by commenter danlitt, this isn't relevant to binary outcomes anyways, since the whole distribution is described by a single number.


> and not so much that fact that they used precise numbers in the their internal monologue as opposed to verbal buckets like "pretty likely", "very unlikely"

I am obviously only talking from my personal anecdotal experience, but having been on a bunch of coffee chat in the last few months with people in the AI safety field in SF, and a lot of them being Lesswrong-ers, I experienced a lot of those discussions with random % being thrown in succession to estimate the final probability of some event, and even though I have worked in ML for 10+ years (so I would guess more constantly aware of what a bayesian probability is than the average person), I do find myself often swayed by whatever numbers comes out at the end and having to consciously take a step back and pull myself from instinctively trusting this random number more than I should. I would not need to pull myself back, I think, if we were using words instead of precise numbers.

It could be just a personal mental weakness with numbers with me that is not general, but looking at my interlocutors emotional reactions to their own numerical predictions I do feel quite strongly that this is a general human trait.


> It could be just a personal mental weakness with numbers with me that is not general, but looking at my interlocutors emotional reactions to their own numerical predictions I do feel quite strongly that this is a general human trait.

Your feeling is correct; anchoring is a thing, and good LessWrongers (I hope to be in that category) know this and keep track of where their prior and not just posterior probabilities come from: https://en.wikipedia.org/wiki/Anchoring_effect

Probably don't in practice, but should. That "should" is what puts the "less" into "less wrong".


Ah thanks for the link, yes this is precisely the bias I am feeling falling victim to if not making an effort to counter it.


> If the variance (uncertainty) in a number is large, correct thing to do is to just also report the variance

I really wonder what you mean by this. If I put my finger in the air and estimate the emergence of AGI as 13%, how do I get at the variance of that estimate? At face value, it is a number, not a random variable, and does not have a variance. If you instead view it as a "random sample" from the population of possible estimates I might have made, it does not seem well defined at all.


I meant in a general sense that it's better when reporting measurements/estimates of real numbers to report the uncertainty of the estimate alongside the estimate, instead of using some kind of janky rounding procedure to try and communicate that information.

You're absolutely right that if you have a binary random variable like "IMO gold by 2026", then the only thing you can report about its distribution is the probability of each outcome. This only makes it even more unreasonable to try and communicate some kind of "uncertainty" with sig-figs, as the person I was replying to suggested doing!

(To be fair, in many cases you could introduce a latent variable that takes on continuous values and is closely linked to the outcome of the binary variable. Eg: "Chance of solving a random IMO problem for the very best model in 2025". Then that distribution would have both a mean and a variance (and skew, etc), and it could map to a "distribution over probabilities".)


Cryptographers are clever and have figured out a way to let you know your vote was counted without being able to prove it to a third party!

https://www.youtube.com/watch?v=BYRTvoZ3Rho


Can the average person trust this though?


This comment is interesting in that it itself exemplifies principle #5.

(And also principles #6, #7, #9 to some extent.)


Yeah it's kinda contradictory. It's part critique and part self reflection on an ideology that I have mixed feeling towards. In the same vein as the Programming Language Checklist for language designers.

https://www.mcmillen.dev/language_checklist.html


"Below" in this context means "directly below and connected by a line" (the lines are the edges of the cube). So you can have a blue vertex that is vertically below a white vertex, so long as they are not connected by an edge. The first time this can happen is for a 3 dimensional cube. You can have blue at the top, then 2 blue and 1 white below that, and then 1 blue (under and between the 2 blues in the layer above) and 2 white in the layer below that, and then white for the bottom vertex. This configuration can be rotated 3 ways and this takes us from 17 to 20.


The post assumes a 2% annual rate of growth in energy consumption. So, due to the nature of exponential functions, most of the energy loss would concentrated towards the end of the 1000 years, as the energy consumption approaches 400 million times present day energy usage. The first two centuries of use would not have a noticeable impact.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: