Book 4 of the Sequences Highlights

While far better than what came before, "science" and the "scientific method" are still crude, inefficient, and inadequate to prevent you from wasting years of effort on doomed research directions.

Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
The Wikipedia articles on the VNM theorem, Dutch Book arguments, money pump, Decision Theory, Rational Choice Theory, etc. are all a horrific mess. They're also completely disjoint, without any kind of Wikiproject or wikiboxes for tying together all the articles on rational choice. It's worth noting that Wikipedia is the place where you—yes, you!—can actually have some kind of impact on public discourse, education, or policy. There is just no other place you can get so many views with so little barrier to entry. A typical Wikipedia article will get more hits in a day than all of your LessWrong blog posts have gotten across your entire life, unless you're @Eliezer Yudkowsky. I'm not sure if we actually "failed" to raise the sanity waterline, like people sometimes say, or if we just didn't even try. Given even some very basic low-hanging fruit interventions like "write a couple good Wikipedia articles" still haven't been done 15 years later, I'm leaning towards the latter. edit me senpai
Self modelling in NN Is this good news for mech interpretability? If the model makes it easily predictable, then that really seems to limit the possibilities for deceptive alignment
Terminology proposal: scaffolding vs tooling. I haven't seen these terms consistently defined with respect to LLMs. I've been using, and propose standardizing on: * Tooling: affordances for LLMs to make calls, eg ChatGPT plugins. * Scaffolding: an outer process that calls LLMs, where the bulk of the intelligence comes from the called LLMs, eg AutoGPT. If the scaffolding itself becomes as sophisticated as the LLMs it calls, we should start focusing on the system as a whole rather than just describing it as a scaffolded LLM. There's another possible category, of LLMs calling each other as peers without a central outer process, but I'm not aware of this being a type of system that actually exists. Thanks to @Andy Arditi for helping me nail down the distinction.
random idea for a voting system (i'm a few centuries late. this is just for fun.) instead of voting directly, everyone is assigned to a discussion group of x (say 5) of themself and others near them. the group meets to discuss at an official location (attendance is optional). only if those who showed up reach consensus does the group cast one vote. many of these groups would not reach consensus, say 70-90%. that's fine. the point is that most of the ones which do would be composed of people who make and/or are receptive to valid arguments. this would then shift the memetic focus of politics towards rational arguments instead of being mostly rhetoric/bias reinforcement (which is unlikely to produce consensus when repeated in this setting). possible downside: another possible equilibrium is memetics teaching people how to pressure others into agreeing during the group discussion, when e.g it's 3 against 2 or 4 against 1. possible remedy: have each discussion group be composed of a proportional amount of each party's supporters. or maybe have them be 1-on-1 discussions instead of groups of x>2 because those tend to go better anyways. also, this would let misrepresented minority positions be heard correctly. i don't think this would have saved humanity from ending up in an inadequate equilibrium, but maybe would have at least been less bad.
I went through and updated my 2022 “Intro to Brain-Like AGI Safety” series. If you already read it, no need to do so again, but in case you’re curious for details, I put changelogs at the bottom of each post. For a shorter summary of major changes, see this twitter thread, which I copy below (without the screenshots & links):  > I’ve learned a few things since writing “Intro to Brain-Like AGI safety” in 2022, so I went through and updated it! Each post has a changelog at the bottom if you’re curious. Most changes were in one the following categories: (1/7) > > REDISTRICTING! As I previously posted ↓, I booted the pallidum out of the “Learning Subsystem”. Now it’s the cortex, striatum, & cerebellum (defined expansively, including amygdala, hippocampus, lateral septum, etc.) (2/7) > > LINKS! I wrote 60 posts since first finishing that series. Many of them elaborate and clarify things I hinted at in the series. So I tried to put in links where they seemed helpful. For example, I now link my “Valence” series in a bunch of places. (3/7) > > NEUROSCIENCE! I corrected or deleted a bunch of speculative neuro hypotheses that turned out wrong. In some early cases, I can’t even remember wtf I was ever even thinking! Just for fun, here’s the evolution of one of my main diagrams since 2021: (4/7) > > EXAMPLES! It never hurts to have more examples! So I added a few more. I also switched the main running example of Post 13 from “envy” to “drive to be liked / admired”, partly because I’m no longer even sure envy is related to social instincts at all (oops) (5/7) > > LLMs! … …Just kidding! LLMania has exploded since 2022 but remains basically irrelevant to this series. I hope this series is enjoyed by some of the six remaining AI researchers on Earth who don’t work on LLMs. (I did mention LLMs in a few more places though ↓ ) (6/7) > > If you’ve already read the series, no need to do so again, but I want to keep it up-to-date for new readers. Again, see the changelogs at the bottom of each post for details. I’m sure I missed things (and introduced new errors)—let me know if you see any!

Popular Comments

Recent Discussion

This post is written in a spirit of constructive criticism. It's phrased fairly abstractly, in part because it's a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them.

Claim 1: The AI safety community is structurally power-seeking.

By “structurally power-seeking” I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an...

2Said Achmiz
Er… yes, I am indeed familiar with that usage of the term “Friendly”. (I’ve been reading Less Wrong since before it was Less Wrong, you know; I read the Sequences as they were being posted.) My comment was intended precisely to invoke that “semi-technical term of art”; I was not referring to “friendliness” in the colloquial sense. (That is, in fact, why I used the capitalized term.) Please consider the grandparent comment in light of the above.
2Eli Tyre
In that case, I answer flatly "no". I don't expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.
2Said Achmiz
You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?

I think majority of nations would support dismantling their governments in favor of benevolent superintelligence, especially given correct framework. And ASI can simply solve problem of meat by growing brainless bodies.


I believe the relevant phrase is "aged like milk".

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

I think I saw a LW post that was discussing alternatives to the vNM independence axiom. I also think (low confidence) it was by Rob Bensinger and in response to Scott's geometric rationality (e.g. this post). For the hell of me, I can't find it. Unless my memory is mistaken, does anybody know what I'm talking about?

Hey everyone! Long time lurker, first time poster. 

My name is Yovel Rom. I'm an Israeli data scientist. I served in the IDF in an elite program called Talpiot, which consists of a physics udergrad and broad military education (basically West Point for technology people, but more elitist), then as a data scientist, and finished my service as a captain. I'm currently at home, since I finished my service two years ago and they don't need many data scientists right now.

Ask me anything. I'm as updated on the situation as a person with no formal position can be (which might be less than you expect).

EDIT: I will try to cite sources as much as possible, but I might do that in Hebrew since there's so much more of it. I will only cite Wikipedia, major newspapers and state- funded think tanks, which are reliable.

Hi, just saw the old thread. Anyway as an Israeli my answer is strongly 2, though it depends what you mean by ideology. The maximum that most Israelis would be willing to give due to national security considerations is less than she minimum that Palestinians are willing to get due to national pride and ethos - in terms of land degree of autonomy, and mostly solution for the descendants of the 1948-9 refugees inside Israel

This post was inspired by some talks at the recent LessOnline conference including one by LessWrong user “Gene Smith”.

Let’s say you want to have a “designer baby”. Genetically extraordinary in some way — super athletic, super beautiful, whatever.

6’5”, blue eyes, with a trust fund.

Ethics aside[1], what would be necessary to actually do this?

Fundamentally, any kind of “superbaby” or “designer baby” project depends on two steps:

1.) figure out what genes you ideally want;

2.) create an embryo with those genes.

It’s already standard to do a very simple version of this two-step process. In the typical course of in-vitro fertilization (IVF), embryos are usually screened for chromosomal abnormalities that would cause disabilities like Down Syndrome, and only the “healthy” embryos are implanted.

But most (partially) heritable traits and disease risks are...

I haven't read much at all about the Iterated Meiosis proposal but I'd be pretty concerned based on the current state of polygenic models that you end up selecting for a bunch of non-causal sites which were only selected as markers due to linkage disequlibrum (and iterated recombination makes the PRS a weaker and weaker predictor of the phenotype).

I know little about Rob Henderson except that he wrote a well-received memoir and that he really really really wants you to remember that he invented the concept of “luxury beliefs”. In his own words, these are:

ideas and opinions that confer status on the upper class at very little cost, while often inflicting costs on the lower classes

The concept has metastasized and earned widespread adoption — particularly among social conservatives and right-wing populists. It might sound sophisticated, but it’s fundamentally flawed. Its vague and inconsistent definitions necessitate a selective application, and it’s ultimately used to launder mundane political preferences into something seemingly profound and highbrow.[1]

It’ll be most useful to break down Henderson’s concept into parts and go through it step-by-step.

1. Fashionable beliefs are always in style

First, there’s...

Haven't read his book but have read enough of his tweets to understand what he's getting at. IMU, a belief is a "luxury" one if  * Adhering to it is considered "harmful" to the individual/society (whether rightly or wrongly; Rob would say rightly, presumably) * Expressing support for the belief is "trendy," manifesting in particular as the "high-class" people popularizing the belief * The proponent's "class" allows them to avoid the consequences of the said belief held at scale, whereas the "regular" folk suffer So, as an example, a "high-class" person comes out with a "hot take" like police abolition, which over time picks up steam and gains more support among the broader populace. When the consequences of higher crime rates hit, the broader populace suffers. But the "high-class" person, by virtue of living a life removed from crime's consequences, avoids them. The high-class (unlike the "regular") person could afford the belief; hence, it's a luxury. For an own-behavior-relating belief (so, think things like polyamory/drug use, not open borders/police abolition), the steps are similar except the "high class protecting the luxury believer" part takes the form of the "high class" person's safety net, ability to delay gratification, "knowing when to stop," etc. saving them from the negative consequences. Whereas someone poor and divorced from the tacit knowledge behind these behaviors (as elaborated on in more detail by Viliam) is more likely to suffer. With this in mind, "classmate ... a Republican oil tycoon who extolled the virtues of going to church but didn’t go himself" seems unrelated: * Attending the church isn't (widely) considered harmful to the society in the US, I'd think, more like meh. * The tycoon's avoidance of church themselves isn't because they say it's good but secretly know "when to stop" or something. It's just because they're lazy, or something? * Finally, in terms of long-term effects, going to church is probably not actually going
Sure, polyamory is bizarre and unconventional, but that only further undermines Henderson's assertion that it was widely adopted (enough to have an impact) by both the upper and lower class of society circa 1960-1970s. I didn't present the oil tycoon story as a luxury belief example, but rather as an example of a story that carried the same "saying but not doing" lesson. I did present "support for a harsh criminal justice system" as an example of a luxury belief that Henderson would contest, even though it perfectly fits his template.

Sure, polyamory is bizarre and unconventional, but that only further undermines Henderson’s assertion that it was widely adopted (enough to have an impact) by both the upper and lower class of society circa 1960-1970s.

He's not asserting that the upper class rejected monogamy in a way that was widely adopted. He does say this about his classmates, but his classmates aren't the entire upper class.

You may be assuming that if the lower classes did it, and the upper classes promote it, that implies that the upper classes must be responsible for the lower cl... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

A few months ago, Rob Bensinger made a rather long post (that even got curated) in which he expressed his views on several questions related to personal identity and anticipated experiences in the context of potential uploading and emulation. A critical implicit assumption behind the exposition and reasoning he offered was the adoption of what I have described as the "standard LW-computationalist frame." In response to me highlighting this, Ruben Bloom said the following:

I differ from Rob in that I do think his piece should have flagged the assumption of ~computationalism, but think the assumption is reasonable enough to not have argued for in this piece.

I do think it is interesting philosophical discussion to hash it out, for the sake of rigor and really pushing for clarity.

2clone of saturn
There seems to generally be a ton of arbitrary path-dependent stuff everywhere in biology that evolution hasn't yet optimized away, and I don't see a reason to expect the brain's implementation of consciousness to be an exception.
2the gears to ascension
Agreed about its implementation of awareness, as opposed to being unaware but still existing. What about its implementation of existing, as opposed to nonexistence?
2clone of saturn
Based on this comment I guess by "existing" you mean phenomenal consciousness and by "awareness" you mean behavior? I think the set of brainlike things that have the same phenomenal consciousness as me is a subset of the brainlike things that have the same behavior as me.

Well I'd put it the other way round. I don't know what phenomenal consciousness is unless it just means the bare fact of existence. I currently think the thing people call phenomenal consciousness is just "having realityfluid".

Separately from the more meta discussion about norms, I believe the failure mode I mentioned is quite different from yours in an important respect that is revealed by the potential remedy you pointed out ("have each discussion group be composed of a proportional amount of each party's supporters. or maybe have them be 1-on-1 discussions instead of groups of x>2 because those tend to go better anyways"). Together with your explanation of the failure mode ("when e.g it's 3 against 2 or 4 against 1"), it seems to me like you are thinking of a situation where one Republican, for instance, is in a group with 4 Democrats, and thus feels pressure from all sides in a group discussion because everyone there has strong priors that disagree with his/hers. Or, as another example, when a person arguing for a minority position is faced with 4 others who might be aggresively conventional-minded and instantly disapprove of any deviation from the Overton window. (I could very easily be misinterpreting what you are saying, though, so I am less than 95% confident of your meaning.) In this spot, the remedy makes a lot of sense: prevent these gang-up-on-the-lonely-dissenter spots by making the ideological mix-up of the group more uniform or by encouraging 1-on-1 conversations in which each ideology or system of beliefs will only have one representative arguing for it. But I am talking about a failure mode that focuses on the power of one single individual to swing the room towards him/her, regardless of how many are initially on his/her side from a coalitional perspective. Not because those who disagree are initially in the minority and thus cowed into staying silent (and fuming, or in any case not being internally convinced), but rather because the "combination of charisma, social skills, and assertiveness in dialogue" would take control of the conversation and turn the entire room in its favor, likely by getting the others to genuinely believe that they are being persuaded for rati

I think this is a good object-level comment.

Meta-level response about "did you mean this or rule it out/not have a world model where it happens?":

Some senses in which you're right that it's not what I was meaning:

  • It's more specific/detailed. I was not thinking in this level of detail about how such discussions would play out.
  • I was thinking more about pressure than about charisma (where someone genuinely seems convincing). And yes, charisma could be even more powerful in a 1-on-1 setting.

Senses in which it is what I meant:

  • This is not something my world mode
... (read more)
I disagree that the speculation was unfounded. I checked your profile before making that comment (presumably written by you, and thus a very well-founded source) and saw "~ autistic." I would not have made that statement, as written, if this had not been the case (for instance the part of "including yourself"). Then, given my past experience with similar proposals that were written about on LW, in which other users correctly pointed out the problems with the proposal and it was revealed that the OP was implicitly making assumptions that the broader community was akin to that of LW, it was reasonable to infer that the same was happening here. (It still seems reasonable to infer this, regardless of your comment, but that is beside the point.) In any case, I said "think" which signaled that I understood my speculation was not necessarily correct.  I have written up my thoughts before on why good moderation practices should not allow for the mind-reading of others, but I strongly oppose any norm that says the mere speculation, explicitly labeled as such through language that signals some epistemic humility, is inherently bad. I even more strongly oppose a norm that other users feeling pressured to respond should have a meaningful impact on whether a comment is proper or not. I expect your comment to not have been a claim about the norms of LW, but rather a personal request. If so, I do not expect to comply (unless required to by moderation).
I don't agree that my bio stating I'm autistic[1] is strong/relevant* evidence that I assume the rest of the world is like me or LessWrong users, I'm very aware that this is not the case. I feel a lot of uncertainty about what happens inside the minds of neurotypical people (and most others), but I know they're very different in various specific ways, and I don't think the assumption you inferred is one I make; it was directly implied in my shortform that neurotypicals engage in politics in a really irrational way, are influentiable by such social pressures as you (and I) mentioned, etc. *Technically, being a LessWrong user is some bayesian evidence that one makes that assumption, if that's all you know about them, so I added the hedge "strong/relevant", i.e. enough to reasonably cause one to write "I think you are making [clearly-wrong assumption x]" instead of using more uncertain phrasings. I agree that there are cases where feeling pressured to respond is acceptable. E.g., if someone writes a counterargument which one think misunderstands their position, they might feel some internal pressure to respond to correct this; I think that's okay, or at least unavoidable. I don't know how to define a general rule for determining when making-someone-feel-pressured is okay or not, but this seemed like a case where it was not okay: in my view, it was caused by an unfounded confident expression of belief about my mind. If you internally believe you had enough evidence to infer what you wrote at the level of confidence to just be prefaced with 'I think', perhaps it should not be against LW norms, though; I don't have strong opinions on what site norms should be, or how norms should differ when the subject is the internal mind of another user. More on norms: the assertive writing style of your two comments here seems also possibly norm-violating as well. Edit: I'm flagging this for moderator review. 1. ^ the "~ " you quoted is just a separator from the previ

[Metadata: crossposted from]

Exploring the ruins of an alien civilization, you find what appears to be a working computer——it's made of plastic and metal, wires connect it to various devices, and you see arrays of capacitors that maintain charged or uncharged states and that sometimes rapidly toggle in response to voltages from connected wires. You can tell that the presumptive RAM is activating in complex but structured patterns, but you don't know their meanings. What strategies can you use to come to understand what the underlying order is, what algorithm the computer is running, that explains the pattern of RAM activations?

Thanks to Joss Oliver (SPAR) for entertaining a version of this koan. Many of B's ideas come from Joss.

Real data about minds

Red: If we want to understand...


Thinking about it more, I want to poke at the foundations of the koan. Why are we so sure that this is a computer at all? What permits us this certainty, that this is a computer, and that it is also running actual computation rather than glitching out?

B: Are you basically saying that it's a really hard science problem?

From a different and more conceit-cooperative angle: it's not just that this is a really hard science problem, it might be a maximally hard science problem. Maybe too hard for existing science to science at! After all, hash functions are ... (read more)