Keltham's lecture on Science, in, as is usual for him, Cheliax
Next Post »
+ Show First Post
Total: 355
Posts Per Page:
Permalink

Message to her:  Don't smile or anything, but:  Correct.  Though two significant digits would've been fine there.

...and she was first of the newcomers, so probably everyone is trying to compute needlessly many digits, which, okay, fine.

Permalink

Keltham will wait until everyone's raised their hand, and then go to the whiteboard and show how he'd have approximated it:

0.   1/3 * 0.00729^2  +  1/3 * 0.02048^2  +  1/3 * 0.03456 ^ 2   vs.   0.03125^2
1.   ~ (2^2 + 3.5^2)/3   vs.   3.1^2
2.   ~ (4 + 12.25) / 3   vs.   9.61
3.   ~ 5.4 vs. 9.6   (/6 * 10 => proportional to 9 vs. 16)
4.   ~ .00054 vs. .00096

"...where step 1 is dropping the first term that's obviously going to end up insignificant, multiplying both sides of everything by 100, and rounding to two significant digits.  I mean, we wouldn't do that in Civilization because we have 'computers' to do the calculation for us, and here you might not do it in a Science! report, but it's definitely fine in my classes."

"And step 4 is dividing again by 10,000 to put that factor back in, because your experimental report should summarize the absolute likelihood of the data, not just the relative likelihood of the data.  Somebody who wants to compare some completely different hypothesis's likelihood, one you didn't even consider, needs the absolute likelihoods to do that - they need the .00054 and .00096 version, not the 9 vs. 16 version."

Permalink

"Imagine that we've got two separate research groups both testing the hypothesis that all-10s solve 2-4-6 with 50% propensity, or alternatively, less than 50% propensity.  They each don't know the other group exists; however, they both use the - hypothetically for this thought experiment - universally standard rule that 'less than 50% propensity' is of course best-modeled in practice by three equally weighted 'probability point-masses' on 0.1, 0.2, and 0.4."

"The first group reports a likelihood of 0.02 on the less-than-50% metahypothesis, and a likelihood of 0.031 on the 50% hypothesis."

"The second group reports the same thing."

"The way we've set up the hypotheses being reported on, we cannot just multiply the two likelihoods together.  The task of combining evidence from different 'published-experimental-reports' is now a big complicated deal requiring us to recheck their original data and redo all their calculations."

"Alternatively, if they had both reported likelihoods of 0.007, 0.02, 0.035, and 0.031, on the distinct hypotheses of 10%, 20%, 40%, and 50% propensity respectively, we could have just multiplied the likelihoods together from both groups, and our ability to accumulate data from across multiple experiments would be vastly simplified."

"Of which it is said out of dath ilan, to those dath ilani children who need to hear it:  Different 'effect-sizes' are different hypotheses."

"That, Carissa, Pilar, is why we can't just have the hypothesis that all-14s have at least five times the propensity of all-10s to solve 2-4-6 in 30 minutes.  We can look at the data and see if that actually happened or not.  But as soon as we try to figure out the exact likelihood that it happened, we are cast into a nightmarish multiverse of different ways the world could be, such that the statement 'all-14s are more than five times as likely to solve in thirty as all-10s' is true about worlds like that, all of which have different likelihoods of yielding the data we saw."

"Like, just on this breakdown, that could be because the chances were .2 and 1.0, or .1 and .5, or .1 and .6, or .1 and .8.  And every one of those hypothetical propensity '2-tuples' will yield a different likelihood for whatever data we saw.  So you'd have to put a prior on their relative odds inside that metahypothesis bucket, before you could calculate the likelihood for the whole bucket."

"And then, actually seeing any data, would update the odds inside that bucket, which would change the likelihood for any future experiments, even if the replicators saw exactly the same data you did."

Permalink

"And that takes us to the principle I wrote earlier on my todo list:"

#2 - Separate experiments are usually supposed to avert 'conditional-dependencies', watch out for when that isn't true

"What I mean here is that - when you are otherwise doing things correctly - it should usually be the case that, for the likelihoods that the 'published-experimental-report' is summarizing for different hypotheses, if some replicators came along and did their own experiment, their likelihoods should be something they can calculate independently of your data.  It shouldn't be the case that they have to look at your data, to figure out the likelihoods given their hypotheses."

"This in fact is the property that lets us compute the joint likelihood of a hypothesis across two experiments, by multiplying the likelihoods together from the separate experiments.  Symbolically:"

P( data from first and second experiments ◁ the hypothesis )
= P( data from 1st experiment ◁ the hypothesis ) * P( data from 2nd experiment ◁ the hypothesis)
    if and only if
P ( data from 2nd experiment ◁ the hypothesis ) = P( data from 2nd experiment ◁ the hypothesis & data from 1st experiment)

"When you say, 'maybe all-14s have 60% propensity to solve in time, and all-10s have 10% propensity to solve in time', you're describing a way reality can be, where the likelihood of my found pattern of YES and NO responses, if that's true, is just the same no matter what data you found in your own experiment.  Maybe your data made that world look incredibly improbable, but that doesn't matter; I can still answer the question of how likely my data would be, if that world was the case, without looking at your data."

"When you say, 'maybe all-14s have at least five times the propensity of all-10s to solve 2-4-6 in 30 minutes', that is a way the world can be; but it's a way the world can be, where calculating the likelihood of my data in that world, requires me to make up a bunch of prior-probabilities, and then those probabilities change depending on the data that you got."

"Which makes it immensely complicated to quickly look over the summaries of what different people's experiments said about different worlds, and combine them together into a joint summary of what reality has told us about them all."

"It would, in fact, be possible to combine a lot of little experiments all of which suggested that - if you wrote the summary this way - the data was more likely if all-10s had 50% propensity to solve, versus less-than-50% propensity to solve, and get out a new update that the combined data was more likely if all-10s had less-than-50% to solve.  If you multiplied enough 0.035 likelihoods from the 40%-propensity hypothesis, compared to the 0.031 likelihoods from the 50%-propensity hypothesis, then eventually the 40%-propensity hypothesis would come to dominate the predictions of its bucket, and then its bucket would start to dominate the other hypothesis."

"Which paradoxical-seeming combination, again, doesn't happen if you consider the 40%-propensity hypothesis separately, because then it's clear from the start that 40% propensity is gaining on 50% propensity in each experiment."

"Hence again the proverb:  Different effect sizes are different hypotheses.  Which argues again against thinking that 'all-14s are at least five times as likely to solve as all-10s' is a good way to split up the world for purposes of SCIENCE!  Even though, in terms of 'truth-functional' scaffolds, it is a way the world can be.  It could even be the metafact that is useful and that we're interested in.  We should still ask the Science! question first, what are the exact real effectsizes, and then check the useful metafact afterwards."

Permalink

"Likewise if you started thinking that 'this coin isn't random' or 'this coin is biased to favor Queen' was a good way to describe the hypothesis you were considering.  If two experiments show that the same coin is probably biased to Queen by notably different amounts, they're pointing at incompatible ways the world can be, and something is wrong, some condition has changed between experiments, at least one group is screwing up."

"You definitely wouldn't say, 'Well, our hypothesis was that this coin was biased to favor Queen, and group one spun it a bunch of times and found that it came up Queen 900 times out of 1000, and group two spun it a bunch of times and it came up Queen 520 times out of 1000, and both of those results are instances of 'the coin came up Queen more often than it came up Text', so both confirm the hypothesis that 'the coin is biased Queen', and the experiment has 'reproduced'.  You are actually less confident after two apparent confirmations of your original statement than you were after one confirmation, because in the details of the particular worlds, it's clear that something was wrong with at least one experimental setup."

"But that apparent paradox is just an artifact of bucketing together different ways the world could be, that yield very different likelihoods on the exact data observed, into one metahypothesis of 'this coin is biased to favor Queen'.  If you said instead 'the coin yields 90% Queens' or 'the coin yields 52% Queens', there would be no illusion of the first experimental result agreeing with the second result, there would be no illusion that the result had 'reproduced'.  Fix a local hypothesis, a single effect-size in this case, that makes the data have independent likelihoods between one experiment and another, that fully specifies the likelihood of the data as a matter of logic, and doesn't change when we read other experimental reports.  Summarize the likelihoods for hypotheses like that, and it would be clear that the data from one experiment was compatible with an exact hypothesis, and the data from the other experiment was not."

"Which, uh, yeah, the lesson is, there are these careful precise details about how to do SCIENCE! correctly, and those details actually matter a lot for making your whole Civilization's SCIENCE! output fit together and have the whole thing make any sense.  Even for a small project like ours, it's still probably best to do it that way, if we want things to make any sense."

Permalink

Korva, who was the last to raise her hand for the last exercise, and who has been calculating through another panic attack for the past twenty minutes, has now realized that the horrible feeling of wasting lots of time making errors that she still doesn't understand on stupidly complicated problems was the point of all of this, that this incredibly painful classroom experience has all been an illustration specifically for the benefit of Carissa Sevar about why they shouldn't do things the way she suggested, even though there is no outward indication that the Chosen of Asmodeus herself experienced any distress about it at all at any point.

Korva thinks that, probably, if she were in charge, she would have experimenters report their data, and then everyone who wanted to see how well that data fit some specific hypothesis could CALCULATE IT THEIR OWN GODSDAMNED FUCKING SELVES.

Permalink

There are, even in dath ilan, children who will agreeably acquiesce in doing SCIENCE! the same way everybody else does it, without carefully detailed demonstrations of exactly what grimdark dooms will befall them if they try to make up their own methodologies that violate the coherence constraints.

Tiny Keltham was not one of those children.

 

And besides, even if you report the raw data like a sane person, people do need to know which calculations to do after they've got all the data, right?  There are some local calculations that do tesselate together neatly to global calculations, and you might as well summarize those when they exist.  Which requires you to know how to set up the kind of calculation that modularizes well, and distinguish it from calculations that don't.

Permalink

Willa feels like she's following well, but she did think she was before, and then she still managed to get herself tied up somehow. In front of everyone, as usual.

Though in retrospect she thinks she had it but there was just a lot of confusion about before-chances priors and some miscommunication? But overall she's happy that this particular weird mess of logic is going to stay away from SCIENCE! The bad buckets were hurting it.

She doesn't know SCIENCE(!) well enough to feel protective of it yet, but she expects she will. Except that expecting yourself to be convinced of something in the future doesn't make any sense. So she might as well feel protective of it already.

Permalink

Alexandre is apparently quite good at probability theory, since he was following for all of that! He is therefore happy that his knowledge (a) is growing and (b) will probably continue to grow, since he's not permanently behind everyone and can therefore continue taking the MOST IMPORTANT CLASS IN THE WORLD so he can obtain MASTERY OF REALITY. Alexandre's power and knowledge continue to grow, all hail Lord Asmodeus.

Permalink

"Also, just to be clear, while you wouldn't necessarily get paid very much for it except in very special cases, there is such a thing as just running out and trying a bunch of stuff that you didn't preregister and aren't going to be especially careful about analyzing if you even report the results at all.  That's definitely still 'science', the individual kind, even if it's not 'SCIENCE!' the project of all Civilization.  It is not considered foolish to do noncareful nonrigorous experiments to figure out which careful rigorous experiments to do, before you spend a lot of effort on writing up something to preregister."

"That's, uh, basically what I did the previous day when I was experimenting on boiling acid and having it explode a lot.  I wasn't exactly trying to lay a firm foundation for future Golarion Science!, just poking around rapidly and trying to get a handle on how 'chemistry' worked with magic in the mix."

"This, too, is something of a dath ilani stereotype.  Stereotypically, it sometimes involves an Ione-like person who tries to prevent the maniacal-scientist from accidentally killing himself.  They are even, often, a girl and a boy respectively, and it is not unknown for them to end up romantically involved.  Though I have to ask again, do I really need one of those when I already have several girlfriends and my new universe comes neatly equipped with healing and resurrection spells?"

Permalink

"YES because resurrections are EXPENSIVE and they are MUCH MORE EXPENSIVE when the person has COMPLETELY DISSOLVED THEMSELVES IN ACID."

Permalink

"Oh.  Huh.  I wasn't actually aware that it got more expensive if I completely dissolved myself.  Good thing to watch out for, then."

Permalink

Why is her life like this.

Permalink

Even Nethys can't have been this bad.  He wouldn't have survived to become a god.

Permalink

...there may possibly have been several romantic interests involved in his continued mortal existence.

Permalink

"I"m actually a little surprised there's not more specialization. If I were imagining how to do this I'd have some people write detailed experiment-specifications based on what they wanted to learn, other people follow the experiment-specifications, and then the first people or other specialists do analysis of the results to figure out how to bet."

Permalink

"It's specialized out to the Fourthplanet colony and back again.  There's going to be literally hundreds of people in Civilization who do nothing but talk to the big chemistry outfits to figure out exactly what they need and write up exact experimental specs for experiments involving 'sulfuric acid' in particular."

"Thing is, everybody in that process needs to understand the rules on how the greater Science! process works and why.  You can't write up a good experimental spec unless you know how a replicator will carry it out and what an analyst will do with the data.  You can't be a good analyst unless you know what was really going on in the experiment, what the impact-buyers were originally asking for, and what the report-writers need from you."

"Good specialists don't have to be good generalists, they can be mediocre generalists, but they do have to be generalists.  You don't need to be able to write a pro-level saleable experimental-report before you can hire a writer to do that for you.  You do need to be able to write a terrible, unreadable report that is nonetheless judged by a professional skill-evaluator to contain all of the necessary details."

"There's literally nobody in the process who doesn't need to understand the Law of Probability well enough to know the difference between hypotheses-that-fully-imply-likelihoods and semantically-well-formed-propositions-that-underspecify-likelihoods and word-sequences-that-don't-even-have-clear-truth-functionals-yet."

Permalink

"Remember, I am not in fact, by Civilization's standards, a great experimentalist, a famous analyst, the one who brings the light of cause-selection to the world."

"I'm a teenager what got tossed into another universe, is what I am."

"Everything I'm teaching you is what Civilization makes sure to teach all of the eight-year-olds, so they'll be able to read the newspapers when they grow up."

"I mean, not literally.  Not literally actually.  Not every eight-year-old has read a couple of novels in which the protagonist has to improvise scalable production of sulfuric acid, in which, unfortunately, most of the memorable drama was about safety precautions obviated for us by Resist Energy (Acid).  I know more than the bare basics, in a few places, because I thought they seemed cool to know."

"But definitely everything we've been covering today has been like that."

"There's a saying out of dath ilan, 'Professional specialization is what grownups do when they actually want results.'  But don't go setting your eyes on a specialist's glory just yet.  Let's all get to the eight-year-old generalist level first."

"If, like... we can successfully do even that, without me having actually been a professional specialist in teaching eight-year-old Law of Probability."

Permalink

Keltham's pocketwatch shows he's running out of time on his third and final Communal Share Language (Baseline), which provides something of a helpful natural time limit on the morning's lecture, so he'll now try to race through some of his remaining whiteboarded pending-topics.

Permalink

#3 - If every obvious hypothesis has unexpectedly low 'likelihood' over all the combined data, it means the true theory wasn't in your starting set, often that different experiments had different hidden conditions

Let's say you're testing (only) the hypotheses for 20% propensity-to-solve and 60% propensity-to-solve.

Suppose you test 300 subjects, and 100 of them solve it.

At this point, obviously, they're not going to be multiplying things out by hand any more, and are going to be working with logarithms...

log2 (0.2) = -2.322
log2 (0.4) = -1.322
log2 (0.6) = -.737
log2 (0.8) = -.322

(bonus points, Keltham declares, for anybody who already noticed that 2 entries in this table are blatantly redundant, but he's writing them out anyways for ease of reference)

2s lost, 20% propensity:
-2.32 * 100 + -.32 * 200 = -296
2s lost, 60% propensity:
-.74 * 100 + -1.33 * 200 = -338

Now, if you are a naive six-year-old being led astray by an older child trolling you, you might look at this, and proclaim that the 20%-propensity hypothesis did much better than the 60%-propensity hypothesis, by a likelihood factor of over a trillion.

But really what this data is yelling at you is "The true hypothesis was not in your explicit hypothesis set!"

Why?  Consider about how well the two theories say they ought to do, loss-of-2s-wise, in the worlds where they are actually true:

Expected 2s lost given 20% propensity:
-2.32 * 60  +  -.32 * 240  ~  -2&1/3 * 60 + -1/3 * 240 = -140 + -80 = -220
Expected 2s lost given 60% propensity:
-.74 * 180  +  -1.32 * 120  ~  -3/4 * 180 + -1&1/3 * 120 = -135 + -160 = -295

...which is to say that both hypotheses lost way more than they expected to lose.

This is a sort of hint that tells you to look for a new hypothesis, like "1/3 propensity", say.  It holds even if you get the subtler hint that some hypothesis is doing way better than it expected to do, losing fewer twos than it said it should lose.

There's obviously ways to think about all the possible propensities at once, but Keltham doesn't think he can get to those this morning, given that they don't have calculus yet.

Permalink

There's a similar phenomenon that would suggest two experiments were working under different conditions, even if the data got all mixed together before you looked at it.  Say you were gathering data to find out the average Intelligence in Cheliax, which you expect to be the sum of lots of tiny factors and hence distributed along a central distribution.

Actually, however, your data was gathered for you by a professional data-gathering firm, though, uh, you might possibly have not done a lot of due diligence before hiring them.  They were cheap, though!

This data-gathering firm immediately subcontracted out your job to two even cheaper subcontractors.

What these data-gatherers should have found - at least if the data told to Keltham himself was correct, and a couple of other facts seem to have borne it out - was that Golarion has a mean Intelligence of 10, with a square-root-of-average-squared-deviation-from-the-mean of 2.  (Baseline:  'Deviation' of 2.)

One subcontractor, however, didn't spell-check their survey, and the spelling errors turned off the smarter and more perfectionist people reading it, so their biased sample of respondents had average Intelligence 8 and deviation 1.

Another subcontractor went where it was very convenient for them to find survey respondents, which was, it turned out, people standing in line to apply to a wizard academy.  That subgroup had average Intelligence 12 and deviation 1.

If both datasets are completely mixed together before you get them, when you compute the average, you'll find it's around Intelligence 10, and the deviation... will not be exactly 2, but it will be around 2.

But the hypothesis "This is a central distribution with average 10 and deviation 2" would predict that 10 is the most likely Intelligence score you can find.  Intelligence-10s will actually be relatively rare if your distribution is the sum of two subdistributions with deviation 1 and averages 8 and 12.  6% or so of subjects will have Intelligence 10, instead of 38% as the hypothesis predicted.  You don't need to notice that particular deficiency by looking at Intelligence-10 subjects specifically.  It'll show up in the combined likelihood of all the data being much lower than expected, even if the whole thing is calculated by a 'computer' that wasn't 'programmed' to detect that exact kind of anomaly.

You can calculate what kind of score you'd expect to get, if any of your hypotheses were actually true, and if all the hypotheses score much lower than they expect, they're all - in Baseline colloquialism - 'stupid with respect to the data'.  This doesn't always happen when different experimenters are working under secretly different conditions and measuring actually different phenomena, it is not always obvious just from the likelihoods especially if you mix all the data together before checking it, but it is an example of a pattern suggesting that the true hypothesis wasn't anything you were considering.

One should always keep in mind, though, that the 'fair coin' hypothesis never looks stupid no matter how much pattern it's missing out on.  If you spin a coin 1000 times, and it comes up Queen 1000 times, the fair-coin hypothesis expected to lose 1000 twos and that's exactly what it loses.  In a case like that, you have to think of the specific better hypothesis - 'this coin has a 100% propensity to Queen' - or perform some more general test that implicitly takes into account the possibility of lower-'entropy' hypotheses like that - before you can see the problem.

If it's just never occurred to you that coins might be biased, if you haven't invented any tests to detect biases, then contemplating the fair-coin hypothesis alone is not going to show you that hypothesis doing any more poorly than it promised you it would do.

Permalink

#4 - How to specially process the special meta-hypothesis 'all-other-hypotheses'

Okay so according to his pocketwatch Keltham has two minutes left to tackle this one before Share Language runs out, and that is not really a lot of time for what is actually the deepest question they've come across so far.

There are always better hypotheses than the hypotheses you're using.  Even if you could exactly predict the YES and NO outcomes, can you exactly predict timing?  Facial expressions?

The space of possible hypotheses is infinite.  The human brain is bounded, and can only consider very few possible hypotheses at a time.  Infinity into finity does not go.

The thing about all the possible hypotheses you're not considering, though, is that you are not, in fact, considering them.  So even if - in some sense - they ought to occupy almost-1.0 of your probability mass, what good does it know to do that?  What advice does it give you for selecting actions?

And yet there is advice you can derive, if you go sufficiently meta.  You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example.  That test is not motivated by any particular hypothesis you already did calculations for.  It is motivated by your belief, in full generality, in 'the set of all hypotheses I'm not considering'.

All that Keltham can really say, in the thirty seconds remaining according to his watch, is that in the end people don't usually assign an explicit probability there.  They steer by the relative odds of those models they actually have of the world.  And also put some quantity of effort into searching for better hypotheses, or better languages in which to speak them, proportional to how much everything is currently going horrifyingly wrong and how disastrously confused they are and how much nothing they try is working.

And also you'd maybe adjust some of your probability estimates towards greater 'entropy' if anybody here knew what 'entropy' was.  Or adjust in the direction of general pessimism and gloom about achieving preferred outcomes, if you were navigating a difficult problem where being fundamentally ignorant was not actually going to make your life any easier.

Permalink

Well, that's it on the Communal Share Language (Baseline).  Time for lunch.  Not one of his best lectures ever, but not all of his lectures can be.

He did have three extra hours left over on that last Communal Share Language, though, and gave those extra hours to Korva Tallandria, Willa Shilira, and Alexandre Esquerra, if they happen to want to ask him any questions over lunch that could benefit from their being able to still speak Baseline, or if they want to go off and think on their own or review their notes in a way that uses Baseline.  They have priority on sitting next to him if they so choose.

There is a discipline out of dath ilan of learning to optimize reality first and appearances second.  Yes and indeed, appearances cannot be neglected in human interactions, especially commerce.  But if you want to look competent, one should first ask 'How can I actually become competent?' and then 'How can I signal that real competence in a way that's hard to fake?'  Any other pathway is one where you're just getting into an arms race between trying to fake how things look, and other people knowing that and not being stupid and guessing what you're faking.

At least, so it is in dath ilan.  In Golarion, Keltham gets the impression, there are more complicated things going on.  Complicated silly things.  It's not going to be like that around him.

Yeah, asking your stupid questions can make you look stupid.  It doesn't make you actually any more stupid.  Asking your stupid questions does, in fact, make you actually more competent.  And benefits Keltham in his teaching, by giving him any idea of what anyone is thinking.

Keltham is not particularly under the impression that the people who didn't ask any questions, if he could read their minds, were not keeping many equally stupid questions to themselves.  If he could read their minds, he would update in a predictable direction, so he is just going ahead and updating that way now; he is reminding his emotions as well as his thoughts to update that way.

Whatever it is that would make silence somehow seem like more a good look in regular Cheliax, in a way that's a stable equilibrium of incentives - maybe because what you're really showing off is your prudence or your self-discipline? - it's not a signal that Keltham knows how to read off from their quiet and their controlledly cheerful expressions.

Welcome to Civilization.  The equilibrium of your incentives here will not be silence.

Permalink

 

...okay, there may be a problem here where the most vocal question-askers were thus selected to be bad at being a normal Chelish person.

Permalink

"Proposal for next time: we oblige everyone to attend class with shadowy hoods that conceal their faces and distort their voices, so they can ask stupid questions in perfect anonymity."

Total: 355
Posts Per Page: