Comments on Rationally Speaking: Provably Nonsense, Part II

And just to clarify, I don't mean that you'...

2010-01-14T15:57:31.180-05:00

And just to clarify, I don't mean that you're completely wrong. But surely other people have made good points that clash with the points you've made...

Jonathan, well put. I think the problem of the un...

2010-01-14T15:56:54.331-05:00

Jonathan, well put.

I think the problem of the unsympathetic reading of post-modernism extends beyond the initial blog post into Julia and Massimo's responses in the comment thread. The problem isn't just the first unsympathetic reading, it's the fact that y'all are completely uninterested in what anyone else has to say.

What's the point of communicating your ideas if you (think you) are never wrong?

ClockBackward, "It is absolutely impossible ...

2010-01-14T15:33:57.755-05:00

ClockBackward,

"It is absolutely impossible to generate a text using a random process that truly has maximum entropy, and have the text have intentional meaning."

Of course! This seems like a tautology to me. How could any random process generate a text with "intentional" meaning? At best, a random process could generate accidental meaning, but as you say, that's not very likely. But what if the process isn't random? What if it's pseudo-random? Here again, you seem to be assuming the very fact we're trying to test for. Or is there something I'm misunderstanding about the term "random process"?

"I agree with you in the sense that, for a given fixed estimation algorithm, it may be possible (though extremely difficult and tedious) to design a meaningful text that appears to that algorithm to have maximum entropy."

I'm glad I'm not entirely off-base then. But would it really be more difficult and tedious than composing, say, a sestina or a sonnet cycle? Many poetic forms involve manipulating the entropy of the text, usually lowering it by using predictable rhyme or meter, repeating words, and so on. Moving in the opposite direction might actually be easier. This seems like a proposition that would need to be empirically tested.

In any case, I take your point to be that an unusually high-entropy text would be more likely to be meaningless, in which case this test would provide evidence, but not quite proof -- which was, after all, Julia's original ambition.

Hey, exactly what I thought, and I do not even hav...

2010-01-14T14:08:45.368-05:00

Hey, exactly what I thought, and I do not even have a whiff of education in information theory. Thanks for the explanation, CB.

Hi Scott, In response to your criticism of my expl...

2010-01-13T22:45:33.514-05:00

Hi Scott,
In response to your criticism of my explanation of why maximum entropy per word texts do not have intentional meaning:

"Your three propositions read, to me, as an argument that a randomly-generated text will not have much meaning. I agree! The question is whether it is possible to non-randomly generate a high-entropy text with meaning. This would be a "pseudo-random" text."

Unfortunately, responding to this requires delving a little bit into the subtleties of information theory.

While it can be convenient to use the shorthand phrase "the entropy of a body of text", you cannot actually measure the entropy of text, you can only measure the entropy of a probability distribution, or more generally, of a random process (for example, one that generates words, one at a time). What you can do though is use a text to estimate the characteristics of the random process from which that text was created, and then use that knowledge to estimate the entropy of the random process (which, for convenience, we'll call the "entropy of the text").

If the random process from which a text is drawn has maximum entropy, then the text really was generated uniformly at random, and the generator of that text had amnesia (or might as well have had it) in the sense that each word written required no knowledge of the words written beforehand. It is absolutely impossible to generate a text using a random process that truly has maximum entropy, and have the text have intentional meaning.

However, as mentioned, since all we can do from any particular text is ESTIMATE the random process used to generate the text, at best all we could produce is entropy per word estimates (hopefully with confidence intervals). This procedure would require a choice of estimation algorithm (used to estimate the entropy of the underlying process from the text itself), and hence we might get slightly different results depending on our choice of algorithm. Fortunately this may well still be sufficient to perform a test like Julia proposed.

I agree with you in the sense that, for a given fixed estimation algorithm, it may be possible (though extremely difficult and tedious) to design a meaningful text that appears to that algorithm to have maximum entropy. But this is not actually as damning as it seems. If the text under consideration is very large, and the estimation algorithm used is good, it is EXCEEDINGLY unlikely that a genuine English text created with intentional meaning would produce an estimated entropy per word very close to the maximum possible (even if it is indeed hypothetically possible). For that to happen, the author's use of each word would have to appear to be independent of the words that came before that word, which is (approximately) never true of texts with intentional meaning.

Hence, Julia’s test as I interpret it (when made specific and formal), is a one way, statistical test for very large texts (or bodies of them). If a text produces an estimated entropy per word that is almost maximal, then you can conclude the (the majority of) the text was almost certainly not created with intentional meaning (though a few intentionally meaningful sentences could always have been thrown in here or there, and perhaps an incredibly crafty foe could have purposely defeated your entropy estimation algorithm on purpose). On the other hand, if the test shows that the estimated entropy per word is similar to that of other large texts drawn from sources known to have intentional meaning, then there is not much that can be concluded (hence the reason I said the test was “one way”).

ClockBackward: "The technical details of ho...

2010-01-13T16:53:44.684-05:00

ClockBackward:

"The technical details of how one would actually [normalize entropy to account for grammar], and of how one would estimate the actual entropy of the process underlying a large text... are tricky and would require some serious thought."

Yes, that's the point. The obvious cases on the ends of the spectrum are insufficient to prove anything about the messy middle. The trvial disproof I wrote earlier still holds against this grammar restriction, with minor modification:

Take any "meaningful" text, and swap all instances of a given word or phrase for another which does not violate grammar rules but renders the text obviously "meaningless". Replace "text" with "camel" in your own post, for instance. Now we have two texts, one "meaningful" and one "meaningless", with identical entropy.

-----

ppnl: "This verbal excess reduces the entropy of the text. It has no effect on meaning." I disagree, on the grounds that because text is descended from spoken language, cadence, rhyme, emphasis, cadence, repetition and other such characteristics which are not strictly information-carrying ("information" here in the conventional sense) still affect how the text will be received by a reader. I know this has nothing to do with the main thrust of the argument, but I couldn't resist a quick defense of rhetoric. Please excuse my pedantry. :)

-----

Jonathan: "There isn't any short-cut to figuring out if a text is meaningful, let alone important." Yep.

ClockBackward, It seems to me that your argument ...

2010-01-13T13:14:56.147-05:00

ClockBackward,

It seems to me that your argument is moving in the wrong direction. Your three propositions read, to me, as an argument that a randomly-generated text will not have much meaning. I agree! The question is whether it is possible to non-randomly generate a high-entropy text with meaning. This would be a "pseudo-random" text.

1) "...each word that was chosen while generating the text is independent from the words that were just written before..."

I assume you mean statistically independent, right?

"...it is as if the author got amnesia after writing each word (or, might as well have)..."

Quite so. But say the author didn't get amnesia. Let's say (for argument's sake) that the author has decided to defeat our vacuity-testing algorithm. After every word, she gets of a list of those words that are statistically independent of the previous words she has written and intentionally choses one that generates a meaningful sentence.

2) "Since words are independent of each other, that implies that the author would have been just as likely to write the given text with the order of its words completely scrambled, as he would be to write the original text."

Again, I agree -- if the text is randomly generated! But you still haven't ruled out the possibility of a high-entropy text being generated in a non-random way.

3) "How intentionally meaningful could a writer's work be if the writer could be effectively replaced with an incredibly simple algorithm that just strings together words at random?"

Again, this is a question-begging formulation. There's no reason to imagine that the writer could be effectively replaced with a random algorithm. What we have here is a situation in which, yes, there are many possible random orderings of words, but a few are meaningful as a whole, and our author manages to find one of them.

The matter of English being inherently redundant complicates these issues somewhat. But I'm still fairly sure that the process I've described here would generate an English text that would approach the theoretical maximum entropy for an English text.

People might feel that the constraint of having to choose only statistically independent words would be too severe to allow for meaning, but I would be surprised if that were so. There's a novel (written first in French, translated into English as A Void) that contains not a single 'e'! I understand it reads quite smoothly.

My final remark is on statistical independence. As I understand it, that's how entropy is measured. So the sequences generated by pseudo-random number generators are still high-entropy. The difference is that they have low algorithmic entropy, which is a horse of a different color altogether.

2010-01-13T12:18:05.970-05:00

This comment has been removed by the author.

Jonathan, Dude I think you have nailed everything...

2010-01-13T08:49:46.602-05:00

Jonathan,

Dude I think you have nailed everything that needs to be said on this subject.

Scott, You should be skeptical about intuition ab...

2010-01-13T08:32:01.614-05:00

Scott,

You should be skeptical about intuition about information and entropy. Fortunately we don't need intuition about them. We have exact mathematical definitions. And the implications of those definitions is very counter intuitive.

The problem with meaning is that we have no quantitative definition. All we have is intuition and that is often very very wrong.

Machine code is an example of very high entropy information. If by meaning we mean the complexity of the logical connections that determine what a given bit does then arguably machine code is much more meaningful than any English text of a similar size.

But then a random string of binary digits would have at least as much information. But meaning? I just don't see correlation between entropy and meaning.

If anything I would argue that pomo texts have lower entropy than expected. Take the example:

"...neither stable nor unstable, but rather metastable..."

You have "stable appearing three times with two different prefixes and some connecting words. You could replace the whole mess with one word "metastable" and the sentence would have the same meaning. If it means anything at all. This verbal excess reduces the entropy of the text. It has no effect on meaning. And this kind of verbal excess is pretty much what defines pomo texts.

Another way to think about it is to think of a pomo text as a verbal ink blot used in a kind of linguistic Rorschach test. You don't need to draw a high resolution picture of something specific. You only need to use a little bit of information and a great deal of style to make it seem like you said something profound.

Yet I don't think low entropy is diagnostic of meaningless crap any more than high entropy is. You can say something meaningful with low entropy text. It just takes longer.

Jonathan, while I agree with you that "postm...

2010-01-13T08:30:02.738-05:00

Jonathan,

while I agree with you that "postmodernism" is a vague term, frankly so is "analytic philosophy," and one can find gibberish, or at least irrelevance, in both camps.

But I don't think it is fair to say that there is *no* postmodernism, or that some authors aren't clearly more representative than others of that way of thinking, or that some of these authors aren't quasi-nonsensical and/or irrelevantly obfuscatory.

Yes, Foucault has written plenty of interesting things. He has also written things that make little sense, and he has done both using language that rarely should be seen in a philosophical essay.

As Witty put it: "Philosophy is a battle against the bewitchment of our intelligence by means of language." (Of course, he himself was rarely a good example of clear language, but that's another story...)

(…continued from last comment) 1. Since each wor...

2010-01-13T00:49:20.646-05:00

(…continued from last comment)

1. Since each word that was chosen while generating the text is independent from the words that were just written before, it is as if the author got amnesia after writing each word (or, might as well have). How in any intentionally meaningful writing could it possibly be the case that the next word an author writes does not require knowledge of the word that he just wrote beforehand?

2. Since words are independent of each other, that implies that the author would have been just as likely to write the given text with the order of its words completely scrambled, as he would be to write the original text. But since word order genuinely matters in languages, and since most word orders lead to gibberish (try scrambling the word order of a few English sentences), that means that the author would be just as likely to write any particular gibberish reordering of the given text as they would be to write the original text itself. But since there are many more gibberish reorderings than meaningful ones, we should expect the author's process to generate mostly gibberish.

3. Since the author was essentially selecting words uniformly at random, they were as likely to write any particular sentence as they were to write any other particular sentence. How intentionally meaningful could a writer's work be if the writer could be effectively replaced with an incredibly simple algorithm that just strings together words at random?

Hopefully, you now believe me that if a large real world English text truly had maximum entropy per word, it would indeed be meaningless (just like maximum entropy Moof texts). On the other hand though, maximum entropy per word English also violates the rules of grammar. To use such a technique to evaluate grammatically correct (but possibly still "meaningless") texts the technique would need modification. One would have to somehow normalize the entropy to deal with the extra structure imposed by grammar. Then, you would be measuring how much entropy a text has compared to that of grammatically correct (but not intentionally meaningful) text. One thought about how to do this is to compare the entropy per word of the text under consideration to the entropy that would be achieved if you generated random grammatically correct sentence structures (according to the frequency with which they occurred in the text) that had words with the proper parts of speech placed within them like a madlib (according to the frequencies with which each of these words occurred in the text, but without ever considering what came before a given word). The technical details of how one would actually pull this off, and of how one would estimate the actual entropy of the process underlying a large text (i.e. essentially the entropy per word of the text... I've been sloppy with terminology throughout this comment for convenience) are tricky and would require some serious thought.

Now, on the flip side, why would a 0 entropy per word Moof (or English) text be uninteresting? Well, if according to person X a text has 0 entropy that implies that before reading each word, X can predict exactly what that word is going to be. That doesn't imply that the text is truly meaningless (in the sense of actually having no meaning), but it does mean that it conveys no information to X (it tells X nothing that X didn't already know). Essentially, this is only the case if each word in the text is completely determined by the words that came before it (from X's perspective). A real world example of this would be if the text consisted of a poem that person X already knew by heart.

I hope that this helps to clarify things!

Hello everyone, I'm a mathematician who has sp...

2010-01-13T00:46:59.774-05:00

Hello everyone, I'm a mathematician who has spent some time studying information theory in the past, and I wanted to share with you my thoughts about this post. Unfortunately, I think a number of commenters have misunderstood Julia's proposal. I would like to try to clarify (for anyone who is interested) why texts with maximal entropy, and texts with 0 entropy are both in some loose sense "meaningless" (in the latter case, "informationless" would be more accurate), in accordance with her claims.

Real world languages are tricky to deal with for a variety of reasons, so for illustration purposes I will limit myself to a fictitious language called Moof, which contains only ten possible words (in other respects though, I will assume it is like other ordinary languages).

Now, let's say that we analyze a very large text written in Moof (of, say, a million words) and discover that this text has (to close approximation) maximal entropy per word (i.e. the number of bits we learn about what is in the actual text with each word that we are shown is as large as it can possibly be). In this case, that implies that the text was (to close approximation) generated by picking words one by one completely (uniformly) at random, and that each new word was selected (to close approximation) without regard for what words came before it. The reason that this is so is because the only random process (outputting a word at each time step, say) that has maximum entropy is one that assigns equal probability to each word, and for which each word is independent of what came before. Not that this would require a decent method (for very large texts... its essentially impossible to do well for small texts) of estimating entropy per word (of the underlying generating process).

Okay, but what does that REALLY mean? Does it mean that every sentence in the text is meaningless? No, absolutely not, because even randomly generated sentences occasionally have meaning. In fact, it may even be the case that a great many of the sentences are meaningful to human beings (perhaps as bizarre, poetic statements, like "lizards free frantic dead"). The important conclusion that we can draw from the fact that it has maximum entropy per word is that the text (taken as a whole) is not comprised of intentional meaning. There are a few ways one can think about why this is the case:

(see my next comment for continuation…)

It is very odd for someone trained, as I was, as a...

2010-01-12T23:32:32.950-05:00

It is very odd for someone trained, as I was, as an analytic philosopher, to be in the position of "defending postmodern philosophy." And the funny thing is, I'm really not. I have no temptation to read and try to under Deleuze - I agree with ppnl that my time is limited and valuable, and I'm not going to use it trying to understand texts that seem to me to be unclear and confused.

So I'm not arguing that pomo philosophy is coherent, or meaningful, or deep, or anything like that. Rather, I'm suggesting that there really isn't any such thing as "post-modern philosophy" and that lumping together a bunch of french post-structuralists, a bunch of strong-sociology of science people, and a bunch of random philosophers & historians who you happen to disagree with, is intellectually dishonest.

And again, not everyone tarred with the "post-modern" brush deserves the criticism. Is Baudrillard often a bit of a prick? Sure. Does he argue for points that are neither trivial nor obviously false? Sometimes at least, yes. Lyotard? Similarly. Habermas? Not only isn't he pomo, I think he's a full-blown modernist. I may not love his writing style, but he's making clear, focused, clear arguments (his exchange w/ John Rawls is illuminating in this respect) Foucault? As I already mentioned, lots of arguments that are for conclusions that are neither trivial nor crazy.

The main point I want to make is that there isn't any short-cut to figuring out if a text is meaningful, let alone important. I am inclined to agree that if you make a 'good faith' effort, and no one can convince you that something important is going on (and they are making a good faith effort), there probably isn't anything important going on. But that's not a "test" of meaningfulness.

Sigh.

Jonathan

Scott, First, most of my last comment wasn't...

2010-01-12T14:58:08.949-05:00

Scott,

First, most of my last comment wasn't directed completely at you. More to Dr. Pigliucci than anybody.

And I agree with you that high entropy sentences deliver more information (the have more uncertainty) but whether those sentences are meaningful has nothing to do with information theory.

"Perhaps you could say that "THTHTHTHT," placed in the correct context, is "about" the coin; I think that might be accurate. "

But how is that different than any other text (in English or any other language). In the sequence "HTHTHT...", "H" means "I flipped a coin and it landed heads" and "T" means "I flipped a coin and it landed tails." But this is no different than what we do in the English language. In the sentence "The tiger is running", "tiger" means "a mammal with orange and black stripes and a tail and ...". You could go on and define all the other words too.

I guess I just don't see the distinction you are making.

ppnl: "Until you define meaning the question...

2010-01-12T14:33:02.092-05:00

ppnl:

"Until you define meaning the question has no meaning."

I half-agree with your above post, but I don't think "meaning" is the problem. Although it's hard to pin down precisely, I think we have pretty good intuitions about what "meaning" is: a meaningful sentence is a sentence that is about something else -- a sentence that has a definite referent.

That's not to say that we'll all agree on whether a given sentence is meaningful! But that's because meaning is very context dependent.

I'm more skeptical of our intuitions about information and entropy...

Eric: I'm actually quite aware of Shannon...

2010-01-12T14:24:27.899-05:00

Eric:

I'm actually quite aware of Shannon's opinions on the subject, and I agree with both him and you: the measure of the entropy of a sentence is independent of its meaning. What I'm saying is that there's a case to be made that a high-entropy sentence can deliver more information, and therefore potentially more "meaning" -- as long as all the information delivered is meaningful.

It's like saying a high-bandwidth connection can deliver more content, faster, although of course it can also just deliver more noise.

Still, a low-entropy sentence can carry some meaning.

My objection to your heads/tails example had more to do with the fact that the string "THTHTHTH" isn't really about anything, in the way we think of meaningful sentences as being about other things. Perhaps you could say that "THTHTHTHT," placed in the correct context, is "about" the coin; I think that might be accurate. So in that sense, a low-entropy sentence could have meaning.

But in your original example you used "meaning" in a slightly different, and I think rather more metaphorical, sense -- as in: "The palm trees are swaying, which means it's windy." I think that's a more metaphorical use of the term -- related, certainly, but not identical to the use we're talking about here.

So when Deleuze talks about singularities that &qu...

2010-01-12T09:38:46.747-05:00

So when Deleuze talks about singularities that "possess a process" and of "differences" being "distributed" in an "energy," those are arrangements of words which you would not normally see in coherent English writing.

Julia, this proposal is, for philosophy, downright dangerous. That is because--assuming it does what you say it will do-- it will privilege as "meaningful" only those texts which literally say the same sorts of things that previous texts have.

Forget the "pomos": suppose an author comes along who thinks that our patterns of speech are inculcating false and contradictory sets of ideas in our minds. The author concludes that a new way of speaking is precisely what is needed. Such a work would undoubtedly count as relatively meaningless under this mode of analysis.

Yet, relative to its own conventions and definitions, it may well be both incredibly rich and philosophically fruitful.

For this reason alone, I think the idea that philosophy should be shackled to linguistic convention is badly mistaken.

From the sidelines, I notice that the arguments ag...

2010-01-12T09:20:28.743-05:00

From the sidelines, I notice that the arguments against high correlation of info entropy and meaninglessness appear to be stronger than those in favor. As expected.

I guess Julia made a mistake in mentioning Shannon...

2010-01-12T09:08:53.160-05:00

I guess Julia made a mistake in mentioning Shannon and information theory without writing something along the lines of "I know that this is not the same, but it might be comparable" in flashing, bold letters. Maybe she does not know enough about information theory, fine, neither do I, admittedly, but again, please consider carefully the following things:

1. This is about words, not about letters.

2. This would be a one-way test (however you say that in English, technically) allowing you to say "this is definitely noise", but if it does not show up as noise, it could still be noise, only you cannot show it with this test. There are other examples in "real science", where you can say some successful test shows a certain genetic structure to be hybridization, but if the test fails, it can still be either hybridization or incomplete lineage sorting. So all who say that this would not be decisive: yes, we know, but at least it would be in one direction.

3. And yes, in one direction it would be. We are not talking about a hypothetical number of bits that could be compressed, we are talking about an English (or French) text. This whole thread about compressibility is irrelevant. This is clearly a less meaningful text than this one, and the same goes at the other (in this discussion more interesting) end where we only have randomly selected terms stuffed into an obfuscating sentence, and this would have to show up in word entropy if the text sample is sufficiently long*. The question whether you can compress English or whether a toss of a coin is meaningful under certain circumstances simply does not apply, because we would be comparing different uncompressed English texts supposedly meant to convey ideas.

* For the decision on what is long enough, I would suggest a calibration with saturation curves selecting random parts of varying lengths from a larger text to see at which lengths the predictability is close enough to the overall value for the text in its entirety. It seems obvious that one paragraph is not long enough for that kind of analysis.

Massimo, English text can be compressed by about ...

2010-01-12T09:03:05.023-05:00

Massimo,

English text can be compressed by about 60%-70% if I remember correctly. Thats pretty substantial.

You said:

--"I find the claim by ppnl that entropy has *nothing* to do with meaning hard to swallow. After all, entropy is related to information, are we going to say that information and meaning are unrelated?"--

Until you define meaning the question has no meaning. You feel that there must be a definition to the term meaning out there for us to discover. But you are being Platonistic here. Meaning is a word that we define for some purpose. There may be no useful definition that fits your intended purpose here at all. Or maybe there is. Unless you invent such a definition the question is ill-formed.

You depend far to much on intuition for things like meaning, information, data processing and such. This is where philosophy goes wrong. Without precise definitions we can't even agree on what we disagree on.

Finally clicked last night as I was trying to fall...

2010-01-12T06:09:45.043-05:00

Finally clicked last night as I was trying to fall asleep.

Key error: The entropy under discussion here is a measure of the predictability of word variation. This has nothing to do with which words are being used.

Simple illustration: Take any English text. Transform it by replacing each letter with the one that follows it in the alphabet (g->h, h->i, z->a, etc). The text has exactly the same entropy before and after transformation, but completely different meaning as English.

Yes, I think information and meaning are unrelated...

2010-01-12T01:42:29.639-05:00

Yes, I think information and meaning are unrelated. I think so because entropy is a purely intrinsic measure. The entropy of a string of symbols is always the same under any circumstance, but the meaning of a sentence depends on circumstances. For example, if I say "shut the door" to you, I probably want you to shut the door nearest to you, be it the door of an oven, a car or an apartment; and if there's no door anywhere near you, then I've uttered a meaningless phrase, or at most a metaphor. But under all those circumstances the entropy of the phrase "shut the door" remains the same.

I'm realizing, though, that this kind of reasoning calls for careful consideration of the term "information." We've been using "entropy" and "information" somewhat interchangeably, but maybe what Massimo means by "information" in the above post isn't the same thing as entropy. The Stanford Encyclopedia of Philosophy has something to say about the distinction I'm talking about, although in my opinion it's not their best article overall.

Scott, "Low entropy texts can't contain...

2010-01-11T23:39:19.153-05:00

Scott,

"Low entropy texts can't contain a lot of meaning, because they don't contain a lot of information. "

Meaning is irrelevant to information theory. Do I really have to quote Shannon on this? Very well. This is from the 2nd paragraph of his seminal article:

"The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem."

You will find absolutely no mention of meaning in the standard text books on information theory (Cover and Thomas, for example). You guys and gals can protest all you want, but it just shows your ignorance of the subject.

Shannon information theory is concerned with the transmission of signals. The content of those signals is irrelevant to information theory.

Should we even be talking about meaning without ha...

2010-01-11T22:54:48.472-05:00

Should we even be talking about meaning without having established a concrete definition for it?