Mind - Brief Introduction to the Philosophy of AI

Mind - Brief Introduction to the Philosophy of AI

Overview: What is AI? What are the issues?

What is artificial intelligence? Seemingly: an intelligence, an intelligent being, that is artificially created. The question is: could there be such a thing? Could we build an intelligent being?

Why is this question interesting? Simply this: if we could build an intelligent being, presumably we would know how it worked, and so we would have explained, demystified, intelligence - which is certainly one of the most remarkable aspects of mentality.

However, the question “Could an intelligent machine be built?” isn’t very interesting. If any form of materialism is true, then we are intelligent machines: we are composed of molecular machinery. It is certainly possible to build molecular machines like us - even if we lack the technology. Machines such as these would be intelligent.

A more interesting question is this: could a computer be intelligent?

The field of “AI” is devoted to this question: researchers in AI spend their time trying to devise computer programs. Their goal: giving computers the ability to do some or all of the things we associate with “intelligence”:

e.g. solve problems, play games, understand and use language, make sense of their environment on the basis of “perceptual input” (e.g. information received from TV cameras)

In his best-selling book The Age of Spiritual Machines (1999), Kurzweil suggests the following timescale for AI over the next few years:

Computers are far below humans in abilities, but can nonetheless perform a large number of important functions.
A $1000 PC can perform a trillion calculations per second. Most routine business transactions take place between a human and a virtual personality. Traditional classroom teaching still exists, but intelligent courseware is a common means of learning.
A $1000 computing device is now equivalent to the computational power of the human brain. Computers are now largely invisible, and embedded everywhere. Communication with computers is largely through gesture and spoken word. Hi-res virtual reality enable people to do virtually anything, with anybody, anywhere. The vast majority of transactions involve simulated personalities. Automated driving systems are installed in most roads. Virtual artists with their own reputations are emerging in all the arts. There are widespread reports of computers passing the Turing Test.
A $1000 unit of computation has the power of about 1000 human brains. Implants in eyes and ears link the human user to the world-wide computing network. High-bandwith direct neural pathways have been developed, to enhance perception, interpretation, memory and reasoning. Computers now learn on their own, and have absorbed all human arts and sciences. There is a growing discussion about the legal rights of computers, and what constitutes “human”. Computers routinely pass valid forms of the Turing Test. Machines claim to be conscious; these claims are largely accepted.

2099 There is a strong trend toward a merger of human thinking with the world of machine intelligence. There is no longer a clear distinction between humans and computers. Most conscious entities no longer have a permanent physical presence. The number of soft-ware based humans vastly exceeds those still using native neuron-cell based computation. Life expectancy is no longer a viable term in relation to intelligent beings.

The philosophical implications are clear: if a properly programmed computer can reason, possess intelligence, then it may well be that our intelligence is a product of the same sort of processes that go on in computers - maybe our brains (or minds) are computers, running programs. Perhaps our reasoning is a form of computation …

- so it may well be that the success of AI would demystify intelligence: we would know what intelligence requires

- on the other hand, if it can be shown that AI is bound to fail, then we also learn something about intelligence: something other than computation is required.

A host of issues: to understand the debate surrounding AI properly a variety of issues and arguments have to be considered, e.g.

(a) The actual progress in the AI field is relevant, however, some people have argued on a priori grounds that computers cannot possibly be made intelligent, others have argued on a priori grounds that computers can be intelligent. These arguments need to be considered.

(b) A reasonably clear understanding of “computer” and “computation” is required. It turns out that there are different sorts of computer, and the “programs” these run, the sorts of computation they do, are quite different.

- relevant here is the dispute within AI between “classical” or “good old fashioned AI”, and the “connectionists”

- a “classical” computer is a device like a typical PC; a connectionist computer is a cluster of interconnected neuron-like nodes, which works in a very different way

The issues: (a) how are these two forms of computation similar/different? (b) which form of computation is most likely to be capable of intelligence?

A point to note: the question is not “Is an artificial mind possible?”, nor is it “Could we build an entity that is conscious?” - the question is solely about intelligence.

- intelligence is an ability - to come up with fast and effective solutions to a wide range of problems - which may not involve or require intelligence

- it may well be that a zombie could be intelligent without being conscious

The question of whether intelligence requires consciousness is an interesting one. Most philosophers these days say it doesn’t - no one has yet put forward a compelling case for thinking that you have to be conscious to be intelligent. Perhaps one day they will … But: there are two separate issues here which must be kept separate.

- so don’t argue like this “AI is impossible because a machine or computer wouldn’t be conscious”

- or, only argue like this if you are prepared to argue like this: “The reason why AI is impossible is that a machine or computer wouldn’t be conscious, and I have a good argument for the claim that intelligence without consciousness is impossible.”

____________________

Before getting down to details, it is worth looking at one well-known issue connected with AI ….

The Turing Test

Nearly everyone who has heard of AI has heard of the Turing Test. The general idea: if a machine passes the Turing Test, then we would be justified in taking the machine to be intelligent.

- the test derives from a paper by the AI pioneer, Alan Turing;

- Turing realized that there was a lot of controversy concerning words such as “think”, “reason”, “intelligence”

- so clarify the situation, Turing proposed to replace the question “Could a machine think” with the question “Could a machine pass the Turing Test”

Simplifying somewhat, a machine would pass the Turing Test if it could pass itself off as a normal human being more than half the time

- “pass itself off” doesn’t mean “look and act like a human” - it’s easy to tell that a PC isn’t a human being; more generally, we’re interested in the possibility of machine minds rather than the possibility of androids

- so, to “pass itself off” as human, the machine will communicate via screen, and you talk to it via keyboard

Running the Test involves having conversations with the machine, and with humans. You can ask what you want: you can tell jokes, inquire about political opinions, ask for verdicts of poets and novelists - nothing is ruled out.

If the machine’s “conversation” is sufficiently human-like that people “talking” to it fail to recognize that they are talking to a machine more than 50% of the time, the machine is deemed to have passed: it is intelligent.

- it should be noted that so far no machine has come close to passing the Test; however, when Tests are run, quite a few humans get taken to be machines!

EVALUATION

In thinking about the Test, remember: the question is not “Is the machine conscious” it is “Is the machine intelligent” …

The test can be criticized: e.g. couldn’t a machine be quite intelligent yet have a distinctly non-human personality? If so, it would fail the test, despite being intelligent.

This seems a reasonable point. Yet, on the other hand, if a machine did pass the test, wouldn’t it be reasonable to regard it as intelligent? Arguably, it would.

This suggests the following compromise: passing the test isn’t a necessary condition for intelligence, but it is sufficient.

But even this restricted position is suspect.

Imagine a library (like Borges’ Library of Babel) which contains every possible book (i.e. a book consists of permutation of the letters of the alphabet). Among the books in the library will be all the books containing meaningful 2-party (or 3-party) dialogues between human beings. Every possible conversation is in there, in a book.

Suppose we had a very powerful computer which could contain all these conversation-books in its memory, and quickly find any book it wanted.

Now suppose that you are playing Turing’s imitation game with this machine.

You begin your conversation any way you like (or you allow the machine to begin, with any of the possible conversational opening lines in its vast repertoire - including periods of silence). But suppose you start off. In response to your initial statement, e.g. `Hi!’, the machine quickly finds all the books containing reasonable conversations with this opening line, and randomly selects one: perhaps the next line is:

And Hi to you. How’s your sore head?

In response you say:

What makes you think I’ve got a sore head?

The computer then quickly isolates the billions of books that contain these three opening lines of conversation, and from these many billions, randomly selects one. The fourth line in this conversation is:

I’m sorry, I thought you were my friend Bill, who told me half an hour ago that he had a sore head.

And so it goes. You have a perfectly sensible conversation - no matter what gambit you adopt, no matter what you say, the computer comes up with a sensible, human-like, response.

The computer can pass the Turing test as often as an average human being.

But is it intelligent? Arguably not: all it’s doing is mindlessly, automatically, selecting books from its memory banks which repeat the conversation you’ve had so far and continue it. This requires great speed, but it’s a fully automated process - there’s no thought or intelligence required whatsoever.

Possible objections:

(i) You might say: Yes, but intelligence was required to select the books that go into the machines memory banks! Maybe: but this is wholly irrelevant.

- Intelligence is needed to build and program any computer. The question is: once built, if such a computer passes the Turing test, are we right to regard it as intelligent?

- If the Test is valid, then any programmed machine which can successfully pass itself off as a human being should be regarded as possessing intelligence.

The case we've been considering suggests this can't be right.

(ii) You might say: “But the machine you are considering isn’t a physical possibility; there are so many possible conversational possibilities that they couldn’t all be stored in a computer’s memory, and it’s not humanly possible to write down all these conversations in the first place, and put them in memory”.

- perhaps this is so (perhaps it isn’t); in which case, if a machine passes the test, it may well be reasonable to think it is intelligent

- BUT: this is not because it passes the Test, but because of practical limitations on computer memory and programming; it remains the case that conceptually speaking, passing the Test is not sufficient for intelligence; it is only sufficient when combined with contingent limitations on the way a computer could be programmed

In which case, behaviour isn't conceptually sufficient for intelligence. Merely producing intelligent-seeming output isn't enough - we doubt that the library-computer is intelligent because although it produces intelligent-seeming responses, what's going on inside it seems to be the wrong sort of process for intelligence.

Lesson: it does matter what goes on inside you. Black-box behaviourism (only the input-outputs are relevant) is false even for intelligence.

————————————————

Computers and AI

Let’s take a slightly closer look at the way computers work. We’ll start with classical or digital computers - also known as Universal Computers, Universal Turing Machines, Von Neumann Computers.

Ordinary PCs are examples of these machines. They are called “universal” because in an important sense they are all equally powerful:

(a) they can all run the same programs (although the programs will need modifying to suit the particularities of a given machine)

(b) they can all run every possible program (when “program” is suitably defined)

This is important: it means that if intelligence can be programmed (in the relevant sense of “program”), then we already have the technology to create an intelligent machine: an ordinary PC could run the AI program, even if only slowly. What matters is software, not hardware.

Well, how do computers work? One quick answer is that they process information. Another is that they store and shuffle symbols, according to fixed and precise rules.

Both answers contain an element of truth, but both need correctly interpreting.

It would be better to say that computers store and manipulate very simple patterns.

- patterns come in all forms, e.g. a sequence of letters from the alphabet, a line drawing, a collection of coloured shapes, an arrangement of shaped objects

- these are all quite complex patterns, involving many shapes, colours, modes of arrangement

Perhaps the simplest form of pattern consists of a sequence of two elements. The pattern consists of the way these elements are organized. E.g. call these elements “presence” and “absence”. A particular pattern would be:

presence, presence, absence, presence, absence, presence

The same pattern could be represented in different ways:

absence, absence, presence, absence, presence, absence

Or we could use different representations for the two elements:

X, X, Y, X, Y, X

1 1 0 1 0 1

This last way of representing a pattern, using “1”’s and “0”’s is how a computer is usually thought of as storing patterns - it would be more accurate to think of patterns of electrical charge, “charge present”, “charge absent”.

- This is one reason why a computer is sometimes thought of as “manipulating numbers”. (Another reason: base 2 arithmetic consists of 1s and 0s, and any base 10 number can be represented in base 2, e.g. 1 = 1, 10 = 2, 11 = 3, 100 = 4, 101 = 5, 110 = 6, 111 = 7 …… 110101 = 53)

- by convention, the sequences of 1’s and 0’s that a computer works with are called bit-strings

However, this connection with numbers is, in a way, accidental. What matters is the simple pattern, which can be represented in any number of ways.

So: a computer is a device which can store (remember) and manipulate simple binary (2-element) patterns, or bit-strings.

Bit-strings are stored in what are called registers. Each register, a location in the computers memory, has an address, which is itself a bit-string.

The “manipulation” performed on bit-strings involve a number of simple or primitive processes. These operations are carried out by the computers’ hardware - each primitive process is activated by its own bit-string. There are only a few of these primitive processes. They include operations like the following:

- save a bit-string to a given register

- compare the contents of two specified registers; if they are the same copy a 1or 0 to a further specified register

- delete all the contents of a specified register

- look at a specified register; if it contains a specified bit-string then go to another specified register

- copy the content of one specified register to another

There are typically only about 10 primitive processes such as these that a computer can carry out - there are different sets of processes, all of which are equivalent (they can perform the same kinds of manipulation)

This is about it: a computer doesn’t do anything other than perform simple operations on bit-strings: it stores them, receives them as inputs, reacts to them in accord with the stored program, and so generates more bit-strings out as output.

Syntax v. Semantics

I noted earlier that it can be misleading to say that a computer works by manipulating or processing data, or information.

In one sense, “information” is what is carried in speech or writing; it is what is held in beliefs: it involves meaning. BUT: it seems clear that the patterns that are manipulated during the running of a simple computer program aren’t meaningful in any ordinary sense.

- we might well interpret the patterns as meaningful (e.g. if we have programmed the computer to multiply numbers then we will interpret certain bit-strings as the numbers that are to be multiplied#

- but it seems clear that the patterns have no meaning for the computer - whereas when you use words, the words have meaning for you (not just other people)

It is often said that computers manipulate symbols

Again, this has to be interpreted in the right sort of way. There are two points:

The “symbols” in question shouldn’t be thought of as meaningful pictures or words: they are merely patterns that are no more meaningful in themselves than the ripples of sand on a beach.
The “manipulations” involved in a computational process shouldn’t (at this stage) be viewed as reasoning or calculation (or cognition). Rather, they are simple mechanical processes, e.g. moving the symbol about, or copying it, or doing something to one symbol in response to another.

A useful analogy: computation involves the manipulation of shapes or syntax, rather than items with semantic properties.

Here “semantic” means: “possesses meaning”.

Consider these two ways of bringing about the same result. We have the following:

(S1) The cat is on the mat.

This is a meaningful sequence of words. I could say to you: “Use the same words to depict a very different situation.” And you might well produce:

(S2) The mat is on the cat.

In performing this manipulation, you were aware of the meanings of (a) the individual words, and (2) of the sentences they compose, and (3) the meaning of the sentence I uttered, the command I gave. You were operating at the semantic level.

But I could easily create a machine which would produce the same effect, i.e. on being given (S1) it would produce (S2). The machine I have in mind doesn’t understand the meanings of the sentences at all; it doesn’t understand anything whatsoever. All it does is:

Recognize certain shapes on the page, i.e. letters of the alphabet.
Shift these shapes from one pattern to another.

This is what computers do: they perform certain simple mechanical operations on items with certain distinctive shapes. It works on a syntactic rather than a semantic level. It works with letters (forms, shapes) rather than meanings.

Does this mean that AI is doomed from the start? Not necessarily. The claim made is this: genuine intelligence, perhaps meaning and understanding, depend upon patterns being manipulated in the right sort of way. Given the right program, the meaningless patterns “come alive”, as it were. Given enough complexity, semantics emerges from syntax.

But: what reason do we have to think this is true?

Turing’s Insight: the Mechanization of Rationality

Computers started to be designed on paper in the 1930’s, and first built in the 1940’s. At first, they were intended purely to do maths and arithmetic - in essence, they were intended to do what human’s could do, but more quickly. In fact, computers were named after people: the people who spend their time in banks and offices doing arithmetic and book-keeping were called computers.

Given this, it may seem surprising that earlier pioneers such as Turing were even then speculating that computers could one day be intelligent, given the right program, and that maybe human intelligence was really just computation.

In fact, there was a rationale for these brave speculations …

Familiar with recent developments in logic, Turing saw clearly how relatively sophisticated human reasoning could be mechanized – i.e. consist of nothing more than syntactic operations. Take a simple argument:

Tim is a bat or John is a human.
Tim is not a bat.

Clearly, we can safely conclude that:

(3) Therefore: John is a human.

You might think you need to be pretty smart (or at least be a little bit smart) to recognize that the conclusion follows from the premises. Any reasonable person would conclude (3) given the premises. But in fact, this reasoning is easily mechanizable.

The argument is of the form:

P or Q

Not-P

Therefore Q

It’s easy to see that a machine could detect the sequences of letters:

P or Q

Not-P

And then be constructed so as to generate the sequence of letters:

You simply set-up the machine so that it recognizes the following forms:

[P] OR [Q]

not-[P]

And produces: [Q]

This is a simple example of logical reasoning or logical argument. The conclusion is bound to be true if the premises are true. The argument is valid because of its structure or form, not because of the contents of the individual sentences or words. Substitute any sentences you like for “P” and “Q”, and the argument is still valid:

Water is silly or David is a mouse

Water is not silly

So: David is a mouse

What we have here is a semantically evaluable process which has been reduced to a purely mechanical sequence of steps

We know that long and complex arguments can be dealt with in the same manner. Any argument to which formal logic can be applied can be handled in a purely mechanical manner. That is, given the premises, what follows from these premises (the consequences) can be generated mechanically.

Why? Because “formal logic” is a system of rules which operates at the level of syntax. It consists of a set of mechanical rules for manipulating symbols, or inscriptions. And as such, it can be handled by a machine which works at the purely syntactic level.

The important point: because the rules are syntactic, they can be mechanized. Reasoning involves logic, and so to this extent, reasoning can be mechanized.

This is what Turing appreciated. Fodor often waxes eloquently on the subject:

Beyond any doubt, the most important thing that has happened in cognitive science was Turing's invention of the notion of mechanical rationality. Here's a quick, very informal, introduction …

It's a remarkable fact that you can tell, just by looking at it, that any sentence of the syntactic form P and Q ('John swims and Mary drinks', as it might be) is true only if P and Q are both true. 'You can tell just by looking' means: to see that the entailments hold, you don't have to know anything about what either P or Q means and you don't have to know anything about the non-linguistic world. This really is remarkable since, after all, it's what they mean, together with how the non-linguistic world is, that decide whether P or Q is itself true. This line of thought is often summarised by saying that some inferences are rational in virtue of the syntax of the sentences that enter into them; metaphorically, in virtue of the 'shapes' of these sentences.

Turing noted that, wherever an inference is formal in this sense, a machine can be made to execute the inference. This is because, although machines are awful at figuring out what's going on in the world, you can make them so that they are quite good at detecting and responding to syntactic relations among sentences. Give it an argument that depends just on the syntax of the sentences that it is couched in and the machine will accept the argument if and only if it is valid. To that extent, you can build a rational machine. Thus, in chrysalis, the computer and all its works. Thus, too, the idea that some, at least, of what makes minds rational is their ability to perform computations on thoughts; where thoughts, like sentences, are assumed to be syntactically structured and where 'computations' means formal operations in the manner of Turing. It's this theory that Pinker has in mind when he claims that 'thinking is a kind of computation'. It has proved to be a simply terrific idea. Like Truth, Beauty and Virtue, rationality is a normative notion; the computational theory of mind is the first time in all of intellectual history that a science has been made out of one of those. If God were to stop the show now and ask us what we've discovered about how we think, Turing's theory of computation is far the best thing that we could offer. (“The Trouble with Psychological Darwinism”)

All this may not seem so remarkable. If so, then the fact that it strikes many people as a remarkable advance just goes to show how little understanding of the nature of thought and reasoning we used to have.

Semantics from Syntax?

We are now in a better position to see how semantics might emerge from syntax: if the shape-shuffling is of the right form and complexity, then the symbols become meaningful, in virtue of being manipulated in the right ways.

Again we have the point that the rational (i.e. logico-semantic) relations between linguistic or conceptual items (thoughts) is mirrored by the causal properties of the physical symbols in the mechanical mind.

To put it another way: meaning is use. Use a symbol in the right way, and it becomes a genuine symbol: i.e. it comes to have meaning, or representational properties. It ceases to be a meaningless shape. Semantics emerges from or consists in, syntax.

- Not everyone would agree with this. Some people would say that syntax is necessary for meaning, but not sufficient: to actually be meaningful, the symbols have to be able to enter into the right sorts of causal relationship with the world. E.g. they are caused by perceptual experience, they lead to behaviour, which involves the subject interacting with things in the world.

- Others aren’t convinced: they say that a newly created brain in a vat can think, even though none of its mental states are in (the right or normal) sort of causal interaction with a body or a perceptual system. So maybe mind-world links aren’t necessary for meaning; semantics requires syntax and nothing else.

We needn’t enter this particular dispute. The key point is that for the classical cognitivist, syntax plays an essential role in the generation of meaning, and may be sufficient

Reason as Computation: Evaluation

The general thesis, then, is this: intelligence and rationality are products of pattern-shuffling.

But to be more precise, we need to distinguish two theses.

Weak Symbol-System thesis: a computer with the right program would be intelligent; i.e. it is possible to create an intelligent device by ensuring that it can manipulate patterns in the right sorts of ways. Symbol-shuffling capabilities (of the right kind) are sufficient for intelligence.

Strong Symbol-System thesis: symbol-shuffling capabilities (of the right kind) are necessary and sufficient for intelligence. The only way to be intelligent is by being a pattern-manipulator.

How can we decide whether either of these theses is true? There are a number of possibilities. The most obvious are these:

(a) The Weak Thesis is susceptible to an existence proof. If we could find a symbol-system which manifested human-levels of reason and intelligence, then we would know the Weak Thesis is true.

- the Strong Thesis isn’t susceptible to an existence proof. The fact that some intelligent systems are symbol shufflers doesn’t prove that all actual or possible intelligent systems are

(a) The Strong Thesis is easier to disprove: if we can find an intelligent system which isn’t a symbol shuffler, or which doesn’t possess intelligence in virtue of its symbol shuffling capabilities, then we know the Strong Thesis is false. There’s at least one other way of creating intelligence.

These are both empirical proofs/disproofs. Are there any non-empirical or philosophical arguments which can be brought to bear?

(b) If we can establish a priori that symbol shuffling cannot possibly be enough for intelligence or rationality, then we refute both theses at once.

Searle’s famous “Chinese Room” argument is an attempt to pursue avenue (c). So is the Lucas/Penrose argument based on Godel’s theorem.

An a priori proof of either the Weak or the Strong theses is harder to envisage. One possible route would be this:

(c) A proof that all complex physical systems are symbol shufflers, and those that are intelligent are so in virtue of their symbol-shuflling abilities.

Some people think a thesis in mathematical logic - the Church-Turing Thesis - proves this. We’ll take a look at this in due course.

Let’s consider some of these avenues in a little more detail, starting with (b).

(i) Does the brain refute the Strong Thesis?

It might be thought that we have a quick and easy refutation of the Strong Thesis:

We are intelligent, and we owe our intelligence to our complex brains; but our brains are not symbol shuffling computers.

If we could be sure that our brains weren’t computers, then assuming that our brains are responsible for our intelligence, we would know that the Strong Thesis was false. But is this the case? There are several points to note.

(1) It’s true that our brains aren’t made of silicon; they don’t contain transistorized logic gates, or magnetic memory registers, or hard-drives. But this is irrelevant. We saw earlier that primitive processors are multiply realizable; they can be made of anything.

- moreover, in a famous 1943 paper, “A logical calculus of the ideas immanent in nervous activity”, McCullock and Pitts showed that systems of interconnected neurons could in fact be organized into working logic-gates

(2) More relevantly, it does seem very likely that our brains are not structured in the same way as an ordinary digital computer. There is no central processing unit working away on registers of binary bit-strings.

A digital computer, of the kind outlined earlier, has what is known as a Von Neumann architecture, after the Hungarian mathematician who invented it. A machine with a Von Neumann architecture works sequentially, with the central processing unit working on one line of code at a time.

The idea that our brains work in this way is hard to accept, given what is known about the way our neurons are structured. There’s another problem:

We know that a brain can understand a sentence or recognize a face in about a 10^th of a second. We also know that a brain cells, neurons, take 1/1000 of a second to respond to an incoming signal. This is about a million times slower than a typical PC, whose flip-flops change state in about a thousand millionth of a second. Since the brain can only process information in 1/1000 of a second steps, in doing any of the things it does in a 10^th of a second, it can only be following about 100 instructions. This is not many at all: a typical AI program, most of which are far less efficient than brains, consists of hundreds of thousands of instructions. It is very implausible to suppose only 100 computational steps are involved in pattern recognition or sentence understanding.

But again, this isn’t necessarily of much significance. Defenders of the Symbol System Theses don’t claim that brains are structured like digital computers. There are indefinitely many ways of making a symbol shuffling device; digital computers with Von Neumann architecture are one way, maybe brains are another.

Also, most defenders of the Symbol System Theses – such as most workers in “cognitive science” - would argue that the brain doesn’t contain just one computer, but many, operating in parallel, and organized into modules and hierarchies.

Modules: these are relatively self-contained systems with particular functions. E.g. separate modules for speech recognition, facial recognition, vision, short-term memory, long term memory; separate memory modules for faces, names, things; a module for controlling bodily movements, and so on. These modules work in parallel, i.e. simultaneously. Although they are interconnected, they also possess a good deal of autonomy (e.g. Muller-Lyer: shows independence of visual processing from general belief system).

Hierarchies: within each module, there will usually be a complex hierarchy of computational processes.

E.g. in understanding speech, a number of computational operations are carried out: (a) detecting phonemes amid the incoming auditory input; (b) working out which words the phonemes are likely to be expressing; (c) finding the concepts which correspond to the words, (d) organizing the words into sentences, (e) interpreting the meaning of the sentences. A few comments on these operations:

These different computational processes are carried out by different bits of the brain, working in parallel, or simultaneously; by different neural computers. Some of the “dumber” processes have been recreated on digital computers.

So: the claim is that the brain consists of a number of interconnected modules, working simultaneously, each of which consist of several levels of computational processing/processors. These basic computation processes are implemented by networks of neurones, the primitive processors, which we know can do computational work.

Given our current ignorance of the brain, it’s hard to see that this picture, drawn from contemporary cognitive science, can be refuted by empirical evidence. The jury is out.

(ii) A Priori Proof: the Church-Turing thesis

A good many writers believe that we have good a priori grounds for thinking that human intelligence must be replicable by a universal computer, or Turing Machine. They base this belief on a widely accepted mathematical doctrine: the Church-Turing thesis.

Consider these claims:

Sterelny asserts "Astonishingly, Turing was able to show that any procedure that can be computed at all can be computed by a Turing machine. ... Despite their simple organisation, Turing machines are, in principle, as powerful as any other mode of organizing computing systems" (1990: 37, 238)

Paul Churchland writes: "The interesting thing about a universal Turing machine is that, for any well-defined computational procedure whatever, a universal Turing machine is capable of simulating a machine that will execute those procedures. It does this by reproducing exactly the input/output behaviour of the machine being simulated" (1988:105). Also: Turing's "results entail something remarkable, namely that a standard digital computer, given only the right program, a large enough memory and sufficient time, can compute any rule-governed input-output function. That is, it can display any systematic pattern of responses to the environment whatsoever" (Paul and Patricia Churchland 1990: 26).

Guttenplan (1994) in his A Companion to the Philosophy of Mind writes "we can depend on there being a Turing machine that captures the functional relations of the brain", for so long as "these relations between input and output are functionally well-behaved enough to be describable by ... mathematical relationships ... we know that some specific version of a Turing machine will be able to mimic them"

There are several different claims to distinguish:

(a) Anything that can be computed by any machine can be computed by a Turing machine

(b) Any possible functional system (involving certain internal states, and a given input-output patterns) can be reproduced on a Turing Machine

(c) Turing's results entail that the brain, and indeed any biological or physical system whatever, can be simulated by a Turing machine.

If all these claims were provably true, then we might have good reason - especially if we believe our intelligence is wholly due to the physical goings-on within the brain - to believe that our intelligence could be simulated or reproduced by a computer. However, there is no proof of any of these theses. (I rely here on Copeland’s article on the Church-Turing thesis in the Standord Internet Encyclopedia of Philosophy)

The Church-Turing thesis concerns the notion of an effective or mechanical method in logic and mathematics.

A method, or procedure, M, for achieving some desired result is called an “alogrithm” or “effective” or “mechanical” just in case:

1. M is set out in terms of a finite number of exact instructions (each instruction being expressed by means of a finite number of symbols);

2. M will, if carried out without error, always produce the desired result in a finite number of steps;

3. M can (in practice or in principle) be carried out by a human being unaided by any machinery save paper and pencil;

4. M demands no insight or ingenuity on the part of the human being carrying it out.

A well-known example of an effective method is the truth table test for tautologousness.

The problem is this: although the notion of an effective procedure is meant to be precise, the above formulation isn’t: it’s a set of rules which can be carried out with no “insight or ingenuity”. But what does this amount to?

Turing made a suggestion: a procedure is effective iff it can be carried out by a Turing Machine:

- the notion of Turing Machine computability replaces the notion of “can be carried out by a human unaided by insight or ingenuity”

The Church-Turing thesis amounts to this:

Any function which is effectively computable can be computed by a Turing Machine

Or as Turing himself put it:

LCMs [logical computing machines: Turing's expression for Turing machines] can do anything that could be described as "rule of thumb" or "purely mechanical". (Turing 1948:7.)

This is called the “Church-Turing Thesis” because Church made a similar proposal (not involving the notion of a Turing Machine) at the same time - 1936.

Although there’s no proof of this Thesis, it is widely believed to be true by logicians.

What follows?

Well, the Thesis only makes the claim that whatever could be calculated in a finite time by a dumb human could also be calculated by a computer. The ramifications of this are not that far-reaching:

(A) It doesn’t follow that anything which can be computed by any possible machine can be computed by a Turing Machine.

Why? Two main reasons:

- it is known (proven) that there are logically possible (but maybe not physically possible) machines which can give the right answers to questions which are not effectively decideable (and so not computable by Turing machine)

- the basic processes which some machines employ may be beyond the capacities of any human computer who is unaided by machinery

Turing himself recognized the possibility of devices which go beyond Turing Machines in their computing power - he called them Oracles (the idea being that whenever a Turing Machine got stuck and couldn’t come up with the answer, it would consult the Oracle device, and get the right answer!)

- an Oracle could take the form a machine which generates a particular infinite number sequence; these numbers would represent, in coded form, the answers to all the non-Turing-computable problems that a given Turing Macine couldn’t solve

- such a device is logically possible, but may well not be something we could actually build

(B) It doesn’t follow that any given physical system, or functional system (pattern of input-output responses) can be simulated by a Turing Machine

- most mathematical functions are non-computable (the answers for any given input cannot be computed by Turing Machine)

- many physical systems (and input-output patterns) are described by non-computable functions

- these systems cannot be simulated by a Turing Machine

So: it’s at least possible that the human brain, or mind, does things which cannot be computed by a Turing Machine. Certainly, nothing that follows from the Church-Turing thesis entails that the operations of our brains can be fully simulated by a Turing Machine.

(iii) A Priori Refutations: Searle and Lucas-Godel-Penrose

(a) Searle’s famous “Chinese Room” thought experiment is meant to refute the idea that pattern-manipulation could ever be sufficient for intelligence, and so refute both the Strong and Weak Symbol Systems theses.

We’ve already discussed this argument, so I will be brief.

- the basic argument: Searle manipulates chinese symbols in accordance with complex rules; it appears to the outside world as though he is giving intelligent answers, in Chinese, to questions put to “the room” in Chinese; but Searle says: I don’t understand Chinese, and nothing else in the room does, therefore there’s no understanding going on. I’m doing exactly what a computer does, so no computer could ever understand what it is doing. Therefore, the idea that semantics could emerge from purely syntactical operations is wrong.

In reply, the advocate of AI could say:

“The room as a whole is in fact a rational and intelligent entity. The fact that Searle doesn’t understand what is going on doesn’t mean that the room as a whole (of which Searle is a functioning part) doesn’t. It does.

Searle’s claim may be convincing, but only when interpreted in this way: there is no conscious understanding. Maybe there isn’t. At the very least, the room as a whole is an intelligent zombie. It has the understanding and intelligence that a zombie can have. The fact that it isn’t conscious doesn’t matter - you don’t need consciousness for intelligence or reason. Even a zombie has an understanding of a sort.”

Provided the claims made for the room don’t go beyond this, Searle’s objection has little force.

(b) Lucas & Penrose on Godel’s Theorem

The arguments in this area are interesting, but nearly always difficult and technical. So I will be brief (plenty of on-line papers available from Chalmers’ website).

In 1961 Lucas published a paper “Minds, Machines and Godel” which used a famous result from logic - Gödel's incompleteness theorem - to refute the idea that a computer could ever fully replicate the power of the human mind.

Gödel's First Incompleteness Theorem states that in any "formal system" F sufficient to formalize a modest portion of the arithmetic of the integers and which is assumed to be sound there is an arithmetical sentence that is true but not provable in the system F.

Some quotes from Lucas:

Gödel's theorem seems to me to prove that Mechanism is false, that is, that minds cannot be explained as machines

Gödel's theorem states that in any consistent system which is strong enough to produce simple arithmetic there are formulae which cannot be proved-in-the-system, but which we can see to be true.

Gödel's theorem must apply to cybernetical machines, because it is of the essence of being a machine, that it should be a concrete instantiation of a formal system. It follows that given any machine which is consistent and capable of doing simple arithmetic, there is a formula which it is incapable of producing as being true---i.e., the formula is unprovable-in-the-system-but which we can see to be true. It follows that no machine can be a complete or adequate model of the mind, that minds are essentially different from machines.

In more recent years, the argument has been taken up by Roger Penrose, in The Emperor’s New Mind, and Shadows of the Mind. Penrose takes the argument to demonstrate the following:

- Godel has shown that there are mathematical truths which cannot be proven by any algorithm (a fixed and precise list of rules)

- a computer running a theorem-proving algorithm couldn’t recognize the truth of these maths theorems

- but we can recognize their truth

- therefore, our minds are not wholly algorithmic: we have a mental ability which cannot be explained in terms of computational rules - insight, intelligence - call it what you will

Penrose extends the usual Godel argument, which says there is some abstruse theorem of arithmetic which we can’t prove but know to be true - he gives plenty of examples of mathematical theorems which are non-computable (cannot be proved by an algorithm) but which we can see to be true by “insight”

- he goes so far as to claim that consciousness plays a crucial role here:

As Dennet says: “The function of consciousness, in Penrose's view, is to leapfrog the limits of (practical) computability by conjuring up appropriate judgments in circumstances in which "enough information is in principle available for the relevant judgment to be made, but the process of formulating the appropriate judgment, by extracting what is needed from the morass of data, may be something for which no clear algorithmic process exists--or even where there is one, it may not be a practical one." (p.412)”

So: the claim is that Godel’s theorem shows that our minds have powers that a computer couldn’t possess.

But: in a review of Penrose’s recent book, Hilary Putnam (Harvard professor of mathematical logic) berates Penrose for relying on an argument which every competent logician has long recognized as being invalid. This is striking, for Penrose is widely recognized as being one of the world’s most brilliant mathematical physicists (his field is not mathematical logic, however).

There are, in fact, several ways in which the Lucas/Penrose argument can be undermined:

(a) For the argument to work, we have to be able to know that our own reasoning powers are sound or consistent (i.e. that we only ever derive truths our axioms, we never derive contradictions). Perhaps they are sound, but why think we can know this? Perhaps it is beyond our powers to know that our reasoning powers are sound.

Lucas’ response to this objection is interesting (and amusing). I quote:

“A man's untutored reaction if his consistency is questioned is to affirm it vehemently: but this, in view of Gödel's second theorem, is taken by some philosophers as

evidence of his actual inconsistency. Professor Putnam has suggested that human beings are machines, but inconsistent machines. If a machine were wired to

correspond to an inconsistent system, then there would be no well-formed formula which it could not produce as true; and so in no way could it be proved to be

inferior to a human being. Nor could we make its inconsistency a reproach to it---are not men inconsistent too? Certainly women are, and politicians; and even

male non-politicians contradict themselves sometimes, and a single inconsistency is enough to make a system inconsistent.

The fact that we are all sometimes inconsistent cannot be gainsaid, but from this it does not follow that we are tantamount to inconsistent systems. Our inconsistencies

are mistakes rather than set policies. They correspond to the occasional malfunctioning of a machine, not its normal scheme of operations. Witness to this that we

eschew inconsistencies when we recognize them for what they are. If we really were inconsistent machines, we should remain content with our inconsistencies, and

would happily affirm both halves of a contradiction. Moreover, we would be prepared to say absolutely anything---which we are not. It is easily shown that in an

inconsistent formal system everything is provable, and the requirement of consistency turns out to be just that not everything can be proved in it---it is not the case

that "anything goes." This surely is a characteristic of the mental operations of human beings: they are selective: they do discriminate between favoured---true---and

unfavoured--- false---statements: when a person is prepared to say anything, and is prepared to contradict himself without any qualm or repugnance, then he is

adjudged to have "lost his mind". Human beings, although not perfectly consistent, are not so much inconsistent as fallible.”

Comment: this sounds quite reasonable – but does Lucas imply here that women’s intellects could be programmable?

(b) Another objection to Lucas runs thus: Perhaps our intellectual abilities depend upon an imperfect program. Some of us are good at maths, but no human mind is perfect – mistakes will be made, inconsistencies will go unnoticed. So even if there are some truths that every human mind will fail to recognize (our own personal “Godel sentences”), it doesn’t mean that our minds are not in fact programs.

In arguing that Penrose’s argument may rest upon a mistake, Dennett argues thus:

- computers can play chess very well

- yet there is no (practical) algorithm for finding a guaranteed route to checkmate from any given position: a computer would have to be able to search through the all vast number of possibilities, and there’s just not enough time

“And yet programs--algorithms--that achieve checkmate with very impressive reliability in very short periods of time are abundant. The best of them will achieve checkmate almost always against almost any opponent, and the "almost" is sinking fast. You could safely bet your life, for instance, that the best of these programs would always beat me. But still there is no logical guarantee that the program will achieve checkmate, for it is not an algorithm for checkmate, but only an algorithm for playing legal chess--one of the many varieties of legal chess that does well in the most demanding environments. The following argument, then, is simply fallacious:

(1) X is superbly capable of achieving checkmate.

(2) There is no (practical) algorithm guaranteed to achieve checkmate.

therefore

(3) X does not owe its power to achieve checkmate to an algorithm.

[The computer is following an algorithm, but a fallible one: an efficient but imperfect chess-winning program - known as a heuristic. Heuristic programs are still algorithms]

Dennett continues: “So even if mathematicians are superb recognizers of mathematical truth, and even if there is no algorithm, practical or otherwise, for recognizing mathematical truth, it does not follow that the power of mathematicians to recognize mathematical truth is not entirely explicable in terms of their brains executing an algorithm. Not an algorithm for intuiting mathematical truth--we can suppose that Penrose has proved that there could be no such thing. What would the algorithm be for, then? Most plausibly it would be an algorithm--one of very many--for trying to stay alive, an algorithm that, by an extraordinarily convoluted and indirect generation of byproducts, "happened" to be a superb (but not foolproof) recognizer of friends, enemies, food, shelter, harbingers of spring, good arguments--and mathematical truths!”

As responses to the Lucas/Penrose arguments go, this is quite convincing.

Does AI constitute an existence proof for the Weak Thesis?

If an intelligent computer were created, we would have proof of the Weak Thesis: symbol shuffling is one way in which intelligence can be realized. People have been working on AI for several decades now. How successful have they been?

The short answer: not as successful as they would have liked. Progress has been so slow, and so far success has been so limited, that we have no reason to think anything like human-level intelligence can be programmed into a computer. On the other hand, it may just be a very difficult thing to do …. We just don’t know as yet.

In more detail ….

The field of AI really took off at a conference in Dartmouth in 1956. A year later, two pioneers, Newell and Simon, made the following predictions:

Within ten years, a digital computer would be the world chess champion.
Within ten years, a digital computer would discover and prove an important new mathematical theorem.
Within ten years, a digital computer will write music that will be accepted by critics as possessing considerable aesthetic value.

Ten years later, none of these predictions had come true. But enthusiasm was still the order of the day. Newell issued a further prediction in 1965:

4.Within twenty years, machines will be capable of doing any work that a man can do.

More than 30 years later, we’re still waiting. In 1982, scientists working on the Japanese “Fifth Generation” project claimed that within 10 years their computers would possess common sense and would be able to understand human conversation, with a vocabulary of 10,000 words. The project was later abandoned – progress was far slower than had been hoped.

But the predictions still come. In the early 1990’s, Hans Moravec has claimed that: within 50 years robots with human intelligence will be commonplace. More recently, there are the Kurzweil projections mentioned earlier.

Given the fate of previous predictions for success in AI, these latest predictions, though more cautious than its predecessors, should no doubt be treated with caution!

The truth is: we haven’t yet made much progress in AI. No program has yet come anything near to displaying human-like levels of intelligence.

There have been some successes:

- Kasparov has been beaten by a computer (computer games generally have made great leaps forward!)

- some maths theorems have been proved

- some “expert systems” have proved better (in their limited fields) than their human counterparts

- decent speech/handwriting recognition programs are just becoming available

programs for facial recognition have improved

So: progress has been made on some of the computation that is involved in our own sensory processes

- robots which can move in a realistic (animal-like) fashion have recently been made

But the failures are more impressive:

- no machine can produce conversations of any length or complexity that show any sense or intelligence

- more generally, we are a long way short of producing any computer that possesses a general understanding of the world: common sense has proved very hard to program

given that we haven’t managed to find a way of representing our common sense knowledge about how the world works, it’s not surprising that common sense reasoning remains far beyond current machines – the raw materials for intelligent reasoning (knowledge) are not in place

A few brief illustrations/explanations of why common-sense knowledge is such a problem.

/B<(1)> There’s a lot of it, and we aren’t aware of it. Consider the following:

Read the following passage and then answer the two questions:

Tommy had just been given a new set of blocks. He was opening the box when Jimmy came in.

Question 1: Who was opening the box?

Question 2: What was in the box?

The answers are obvious to us: we know that “he” usually refers to a person; that people who have been given a present open it themselves; they open presents after receiving them; that blocks are the kind of thing a child gets as a present; that “Tommy” may well be the name of a child; we know that blocks are not people.

This mass of background knowledge is obvious to us, but it’s not obvious to a computer. It all has to be programmed in. This means finding a way of representing the data in a way the computer can understand, and finding a way of representing the interconnections between pieces of data: the inference patterns we all unconsciously adopt. This too isn’t easy. Another example:

What does the word “they” refer to in each of the following sentences?

The police refused to give the students a permit to demonstrate because they feared violence.

The police refused to give the students a permit to demonstrate because they advocated revolution.

This example illustrates the way background common sense is necessary to understand the basic grammar of a sentence: the structure of both sentences is the same, the only difference is the final two words, which in both cases

The sorts of information we’re concerned with here consists of the following sorts of fact:

animals need food
people (humans) usually wear clothes
people usually walk forward, but can walk backward
people stand feet-first on the ground; bats hang upside down.
But human beings can hang upside down too.
Water is a liquid
Iron (at room temperature) is a solid
You can insert an solid object into a liquid
You can’t insert a liquid object into a solid, unless it has got a hole inside
A dog which is alive at 9 am will usually be alive at 9 pm
Giraffes have spots, and long necks
Giraffes don’t wear underwear
Opening a jar of jam will not cause the moon to explode
People use thermometers to find out their temperature
They don’t insert thermometers into their ears

Clearly, there are millions of bits of information of this sort. Itemizing and summarizing it is going to be a monumental task.

(1) Worse still, these different bits of information have to be connected up, if intelligence is what is required. But the “rules” of common sense are hard to set down in a formal systematic way.

E.g. If there’s a bag in your car, and a gallon of milk in the bag, there’s a gallon of milk in your car. But if there’s a person in your car, and a gallon of blood in a person, it would be strange to conclude that there’s a gallon of blood in your car.

Dennett has provided a nice illustration of the difficulties involved in anticipating the side-effects of a given action, and a more general problem.

The scenario is this: A robot is designed to fetch a spare battery from a room that also contains a time bomb. The robot must get the battery before the bomb goes off.

Version 1 of the robot saw that the battery was on a wagon and that if it pulled the wagon out of the room, the battery would come out too. Unfortunately, the bomb was also on the wagon, and the robot failed to deduce that pulling out the wagon would also bring the bomb out too.

Version 2 was programmed to consider all the side effects of its actions. It had just finished computing that pulling the wagon would not change the colour of the room’s walls, and was proving that the wheels would turn more revolutions than there are wheels on the wagon, when the bomb went off.

Version 3 was programmed to distinguish between relevant implications and irrelevant ones. It sat there cranking out millions of implications and putting all the relevant ones onto a list to consider and all the irrelevant ones on a list to ignore, as the bomb went off.

Dennett points out that it’s not enough to deduce the implications of what one knows, what’s needed is to deduce only the relevant implications. This problem is sometimes known as the frame problem. We don’t yet know how to solve it. That is, we don’t know how to program a machine to solve it. We know we solve it all the time.

Conclusions AI hasn’t done very well. But do these failures prove anything? Unfortunately (or fortunately) not. All they show is that AI is hard. Much harder than was envisaged. It doesn’t mean that intelligence can’t be programmed. And so the Weak Thesis remains intact – albeit lacking overwhelming support.

A Way out of the Impasse: Evolving Intelligent Machines:

Some workers in AI have come up with an interesting way of making progress … rather than trying to invent AI programs, we let them evolve by natural selection.

Suppose we have some problem, such as sorting numbers
We randomly generate lots of programs (consisting of sequences of bits instructions chosen at random)
We test the population, to see which programs work best
We eliminate the least successful programs, and create new programs by modifying the most successful programs
Either: we make random modifications OR we “breed” new programs by combining the elements of the more successful programs
We then test these new programs, and repeat the procedure.

|Note: because computers are very fast, natural selection on a computer takes seconds or minutes, rather than millions of years!]

Some very successful programs have been created like this. What is interesting: although the instructions are clearly visible, and contain nothing mysterious, how the program does what it does is often completely incomprehensible! The shortest and simplest description of how the program works is the instruction sequence itself …

The moral is obvious: it may well be possible to evolve an intelligent piece of software; but it may also be that we won’t understand how it works.

If this is right, then a good deal of the interest of AI evaporates: creating an intelligent device wouldn’t tell us anything informative about the mechanisms underlying intelligence – intelligence would be as mysterious as ever (even though we have a list of the instructions for programming an intelligent computer in front of us!)

The Challenge from Connectionism

The Strong and the Weak Theses make the claim that symbol manipulation is either necessary or sufficient for intelligence. Both these claims have come under attack, from enthusiasts of a new and different kind of computing devices, namely PDP, or Parallel Distributed Processing, or connectionist devices.

- the label connectionism is used in a number of ways, but usually to refer to the general idea that PDP devices have more to tell us about the nature of the mind, or intelligence, than classical or Von Neumann or digital computers

NOTE: it’s not really true that work on this broad class of computational machines is new. Research was being done in the 1950’s. But after an influential critique, by Minsky, interest waned, until the 1980’s, when it suddenly picked up again. It turned out that Minky’s criticisms were only valid against a particular class of connectionist devices. Influential figures: McLelland & Rummelhart, Hinton, Smolensky.

So-called connectionist devices are in many ways unlike digital (or “Von Neumann) computers. They consist of networks or webs of simple nodes; in some ways, they look like expanses of neural tissue, and so are often called “neural nets”.

this said, it would be a mistake to suppose connectionist systems are very similar to real brains. There are plenty of differences between PDP systems and real brain, real neural nets
but it is nonetheless true that PDP systems are closer to real brains than digital computers.

These nets aren’t programmed by running sequences of instructions. They don’t store sequences of instructions, or bit-strings, in memory banks. They don’t have central processing units. Most significantly, they don’t shuffle symbols: they don’t store, pass around, and work on discrete patterns or bit-strings. In this sense, they are non-symbolic or sub-symbolic computational machines.

As we’ll see, this too is an oversimplification, in that PDP devices can mimic symbol-shufflers, they can run them as virtual machines
but the point remains: connectionist systems aren’t fundamentally symbol manipulators

Before proceeding, some important points to note:

(A) if it could be shown that connectionist or PDP systems possess intelligence, then the Strong Thesis is refuted. Symbol shuffling isn’t the only way intelligence can be realized in a physical system. (Actually, this is an oversimplification, in that to refute the Strong Thesis, it would need to be shown that a PDP system which didn’t run a virtual symbol-processor could possess intelligence.)

(B) The Weak Thesis is under threat too: if it could be shown that there are some computations, some forms of cognition, which can only be performed by a PDP system, then it’s not true that a symbol shuffling can be sufficient for intelligence.

- the term “connectionism” here can mean either of two things: (i) all possible intelligent systems are connectionist networks, (ii) our brains are connectionist systems.

- the term “eliminativism” here refers to the doctrine that there are no such things as the entities posited by ordinary folk psychology: e.g. beliefs, desires, hopes – propositional attitudes generally.

The argument runs along these general lines:

According to folk psychology, we store a great deal of information in beliefs; beliefs have certain properties, e.g. they are discrete, persisting entities; they have a compositional structure; they have causal powers;
Connectionist systems don’t store information in units with the same properties as beliefs;
So, if we store information in a connectionist way, we don’t have beliefs;
More generally, all our representational states of mind (memories, propositional attitudes other than belief) don’t exist either, in the form alleged by folk psychology.

We’ll take a brief look at this issue.

HOW PDP SYSTEMS WORK – MORE DETAIL

A PDP device is a network of nodes, or artificial neurones: so-called because they work in something like the manner neurones are believed to work. A neuron is a single cell, with several long “wires” leading into it and out of it: dendrites and axons. Neurons are either “on” or “off”, that is, they are either firing or not firing, i.e. they are sending electrical discharges up through the axon, in response to the electrical activity coming in through the dendrites.

Here’s a simple example, to illlustrate how a neuron works:

Output

Neurone N

X (1) Z (3)

Y(2)

X, Y and Z are inputs into our neurone N; they are themselves outputs from other neurons. These inputs have different “connection strengths” or “weights”, depending upon the chemistry of the links, or the thickness of the neural strands. X has strength 1, Y strength 2 and Z has strength 3. This means the effect on N of Z firing is 3 times the effect of X firing.

N itself has a certain threshold, i.e. the minimum input which causes it to start firing. If N has a threshold of 4, then it will fire if: Z and X are both active, or X and Y are active. It won’t fire if X and Y are active but not Z, and it won’t fire if only one input is active.

A PDP network is built up of simple devices, call them units, which work along these lines. These units are usually arranged in layers, like this:

Each unit is connected to others. There may be two layers, or more than two, in which case, the inner units are called the “hidden layers”. Each unit has a firing threshold; input connections can be excitory or inhibitive; connections can vary in strength, i.e the degree to which being active influences the unit they feed into. If the inputs to a unit exceed the firing threshold, the unit is active; if they fall below, the unit is dormant.

The “edge” of the network (one of the outer layers) is chosen as the input layer, and another is chosen as the output layer. To enter data, the units of the input layer are forced or clamped into a particular activation pattern (e.g. in a simple case, a sequence of on’s and off’s). Forcing (and keeping) the input units in this pattern initiates a mass of activity: the units in the next layer start firing, which cause some units in the next layer to start firing …. Changes spread across the network, in a fluctuating pattern of activity. But eventually, the network settles into a stable state. At this point, we read off the output: the pattern of on’s and off’s at the output units.

When a network has reached a stable condition, changing the condition of just one unit (e.g. clamping it on or off) often has dramatic repercussion: ripples of change spread back and forth throughout the network, until a new stable state is reached. The network is a very sensitive, highly dynamic and unstable, system. It is very difficult to predict how such a system will behave in response to any particular change.

SOME APPEALING PROPERTIES OF NETWORKS

There are a number of these, I’ll mention just a few, and quite briefly.

(1) Learning by Example

Given the chaotic unpredictability of many networks, explicit programming isn’t an option. Instead, we have to find a way of getting the network to learn what to do, by showing it sample problems, or inputs, then seeing what the system does with these inputs, then modifying the system to improve its behaviour..

The simplest method: randomly varying the connection weights, until the system performs in the desired way.

But this is time consuming, and there’s no guarantee that you’ll ever find the right combination of weights: there are just too many possibilities.

A better solution: use reinforcement by feedback, or “back-propagation” PDP programmers have designed various training methods for nets. These are mechanical (performed by machine, or a computer, or network) and non-random ways of altering the connection weights so that after each “run”, the output of the system moves closer to the desired output.

Training is not programming: the “training algorithms” that have been designed by PDP engineers are not programs in the traditional sense. They aren’t sequences of instructions for performing manipulations on bit-strings. They are simply systematic ways of adjusting the strengths of connection weights in the light of discrepancies between actual inputs and desired outputs. The designer doesn’t know (or care) what the learning algorithm actually does in response to any given input: the aim is to get the right outputs; the precise way the network achieves this is entirely unplanned – the particular connection-weight changes that occur aren’t foreseen or intended. The network finds them itself.

The important point: back-propagation is leaning by example, just as we do. Networks aren’t programmed, nor are we. We learn intelligent responses by trial, error, and feedback, just like networks.

(2) Generalization

A somewhat remarkable thing: when a net has been trained up, on a given set of inputs (e.g. it has learned to give the right output to thirty different input patterns, those used in its training), it will also (often) give the right outputs for different input patterns, i.e. inputs it has never “seen” before. In effect the network generalizes from selected inputs to all inputs

- these are two characteristics that are very familiar from human behaviour; we learn by example, and can correctly “get the idea” or generalize, from a limited number of inputs

One noted success: the past tense converter; takes English present tense verbs as inputs and delivers their past-tense forms as outputs. The system as trained with a few hundred regular and irregular forms (e.g. train/trained, walk/walked, bake/baked, run/ran, make/made, is/was ….). [In fact, 400 verbs, and 190 training cycles to master them]. After this comparatively brief exposure, the system could correctly convert regular verbs it hadn’t seen before, and irregular verbs it hadn’t seen before: it gave wept for weep, clung for cling, and dripped for drip.

this is an impressive achievement, but it was only successful in about 70% of cases. E.g. it gave membled for mail, shipped for shape, and squawked for sqat.

(3) Storage by Superposition

The same network, the same units, can be used - at the same time - to store data about different things.

[example: ten propositions correctly stored and evaluated by same network]

Note 1: the same units are involved in representing all the propositions. Each proposition is represented by activity spread or distributed across a number (probably all) the units.

Note 2: this is why this mode of storage is referred to as “storage by superposition”: the information pertinent to the evaluation of all the propositions is stored throughout the same network of units

Note 3: This distributed mode of representation is very economical, and corresponds well with what we know of how memories are stored in brains. There’s no one part of the brain which is associated with any given memory.

(4) Graceful Degradation

(5) Pattern Recognition, Pattern Completion

We know we are good at recognizing objects: think of the speed at which you recognize a familiar face. We can recognize thousands, maybe millions of different faces, almost instantly. How do we manage it?

Think of the range of forms a chair can take, yet usually we can tell that something is a chair, or a table, without any delay or thought.

Handwriting is another good example: people write in very different styles, they often write very badly, yet we can usually read what they have written.

Understanding speech involves recognizing words and sentences, despite the fact that people’s voices and accents are very different.

Another auditory example: recognizing a tune, when sung, played by an orchestra, played very badly by the teenager learning piano next door.

These activities can all be classed as pattern recognition problems. Conventional AI has proved very bad at programming for pattern recognition. Connectionist systems are very good at it.

So much so that “pattern recognizer” or “pattern completer” is a good way of describing a PDP system.

Some writers see something very deep and significant here. Consider some of the different ways the letter “a” can be written, in very different fonts. What do all the “a”’s have in common? It’s very hard to spell out precisely, in terms of geometrical properties that all and only versions of the letter have – there may be no such properties. Yet, we can quickly recognize that the letters do have something in common: they are all versions of a!

They have the same spirit or essence.

There’s another complicating factor: not only do different a’s vary in shape and geometry, in recognizing some letters in some fonts, we often need to pay attention to how other letters in the same font are recognized. E.g. the letter h in some fonts may be similar to the letter b or k. To tell if a letter really is an h, you need to look and see how b and k are represented in the same font. This process of making judgments in a holistic manner, paying attention to how different letters are depicted, entertaining several tentative hypotheses at once, and doing the job very quickly – all very difficult to program in a conventional way. Yet relatively easy for a PDP network.

CONNECTIONISM: WHAT TO MAKE OF IT?

The power of PDP systems to perform certain types of task has had a significant impact on recent cognitive science, but there is little consensus as to what exactly their significance is. Do they legitimise adopting fundamentally new view of cognition? Some people say yes, others say no. I’ll give a brief overview of current controversies.

/B<(1)> ELIMINATIVE CONNECTIONISM

Probably the most dramatic interpretation of the significance of connectionist systems: it may well be that our minds are very different than we think. Folk psychology is false: there are no beliefs, or desires, there are no propositional attitudes.

Ramsey/Stitch/Garon’s Argument

A more detailed and specific argument for eliminativism has been provided by RSG.

They begin by discussing some general issues concerning the circumstances in which changes in scientific theory legitimise ontological revision, i.e. altering our view of the entities which really exist in the world.

- there are other cases where old theories are so wide of the mark that we no longer believe the entities posited by those theories exist. E.g. witches, phlogiston.

RSG claim that if connectionism is true, if our brains are connectionist nets, then folk psychology is so wide of the mark that there is good reason to believe that entities posited by folk psychology do not exist. In particular, there are no beliefs!

In more detail, the argument runs thus.

According to RSG, folk psychology is committed to the doctrine that we store information in propositional form (e.g. beliefs). Moreover, folk psychology is committed to a doctrine they call propositional modularity. This doctrine amounts to the following picture of what beliefs are:

<(a)> They are semantically interpretable (they can be assigned meanings, be evaluated for truth/falsity)
<(b)> They are functionally discrete (beliefs can be lost or gained one by one; losing one belief can – in principle – leave the rest of the mind unchanged)
<(c)> They are causally active (beliefs causally influence actions, and other beliefs)
<(d)> They are projectible (belief-states can be the object of theoretical generalizations or laws, e.g. if someone has the property of “believing that dogs have fur”, we can make predictions about how they will act, etc., because they have this belief)

RSG then examine a common form of connectionist network, of the sub-symbolic kind, one in which information is widely distributed, and argue that this mode of information storage is incompatible with propositional modularity.

They consider a Network A, which has been trained to evaluate sixteen sentences

They describe the way this system works thus:

In the semantic network [memory as conceived by folk psychology] there is a functionally discrete sub-part associated with each proposition, and thus it makes perfectly good sense to ask, for any probe of the network, whether or not the representation of a specific proposition played a causal role. In the connectionist network, by contrast, there is no distinct state or part of the network that serves to represent any particular proposition. The information encoded in Network A is stored holistically and distributed throughout the network. Whenever information is extracted from Network A, by giving it an input string and seeing whether it computes a high or a low value for the output unit, many connection strengths, many biases and many hidden units play a role in the computation. And any particular weight or unit or bias will help to encode information about many different propositions. It simply makes no sense to ask whether or not the representation of a particular proposition plays a causal role in the network’s computation.

(1) The point made here is a simple one: in Network A, the units and connection weights which are causally responsible for the “Yes” answer to the query “Do dogs have fur?” are also responsible for the network’s answers to another 15 questions. The system as a whole encodes the information about all 16 propositions. It is impossible to isolate one part of the network as representing “dogs have fur”; indeed, the question “which part of the network represents a given proposition” doesn’t have any sense.

Given this, it makes no sense to suppose that the representation of a particular proposition has any distinctive causal role.

Note also: The holistic mode of information storage also means that it is unlikely that the network could lose just one item of information. Changing the weights in a way which would cause the system to answer “No” to “dogs have fur” would also change the way the system responded to the other propositions.

Summing Up: connectionist representations don’t have functional discreteness, they aren’t causally active, they aren’t projectible. Propositional modularity fails – hence the case for saying that if our minds store information in distributed connectionist systems, there are no such things as beliefs, or other propositional attitudes.

(2) IMPLEMENTATIONISM

A very different attitude to connectionism is to be found in defenders of the classical symbol system hypothesis, such as Fodor, Pylyshyn and McLaughlin.

In an influential article, F& P argue that it’s a mistake to suppose connectionist systems offer a new and alternative model of cognition. In reality, all they do is provide us with a new way of implementing symbol processing systems. They claim any cognitive system possessing general intelligence will be a classical symbol processor. Whether this symbol system is implemented in a serial silicon computer, or a parallel web of nodes and weights is irrelevant. Cognition still and ultimately consists of rule-governed symbol processing.

F& P’s attack is rooted in the claim that while connectionist systems are good at some things, they are very bad at others, and the things they aren’t suited for (or just can’t do at all) are essential components of genuine intelligence. The abilities in question fall under two headings: productivity and systematicity.

Productivity: How many different thoughts can you think and understand? Answer: there’s probably no limit. You can effortlessly come up with sentences, or propositions, or thoughts, which you haven’t previously entertained.

“Periwinkles are green or livid.”

“Nice trees don’t dance”

“Numbers are OK for some people, but I prefer hot chicken.”

This boundless productivity is easily explained in classical terms. It’s easy to see how a finite number of symbols, and a finite number of rules for manipulating these symbols (so as to form grammatical sentences) can yield the capability for generating an infinite number of different sentences. All that’s required is that the rules allow the endless re-arrangement of symbolic units:

e.g. the embedding of one sentence within another

It’s true that grass is green.

It’s true that “it’s true that grass is green”

… and so on

e.g. an unlimited capacity for forming new sentences by putting old one’s together:

Grass is green.

Grass is green and water is wet.

Grass is green and water is wet or dogs have paws.

…. And so on.

It’s easy to see that an indefinite number of new sentences, not previously seen, can be understood by the same procedure: working backwards, applying the rules and decomposing the complex sentence into simpler combinations of symbols.

But it’s not easy to see how a pattern associator, a system whose outputs aren’t created by rule-governed symbol manipulation, could have the same productive powers. Indeed, it is plausible to think that boundless productivity could only be a property of a system which manipulates symbols in accord with recursive rules.

Systematicity: Fodor & Pylyshyn also point out that any rational person who understands “Jill loves Jack” will also understand “Jack loves Jill”. More generally, when we can understand a sentence with a given structure, we can understand any other sentences of the same structure, but with different “fillings”.

Jack jumped over the moon.

The dolphin jumped over the moon.

The moon jumped over the moon.

The moon jumped over Jack.

Fodor and Pylyshyn claim that systematicity is more than coincidental: in saying that thought is systematic, they are claiming that the ability to think one thought is intrinsically connected to the ability to think others.

“You don’t, for example, find native speakers who know how to say in English that John loves Mary but don’t know how to say in English that Mary loves John. If you did find someone in such a fix, you’d take that as presumptive evidence that he’s not a native English speaker but some sort of tourist.”

Assuming that genuine thinker’s are systematic (that is, their thought processes exhibit systematicity), it is evidence that thoughts aren’t stored whole.

If individual thoughts, such as “John loves Mary”, were stored as wholes, there could easily be thinkers who could think that John loves Mary but were unable to think or understand the thought that “Mary loves John”.

- if thoughts aren’t stored as wholes, how are they stored? Obvious answer: we have the ability to generate new thoughts by re-arranging parts of sentences in accord with certain rules

Now, it’s easy to see that a connectionist network could learn to say “Yes” to “John loves Mary”, but fail to say “Yes” to “Mary loves John” – we simply stop training the network as soon as it has learned to say “Yes” to “John loves Mary”. This suggests that networks aren’t systematic: their ability to “understand” one sentence is not intrinsically linked to their ability to “understand” other sentences with the same structure.

If F & P are right, and genuine thought is necessarily systematic, then it may well be that pure connectionist systems, i.e. systems which do not manipulate symbols in accord with rules, will not be capable of genuine thought.

Fast, flexible updating

There’s a further distinctive feature of genuine thinkers, one which F & P don’t lay such great store on. For certain kinds of problem-solving, it really helps to work with explicit and precise rules. Cognitive systems which don’t (or can’t) work with explicit rules are severely handicapped.

Here’s an example due to Andy Clark. Connectionist systems can be trained to come up with the answer to simple electrical circuitry problems, which rely on Ohm’s Law: V = IR, and they do so without having any explicit understanding of the Law itself. Human’s, on the other hand, usually use the rule. Suppose you give a human being a new kind of problem to solve, concerning deviant circuits, where Ohm’s Law doesn’t hold, but the variant does: V = I/R. It’s a simple matter to calculate voltages by using this rule: the human’s performance is nearly perfect straight away. This performance is brought about by a single instruction: use this rule, not that one. Could a PDP system achieve the same result, so quickly and easily? No, it would have to be retrained from scratch, using lots of examples …

A nice example of how working with explicit symbolic rules is essential to smooth and efficient problem-solving.

Summing Up: F & P are quite prepared to grant that some low-level cognitive tasks (such as pattern recognition, data storage) may be performed in connectionist ways. But they insist that the bulk of higher-level cognition requires and involves the rule-governed processing of symbol. If this is the case, then if our brains are vast connectionist systems, these systems must implement classical symbol-processing systems. Symbol systems are virtual machines, run on connectionist hardware.

The important point: in understanding higher cognition, the relevant algorithms concern rule-governed symbol manipulation. Purely connectionist algorithms, concerning the numerical values of connection weights etc., have nothing to contribute to our understanding of higher cognitive processes.

So: connectionism is irrelevant to the cognitive science of higher cognition, for cognitive science is concerned with the kinds of information processing which intelligence requires, not the hardware that can be used to implement this processing. These hardware questions are the concern of engineers.

BETWEEN IMPLEMENTATIONISM AND ELIMINATIVISM: PTC & ICS

In an influential paper 1988 paper, “On the Proper Treatment of Connectionism”, Paul Smolensky argued that the right lesson to draw from PDP systems differs from both eliminativism and implementationism.

- there is a middle way, which recognizes both the power of classical symbolic computation, and the novelty of PDP systems, but which doesn’t say that classical computation (or folk psychology) doesn’t exist or occur, and which doesn’t say that if PDP systems with capabilities similar to our own must simply be classical symbol processors.

In broad strokes, Smolensky’s PTC picture is this:

the hardware of our brains runs two virtual machines
one is a symbolic rule processor: the conscious rule interpreter
one is a sub-symbolic PDP system: the intuitive processor
the conscious rule interpreter is responsible for processing information that is linguistically encoded, “cultural knowledge”.
the intuitive processor is responsible for “all of animal behaviour and a huge portion of human behaviour: perception, practiced motor behaviour, fluent linguistic behaviour, intuition in problem solving and game playing – in short, practically all skilled performance.”

This two level view might seem to be indistinguishable from the ecumenical position of Fodor and Pylyshyn, save for a difference in emphasis: both camps accept that some intelligence is the product of symbol manipulation, whereas other intelligent behaviour is the product of sub-symbolic processing, it’s just that Smolensky thinks the sub-symbolic processor has more far-reaching responsibilities than F&P.

But this would be mere implementationism, which Smolensky rejects.

PTC differs from implementationism in this crucial respect: according to Smolensky, our cognitive activity is connectionist through and through, or all the way down. Although part of the network can be interpreted as a symbolic processor, it isn’t really processing symbols in accord with rules, in the ways specified by the classical paradigm.

- connectionists are proposing a new cognitive architecture for all aspects of cognition, symbol processing included.

- certain parts of a PDP system mimic symbol processing systems, they approximate to classical systems, and this is why we can interpret them as doing symbol processing

- but connectionist systems only ever approximate classical systems, they never actually become or implement them

Why is this? Smolensky argues that for a system to implement a classical architecture, it would

have to perform the same functions (its inputs/outputs could be interpreted as consisting of symbol manipulations, e.g. addition), and
accomplish these functions by using the same algorithms (follow the same computational processes, i.e. their causal processes would be structurally analogous)

Smolensky argues that in a connectionist system, although its inputs/outputs and some of its internal states can be interpreted semantically, these “semantic constituents” do not, and cannot, feature in the precise algorithms which describe/explain how the system actually behaves, how it does what it does. The constituents which do figure in these complete algorithms are not semantically evaluable, i.e. they cannot be interpreted as standing for common sense concepts or propositions.

The complete behaviour of the system is explicable in connectionist terms; the connectionist theory posits certain items, which possess certain causal roles, and the behaviour of the system emerges from the way these items interact. However, these items cannot be semantically interpreted.

This explains (in general terms) why PTC differs from implementationism, but why does it differ from eliminativism?

Smolensky is clear on this point: he sometimes calls PTC limitivism to make the difference from eliminativism explicit.

PTC isn’t eliminativist because it recognizes features in a cognitive system which have some, but not all, of the features of propositional attitudes.

Recall the RSG case for eliminativism. They argued that folk psychology is committed to propositional modularity, according to which a belief is (a) semantically evaluable, (b) functionally discrete, (c) has a causal role in the production of behaviour, and (d) is a genuine projectible kind.

Smolensky suggests that the connectionist beliefs have three of these four properties: they are semantically evaluable, functionally discrete, and are genuine projectible kinds. They lack only one thing: they don’t have any causal roles. This is because all the causal work is done by sub-symbolic constituents: in the precise algorithmic description of how the system manages to do what it does there is no mention of constituents which are semantically evaluable.

I’ll come back to why this is meant to be the case shortly. So far as the eliminativism/limitivism distinction is concerned, the important point is that Smolensky thinks that there is a level of description of connectionist systems at which it does make sense to recognize entities which are very similar to beliefs, and these are also (in an abstract sense) functionally discrete (roughly speaking, individual beliefs can be identified with dispositions of networks to enter particular activation patterns when presented with certain inputs).

so in a sense, there are beliefs; accepting PTC means accepting that there are entities which are similar to, but not identical with, the folk psychological concept of “belief”
the similarities are too great to warrant elimination
instead, we should view the connectionist account as a refinement of the classical account
we get a new, deeper, more accurate, theory of what a belief is
the folk theory is refuted, but the entities it posited are retained, in a new guise

Smolensky, and other connectionists, are fond of an analogy from physics. Quantum theory is what we currently believe; we used to accept Newtonian (classical) theory. Both theories are about the same entities: what makes up the physical world, particles, forces, etc. But the quantum theory is entirely different from Newtonian theory, it offers a fundamentally different picture of the world. But the world recognized by the two theories is much the same in one sense: it’s the world of planets, people and stars.

the difference is, the theories offer competing accounts of what these entities are, of what they are constituted from, of the laws governing these constituents
despite these fundamental differences, the Newtonian and the Quantum theories offer the same predictions for a wide variety of circumstances
that is, under some circumstances, the world looks/behaves as though it is a Newtonian world, and in certain circumstances, the Newtonian theory gets it very nearly right: under ideal (most favourable) circumstances, the Newtonian theory approximates the true theory
but in reality, the world is Quantum all the way down

In analogous fashion, the mind is connectionist or sub-symbolic all the way down, but under certain circumstances, it approximates a classical system: it behaves as if it is processing symbols. In fact, it isn’t: the classical theory is false.

But this doesn’t mean we ought to go eliminativist. A connectionist system’s “conscious” rule processor has enough of the properties of a classical system for it to be legitimate to regard it as handling beliefs, and the other entities familiar from folk psychology.

Although Newtonian physics provides an accurate predictions under many circumstances, there are circumstances when it breaks down, and gives predictions which are wide of the mark. In similar fashion, the classical picture of cognition provides a reasonably accurate picture of how our minds work under some conditions, but there are conditions when it breaks down – when it does break down, the predictions of the true, connectionist, theory come into their own. As Smolensky puts it:

The rich behaviour displayed by cognitive systems has the paradoxical character of appearing on the one hand tightly governed by complex systems of hard rules, and on the other to be awash with variance, deviation, exception and a degree of flexibility and fluidity that has quite eluded our attempts at simulation. Homo Sapiens is the rational animal, with a mental life ruled by the hard laws of logic, but real human behaviour is riddled with strong non-rational tendencies that display a systematicity of their own. Human language is an intricate crystal defined by tight sets of intertwining constraints, but real linguistic behaviour is remarkably robust under deviations from those constraints. This ancient paradox has produced a deep chasm in both the philosophy and science of mind: on the one side, those placing the essence of intelligent behaviour in the hardness of mental competence; on the other, those placing it in the subtle softness of human performance

The sub-symbolic paradigm suggests a solution to this paradox. It provides a formal framework for studying how a cognitive system can possess knowledge which is fundamentally soft, but at the same time, under ideal circumstances, admit good higher-level descriptions that are undeniably hard. The passage from the lower, sub-symbolic level of analysis to the higher, conceptual level naturally and inevitably introduced changes in the character of the sub-symbolic system: the computation that emerges at the higher level incorporates elements with a nature profoundly different from that of the fundamental computational processes.

Why does Smolensky deny that we really process symbols?

Why does he say: “Symbols, and the rules governing them, therefore have a most curious status: they are real in the sense of governing the semantics and the functions computed, but not real in the sense of participating in a causal story, capturable as an algorithm.” (“Integrated Connectionist/Symbolic Architecture”, p.225)

An adequate account of this would require a lot of detail; this isn’t the place. But the basic point is quite simple.

The large-scale behaviour of PDP systems (their input-output patterns over time) can be interpreted in common sense (semantic) terms
At a lower level, of individual units, activation values, weights, etc., we have a complete and precise algorithmic description of what the system does, how it behaves, what causes what, etc.
The activity at this lower level cannot be interpreted semantically. The activities of individual units, and small collections of units, is meaningless.
These algorithms cannot be extended to the large-scale patterns; the way the individual units behave at the small scale is so complex, that there is no way of extending the precise account upward, to the large-scale behaviour of the system
So, there is no algorithmic account of the way the system behaves when described in high-level terms.
Hence we have an intrinsically split-level architecture: there is no account of the architecture in which the same elements carry both the semantics and the syntax.
“Thus we have a fundamentally new candidate for the cognitive architecture which is simply not an implementation of the classical one.” (Where syntax and semantics have the same vehicles.)

____________________

How can we sum up this debate between classicists and connectionists?

(1) There is something approaching a consensus on the need for a mixed architecture: some sub-symbolic processing is needed, but some symbolic processing is also needed. This point remains even if the symbolic processor is really connectionist all the way down - the connectionist sub-system closely approximates a traditional computer.

- this means the eliminativist consequences of connectionism are not so drastic; it may well be that language-based propositional attitudes will play an important role in advanced cognition

(2) This said, if connectionist processing does play an important role in cognition, which seems quite likely, then the classicist idea that cognition consists in the manipulation of symbols is at least partially undermined. The phrase “manipulation of symbols” in this context should be taken in a quasi-linguistic way.

- one of the the classicists’ core ideas is that reasoning is (i) language-based, (ii) akin to logical argument - remember the inspiration provided by the syntactic nature of formal logic.

- but: at a fundamental level, connectionist systems do not work in this way; they do not process discrete symbols in accord with syntactical rules. (Though, as we have seen, connectionist systems can “implement” classical processors.)

So: it seems quite possible that a good deal of cognition is pre-linguistic, pre-conceptual. There may be a “language of thought”, but it may also be that a good deal of what we call “intelligence” does not involve or require this language.

Note: the need for sub-symbolic processing receives support from another direction. A number of impressive AI research projects adopt an intermediate position: neither purely connectionist nor classicist. They work with normal programs, but the programs govern the manipulation of sub-symbolic ingredients, and in a stochastic (probabilistic) manner. E.g. Douglas Hofstadter’s Fluid Analogies Research Group (see the recent collection of papers, in Penguin, Fluid Concepts and Creative Analogies)

(3) Assuming that some cognition is connectionist, where does this leave the Weak and Strong Symbol System theses? Answer: there is no easy, straightforward, answer. It depends on what we mean by “symbol”.

- if by “symbol” we mean something like a word, which is manipulated in accord with syntactical rules, then both these are weakened. At best only a part of cognition consists of symbol-manipulations

- if by “symbol” we mean pattern in a very broad sense, then a connectionist system is a pattern manipulator: the patterns in question are the distributions of connection weights and patterns of activation triggered by different inputs.

(4) It’s also worth pointing out that, as things stand, connectionist systems are fully computable: they don’t do anything which can’t be replicated on a classical machine: in fact, nearly all experiments with connectionist systems are carried out on classical machines - most neural network-type exist only as virtual machines.

(5) Perhaps the most important philosophical lesson that connectionism has to teach is this: if a significant part of high-level cognition relies upon sub-symbolic processing, then there is a limit to the extent to which the workings of cognition itself is cognitively transparent: i.e. some of the mechanisms underlying intelligence are inexplicable - there is a sense in which we don’t understand how or why they work …

- recall the idea mentioned earlier: we might produce programs with high-level abilities by a process of natural selection; this does work, but the resulting programs are often unintelligible: the scientists can’t work out why they work

- of course, they also know exactly what is going on when the program runs - in this sense, nothing is hidden. So they do know how the machine works - in a sense.

- the claim that the workings of the program are “unintelligible” really means this: a certain model of explanation cannot be applied - the workings cannot be explained in a certain way

- what is this mode of explanation? Well, you might say it is a reductive explanation, or an engineering mode of explanation …

- Think of how an engineer designs a complex machine such as a plane: there are various components, doing various jobs, all organized and connected in a certain way. We’re happy with this sort of explanation: we can understand the functioning of the whole in terms of the interactions between simpler parts.

- A classical computer program works like this: there are recurrent and combinable elements (word-like symbols) which are manipulated in accord with certain rules. The workings of such a program are transparent: we feel we understand exactly what is going on

- Sub-symbolic processing systems, such as connectionist devices, don’t work in this fashion, and so can’t be explained in engineering/reductive terms. As a result, we feel that we cannot understand what’s going on. But of course we can: the program (the distribution of connection weights) is perfectly clear and intelligible. Our problem is that we are accustomed to, and happiest with, a certain style of explanation, a style which is simply inapplicable to certain sorts of machine.

____________________

So, where does this leave things?

(1) As things stand at the moment, there is no conclusive empirical evidence that a computer of any kind will soon be capable of human levels of intelligence. Connectionist machines can do some things that traditional computers find hard, but research is still in its infancy.

(2) There is no a priori argument which guarantees the success of AI (the Church-Turing thesis); but likewise, there’s no a priori argument which guarantees that it will fail - or at least, there’s no argument (e.g. Searle, Lucas-Penrose) which is widely accepted as showing this. The question is at best controversial.

(3) There are a few considerations which do support AI:

- it’s true that as soon as a computer can perform a task we well has humans we tend to say “Ah, we always knew that that task didn’t require real intelligence”. Fifty years ago, the ability to do fast and accurate arithmetic in one’s head, or play chess better than anyone else in the world, would have been taken as paradigms of what intelligence was all about. Now computers can do it, we take a different view. So: because of this bias, it could well be that computers have already made more progress that AI’s critics like to admit

- it’s early days in the field, and computer processing power is increasing at an exponential rate (i.e. after a slow start, it will get much more powerful very quickly)

(4) We should bear in mind the possibility that even if we do build an intelligent machine, we may well not understand how it works, in any clear detail. The workings of cognition may be opaque.

(5) It’s still an open question whether or not consciousness plays a crucial role in intelligence. Most deny that it does, but if they are wrong, then success in AI will require the creation of a conscious machine - a different proposition altogether.