*Note: some facts, dates, or historical events might not
be accurate in this page. Don't use the information given
here to make a homework or something like that. Not even to
mention that the english is far from excellent...*

Sudoku is very popular. I don't know why. Stuff happen.

Someone I know began to play with it. Then I read
an article
in a "scientific" journal (*IEEE spectrum* to name it) which
explained some stuff about the game.

I said to me: *hey! let's be original and solve this little
game! And let's do it using a SAT solver!*

Of course, some people already did it. This is my nightmare. When I have a funny idea, someone somewhere already did it. But that's the life.

What does not really exist is a web page that explains all the little details about this SAT solving of sudoku.

So let me explain what I know about this affair.

And let me do it in great details.

Everyone knows about sudoku. You have a 9x9 grid divided in small squares of 3x3. The goal is to put numbers 1 to 9 in the grid so that each line, each column and each 3x3 square contains one and only one time each of the numbers.

Here is an example:

8 3 7 | 5 9 2 | 1 4 6 5 6 1 | 4 7 8 | 9 3 2 4 9 2 | 1 3 6 | 7 8 5 ------+-------+------ 3 8 9 | 7 6 5 | 4 2 1 2 4 5 | 8 1 9 | 3 6 7 7 1 6 | 3 2 4 | 8 5 9 ------+-------+------ 9 5 3 | 2 8 1 | 6 7 4 1 7 4 | 6 5 3 | 2 9 8 6 2 8 | 9 4 7 | 5 1 3

I love to draw pictures with text. It gives me some kind of weird philosophical feeling. But that's not the point.

You can easily check that each line, column, and 3x3 square contains the numbers 1 to 9 one and only one time.

Of course, for you to play the game, a grid is not given to you fully filled. Some numbers are hidden. And your goal is to put the right number at the right position.

By using some weird possibilities of your brain, you find the numbers.

Normally, a given grid has only one solution. But I could see some grids with more than one solution. Shame on the authors of those! (They will remain anonymous.)

The game is said to have been invented in the 1970's by an USAsian, if someone cares about it. Well, Euler also is involved. Anyway, go here for some very good historical background of the game.

There are variations on the game. Some people propose a bigger grid (16x16, 25x25, ...) or weird ones like Samurai Sudoku (scroll down to see it), and we have stuff like Duplex Plex Sudoku, Duplex Sums Sudoku, Duplex Difference Sudoku, and probably billions of others. When something is popular, this kind of derived work becomes the norm. (Of course, I cannot prove this claim.)

Note: this is not really sudoku which is NP-complete.
This is the ASP version of sudoku that is. *ASP* stands
for *Another Solution Problem*. We want to know if,
given a solution to a sudoku, there is another solution or not.
The answer will be *yes* or *no* so we have here
a *decision* problem. These are the problems for which
NP-completeness is defined. But more on that very soon...

Ok, we have a big stuff to explain here.

What is 'NP-complete?'

Well, let me say it in a few words.

You have a problem, with certain unknown stuff. And you want to know if there is a solution to this problem. For example you have a bag and objects. And you want to know if, given the geometrical dimensions of the bag and the objects, you can put the objects in the bag.

So, how do you do that? Well, that's simple, you try to put the objects in the bag. If you can put everything in you're done.

Well, in the real life, it is not that easy. It might be that the
bag is very very big and the objects are very very heavy. So you would
like to know if you can do it without actually doing it. You want
to *plan* the packing. Because what happens if you cannot put
everything in? Maybe the first object is not put correctly? So you
would need to take everything out and do all again from the beginning.
Woo, no one would do that!

So, some people began to dig into problem solving, so that you can decide how you would pack your bag before doing it. And they found it was damn hard to get a good method to do it.

Later in time, there was this automation process. Algorithms, machines (not yet computers at the time) and the like.

So people began to program, you know. So they said: hey! let's use an algorithm to solve our problem. So they tried. And could not do it.

One guy came one day and said: hum, there is a problem here. We want to solve some problems (in the meantime they wanted to do more than packing bags) but we don't find good algorithms and programs, are we stupid or what?

So he created an abstract model of a computer to think about all this stuff. This is the Turing machine and the guy is, well, Turing. He was a fag who killed himself after the world war 2, because England condamned homosexuality at that time. They didn't care that this guy worked hard to help defeating the Nazis. He was just a stupid fag and nothing else.

You can read an electronic version of his paper
*On computable numbers,
with an application to the Entscheidungsproblem* (I don't know if
it is correct or not, especially the figures) if you want
to know in details what
a Turing machine is. Yeah, the guy is obsessed by mathematics,
but it's just to look intelligent. What he really cares about
is how to pack his bag for his week-end with his (boy)friend.

And then people used this model, which looked fun.

They invented the P stuff, which has the following
definition. A problem is said to be in P if you have
a Turing machine that can find a solution in polynomial
time. Polynomial means that
it takes a number of steps proportional to the size of the
problem. If you increase *a bit* the problem, you
also increase *a bit* the number of steps. There exist
problems for which you have to increase *a lot* the number
of steps to check for a solution. Yeah, it looks weird, but
these people are weird.

And then the NP stuff, which has the following definition. A problem is said to be NP if you can find a Turing machine that, given a solution to the problem, can check it in a polynomial time.

What did they care? Because they had problems for which they could not find a Turing machine in P, but they had one in NP. So, they formalized a bit these notions, for the fun of it. (Yeah, it's fun. At least for them.)

Now, for the NP-complete stuff, it just means that you have a problem, that you know is NP (you have this nice Turing machine that can check the validity of a solution in polynomial time) and you would like to compare it with this other problem. So that maybe if you can solve one (remember that NP means that you can check if a solution is good or not, but you don't know how to find one) you can solve the other one too. So, a problem is NP-complete if you can transform all the other problems in NP into it in polynomial time.

That would be great, no? You just need to solve one (NP-complete) problem, and all the other (NP) problems are solved, as by magic!

That's where NP-complete comes. A problem is NP-complete if you can transform any NP problem into it.

Does this kind of problems exist? Damn, yes! Otherwise no one would care about this stupid idea of NP and NP-complete. That's where the fun arises. (See below for the paper by Cook demonstrating that SAT is NP-complete. It was the first paper proving such stuff. Weirdos in computer science find it very very important.)

And Sudoku (well, ASP-sudoku) is one of those!
Read the paper
*Complexity
and Completeness of Finding Another Solution and Its Application to
Puzzles* by Takayuki YATO and Takahiro SETA if you don't believe
me (as of 2008-04-10 the link is dead, try
this
one instead). (If you understand the stuff, good
for you, I can't say I don't, I wouldn't look very serious, would I?)

Anyway, Wikipedia has some more serious information about it. Look for P, NP, NP-complete, Turing, Turing machines and the like on it if you want.

What you really need to know is that people don't know how to solve some problems efficiently. So they invented formalisms to classify these problems. NP-completeness is such a classification. If you know that a problem is NP-complete, you get the intuition that it's hard to solve. But you are not absolutely sure. Maybe there is a nice algorithm waiting to be discovered. This is the P versus NP problem. Is there a problem in NP that is not in P? We don't know. If you find the answer, tell the world and then take some vacation because your name will be famous for a few thousands of years, so that you won't have to work like hell to give some food to your kids.

But that's enough. Let's move on.

So, I did a solver. Can I prove it? Yes I can!

*2014-04-02: the following code may fail to compile with
modern systems. Only msudoku2 is known to work.*

Here is some code:

- zsudoku.0

the first version of my solver, pretty barbarian. It uses zchaff as SAT solver. Read`sudoku.c`

(very undocumented, I won't explain its working further). - zsudoku

the previous code was too hairy and encoded the problem in a very bad way, leading to very big CNF formulae (see below to know what is a CNF formula). So I did this version, much better. - msudoku
(msudoku2 released 2014-04-02 works with
modern systems)

I wanted to try other SAT solvers. So I checked satcompetition.org to see what was the state of the art of the day. It appeared to be minisat. So here is a version using minisat as SAT solver. It's faster than zchaff (for the empty grid, where you must guess all cells, zchaff uses more than 10 seconds and minisat not even 0.1 second).

Note that I wrote this webpage in march 2006, maybe things are different when you read it. Maybe minisat is out of the game or does not exist anymore. But I don't control the evolution of other webpages, you know. So don't blame me for not trying the SAT solver of the (your) day. I can't spend my time checking the latest stuff that appears on earth. - mdudidoku

This one solves Duplex Difference Sudoku. Yes, I wanted to be a little original, so I made this. (I won't explain its working.)

After writing some code (yes, *after*, shame on me), I wondered
if some other people did the same (you know, I wanted to be original).

Of course, there are billions of solvers out there.

Here is a small list:

- SuViSa

SAT based, not the same encoding as mine, mine is logarithmic, the one here is linear. - magictour.free.fr

SAT based. You can find there programs to generate and rate sudoku. - Robert Sandilands' solver

I don't know how it works. There is some "thinking" in there, it's definitely not SAT-based. This program can solve NxN sudoku. - more stuff

Where you find solvers in prolog, haskell, javascript, etc.

And just for the sake of it, a Sudoku Puzzle Generator.

But enough with those links.

Let me explain to you how the solver works. You will learn what a CNF formula is, what SAT is, what the DPLL procedure is.

Ok, so you have a sudoku problem and you want to use a computer to solve it. How do you give the sudoku problem to the computer (more specifically the SAT solver)?

Well, that's simple. You transform the grid into a SAT instance in CNF form and then run the SAT solver on it to get a solution out of it.

You don't get the point?

Ok, let's enter the details.

First of all, let me explain what a CNF formula and SAT are.

You have a sudoku, and you need to encode it as a bunch of clauses, what we call a CNF formula (CNF means conjunctive normal form). Go to Wikipedia for a definition of CNF.

To be short, a CNF formula is a set of clauses. For the CNF formula to be true, each clause has to be true.

A clause is a bunch of atomic propositions. These atomic
propositions are made of variables, that can
be true or false. For the clause to be true
one of the atomic proposition has to be true
(several of them can be true, we need *at least* one).

It's not very clear, so let me give you an example of clause:

A or B or !C

This clause if true if A is true. It is also true if B is true. And still true if C is this time false.

(You are familiar with boolean variables, aren't you? These are variables that can be true or false, ie. can be 0 or 1, light on or light off, binary. Here A, B and C are this kind of variables, they can take only two values.)

It looks simple, no? It should not be that hard to find a solution, should it?

In the reality you may have a lot of clauses and some variables may appear in several clauses. So when you say: ok, this clause is true if A is true, so I set A to true. Now, this other clause is true if A is false or B is false, so because A is now true I must have B false. Now I have this third clause that is true if B is true or C is true, but because... etc.

As you see, you apply a reasoning. In your reasoning you sometimes
have to *choose* values for variables. And this choice might
be the bad one. So in this case you would have to remove all your
reasoning and do all over again with the other choice.

You could say: hey! that's simple! let's *not* choose at
all. Let's pick up the good value, as imposed by the problem.

Sure, it would be nice to have this. But for the moment, no one knows how to do that. And a lot of people think it is not possible. I am on their side. I think that there does not exist this kind of magic method to know whether a given CNF formula is true. It's the famous P versus NP problem.

So you now have your CNF formula. And you want to know if there is a solution for it, ie. if you can give values to the variables so that all the clauses are true.

That's the SAT problem.

Why give a name to such a simple problem?

Because it has been proven in the past that SAT was NP-complete. In fact SAT was the first problem to be proven NP-complete. That's why it has a name and that's why so much people waste their time trying to create fast solvers for it.

Go read the paper of Stephen A. Cook,
*The
Complexity of Theorem-Proving Procedures*,
written in 1971, that demonstrates that SAT is NP-complete.
It's a very important paper. Cook only wrote around 60 papers in his whole
academic life (which is not over yet), which is not a lot compared to
others. But, hey, he wrote this one, so no one cares that much if he
does not produce too much other stuff.

What is funny is that he says that he tried to prove that PRIME is also NP-complete, but he failed.

We now know (demonstrated in 2002) that PRIME is in P (article written by third-world people: Manindra Agrawal, Neeraj Kayal and Nitin Saxena, yeah you guess well, they are Indians), so it's not a big surprise that Cook could not prove it to be NP-complete (serious people (me for example) think that P is not equal to NP).

By the way, PRIME is to know whether a given number is a prime, ie. if it is divisible only by itself and by 1. This kind of numbers is important in electronic security for example, that's why we care so much about them. Otherwise, they are no more sexy than the others.

Let me now explain how to translate a sudoku into a CNF formula.

You have your grid. And the rules that each cell must obey.

Here is how I encode a sudoku problem into a CNF formula.

First of all I need to encode a cell. I choose to use 4 boolean variables per cell for that purpose. It gives me 16 possible values. For example to encode 5, you have the binary value 0101 which states that two variables will be 0 and two others will be 1. So each cell will give 4 boolean variables. We have 9x9 cells, so all in all 9x9x4 = 324 boolean variables. Got the point?

After that, the rules. Here they are:

- each cell must have value 1, 2, 3, 4, 5, 6, 7, 8 or 9
- all the cells of a column must have different values
- all the cells of a line must have different values
- all the cells of a square must have different values

To encode the first rule, I state that a cell must be different from 10 AND different from 11 AND different from 12 and different from 13 AND different from 14 AND different from 15.

Why that? Because I want clauses, and using the other way (the cell is 1 OR the cell is 2 OR ...) is not in CNF form.

Then to encode the fact that a cell is different from a value x,
it simply says that vb_{1} different from x_{1}
OR vb_{2} different from x_{2}
OR vb_{3} different from x_{3} OR vb_{4}
different from x_{4}, where vb_{i} are
the boolean variables of the cell and x_{i} are the boolean values
of the number x to compare to.

To encode vb_{i} different from x_{i}, it's simple.
If x_{i} is 0 then we want vb_{i} to be 1, so we have
the atomic proposition vb_{i} and if x_{i} is 1 then
we want vb_{i} to be 0, so we have the atomic
proposition !vb_{i}.

So we have a set of clauses to encode the first rule. (Just glue all the previous stuff together and you're done.)

For the second rule, let's name the cells cell_{1} to
cell_{9}. We want all these cells to be different. It means
that we want them different two by two.

Here is what I wrote in the code to document this case (cells X and Y have to be different, noted X != Y):

we have 2 vars: X and Y that have 4 bits, so we have X1, X2, X3, X4 and Y1, Y2, Y3, Y4 X != Y means: X1 != Y1 OR X2 != Y2 OR X3 != Y3 OR X4 != Y4 X1 != Y1 means (X1 AND !Y1) OR (!X1 AND Y1) is true We want a CNF form for that, so we apply the wonderful rule: (A AND B) OR C = (A OR C) AND (B OR C) which gives an exponential blow up, but we don't care here. We finally get by multiple applications of the rule that: X1 != Y1 means (X1 OR Y1) AND (!X1 OR !Y1) We have the same for X2 != Y2 etc. Now we have: ((X1 OR Y1) AND (!X1 OR !Y1)) OR ((X2 OR Y2) AND (!X2 OR !Y2)) OR ((X3 OR Y3) AND (!X3 OR !Y3)) OR ((X4 OR Y4) AND (!X4 OR !Y4)) We still have to apply the same rule to take out the AND and take in the OR. We finally find 16 clauses, of shape: T1i(X1, Y1) OR T2i(X2, Y2) OR T4i(X3, Y3) OR T8i(X4, Y4) where Tji is true if bit j of integer i is 1 and when Tji is true then Tji(Xn, Yn) = Xn OR Yn and when Tji is false then Tji(Xn, Yn) = !Xn OR !Yn

Is it clear? It gives 16 clauses of 8 atomic propositions each. It's barbarian, that's for sure. But it works!

Let me write down the 16 clauses for you to fully understand:

X1 OR Y1 OR X2 OR Y2 OR X3 OR Y3 OR X4 OR Y4 X1 OR Y1 OR X2 OR Y2 OR X3 OR Y3 OR !X4 OR !Y4 X1 OR Y1 OR X2 OR Y2 OR !X3 OR !Y3 OR X4 OR Y4 X1 OR Y1 OR X2 OR Y2 OR !X3 OR !Y3 OR !X4 OR !Y4 X1 OR Y1 OR !X2 OR !Y2 OR X3 OR Y3 OR X4 OR Y4 X1 OR Y1 OR !X2 OR !Y2 OR X3 OR Y3 OR !X4 OR !Y4 X1 OR Y1 OR !X2 OR !Y2 OR !X3 OR !Y3 OR X4 OR Y4 X1 OR Y1 OR !X2 OR !Y2 OR !X3 OR !Y3 OR !X4 OR !Y4 !X1 OR !Y1 OR X2 OR Y2 OR X3 OR Y3 OR X4 OR Y4 !X1 OR !Y1 OR X2 OR Y2 OR X3 OR Y3 OR !X4 OR !Y4 !X1 OR !Y1 OR X2 OR Y2 OR !X3 OR !Y3 OR X4 OR Y4 !X1 OR !Y1 OR X2 OR Y2 OR !X3 OR !Y3 OR !X4 OR !Y4 !X1 OR !Y1 OR !X2 OR !Y2 OR X3 OR Y3 OR X4 OR Y4 !X1 OR !Y1 OR !X2 OR !Y2 OR X3 OR Y3 OR !X4 OR !Y4 !X1 OR !Y1 OR !X2 OR !Y2 OR !X3 OR !Y3 OR X4 OR Y4 !X1 OR !Y1 OR !X2 OR !Y2 OR !X3 OR !Y3 OR !X4 OR !Y4

It should be clearer now. If you want you can take the X != Y stuff and do all the calculation by yourself. It's fun. (Yes it is.) (And no I am not a computer scientist, because I failed in finishing my PhD.)

And you are done. You can use this method for rules 2, 3 and 4, by choosing the good cells.

You are left with a bunch of clauses that you give to the SAT solver.

Now, let me explain a bit how the SAT solver works.

First of all you have this resolution procedure. Go to Wikipedia to get it full.

To be short, the algorithm takes the clauses and creates new ones, up to a contradiction or no more new clauses can be found.

This is more or less what we want, but not exactly. We want to
find *explicit* values, so we need more than just that.

There is this Davis-Putnam algorithm, the beloved one. In fact it's not exactly that, it's the Davis-Putnam-Logemann-Loveland algorithm that we want. This is not the same.

See on Wikipedia, the Davis-Putnam algorithm and the Davis-Putnam-Logemann-Loveland algorithm to get the point.

To be short, the Davis-Putnam-Logemann-Loveland algorithm uses
resolution. When it cannot use it anymore, it decides to assign a value
to a variable and do it all again. The key point here is in the choice
of the variable. If you choose well, you may be fast. If you choose
bad, you will be slow like hell. (But see 4.3 below, there are cases
where whatever choice you make, you will *always* be slow like hell.
People wanted to know if this resolution stuff was good
for everything or not. It turned out it's not. Back to your
quest my dear computer scientists, resolution sucks.)

For your enjoyment, you can read the article describing the Davis-Putnam algorithm here. It was published in 1960. You can also read the article describing the Davis-Putnam-Logemann-Loveland algorithm here. It was published in 1962.

For some modernity, you can read articles describing minisat (published in 2003 and 2005), for you to see how to optimize stuff. On the webpage are also some papers describing other SAT solvers. It's a good reading I think.

Let me give some issues, to put a bit more value in this webpage.

I encode each cell with 4 boolean variables. Some people use
9 (read *Sudoku
as a SAT Problem* by Inês Lynce and Joël Ouaknine for example). Why not?
What is best?

I should compare, both in term of size of clauses, number of clauses, time to solve the problem and the like. But, hey, it's just sudoku! I won't waste too much of my time on it, will I?

But anyway, maybe we can encode the problem better, with less clauses, less variables, and the like.

Some people wrote scientific papers about sudoku. Let me introduce two of them.

The first one is called
*A
SAT-based Sudoku Solver* (written by Tjark Weber)
and was publish in december 2005, in the
12th
International Conference on Logic for Programming, Artificial Intelligence
and Reasoning, happening in Montego Bay, Jamaica. You had to
pay
from
300 to 500$ (not including transportation and hotel) to attend the
conference. I guess that the guy didn't pay it, his university did. It is
often like that in the academia. (By the way, researchers spend some of
their time travelling (for free most of the time), so if you like to
travel, go and be a researcher!) So you write some stuff about sudoku
and you get a few days of vacation in Jamaica. Nice life, no?

But wait! In january 2006, two guys (Inês Lynce and Joël Ouaknine) published
*Sudoku as a
SAT Problem*, presenting their research at the
Ninth International Symposium
on Artificial Intelligence and Mathematics, held in Fort Lauderdale,
Florida. The registration was
between 150
and 340$. Once again I guess the guys did not pay that, their
institution did (if they both went, maybe only one did). What a fun life.
You write a few pages about sudoku so that people have some fun at the
conference and then you are off for a few days in Florida. Kewl!

The research community is great, definitely. (Don't come and say it's a waste of resources. Their work is important and deserves that the society supports it. Yes it does!)

Since with the DPLL procedure you have to make choices, it's good
to do the *right* choice. Of course, no one knows what the
*right* choice is, but by studying the clauses, maybe you
can find a *better* one, no?

Some people work on this.

Check the satcompetition.org website for the state of the art in SAT solvers.

The use of DPLL and resolution is far from optimal.

In 1985, the paper
*The Intractability of Resolution* was written
by Armin Haken (sorry, I could not find it online, if
you have it, send it to me). It explains that for
the Pigeon's hole problem, which says that you cannot
put *n+1* pigeons into *n* holes with at most
one pigeon in one hole, you cannot find a short resolution.

This was bad news. Resolution is bad. There are some problems for which you absolutely cannot decide quickly if they have a solution or not using resolution. Everyone was sad and some people even commited suicide. Where do you think the economic crisis of 1987 comes from, huh? (Yeah, the world needs some time to digest what computer scientists produce. Two years is damn short, when you think of it by the way.)

Anyway, there we are. Resolution is bad.

By the way, if you want to read some stuff about how to prove lower bounds for resolution and various problems, you can find tons of articles online.

There is
this,
which explains (but will you understand, I don't know) the Haken's paper.
Haken wrote an article with Cook (the one who proved SAT is NP-complete),
*an Exponential
Lower Bound for the Size of Monotone Real Circuits*, in 1995, using
the same technique of counting bottlenecks that he introduced in 1985.
You can read
*Hard examples for resolution* by Alasdair Urquhart,
written in 1987.
A lot of papers may also be found
here. Enjoy.
A last one to finish is
*Propositional
Proof Complexity: Past, Present, and Future* by Paul Beame and
Toniann Pitassi in 1998. Nice reading with some open (as in 1998) problems.
I don't know if some of the open issues are now closed or not. Anyway,
nice reading.

Now, let me express some ideas I've got concerning this SAT solving stuff. (I won't give pointers, maybe some stuff exists, I don't know. If you know, tell me.)

**Direct manipulation of clauses without finding an explicit instance**.
The SAT problem is to know whether there is a solution or not. We don't care
about an explicit solution. Is there a method that would directly manipulate
clauses and answer *yes* if there is a solution and *no* otherwise,
but without providing any explicit solution? (No, I am not talking about
*hiding* a solution, you can do it with minisat for example, I talk about
a program that would *never* create a solution but still give a real
answer to the question whether the problem has a solution or not.)

**Use mathematical methods** (with real or complex numbers for example).
Let's imagine you create an equation using your SAT problem. Maybe you
can solve this equation with some mathematical tools in a quick way,
and thus know if your SAT problem has a solution or not? Maybe
by using some calculus or I don't know. Maybe...

**Go into 'another' domain**. If you run on a
Koch snowflake
you hardly can get the surface it covers, because its length is infinite.
For some given iteration you can do it. You know that you move straight
for a length *n* then turn right at angle θ and move straight
for another length *n*, etc. You can deduce from that the surface
covered by the Koch snowflake. But for the limit of the series, you cannot
do it, because the length of the curve is infinite. So, if you want to
know the surface covered by a Koch snowflake, you need another way than
running around. You will probably use some mathematics and calculate
a limit of the series, which will give you the analytical solution, which
(if you are lucky) will give you an easy to calculate number. (I think
it does, I did it when I was a kid but don't remember exactly.) Maybe
there is some similar stuff for this SAT problem. Maybe this is a
mathematical formalism, maybe something different that needs to be
invented.

**Use 'some' physical devices** to do the calculation (biological
molecules, atoms, quantum properties of nature, etc.). Why not?
What we really care about is if there is a solution. What we want
to do is to encode the SAT problem *in a way or another*, let the stuff
we encoded it into go to *some state* that we would check
later in time. Sure, stated this way it's very vague, but why not?
A computer is just that, but with a lot of atoms and each step
of the calculation full of redundant tests that slow down everything,
but provide a reliable device to calculate stuff. But it's helly
slow. Maybe we can have some better physical device?

Anyway, some people think about it. And some people try to know what is the best we can have out of the universe and the laws we know about it. There is a paper, Computational capacity of the universe, published in 2002, written by Seth Lloyd if you want some ideas about it. I don't know if it is reliable or what, but hey, there you are. Judge by yourself.

In fact, the question people (of computer science) are interested in is the following one. Can we find a Turing machine's program that solves a SAT problem (or equivalent problems, ie. any problem that is NP-complete) in a quick way (both computationaly and in terms of time)? It's the P versus NP question.

But what do we care? The real question is: can we find a
*practical way* to solve a SAT problem quickly, whatever this
way is. If we need to pop bubbles to do it, let's pop bubbles!

For example, there is the paper Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer by Peter W. Shor, published in 1994. No one (when I write this, back in march 2006) knows of a good algorithm to factorize numbers on a classical computer (a Turing machine, to state it shortly), but we have stuff using quantum properties of the nature. See? We don't really care about the Turing stuff and the complexity zoo and the like. We want practical ways to handle problems, whatever these ways are.

Let me quickly present you a nice data structure that might be of some interest to solve a sudoku.

Some people wrote research articles about sudoku and SAT, but I could not find anything about sudoku and BDD. Maybe you can do something? Do the analysis, write some code to prove it works, and then write a paper for a conference somewhere in Tahiti or Australia or anywhere you would like to go for free (I suppose you work into an institution that would pay for your travel of course).

So what is a BDD? Go to
Wikipedia,
their article is nice. You can also go read the original paper
*Graph-Based
Algorithms for Boolean Function Manipulation*
by Randal E. Bryant, where you've got it full and with no deformation. It
was published in 1986.

To describe it shortly, a BDD is a (sometimes) compact representation of a boolean formula. You can do intersection (AND) and join (OR) of boolean formulae directly on the BDD.

It's sometimes compact, but not always. There are boolean formulae that have no compact BDD (formulae that describe multiplication for instance). I don't know for sudoku. I tried a bit but could not find anything useful. But since sudoku is NP-complete, I am not surprised at all. But even for 3x3 sudoku I could not find something. Maybe I did not construct the BDD in the right way? I take some small BDD and combine them together with AND and OR, maybe I don't do it in the proper order? I don't know. Dig into it!

By the way, some people wanted to compare resolution and BDD.
Well, it seems these are two different beasts. There are some
problems that are easily solved by resolution but not by BDD and
some problems that are easily solved by BDD but not by resolution.
Read
*Resolution
and Binary Decision Diagrams cannot simulate each other polynomially*
by Jan Friso Groote and Hans Zantema, written in 2000. It explains the stuff.
(Yeah, it was just before the economic crisis of 2002.)

But, hey, someone should do the analysis for sudoku, no?

Some solvers out there are less brutal. Some people encode their (brainy) reasoning into a program and let the program systematically apply the rules they invented until a solution is found.

It seems better than SAT, no? The program is *more aware* of
sudoku than a barbarian SAT solver.

But what about grids that are not solvable with some thinking?
What about grids where you need some choice at some point to get the
solution? *Ok, I can't find a solution. Let's suppose there is
a 7 in this cell and keep going.* See what I mean?

Anyway, this kind of solvers is good in that you can measure the difficulty of a sudoku. If you need only the simple rules to solve it, it's an easy sudoku. If you need the very weird and tricky ones you just discovered yesterday in a dream where a sudoku grid was fighting with George W. Bush to let the great green elephant escape to Mars, then it's a hard one.

How to design a sudoku is another interesting question. Some people make money with this activity.

Anyway, I don't know how to design a sudoku. If you know some free resources in the great Internet, contact me, I'll put the links right here.

Ok, so let me present you some philosophical implication of SAT solving of sudoku. Yes, there are some.

Is SAT solving thinking? I mean, we have this resolution
procedure, this DPLL procedure, but does it *think*? It
clearly solves a problem. But is it *aware* that it
solves a sudoku? Or isn't it just trying to find a point
of equilibrium where it cannot manipulate clauses anymore
and got an answer to the question whether this set has a
solution or not? What is thinking after all? What does happen
in your brain when you solve problems? Why wouldn't a computer's
program be said to *think* when it clearly finds a solution
to a problem?

Same for BDD, is it thinking? There the technique is different, but still, the computer gives an answer to a question, so doesn't it think? Hum?

What about "intelligent" solvers? Do they think? Here it's
even worse, because the computer *mimics* the way we human
solve this sudoku game. But you could tell me that the program
just follows the rules, it does not create new ones. So, thinking
would be to create rules to solve problems? What if someone somewhere
makes a program that invents rules to solve a sudoku? You would reply
that the program has been especially designed to find rules, so it
does not think. So what do you call thinking? Just what you do *you*
and your brain and nothing else? Or do you accept that some other
device might also be able to think, hum? Can something that is not
based on neurons think? Or maybe you think there is something special,
something more, that is needed to think, that neurons are not enought.
Maybe you think that there is a need for some kind of *soul*.
But let me tell you that souls do not exist, so it's not a good
answer.

Anyway, what about you? What about me? Are we thinking? Do you only exist? Hum?

Ok. Enough about what I did. It's a useless way to spend some time. What can you do in this business?

Well, prove that P is not equal to NP and you are done.

It is probably the case. Why? Well, I *feel* it this
way. What is a proof if not a path along a certain road?
And when you follow a road, you cannot go anywhere you want,
you just follow the road.

So you can imagine very very long roads, specially designed
and built so that anyone will need a very huge amount
of time to go from A to B (think about this Koch snowflake).
In other words, you can probably
define some problems so that, for *any thinking*, it
would require a huge amount of time to solve it. Yeah, you
can try some random values and by luck you can hit the good
one. But if the search space is damn huge this is not practical.

Imagine that you throw a 1.01cm long stick in the middle of one billion of 1cm long sticks. Then you mix the stuff and ask someone to pick up the 1.01cm long stick. No one will be able to do it in a reasonable amount of time. Your brain is probably convinced that picking up sticks at random is useless here. It's the same for values of digits.

Of course one can argue here. What do we care about a thinking method at all? We want a device that gives answers, without caring what's going on inside. Sure. But this does not exist. Just as some philosophers who wasted their time trying to prove that movement could not exist or that there is a God down here, you may spend the rest of your life with this idea. It would not give any advance in the business of P versus NP problem though.

No I tell you. Try to find some theoretical framework in which it is easy to see that P is not equal to NP. Yeah, it's tricky. But you would be famous for this one.

Unless, of course, if P equals NP. But I doubt it.
Thinking is a process. Data is manipulated over time, transformed
into other data. Steps *must* be followed, in a certain
order. I am pretty convinced that there are problems out there
for which, whatever thinking you use, the number of steps are
too huge.

One needs to prove it, though. Maybe we must go out of
thinking for that? Maybe we need to know more about the
thinking process in itself? Maybe we need to study the
universe and conclude from its nature that we absolutely
need to use this step's approach and we cannot bypass it?
But how could we do it without an *absolute* knowledge
of the nature of the universe? And is such a knowledge possible
to acquire? I guess not. So we are doomed.

But we are going a bit too far here. P and NP are defined for Turing machines, not for the universe as a whole.

So prove it using Turing machines and you would still have done something really big and important. (But still, if you can prove something like that out of the Turing machines' framework, your result can be meaningful in the particular Turing machines' case.)

Or maybe we just cannot prove that P is not equal to NP or that P is equal to NP. Think about this guy, Gödel. What do you think, hum?

Contact: sed@free.fr

Creation time:
Fri, 10 Mar 2006 14:03:16 +0100

Last update:
Wed, 02 Apr 2014 12:51:59 +0200

Powered by a human brain, best viewed with your eyes (or your fingers if you are blind).