Takes the square root of a 16-bit unsigned integer, returning the square root and a remainder. Note that:

number = root * root + remainder

Similar to (and inspired by) Lee Davison's square root routine (there is also a version in 6502.org source code repository), but this routine is faster and requires no temporary RAM storage.

- Input:
- NUMH = bits 15-8 of the number
- NUML = bits 7-0 of the number

- Output:
- ROOT = square root of NUM
- REM = bits 7-0 of the remainder
- carry flag = bit 8 of the remainder

```
;
LDA #0
STA ROOT
STA REM
LDX #8
L1 SEC
LDA NUMH
SBC #$40
TAY
LDA REM
SBC ROOT
BCC L2
STY NUMH
STA REM
L2 ROL ROOT
ASL NUML
ROL NUMH
ROL REM
ASL NUML
ROL NUMH
ROL REM
DEX
BNE L1
```

## How it works

Calculating an integer square root digit-by-digit is similar to shift-and-subtract integer division (which itself is a binary version of long division). Integer division produces two results: the quotient, and the remainder. Thus, when top is divided by bottom, the following identity holds:

top = bottom * quotient + root

Similarly, calculating the square root of n digit-by-digit also produces two results: the root and the remainder, and the following identity holds:

n = root * root + remainder

The main differences between the division and square root calculations are:

- For division, one digit of the partial dividend is shifted in (or "brought down" to use a term from long division) for each digit of the quotient; for a square root, two digits of the partial dividend are shifted in.
- For division, the divisor is the same for each iteration; for a square root the divisor is updated for each iteration.

An example may help to make this clearer; to keep the example short, the 8-bit number $AB (10101011 in binary) will be used. Thus, n = $AB and the square root of n will be calculated.

First, note that since $10 * $10 = $100, the root must be less than 16, and thus the square root of an 8-bit number will be 4-bits wide. Second, note that 8 * 8 = $40, so if n >= $40, the first digit will be 1, and if n < $40 the first digit will be 0. Therefore, the first first partial dividend is the upper two bits of n, The first digit of the root will be 0 if the partial dividend is 0; the first digit of the root will be 1 if the partial dividend is greater than 0. The first divisor is square of the first digit of the root; however, since 0 squared is 0 and 1 squared is 1, in this case the first divisor is the same as the first digit of the root. The divisor is subtracted from the partial dividend and the next two digits are brought down to produce a new partial dividend.

```
:
1 <- root
------------
\/ 10 10 10 11 <- n
10 <- first partial dividend
1 <- divisor
1 10 <- new partial dividend
```

The second divisor is where things get slightly more complicated.

Let A be the upper 2 bits of the n

Let B be the first digit of the root

Let C be the upper 4 bits of the n

Let D be the second digit of the root

From above, the upper two bits of the second partial dividend are:

A - B*B

and the root so far (after one digit) is:

B

So, the second partial dividend (all four bits) is:

C - 4*B*B

Then, the root (so far) after the second digit is found is 2*B+D, so the upper 4 bits of the third partial dividend are:

C - (2*B+D)*(2*B+D)

= C - 4*B*B - 4*B*D - D*D

= (C - 4*B*B) - (4*B + D)*D

Due to the mathematical identity:

(2*x+y)*(2*x+y) - (2*x)*(2*x) = 4*x*y + y*y

the logic above can be applied to show that:

new partial dividend = old partial dividend - divisor

where divisor is 4 * (root so far + new root digit) * new root digit

Note that (1) new partial dividend refers to the partial dividend **before** the next two digits are brought down (shifted in), and (2) the divisor is 0 when the new digit is 0.

Therefore, the algorithm for a square root of a 16-bit number is:

```
square_root (n):
dividend = n
root = 0
LOOP 8 TIMES
dividend << 2
NOTE: discard the lower 16 bits to produce the partial dividend
partial_dividend = dividend >> 16
divisor = (4 * root + 1) * 1
IF partial_dividend >= divisor
partial_dividend -= divisor
root = root * 2 + 1
ELSE
root = root * 2
remainder = partial_dividend
RETURN (root, remainder)
```

Since shifting left twice is the equivalent of multiplying by 4, the assembly routine above does the comparison first, and shifts the dividend later. Thus, the loop becomes:

```
divisor = root * $10000 + $4000
IF partial_dividend >= divisor
dividend -= divisor
root = root * 2 + 1
ELSE
root = root * 2
dividend << 2
```

since the lower 8 bits of the comparison (and subtraction when necessary) are zero, they can be omitted.

The original divisor and comparison was:

divisor = (4 * root + 1) * 1

IF partial_dividend >= divisor

which is equivalent to

divisor = (4 * root + 1) * 1 * $10000

IF dividend >= divisor

which is the same as

divisor = 4 * root * $10000 + $10000

IF dividend >= divisor

Thus, when the dividend is not shifted; it should be compared to divisor » 2 which is root * $10000 + $4000.

To finish the example above:

partial dividend = 110 in binary

(4 * root so far + 1) * 1 = (4 * 1 + 1) * 1 = 5 = 101 in binary

Since 0110 >= 0101, the second root digit is 1

```
:
11 <- root
------------
\/ 10 10 10 11 <- n
10
1
1 10 <- partial dividend
1 01 <- divisor
1 10 <- new partial dividend
```

(4 * root so far + 1) * 1 = (4 * 3 + 1) * 1 = 13 = 1101 in binary

Since 110 < 1101, the next root digit is zero

```
:
11 0 <- root
------------
\/ 10 10 10 11 <- n
10
1
1 10
1 01
1 10 <- partial dividend
0 <- divisor
1 10 11 <- new partial dividend
```

(4 * root so far + 1) * 1 = (4 * 6 + 1) * 1 = 25 = 11001 in binary

Since 11011 >= 11001, the final root digit is 1

```
:
11 01 <- root
------------
\/ 10 10 10 11 <- n
10
1
1 10
1 01
1 10
0
1 10 11 <- partial dividend
1 10 01 <- divisor
10 <- remainder
```

Thus, the result is

root = $0D

remainder = 2

And note that:

$0D * $0D + 2 = $AB