## Elliptic Curve Digital Signature Algorithm - Bitcoin & Haskell

The previous posts covered mathematical curiosities. Now we can leave that realm, find an application and turn them into tools to work with elliptic curve cryptography. That means finding and choosing which particular values of the parameters bring us into a curve we can use for cryptography.

## The Bitcoin curve

The Bitcoin curve uses the secp256k1 curve parameters.

- Given the elliptic curve \(y^2 = x ^ 3 + a x +b\), Bitcoin uses \(a=0\) and \(b=7\).
- Defined over the finite field of prime number \(p=2^{256} - 2 ^ {32} - 977\)
- Using the generator point \(G\), with \(x\) and \(y\) coordinates:
- \(G_x = \tiny 0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798\)
- \(G_y = \tiny 0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8\)

- Which has a group order \(n=\tiny 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141\)

\(G\) and \(n\) expressed here as hexadecimal numbers are despite the small font rather large numbers. The prime number \(p\) is almost \(2^{256}\), which means most numbers under \(2^{256}\) are also in the prime field. What is also impressive is that \(n\) is also really close to \(2^{256}\). This means we can apply scalar multiplication to \(G\) as much as \(n\) times before we end at the point at infinity.

It escapes my current knowledge: how are these particular parameters found? They are publicly documented everywhere, yet I could not easily find which is the method to find them, that being the principal question of its security. Cryptography bases its security on the infeasibility of practicing some mathematical operations in a short time.

Testing that \(p\) is prime is already a challenge, naively you have to test the
divisors up to \(\sqrt{p}\), and that is already a lot to verify and unfeasible
under that approach. Yet, once validated that \(p\) is prime, you can work on that
large finite field. Where the difficult problem arises is then choosing
parameters \(a\) and \(b\). I have found the comment: for Bitcoin they are so
simple, there shouldn’t be any backdoor in them, but there is no proof of that.
With that choice of parameters, you keep compounding the problem. Now, you have
the task to find the generator or basis point \(G\). In essence, it could be any
point on the curve. It doesn’t seem like an easy task to find it, \(x^3\) grows
much faster than \(y^2\), thus you need to start with a large number, before it
shrinks through the modulo operation to and verifies the equality of our
elliptic curve. Once you have that point, under scalar multiplication at a
certain multiple of this point you reach the point at infinity, forming a
`finite cyclic group`

, that is the set:

\[ \{ G, 2G, 3G, 4G, …, nG \} \text{ where only } nG=0\]

\(n\) for Bitcoin is also a large number, and it is prime too. I keep wondering how this came out to be, testing that no other \(n\) takes you to the point at infinity is equivalent to having tested all Bitcoin private keys, which we assume to be computationally unfeasible. Also, how is it that they exactly found \(G\), the first point, and not stumbled into \(42G\) or any other point first. Even starting with that point, you can’t find \(n\) and give an order until you scan the entire space, which is unfeasible. Under the assumption that the parameters are secure, with the hint of evidence that the Bitcoin network has survived years of attacks. I, for the moment, move forward.

The numbers below \(n\) will become your private keys. Since \(n\) is large, there are a lot of numbers to pick your private keys from, and if you randomly pick a number(with an appropriate source of randomness), you’ll end up with a key, that nobody else has come up to choose by coincidence. That’s why you instantiate your wallet with that randomly chosen key, it is empty.

There have been some stories of hacked Bitcoin wallets, mainly hacks to your computing device, then guessing brain wallets which by construction have a low entropy, then hacks because some devices had poor randomness. After all those hacks, the hint of the safety of Bitcoin is that year after year, there has not been a large moment of funds from old wallets. Those old large funded wallets remain the bounty for which Bitcoin is constantly attacked and its security constantly audited. As the beginner I am, I still need to catch up on experience to convince my healthy skepticism that it is safe.

### Specializing the code

The code presented in the previous posts is generic, and now we need to provide some auxiliary functions and some specialization to use the Bitcoin curve easily while programming. I introduce the new type aliases, function constructors and constants:

```
1type S256Field = FieldElement (2 ^ 256- 2^ 32 - 977)
2type NField
3 = FieldElement
4 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141
5type S256Point = ECPoint S256Field
6
7s256point :: S256Field -> S256Field -> S256Point
8s256point x y =
9 let p = ECPoint x y 0 7
10 in if validECPoint p then p else error "Invalid point"
11
12ncons = 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141
13gcons = s256point
14 0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
15 0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
```

For some reason, the compiler does not allow me to define `type Nfield = FieldElement ncons`

, thus I have to explicitly declare that constant repeatedly.

Now we can also test that the order of \(G\) is \(n\), that is \(nG=0\):

```
1scalarProduct ncons gcons -- ECPoint(Infinity)
```

## Digital signature and verification

The key relationship of public-key cryptography is \(P=eG\). It is easy to calculate \(P\) from \(e\) and \(G\), but we cannot easily find \(e\), given \(P\) and \(G\). \(P\) will be your public key, and \(e\) is your private key. \(e\) is an integer bellow \(n\) and \(P\) is a coordinate \((x,y)\) on the elliptic curve. \(P=eG\) is rather boring, you have only gone from private to public key, and know that such calculation is asymmetric. Easy from the scalar multiplication side, infeasible to divide.

We can thus build upon that relationship and construct a linear combination and have the equation

\[ uG + vP = kG \]

Re arranging the terms \[ vP = (k-u)G \]

Now requiring that \(v\neq0\)

\[ P = eG = v^{-1}(k-u) G \]

That implies for any combination of \((k, u, v)\) that satisfies the equation, we are also in knowledge of the private key \(e\). What we need is to put constraints into \((k, u, v)\) in such a way that they are meaningful. Also, revealing \((k,u,v)\) as shown now, reveals your secret key, thus it is important to provide these constants in a way that you don’t reveal your key.

The goal of a digital signature is to certify that a message originates from the owner of a specific key and nobody else. Thus, we need to mix information of the public key, the message and some additional arbitrary target to make it infeasible to forge the signature.

We start selecting the arbitrary target: \(R=kG\). That is an arbitrary point on
the curve, with *key*: \(k\). However, we are only interested in the \(x\)
coordinate of this point and call that number \(r\). The goal of the digital
signature is that having \(r\) on the start, then mixing it with the message and
the signer’s public key, we can find again the point in the curve with \(x\)
coordinate being \(r\).

The message(\(m\)) we want to sign, is first fingerprinted through a hash function(\(HASH\)), which is also an asymmetric function that converts any arbitrary data into a fingerprint(\(z\)) (the hash) of fixed length. That equation resumes to \(z = HASH(m)\)

With that, we define our constants to include those values

\[ u = z / s , v = r / s \]

Here \(s\) is the signature we need to calculate. It is also important to realize that the division takes place on the prime field defined by the group order \(n\). That confused me for a while, but since all these constants are the factors we use to multiply the basis point \(G\), then it makes sense to define them over the group order and not over the prime field of the elliptic curve.

Calculating \(s\) from our previous equations:

\begin{align} uG &+ vP & = & kG \\\ uG &+ veG & = & kG \\\ u &+ ve & = & k \\\ z/s &+ re/s & = & k\\\ (z &+re)/s & = & k\\\ \end{align}

Finally, we obtain: \[s = (z +re)/k\]

A signature is then described by the pair of numbers \((r,s)\). That is the random target and the signature itself. The person willing to validate our signature needs those plus our public key and the signed message to calculate its hash. The verification of the signature results from using those values to calculate \(R=uG+vP\) and verifying that the point obtained has its \(x\) coordinate equal to \(r\).

It is important that we only reveal \(r\) and not \(k\), because as stated earlier, knowing \((k,u,v)\) reveals our private key. It is also important to never reuse \(k\) for different messages, because that will lead the signatures to leak the private key.

## Implementing the digital signature algorithm

After such a long introduction it’s time to go back to Haskell code. First, I define a new datatype for the signature to hold the pair of numbers \((r,s)\)

```
1data Signature = Signature
2 { r :: S256Field
3 , s :: NField
4 } deriving (Show)
```

I’m really happy how Haskell made me aware that \(r\) and \(s\) are elements of different fields. In all the previous explanations, I superficially treat them as integers, but it is important when calculating division to be sure in which finite field you are working, and realizing that all the relevant calculations for multipliers of the point \(G\) happen on the finite field of the group order \(n\).

To use the `scalarProduct`

defined in the previous posts, you also needed to
convert back from the Finite Field to an integer. I tried to first implement
Haskell’s standard `toInteger`

, but I couldn’t. So I defined my auxiliary
function:

```
1asInt :: KnownNat n => FieldElement n -> Integer
2asInt (FieldElement n) = n
```

With that, I have enough to implement the signature verification algorithm.

```
1verifySignanture :: NField -> Signature -> S256Point -> Bool
2verifySignanture z (Signature r s) pub = x target == r
3 where
4 target =
5 scalarProduct (asInt $ z / s) gcons
6 <> scalarProduct (asInt $ fromInteger (asInt r) / s) pub
```

There is a some back and forth conversion of types with `asInt`

and
`fromInteger`

depending on each use case. I like that in Haskell I need to be
explicit about it. It is more work, and certainly some noise, but I became aware
of what I was doing, by understanding why the compiler complained and fixing it.

### Using Haskell’s *packaging*

Everything up to this point, you could do by firing up your text editor and
ghci. It all worked. However, to continue implementing the digital signature
algorithm, additional modules become necessary and managing those with a
*package manager* of some sort. I selected cabal because it seemed simple enough
to get started.

I start by creating a cabal file with the following contents:

```
1cabal-version: 2.4
2name: programmingbitcoin
3version: 0.1.0.0
4author: Óscar Nájera
5maintainer: [email protected]
6
7extra-source-files:
8 CHANGELOG.md
9 readme.org
10
11library
12 hs-source-dirs: src
13 exposed-modules: Ecc
14 build-depends: base ^>=4.14.1.0
15 , bytestring >= 0.10.12.0
16 , cryptohash-sha256
17 default-language: Haskell2010
```

This is only to include the dependencies `cryptohash-sha256`

and `bytestring`

. I
get the feeling `base`

is what I was already using directly loaded in ghci. I
must also move my Haskell source file into `src/Ecc.hs`

for it to be correctly
found and compiled. That file now needs some extra things on its header:

```
1module Ecc where
2
3import qualified Crypto.Hash.SHA256 as SHA256
4import qualified Data.ByteString as BS
```

That defines the `Ecc`

module and exports everything I have defined. Then it
imports the 2 packages that are in my new dependencies.

### Hashing the message

The first step to sign a message is to calculate the hash of that message. In
Bitcoin, the hash function is two rounds of the sha256 hash. I use the hash
function from the package I imported `cryptohash-sha256`

.

```
1hash256 :: BS.ByteString -> BS.ByteString
2hash256 = SHA256.hash . SHA256.hash
```

This function works on `ByteString`

thus it is also a good idea, to put the
language extension `{-# LANGUAGE OverloadedStrings #-}`

in the header of the
file, to be able to directly use Strings on the source code and have the
compiler transform them to `ByteString`

.

This hash function returns a `ByteString`

, however we need \(z\) as a number,
because we operate mathematically with it to calculate the signature. Thus the
auxiliary function:

```
1fromBytes :: BS.ByteString -> Integer
2fromBytes = BS.foldl' f 0 where f a b = a `shiftL` 8 .|. fromIntegral b
```

as the function signature indicates, `fromBytes`

takes me from `ByteString`

to
`Integer`

. Then I can calculate the hash of the message directly to a number in
the finite field defined by the group order with:

```
1sighash :: BS.ByteString -> NField
2sighash = fromInteger . fromBytes . hash256
```

### Getting the random target *deterministically*

To create a signature we use a random number \(k\), which should be a cryptographically secure random number. Even if you had a good source of randomness, it seems to be a better practice to generate this number deterministically following the specification defined in RFC 6979 .

I first need the inverse function `fromBytes`

to transform numbers into
`ByteString`

, which an extra constrain to always have them be `256 Bits`

(`32 Bytes`

) long, for this I include the `zeroPad`

function and use the greatness of
function composition in Haskell.

```
1integerToBytes :: Integer -> BS.ByteString
2integerToBytes = BS.pack . go
3 where
4 go c = case c of
5 0 -> []
6 c -> go (c `div` 256) ++ [fromIntegral (c `mod` 256)]
7
8zeroPad :: Integer -> BS.ByteString -> BS.ByteString
9zeroPad n s = BS.append padding s
10 where
11 padding = BS.pack (replicate (fromIntegral n - fromIntegral (BS.length s)) 0)
12
13toBytes32 :: Integer -> BS.ByteString
14toBytes32 = zeroPad 32 . integerToBytes
```

The RFC describes the algorithm to generate \(k\) in an imperative way, and it reuses variables. Haskell forbids that, because everything is a constant. Thus each new transformation of a value gets a new name, until I reach some form of recursion. Have a look at my implementation.

```
1deterministicK :: NField -> NField -> NField
2deterministicK (FieldElement priv) (FieldElement z) = fromInteger
3 $ candidate k2 v2
4 where
5 k = BS.pack $ replicate 32 0
6 v = BS.pack $ replicate 32 1
7 zbs = toBytes32 z
8 skbs = toBytes32 priv
9 k1 = SHA256.hmac k $ v `BS.append` "\NUL" `BS.append` skbs `BS.append` zbs
10 v1 = SHA256.hmac k1 v
11 k2 = SHA256.hmac k1 $ v1 `BS.append` "\SOH" `BS.append` skbs `BS.append` zbs
12 v2 = SHA256.hmac k2 v1
13
14 candidate k v =
15 let vNew = SHA256.hmac k v
16 can = fromBytes vNew
17 in if can >= 1 && can < ncons
18 then can
19 else
20 let kp = SHA256.hmac k $ vNew `BS.append` "\NUL"
21 vp = SHA256.hmac kp vNew
22 in candidate kp vp
```

Here I use also the `SHA256.hmac`

from the library I imported and `BS.pack`

transforms a list of 8bit integers into a `ByteString`

. `\NUL`

and `\SOH`

are
the zero and the one transformed into a `ByteString`

, I just copied that
directly from the ghci repl.

### Signing the message

Signing the message follows the equations described in the previous section . The only thing extra is requiring as integers that \(s<n/2\).

```
1signMessage :: NField -> BS.ByteString -> Signature
2signMessage priv mesg =
3 let z = sighash mesg
4 k = deterministicK priv z
5 rm = scalarProduct (asInt k) gcons
6 FieldElement sm = (z + fromInteger (asInt (x rm)) * priv) / k
7 ss = if sm > div ncons 2 then ncons - sm else sm
8 in Signature (x rm) (fromInteger ss)
```

## Conclusion

This was another leap learning Haskell. Figuring out how to manage dependencies in a project. Learning how to convert between types, it is really good that I must do that explicitly in Haskell, and that it is really strict preventing you from mixing objects. I became aware of what I was trying to do by those simple type enforcements.

The next step is to worry about testing. At the moment all logic from these blog posts fits in 187 lines of code. I can still read it, and I have my manual test cases. However, as the project grows in complexity, keeping track of it becomes really hard and exhausting, test help to track features and document how to use the software.

##### Dr. Óscar Nájera

###### Software archeologist – Recovering Physicist – Dancer

As scientist I studied the physics of the very small quantum world. As a computer hacker I distill code. Software is eating the world, and less code means less errors, less problems. Millions of lines of legacy code demand attention and have to be understood and simplified for future reliable operation.