# Benford’s Wallet

One criticism of using a global public ledger such as Bitcoin SV is that privacy may be sacrificed as all transactions are visible to everyone. For example, if a merchant always sells a model of bicycle for 3 BSV then it would be relatively easy to identify the potential transactions that belong to the merchant.

In this blog I will address this problem by explaining how to randomly distribute an invoice of BSV over outputs in a way that maximises privacy but maintains off-chain auditability. The idea is to randomly partition the invoice value over several outputs such that it blends into the wider pool of transactions. At the same time, an off-chain cryptographic link is provided between each output address and the invoice for book-keeping purposes.

This blog is based on work in collaboration with Chloe Tartan, Craig Wright, and Wei Zhang from the nChain research team which was presented by Craig at the CoinGeek conference in Seoul in October 2019.

## Invoice value distribution

I will first explain how the invoice value may be split over outputs in a way that promotes on-chain privacy. (The number of outputs may be chosen randomly by the merchant or wallet software.)

The first task is to uniformly randomly select variables in the range (in units of satoshi). These are ordered from smallest to largest and labelled . We use these values to split into partitions . This is perhaps best visualised by drawing these values on a number line:

Notice that

We will see below that the resulting partitions satisfy a property known as* Benford’s law.*

Benford’s law states that in many real-world financial data sets the leading significant digit is most likely to be 1, the second most likely digit is 2, then 3 and so-on up to 9. Large values in datasets that span many orders of magnitude tend to adhere to this rule. In Bitcoin SV we expect to see values of a few hundred satoshi through to a few billion satoshi, so it is reasonable to suspect that larger output values may display Benfordian behaviour. Of course, this can easily be checked: the graph below displays the count of leading digits for outputs over 0.1 BSV on 20 October 2019 (yesterday, at the time of writing).

This means that an invoice split into partitions that follow Benford’s law will tend to blend into the general distribution of transactions, which promotes on-chain privacy. (Note that the invoice must be split over at least two transactions otherwise it would be easy to identify the outputs that sum up to .

We are left to prove the following:

Proposition

The leading digits of defined above follow Benford’s law.

Sketch of Proof (Credit to my nChain colleague Wei Zhang)

Here I will sketch a proof for the special case although the same arguments hold for arbitrary .

The partitions defined above follow a beta distribution with probability density function

This follows immediately for from arguments found in eg *[David, H. A.; Nagaraja, H. N. (2003). Order Statistics] *and with a bit more effort it can be shown to also hold for the endpoints .

The plot of below tells us that that the value of each partition is more likely to be small than large.

Note that the cumulative distribution function is

We are left to prove that the expectation value of a random variable following a beta distribution satisfies Benford’s law.

Write in scientific notation as where and is an integer (which can be negative). Then

This implies that

This infinite series converges for any fixed value of . In the table below are some calculated examples and their comparison with true Benford’s law.

We conclude that the partitions follow Benford’s law for any .

## Linking output addresses to an invoice for off-chain auditability

In the previous section we saw how an invoice value can be split over outputs. This presents a book-keeping challenge as there now needs to be separate output addresses used for a single invoice. It therefore makes sense to link these addresses directly to the invoice in order to facilitate accounting and auditing. We will explain how this can be achieved using a provable cryptographic link that is only apparent to those with an off-chain knowledge of the wallet and invoice.

Consider a BIP32 HD wallet where there is a parent keypair and chain code . This information can be used to derive a collection of child keypairs given by

where is the order of the elliptic curve group and

Note that the corresponding public keys are given by

Suppose there is an invoice message that is of the form

= ‘*Request payment of 3 BSV for the purchase of bicycle with frame number ZL57846 at timestamp 13:06:23 27/08/2019.’*

In this case we could instead define the child private keys to be

with corresponding public keys

Linking back with the previous section, we use the addresses to receive BSV each in order to obtain a total of BSV needed to complete the invoice. To an outside observer watching the blockchain there is nothing that links each output to one-another. However, there is a provable off-chain link to anyone with knowledge of the parent public/private key (depending on whether the child isn’t/is hardened), the chain code , the invoice message , and the index .

Note that the merchant may choose to register the hash of the invoice message with a tax authority. In this case, if they are audited in the future there is a provable link between the invoice held by the tax office and the addresses that have been used to receive funds.

## Conclusion

I have explained how a random partition model can be used to split an invoice value over many outputs in accordance with Benford’s law. Since this creates multiple outputs it is advantageous to be able to provably link each output addresses used to the invoice for internal and external auditing purposes. This was achieved by modifying a BIP32 process to incorporate the invoice message into each output address.

*© 2019 nChain Limited. All rights reserved. This article is provided without any warranties whatsoever and shall not result in the grant of any license, whether implied or otherwise. nChain Limited shall not be liable in any way for the use of the information provided herein.*