Part 1: Transaction Basics

Introduction

In my quest to understand transaction malleability1 better, it became clear that one prerequisite was a thorough understanding of transactions themselves. What is a Bitcoin transaction? What does a transaction consist of? What is the format or structure of a transaction? How is it put together? This two-part post represents documentation of my probes into the innards of the typical Bitcoin transaction.

When I first became interested in Bitcoin I had looked into transactions in some detail, but I soon realized that a proper understanding of all the various ways in which transaction malleability can be effected required that I do a second, more thorough study of Bitcoin transactions. This time around, I decided that transactions are best understood if one attempts to build a transaction “by hand”. Clearly, this is not the first account of such an effort, but during the course of my study, I found that information relating to the task of building a transaction by hand is inconveniently scattered across several learning resources. I am attempting to bring all the relevant information together in one place in this write-up. Before we get down to building a transaction by hand (in Part 2), for the sake of newcomers to Bitcoinland, I thought I’d attempt this introduction to Bitcoin transactions.

Satoshi on describing Bitcoin:

image2

Getting Started with Transactions

Transactions are pretty much at the heart of the Bitcoin protocol.

image7

Think of a bitcoin as a collection of bytes, in computer storage, that is replicated and distributed all over the globe. It does not matter how many copies of a bitcoin there are; or where in the world the computers that store them are located. So you never actually “take possession” of bitcoin. They are not in your Bitcoin wallet; they are just “out there”.

When you spend a bitcoin, you magically spend every copy of that bitcoin no matter where in the world it exists. Spending a bitcoin means extracting that bitcoin’s value in its entirety – whatever that value may be, perhaps 0.1BTC or 5BTC – by unlocking it and reassigning that value to one or more new locked bitcoin. So, while you don’t actually take possession of bitcoin, you can possess the information required to unlock the value of, and spend specific bitcoin.

Transactions are what enable this unlocking of existing bitcoin and the reassignment of their value to new locked bitcoin each of which will be associated with a specific Bitcoin address.

If you want a technical overview of how Bitcoin works, Andreas Antonopoulos’ book Mastering Bitcoin has become the de facto reference. But when it comes to dissecting transactions, I must join the chorus of praise for the excellent blog post by Ken Shirriff on this very subject.

image6

Structure of the Typical Bitcoin Transaction

The following figure illustrates the general structure of a Bitcoin transaction, in this case the type of transaction that is used most often, known as a P2PKH (Pay to Public Key Hash) transaction. A P2PKH transaction is the type of transaction where Alice makes a bitcoin payment to Bob’s bitcoin address.

Figure 1: A standard Pay-To-Public-Key-Hash (P2PKH) transaction.

image5

What the above figure shows is a properly formatted raw P2PKH transaction that is ready to be fed into the Bitcoin network. The complete serialized transaction as a stream of bytes would look like this:

0100000001416e9b4555180aaa0c417067a46607bc58c96f0131b2f41f7d0fb665eab03a7e000000006a47304402201c3be71e1794621cbe3a7adec1af25f818f238f5796d47152137eba710f2174a02204f8fe667b696e30012ef4e56ac96afb830bddffee3b15d2e474066ab3aa39bad012103bf350d2821375158a608b51e3e898e507fe47f2d2e8c774de4a9a7edecf74edaffffffff01204e0000000000001976a914e81d742e2c3c7acd4c29de090fc2c4d4120b2bf888ac00000000

So if we are building a similar transaction by hand (which we actually do in Part 2), this is what the end product should look like.

After you’ve got your raw transaction in this form, you can quite easily broadcast it into the Bitcoin network using either a full node or a transaction broadcasting service like the one provided by Blockchain.info or Insight. Once in the bitcoin peer-to-peer network, individual nodes will verify that all is in order with the transaction and then pass it on until it reaches a mining node and gets incorporated into the bitcoin blockchain. When a transaction gets “mined” in this way, the value transfer intended by that transaction is actualized.

To keep our example simple, the transaction illustrated in Figure 1 has only one Input and one Output, though in reality transactions can have almost any number of Inputs and Outputs. For each additional Input, there would be an entire additional segment like the one in magenta. And for each additional Output, an entire additional yellow segment would be present.

Figure 1 is meant to demarcate the three parts of a transaction:

  • There is the main component comprising both the Input and Output segments.
  • Then there are the fields outside the Input and Output segments.
  • The third component of a transaction is actually contained within the Inputs and Outputs. This third part comprises the scriptSig and scriptPubKey segments which are in the input and output respectively.

I believe that understanding transactions can be easier if we think of them in terms of these three components.

You can skip the following sidebar on keys and addresses if you are already familiar with the subject.


Bitcoin Keys and Addresses

I am assuming the reader has a working knowledge of how Bitcoin keys and addresses work. Nevertheless, we’ll do a quick summary of the important aspects.

While the Bitcoin address is what we deal with all the time, the really important component is the private key. It all begins with the private key. Perhaps the first thing you learn when getting started with Bitcoin is that the private key must be kept secret and securely backed up. Lose the private key, and you’ve lost your bitcoin.

Most bitcoiners may never have reason to work directly with a private key. Private keys and the public keys derived from them are generated and managed internally by wallet software typically, leaving us to work only with the familiar Bitcoin address like this one:

1F1fXXbXH9PX1RZuP4aSBcAro9uSUi5tsh

A private key is nothing more than a randomly generated, typically 256-bit number. In hexadecimal format it would comprise 64 hex digits or 32 bytes and look like this example:

3cd0560f5b27591916c643a0b7aa69d03839380a738d2e912990dcc573715d2c

If you’ve ever seen a private key, you’d be saying this looks nothing like it. Well, that’s because private keys are usually encoded in the base58check format and end up looking more like this:

KyFvbums8LKuiVFHRzq2iEUsg7wvXC6fMkJC3PFLjGVQaaN9F1Ln

The above example is a WIF–compressed private key (Wallet Import Format). Private keys in this format start with a ‘K’ or ‘L’ and are now widely used in place of uncompressed private keys which start with a ‘5’.

A private key and its associated public key and Bitcoin address are mathematically related. The public key is generated from the private key using Elliptic Curve Cryptography. This is a one-way process meaning it is not possible to get the private key from the public key. The WIF–compressed private key above, will deterministically generate the following compressed public key:

03bf350d2821375158a608b51e3e898e507fe47f2d2e8c774de4a9a7edecf74eda

Getting from the public key to the Bitcoin address involves a series of cryptographic hashing operations which are, again, irreversible. The result of these operations is the public key hash, which for our example would be this:

99b1ebcfc11a13df5161aba8160460fe1601d541

The public key hash (PUB-KEY-HASH) is something we’ll frequently encounter in these posts. Only one step, base58check encoding, remains before we arrive at our familiar Bitcoin address:

1F1fXXbXH9PX1RZuP4aSBcAro9uSUi5tsh

For a fabulous interactive walk-through of the entire process of getting from a private key to a Bitcoin address, check out this post in The Royal Fork blog. Should you want to generate your own set of private/public keys and address, you could use Bitaddress.org.

Just a couple of final points on keys and addresses:

  • Do not generate your own private key to hold any value of bitcoin other than very trivial amounts, unless you are absolutely sure you know what you are doing.
  • A public key is not the same as a Bitcoin address. From a Bitcoin address you cannot derive the public key and from the public key you cannot derive the private key. From the private key you can derive both the public key and the Bitcoin address.
  • You may encounter a Bitcoin address that starts with a ‘3’ rather than a ‘1’. These are Pay To Script Hash (P2SH) addresses, typically used as multi-signature addresses.

Inputs and Outputs

The Output segment of a transaction specifies two properties: Firstly, it states the value of the new locked bitcoin and secondly, the nature of this ‘lock’. The lock, or “locking script”  – a snippet of code in the Bitcoin Script2 programming language – dictates the type of authorization that is needed for a future transaction to spend this new bitcoin. With the typical transaction, as in our example, the authority to unlock and spend this new bitcoin requires possession of a unique, secret private key. The locking script is also known as the scriptPubKey.

The Input segment of a transaction identifies the specific bitcoin – the output of a previous transaction – that will be unlocked and spent by the current transaction. Also included in this segment is the part that does the unlocking, the “unlocking script”, again a snippet of code in the Bitcoin Script language. The unlocking script is also referred to as the scriptSig.

The locking and unlocking scripts are dealt with in greater detail later.

To avoid a common point of confusion, note that the unlocking script(s) in a transaction does NOT unlock (or interact in any way) with the locking script(s) in that same transaction. (This may seem like an odd point to make for those who already get it, but you will be surprised how often people are confused by this issue.)

  • The unlocking script of the Input segment of a transaction interacts with the locking script of the Output segment from a previous transaction.
  • The locking script(s) in the same transaction interacts with the unlocking script(s) in a future transaction.

Figure 2 below should help clarify this relationship between previous, current, and future Outputs and Inputs and their scripts.

Inputs are nothing more than Outputs of previous transactions that have not yet been spent; that is they are “unspent transaction outputs” (also known as UTXO). If you are thinking even further back and wondering where the “original” bitcoin came from, then we have to delve into the subject of Bitcoin mining which is outside the scope of this article. For now, it should suffice to know that the mining process adds a supply of fresh, never-used-before bitcoin every time a block is created. This influx of original bitcoin currently stands at 12.5 bitcoin roughly every 10 minutes.

Figure 2: Bitcoin value moves from one owner (the person controlling the private key to the Bitcoin) to another owner in a chain of transactions where Output become Input to create new Output, and so on.

image4

The first two fields of the Input segment, identify the specific unspent previous output, or UTXO, that we want to spend in the current transaction. The first Input field (Input’s Transaction Hash) serves as the identifier of the transaction that includes the output we now want to utilise as an Input. Since transactions can have more than one Output, the second field (Input’s index), specifies which particular output in the identified previous transaction we now want to spend. In our example transaction, the Input’s index is 00, meaning it’s the first output in that previous transaction, with the second output being 01, etc. (Note that this Index is not explicitly stated anywhere in the transaction format. It is surmised from the position of the Output within the transaction.)

UTXO

You should by now become comfortable with the term UTXO as we will be using it frequently. As a reminder: A UTXO, or unspent transaction output, is the Output segment of a transaction, specifically the value and scriptPubKey parts. And for as long as this transaction Output remains unspent, that is, not utilised as the Input of a subsequent transaction, it is a UTXO or in fact, what we understand to be some discrete value of bitcoin. So while bitcoin may be collections of bytes in computer storage, they really are UTXO as defined within transactions, transactions which are recorded unencrypted in the Bitcoin ledger or blockchain. (In more recent versions of the Bitcoin client, the software that executes the Bitcoin protocol, the entire collection of UTXOs, called the UTXO set, is held in a database.) Every transaction that gets recorded in the blockchain, modifies the UTXO set as one or more existing UTXO are utilized as Inputs and get spent hence deleted from the UTXO set, while transaction Outputs create new UTXO which get added to the UTXO set.

Change and Fees

The transaction format requires that the number of inputs and outputs are explicitly stated. In our example, there are clearly only one input and one output. Note, however, that most transactions will have more than one output. Why is this the case? Well, since a bitcoin (or UTXO) must be spent in its entirety, the likelihood of there being “change” is very high. This change is typically sent back to the spender by creating an additional Output that can be unlocked by the spender. So in a typical transaction, an entire bitcoin/UTXO is spent, with part of the value going to the intended recipient, say Joe’s Coffee Shop, and the change going back to the spender.

There is another part to a transaction we need to take note of: Executing a bitcoin transaction is not quite free; there is a cost in the form of a transaction fee. (There was a time when this fee was trivial, but not necessarily these days.) You will notice that nowhere in the transaction format is a fee specified. Transaction fees are implied; the sum of the value of the inputs minus the value of the outputs will be the transaction fee. Miners, who do important work to keep the bitcoin system working, get to keep the transaction fee. Clearly one needs to be careful when constructing a transaction by hand because we do not want to inadvertently send a whole bunch of money in the form of a huge fee to some lucky miner.

Bitcoin Scripts

Demarcating a transaction into three areas in the manner I have, intentionally draws attention to the scriptSig and scriptPubKey components, that is, the unlocking and locking scripts. You may have heard Bitcoin described as “programmable money”. Well, it is these components of a transaction that give Bitcoin its “programmable” characteristic.

In this section, I will briefly describe how the Bitcoin Script programming language and associated scripts are implemented in transactions.

When a UTXO is created via a transaction, it will incorporate a locking script. The locking script imposes a condition that must be met before the UTXO can be spent. If this UTXO is subsequently used as the Input in a future transaction, the Input segment of this future transaction must provide the unlocking script that will make the UTXO spendable, and consequently the transaction itself a valid transaction.

Bitcoin uses a simple, “stack-based”, programming language called Script to implement the locking and unlocking scripts. Using a programming language, even a limited one, to code the locking script, means that it is possible to impose an infinite variety of conditions (or encumbrance) on the UTXO3. The Input’s unlocking script, also written in the Script language, must satisfy the condition imposed on the UTXO that is to be spent.

Referring again to our example transaction Figure 1, consider the Input segment: The unlocking script is clearly indicated. But what about the locking script (scriptPubKey) of the UTXO we are trying to spend? It is nowhere to be seen! Similarly, the value of this UTXO is not indicated anywhere in the transaction structure. The way this works is as follows: The transaction verification software in a full node copies the unlocking script from the Input segment… pretty straightforward. Then, the verification software will use the first two fields of the Input segment to identify and retrieve the locking script and value4 of the Input UTXO. (This data was originally retrieved directly from the blockchain but now comes from a UTXO-specific database in the full node.)

The UTXO used as the Input in Figure 1 is one Output of this previous transaction. The partially displayed Block Explorer page shown below identifies the Input UTXO as is recorded in the blockchain.

Figure 3: Partial Insight block explorer page of the previous transaction.

PoloTxLabelled

Stack-Based Execution of Scripts

So how exactly does the unlocking script work its magic on the locking script? This is where the Script language’s stack-based execution comes in.

To understand how this works, let’s execute two very simple scripts to simulate the execution of an unlocking and locking script. For easy human consumption, the scripts can initially be expressed as follows:

scriptTablePart1a

When executed by the verification software, in sequence as shown above, these scripts add two numbers, 3 and 11, and then evaluate if the total equals the value in the locking script. Two Script language opcodes, OP_ADD and OP_EQUAL are shown above, though there are also required data-pushing opcodes (OP_PUSHDATA) which are not shown for now. A full list of available Script opcodes can be found here.

As we become acquainted with stack-based execution note that push adds an item to the top of the stack while pop removes the top item. Also, stack operations progress from the left to the right of a script sequence.

Execution of Simple Example Script

3  11  OP_ADD  14  OP_EQUAL

  1. Stack-based execution of this simple script would start with the constant value 3 being pushed onto the stack.
  2. Next the constant 11 is pushed to the top of the stack.
  3. As execution continues to the right, the OP_ADD arithmetic opcode will pop the top two items off the stack (3 and 11), add them together and push the result (in our example 14) to the top of the stack.
  4. Moving right again, now the constant 14 is pushed to the top of the stack.
  5. The final operation is to execute the OP_EQUAL comparison opcode. This opcode pops the last two items off the top of the stack (14 and 14) compares them, then pushes 1, for TRUE, onto the stack since they match exactly. If they do not match, 0 for FALSE is pushed to the stack5.

I have entered the above example script into the Bitcoin Script IDE called Hashmal. Hashmal is super useful for writing, testing and learning about Script (see Hashmal Wiki, Bitcointalk thread for more info). The following screenshot shows our simple script after execution in Hashmal:

Figure 4: Screenshot of a Hashmal window showing the executed example script.

image3

Several aspects of Bitcoin scripting become more clear in this screenshot.
At the top of the window, we see our script expressed in “human readable” form with numbers expressed in hex format (3 in hex is 0x03 or just 03, while 11 is 0x0b or 0b).

The next text box shows our sample script as a sequence of bytes in hex notation. This is the byte data that is acted on by the Script execution engine. Let’s break down this data into individual bytes and see what they represent:

Figure 5: Byte stream that is acted on by Script execution engine

image1

The numbers 3, 11, and 14 we already know about. The opcodes OP_ADD and OP_EQUAL are denoted by the hex codes 93 and 87 respectively (as listed here). So, that leaves us with the three 01 bytes; these are actually opcodes which push data onto the stack. As I mentioned previously, these data-pushing opcodes are typically not specified when describing scripts but are really very necessary. And in the context of transaction malleability they have special significance, as you will see in Part 3 of this series.

The PUSHDATA opcode (0x01) denotes the number of following bytes that are to be pushed to the stack. In this case, only the following one byte should be pushed to the stack. In fact, any hex value between 0x01 and 0x4b inclusive (decimal 1 to 75) is actually a PUSHDATA opcode just like the 01 opcode in our example, meaning the opcode itself denotes the number of following bytes to be pushed.

If we now look back at Figure 4, the Hashmal screenshot, we can better understand the stack operations listed in the third pane. Step 0 executes the 01 PUSHDATA opcode which pushes the next one byte (03) to the stack… and so on.

Executing Scripts to Validate Transactions

We have just seen in some detail how a simple script is executed on the stack. In this section, we examine in very general terms how a combined unlocking and locking script sequence would execute. A detailed byte-by-byte example, involving a real-world transaction will have to wait until Part 2 after we construct a transaction by hand.

The Input’s interacting, unlocking and locking scripts are executed one after the other on a shared6 stack in much the same way as the simple scripts in the example above. The result of this process will either allow or reject the intended spend of the Input UTXO depending on whether TRUE or FALSE remains as the top stack item at the end of execution. This verification process is carried out by the Bitcoin client software running on all full nodes on the Bitcoin peer to peer network through which the transaction is broadcast. If verification fails, the transaction will be considered invalid and will not be propagated. Hence, a transaction will never get mined and incorporated into the Bitcoin blockchain unless the script execution returns a TRUE result.

This is the sequence in which the unlocking script (scriptSig) and locking script (scriptPubKey) will be executed progressing from left to right:

Figure 4: Combined unlocking and locking scripts as they will be executed.

Figure4Part1

We know that the scriptPubKey is retrieved from the Input UTXO. The scriptSig to unlock/spend the Input UTXO is typically built by the spender’s wallet app. In a P2PKH transaction, the scriptSig would comprise a signature and a public key. The signature is generated by a secret private key which will be securely stored in the spender’s wallet app. The public key, derived from the private key, makes up the second part of the scriptSig.

The scriptPubKey uses four opcodes (or functions/commands) from the Bitcoin Script language. We know that the scriptPubKey is the locking script of the Input UTXO. This UTXO was itself an Output of a previous transaction. We also know that all Outputs are associated with a bitcoin address. So now we know where the public key hash in the middle of the scriptPubKey comes from; it is derived from the bitcoin address by decoding the base58check encoding, used with all bitcoin addresses, to expose the underlying public key hash.

In essence, with a P2PKH transaction, the locking script on the Input UTXO is saying to the wannabe spender: If you can sign this transaction with the private key from which the address in the locking script is derived, then you prove that you are the real owner of this address and so are entitled to spend the associated UTXO.

Generalized Execution of unlocking and locking scripts (P2PKH Transaction):

Part1GeneralizedScripts

  1. Execution begins with a push operation that sends the data comprising the signature to the stack.
  2. Next to be pushed to the top of the stack, above the signature, is the data comprising the public key.
  3. Moving to the right again, the OP_DUP opcode is executed. This opcode will duplicate the current top most item in the stack and push it to the top.
    The result is that we now have two copies of the public key as the first and second items on the stack.
  4. The OP_HASH160 operation that follows will hash the top most item (the public key) twice; first with SHA256 and then RIPEMD160. The resulting PUB-KEY-HASH is pushed to the top of the stack.
  5. The data comprising the PUB-KEY-HASH in the scriptPubKey is now pushed to the top of the stack. So we now have two RIPEMD160(SHA256) hashed public keys as the top two items.
  6. The comparison opcode OP_EQUALVERIFY checks if the PUB-KEY-HASH generated from the public key in the unlocking script matches exactly the PUB-KEY-HASH from the locking script. If they do match, then both copies of the PUB-KEY-HASH are removed and execution proceeds.
    The purpose of this step becomes apparent after we execute the last opcode in the next step.
  7. OP_CHECKSIG does the real heavy lifting during script verification.
    It will check whether the signature and public key in the unlocking script match. That is, OP_CHECKSIG will verify that the signature and accompanying public key were both generated by the same private key, and it will accomplish this while the private key remains entirely secret.
    Now we see the relevance of Step 6: Step 6 and Step 7 taken together can confirm that the signer/spender of the transaction is indeed in possession of the private key that is associated with the address (the encumbrance) in the locking script.
    Importantly, OP_CHECKSIG also verifies that the transaction itself has not been altered since it was signed. How this is accomplished will become apparent in Part 2.
    If all OP_CHECKSIG checks pass, TRUE is returned as the final item on the stack thus verifying the scripting and validating the transaction.

Footnotes:

[1] If you are not familiar with transaction malleability suffice to know for now that it is generally considered a benign bug within the bitcoin protocol. However, with proposed (Layer 2) enhancements to Bitcoin, transaction malleability now takes on greater significance and is no more just the nuisance it once was. This article is a good introduction to malleability and some of its consequences.

[2] Bitcoin’s Script programming language is one of only 2 aspects of the Bitcoin protocol that are unique to Bitcoin. All other elements of the protocol already existed and were, of course, then put together in the most fascinating way. The other unique aspect would be the base58check encoding used by Bitcoin addresses.

[3] While full of possibilities, it should be noted that todate, Bitcoin’s scripting has not seen any significant use case besides multi-sig.

[4] While the locking script will impose the condition that must be met to make the UTXO spendable and the transaction valid, the value associated with this UTXO also determines the validity of the transaction. Clearly, the total value of all Inputs must be greater than or equal to total Output value for the transaction to be valid.

[5]  Note that any return value other than an explicit FALSE is considered a TRUE. From the Bitcoin Wiki: “Byte vectors are interpreted as Booleans where False is represented by any representation of zero, and True is represented by any representation of non-zero.”

[6] To remedy a bug in early versions of the Bitcoin client, the unlocking and locking scripts are no more concatenated and run together. Instead, the scriptSig is run, then deleted but leaving the stack as is. Then the scriptPubKey of the Input UTXO is run.