Advisory Details
### Summary
`concat` built-in can write over the bounds of the memory buffer that was allocated for it and thus overwrite existing valid data. The root cause is that the `build_IR` for `concat` doesn't properly adhere to the API of copy functions (for `>=0.3.2` the `copy_bytes` function).
A contract search was performed and no vulnerable contracts were found in production.
Tracked in issue https://github.com/vyperlang/vyper/issues/3737
### Details
The `build_IR` allocates a new internal variable for the concatenation: https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/builtins/functions.py#L534-L550
Notice that the buffer is allocated for the `maxlen` + 1 word to actually hold the length of the array.
Later the `copy_bytes` function is used to copy the actual source arguments to the destination: https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/builtins/functions.py#L569-L572
The `dst_data` is defined via:
- `data ptr` - to skip the 1 word that holds the length
- `offset` - to skip the source arguments that were already written to the buffer
- the `offset` is increased via: `["set", ofst, ["add", ofst, arglen]]`, ie it is increased by the length of the source argument
Now, the `copy_bytes` function has multiple control flow paths, the following ones are of interest:
1) https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/codegen/core.py#L270-L273
2) https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/codegen/core.py#L301-L320
Note that the function itself contains the following note:
https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/codegen/core.py#L245-L247
That is we can ask for a copy of `1B` yet a whole word is copied.
Consider the first interesting path - if the `dst_data`'s distance to the end of the concat data buffer is `< 32B`, the `copy_op = STORE(dst, LOAD(src))` from `copy_bytes` will result in buffer overflow as it essentially will `mstore` to `dst_data` the `mload` of the source (mload will load whole word and the distance of the `dst_data` to the word boundary is `<32B`).
From the two mentioned paths in `copy_bytes` it can be seen that both sources from memory and storage can cause the corruption.
### PoC
The main attack vector that was found was when the `concat` is inside an `internal` function. Suppose we have an `external` function that calls `internal` one. In such case the address space is divided such that the memory for the internal function is in _lower_ portion of the adr space. As such the buffer overflow can overwrite _valid_ data of the caller.
Here is a simple example:
```python
#@version ^0.3.9
@internal
def bar() -> uint256:
sss: String[2] = concat("a", "b")
return 1
@external
def foo() -> int256:
a: int256 = -1
b: uint256 = self.bar()
return a
```
`foo` should clearly return `-1`, but it returns `452312848583266388373324160190187140051835877600158453279131187530910662655`
`-1` was used intentionally due to its bit structure but the value here is fairly irelevant. In this example during the second iteration of the for loop in the `build_IR` `mload` to `dst+1` will be executed (because len('a') == 1), thus the function will write `1B` over the bounds of the buffer. The string 'b' is stored such that its right-most byte is a zero byte. So a zero byte will be written over the bounds. So when `-1` is considered it's left-most B will be overwritten to all 0. Therefore it can be seen: `452312848583266388373324160190187140051835877600158453279131187530910662655 == (2**248-1)` will output `True`.
#### IR
If we look at the contract's IR (vyper --no optimize -f ir), we see:
```
# Line 30
/* a: int256 = -1 */ [mstore, 320, -1 <-1>],
```
And for the second iteration of the loop in concat:
```
len,
[mload, arg],
[seq,
[with,
src,
[add, arg, 32],
[with,
dst,
[add, [add, 256 <concat destination>, 32], concat_ofst],
[mstore, dst, [mload, src]]]],
[set, concat_ofst, [add, concat_ofst, len]]]]],
[mstore, 256 <concat destination>, concat_ofst],
256 <concat destination>]],
```
So the address of the `int` is 320.
The `dst` is defined as: `[add, [add, 256 <concat destination>, 32], concat_ofst],`.
In the second iteration the `concat_ofst` will be 1 because `len('a)==1` so `256+32+1 = 289`. Now this address will be `mstored` to - so the last mstored B will have the address `289+32=320` which clearly overlaps with the address of the `int a`.
#### PoC 2
Due to how `immutables` are handled, they can be corrupted too:
```python
#@version ^0.3.9
i: immutable(int256)
@external
def __init__():
i = -1
s: String[2] = concat("a", "b")
@external
def foo() -> int256:
return i
```
Output of calling `foo()` = `452312848583266388373324160190187140051835877600158453279131187530910662655`.
### Impact
The buffer overflow can result in the change of semantics of the contract. The overflow is length-dependent and thus it might go unnoticed during contract testing.
However, certainly not all usages of `concat` will result in overwritten valid data as we require it to be in an `internal` function and close to the `return` statement where other memory allocations don't occur.
### Concluding remarks
The bug based on the fast path in `copy_bytes` was likely introduced in: `548d35d720fb6fd8efbdc0ce525bed259a73f0b9`. `git bisect` was used between v0.3.1 and v0.3.2, `forge test` was run and the test asserted that the function indeed returns -1.
For the general case, `0.3.0` and `0.3.1` are also affected.