A Primer on Live Proxy Detection (2016) – Thomas Mannfred Carlsson

Introduction

Most people who stumble on this post through a search engine or referral will probably be familiar with open proxies. As such this article will not focus on the basics of open proxy protocols since there is an abundance of such information available on the web. [1,2,3] Instead this article will focus on the lesser-known and less obvious nuances of confirming the existence of an open proxy on a remote system, and the intended audience is IT specialists involved in mitigating fraud, spam, hacking and other open proxy-related abuse.

Achieving a high detection rate of remote open proxies requires many considerations. Depending on the desired success rate the allocated resource requirements tend to increase exponentially, and perfect detection certainty for arbitrary external IPs can never be achieved due to numerous complicating factors. This article will first discuss the general dynamics of detecting open proxies along with associated statistical research, and will then discuss specific software considerations.

General Considerations

The proxy lifecycle

When an open proxy service appears on the Internet it is initially unknown to the Internet at large. The reasons for the appearance could be one of many, e.g. it could be due to a system administrator accidentally installing a web service or connection gateway with inappropriate access permissions, it could appear due to a computer virus or malicious script enabling it on an exploitable system, and it could even be installed on purpose by the system owner due to various reasons. The open proxy can remain in this “active but unknown to the Internet at large” state for a considerable amount of time, and it’s reasonable to assume that many such open proxies are never exposed or publicised.

As the Internet is increasingly being scanned and mapped by various actors the open proxy may eventually be detected by somebody, which could subsequently lead to the proxy IP and port being published on freely available Internet proxy lists. [4] Once an open proxy becomes well-known its connection capacity is quickly overwhelmed by demand (see Fig 1), and it will start to appear on various open proxy blacklists. By this point a responsible system or network administrator will hopefully become aware of the issue and take steps to secure the exposed service, but it’s nonetheless worth noting that based on a sample set of 4 million proxy detections (HTTP CONNECT, HTTP POST, SOCKS4, SOCKS5) between March 2015 and April 2016 although the median lifespan of a publicised proxy was 3 days, the average lifespan was 21.39 days (see Fig 2).

When considering what level of proxy detection reliability is required for a specific use case it’s helpful to consider the proxy lifecycle vis-a-vis the amount of resources required; in order to detect hitherto undiscovered proxies a larger number of ports should be scanned but fewer connection attempts per port will typically be required, and for well-known proxy IPs it’s probably only necessary to focus on the publicised ports. However, as a tradeoff substantially more connection attempts are required to compensate for the high load that the proxy is likely under. If at all operationally possible, open proxy DNS blacklists should be consulted for a second opinion regarding the reputation of any IP of interest.

Factors affecting proxy detection

In the past proxy detection has relied on various degrees of anecdotal information and gut instinct. As a potentially significant information security topic it has been underresearched. In a bid to improve some of the current understanding on the topic a study was carried out between March and April 2016 with the aim of gathering additional hard facts and statistics on the state of open proxies and the successful detection requirements. As the author himself has actively harvested open proxy data since 2009 for an established open proxy DNS blacklist (EFnet RBL) an abundance of complimentary statistical data was also available for further research.

To create a more detailed open proxy detection dataset Internet search engines were dynamically consulted for publically available lists of open proxies. In addition a static set of 27 known active websites which publish open proxy data were periodically polled. Through this methodology an average of 11612 suspected proxy IP-port pairs were collected every hour which (after filtering for duplicates) resulted in an average of 31890 unique IP-port pairs for every 12-hour caching period. For every IP-port pair acquired up to 20 connection attempts were made with the objective of establishing whether it hosts an operational SOCKS4, SOCKS5, HTTP CONNECT, HTTP POST or HTTP PUT proxy. This relatively large test pool yielded an average of 664.42 actual detected open proxies per hour. For successfully detected proxy IPs some further fingerprinting of the affected host was carried out (DNS and identd). Finally, before syncing the data to EFnet RBL a blacklist audit was carried out to determine how many of the open proxy-specific blacklists were already aware of the IP.

One of the initial questions the study sought to answer was how many detection attempts are actually required to successfully route a connection through one of the publically listed open proxies, and whether this number changes as the proxy becomes more widely known. As HTTP CONNECT and SOCKS proxies offer the most versatility and utility to potential abusers these were focused on initially, resulting in a final dataset of approximately 45,000 SOCKS4, SOCKS5 and HTTP CONNECT proxies. One of the key findings of this part of the research was that the longer a listed proxy remained active the more difficult it became to detect, and the proxy publication age and the detection cost (number of connection attempts required) were strongly correlated (see Fig 3). In addition, it became apparent that the single open proxy detection attempt that is commonly employed by network scanners (e.g. BOPM, HOPM, nmap’s proxy nse scripts, proxycheck, etc.) is insufficient for detecting the average publically listed open proxy (see Fig 4). Of all tested HTTP CONNECT, SOCKS4 and SOCKS5 proxies only 48.23% were detectable on the first connection attempt. Of all proxies detected with a maximum of 20 connection attempts only a total of 83.93% were discoverable in 4 attempts, and even with 10 attempts the total coverage was 97.29%, leaving a further 2.71% undiscovered. Given the diminishing detection returns beyond 15 attempts it may be reasonable to conclude that 20 attempts “should be enough for everyone” (™) but the author intends to carry out further research to validate the claim in numerical terms.

Sociocultural aspects also stood out in the proxy detection dataset. When comparing the average detection cost with the emergence and attrition of proxy IPs there were discernable differences between the standard workweek and weekend (measured in UTC, see Fig 5). The middle of the workweek (Tuesday, Wednesday and Thursday) was relatively stable, and volatility increased around the weekend. The author is not currently offering any technical explanations for this pattern (number of publicised proxies remains relatively consistent from one day to another with a slight uptick on Sundays and Mondays) but notes that various hypothetical sociocultural causes for the weekday differences are conceivable.

Summary of conclusions

The discoveries of the current research can be summarised by the following bullet points:

The current demand for open proxies significantly outstrips the supply.
When an open proxy is listed on the Internet it will very quickly become saturated.
Once made public, open proxies have a short lifespan. The median proxy IP-port is active for 3 days, and over one third vanish within 24 hours.
Currently used open proxy scanner implementations fail to detect the majority of publicised proxies because one connection test is insufficient due to proxy saturation.
Good quality open proxy blacklists are strongly recommended as a second opinion on IP reputation.

Proxy detection software

Most software designed for detecting open proxies tends to come with sufficient documentation and as such any replication of said information will not be attempted. Instead, “best practices” recommendations for commonly used or usable programs will be offered in light of the above research.

BOPM and HOPM

The well-established BOPM project has been forked and is actively developed as HOPM. This powerful proxy scanner was primarily designed for use with IRC but also comes with a reusable scanner library (libopm).

The author’s current “best practices” recommendations for employing HOPM / libopm are:

The fact that very few proxies support identd (see Fig 6) can be utilised as a distinguishing feature when optimising FD expenditure.
Although HOPM does not currently support staggered rechecks one can use redundant user blocks for the scanner blocks to increase the number of checks per IP and thus improve the detection yield. Or even better, consider running multiple HOPM bots on different networks to redundantly scan target IPs.
Good quality blacklists are an essential complement to active HOPM scan checks, but an often overlooked aspect of mitigating open proxy abuse on IRC is on the ircd configuration side of things; consider setting appropriate access limit for unidented and non-DNSed hosts, and if your ircd software supports ping cookies please consider enabling them.
If your ircd software treats the client command “HTTP” as synonymous with “QUIT” (e.g. Ratbox) you can in theory skip checking for HTTP POST proxies (but note that a misconfigured HTTP POST server on a host could also be indicative of other exploitable configuration problems).
Note that not all blacklists will work well with popular ISP or public DNS servers. Some blacklists filter high volume DNS servers, and some DNS servers may override the TTLs. For best results it’s worth considering using a dedicated local DNS resolver such as unbound or the low-footprint dnscache program on the HOPM system.

nmap

The legendary network scanner nmap also comes with user-contributed nse scripts for detecting open SOCKS and HTTP CONNECT proxies. The project is actively developed by a large number of contributors.

These scripts perform three connection attempts per proxy and as such are a better starting point than the average proxy scanner’s single connection attempt, but even with three connection attempts more than 20% of publically listed proxies will be undetectable. As such, the author’s “best practies” for employing nmap for open proxy scanning are:

If scanning for unpublicised open proxies the default scripts with their three connection attempts should be sufficient. The script portrules should however be amended to use a recent list of the most common proxy ports to improve the yield.
If using nmap to confirm the existence of publicised open proxies it is suggested that the nse scripts are amended to perform more than three connection attempts (20 attempts suggested) per suspected proxy IP-port.

proxycheck

proxycheck is an older Linux/Unix command line program which performs configurable proxy scans of an IP and returns results via STDOUT. At the time of writing this the software has not been developed since 2004 and lacks support for protocols such as IPv6 or HTTPS.

Some aspects of note when using proxycheck are:

proxycheck does not support the rechecking of IP-port pairs (beyond basic reconnect logic in case the attempt times out) so this logic should be handled by the host application to ensure the desired amount of connection checks are carried out.
If the concurrent scan job count is significant it is essential for the host application to manage the threading and FD load.

socat

socat was not specifically developed for detecting open proxies, but this versatile network tool makes for a usable IPv6 and HTTPS-capable command line proxy detection tool which provides results via STDOUT. The project is actively developed and well supported.

As such socat can reasonably be used for open proxy detection via the command-line with the following caveats:

Just like with proxycheck the socat program does not support the rechecking of IP-port pairs (beyond timeout handling) so this logic should be managed by the host application to ensure the desired amount of connection attempts are carried out. Overloaded proxy servers may fail to process the connection attempt in a number of different ways.
As with proxycheck, if the concurrent scan task count is significant it is essential for the host application to manage the threading and FD load.

References

1. Open proxy en.wikipedia.org (Web encyclopaedia)
2. SOCKS en.wikipedia.org (Web encyclopaedia)
3. HTTP en.wikipedia.org (Web encyclopaedia)
4. “Open Proxy List” duckduckgo.com (Search engine query)

Open proxy utilisation and demand

Subset of 421 HTTP CONNECT proxies sorted by utilisation percentage

Fig 1: Open proxy utilisation and demand. Due to configuration mistakes certain proxy servers reveal connection information. In a sampling of 421 affected open HTTP CONNECT proxy servers a majority showed utilisation close to the configured maximum connection limit. Whereas 60.1% of the sample had a utilisation of over 95% a remarkable 54.6% had a utilisation load in excess of 99%. Only 14.2% of the examined proxy servers had a connection load of between 5% – 95%. Typical maximum connection limits varied between 600-2600.

Open proxy attrition

Number of days before an open proxy deactivates after becoming public

Fig 2: Open proxy attrition. Most open proxies disappear soon after they appear on public proxy lists. In a dataset of over 4 million open proxies the median lifespan was 3 days and the average lifespan (due to the long tail) was 21.39 days. Over 80% of proxies did not survive the first 15 days. NB: This dataset includes significant botnet-based proxy activity in Jan-Mar 2016 which was a period of increased volatility.

The cost of publicity

Proxy publication age (days) vs Number of attempts required for successful connection

Fig 3: The cost of publicity. Open proxies become increasingly unresponsive following an appearance on public proxy lists. It is arguable that this is due to excessive demand (as per Fig 1) and related effects.

The cost of detection

Number of connection attempts required for successful detection

Fig 4: The cost of detection. Due to congestion successful detection of a proxy usually requires more than one connection attempt. A single proxy scan will fail to detect the average proxy, and even 10 scan attempts will fail to detect 3% of the proxies which were detectable with 20 attempts.

Weekday detection differences

How the day of the week influences the proxy detection landscape

Fig 5: Weekday detection differences in April 2016. The amount of new proxies entering the public proxy lists increase noticeably around Fridays and Saturdays. Existing proxies tend to deactivate the most on Mondays and Fridays. Saturdays are a form of peak day with many new proxies, little attrition and relatively high detection cost (potentially signifying high proxy loads).

Profiling proxy hosts

Statistical prevalence of IRC-relevant characteristics

Fig 6: Profiling proxy hosts. IRC servers typically perform basic fingerprinting of incoming connections which includes validation of DNS and identd. A subset of 45,000 SOCKS4, SOCKS5 and HTTP CONNECT proxies were fingerprinted accordingly. Very few (0.27%) open proxies had an open identd port but nearly 1/3 had forward-confirmed reverse DNS as per RFC 1912.

Common proxy ports

List of the most common ports used by proxies per protocol

Fig 7: Common proxy ports. Note that ports can appear or disappear overnight due to malware changes etc. and therefore port configurations should be amended often for optimal results.

E-mail	[email protected]
LinkedIn	Thomas M. Carlsson
GitHub	MannfredCom
Geo	Helsinki, Finland
IRC	Beige@EFnet