| || |
You can quickly get the most relevant information just by looking at the graphs on the right and reading the associated graph descriptions. The rest of this page's text is mainly intended as a description of background information, methodology, and detection techniques for anyone looking to replicate the results along with requisite "whitepaper filler."
Most people who stumble on this web page through a search engine or referral will probably be familiar with open proxies. As such this article will not focus on the basics of open proxy protocols since there is an abundance of such information available on the web. [1,2,3] Instead this article will focus on the lesser-known and less obvious nuances of confirming the existence of an open proxy on a remote system, and the intended audience is IT specialists involved in mitigating fraud, spam, hacking and other open proxy-related abuse.
Achieving a high detection rate of remote open proxies requires many considerations. Depending on the desired success rate the allocated resource requirements tend to increase exponentially, and perfect detection certainty for arbitrary external IPs can never be achieved due to numerous complicating factors. This article will first discuss the general dynamics of detecting open proxies along with associated statistical research, and will then discuss specific software considerations.
When an open proxy service appears on the Internet it is initially unknown to the Internet at large. The reasons for the appearance could be one of many, e.g. it could be due to a system administrator accidentally installing a web service or connection gateway with inappropriate access permissions, it could appear due to a computer virus or malicious script enabling it on an exploitable system, and it could even be installed on purpose by the system owner due to various reasons. The open proxy can remain in this "active but unknown to the Internet at large" state for a considerable amount of time, and it's reasonable to assume that many such open proxies are never exposed or publicised.
As the Internet is increasingly being scanned and mapped by various actors the open proxy may eventually be detected by somebody, which could subsequently lead to the proxy IP and port being published on freely available Internet proxy lists.  Once an open proxy becomes well-known its connection capacity is quickly overwhelmed by demand (see Fig 1), and it will start to appear on various open proxy blacklists. By this point a responsible system or network administrator will hopefully become aware of the issue and take steps to secure the exposed service, but it's nonetheless worth noting that based on a sample set of 4 million proxy detections (HTTP CONNECT, HTTP POST, SOCKS4, SOCKS5) between March 2015 and April 2016 although the median lifespan of a publicised proxy was 3 days, the average lifespan was 21.39 days (see Fig 2).
When considering what level of proxy detection reliability is required for a specific use case it's helpful to consider the proxy lifecycle vis-a-vis the amount of resources required; in order to detect hitherto undiscovered proxies a larger number of ports should be scanned but fewer connection attempts per port will typically be required, and for well-known proxy IPs it's probably only necessary to focus on the publicised ports. However, as a tradeoff substantially more connection attempts are required to compensate for the high load that the proxy is likely under. If at all operationally possible, high quality open proxy DNS blacklists should be consulted for a second opinion regarding the reputation of any IP of interest (see section on Blacklist Review).
In the past proxy detection has relied on various degrees of anecdotal information and gut instinct. As a potentially significant information security topic it has been underresearched. In a bid to improve some of the current understanding on the topic a study was carried out between March and April 2016 with the aim of gathering additional hard facts and statistics on the state of open proxies and the successful detection requirements. As the author himself has actively harvested open proxy data since 2009 for an established open proxy DNS blacklist (EFnet RBL) an abundance of complimentary statistical data was also available for further research.
To create a more detailed open proxy detection dataset Internet search engines were dynamically consulted for publically available lists of open proxies. In addition a static set of 27 known active websites which publish open proxy data were periodically polled. Through this methodology an average of 11612 suspected proxy IP-port pairs were collected every hour which (after filtering for duplicates) resulted in an average of 31890 unique IP-port pairs for every 12-hour caching period. For every IP-port pair acquired up to 20 connection attempts were made with the objective of establishing whether it hosts an operational SOCKS4, SOCKS5, HTTP CONNECT, HTTP POST or HTTP PUT proxy. This relatively large test pool yielded an average of 664.42 actual detected open proxies per hour. For successfully detected proxy IPs some further fingerprinting of the affected host was carried out (DNS and identd). Finally, before syncing the data to EFnet RBL a blacklist audit was carried out to determine how many of the open proxy-specific blacklists were already aware of the IP (for further information about the RBL testing results please see the section on Blacklist Review).
One of the initial questions the study sought to answer was how many detection attempts are actually required to successfully route a connection through one of the publically listed open proxies, and whether this number changes as the proxy becomes more widely known. As HTTP CONNECT and SOCKS proxies offer the most versatility and utility to potential abusers these were focused on initially, resulting in a final dataset of approximately 45,000 SOCKS4, SOCKS5 and HTTP CONNECT proxies. One of the key findings of this part of the research was that the longer a listed proxy remained active the more difficult it became to detect, and the proxy publication age and the detection cost (number of connection attempts required) were strongly correlated (see Fig 3). In addition, it became apparent that the single open proxy detection attempt that is commonly employed by network scanners (e.g. BOPM, HOPM, nmap's proxy nse scripts, proxycheck, etc.) is insufficient for detecting the average publically listed open proxy (see Fig 4). Of all tested HTTP CONNECT, SOCKS4 and SOCKS5 proxies only 48.23% were detectable on the first connection attempt. Of all proxies detected with a maximum of 20 connection attempts only a total of 83.93% were discoverable in 4 attempts, and even with 10 attempts the total coverage was 97.29%, leaving a further 2.71% undiscovered. Given the diminishing detection returns beyond 15 attempts it may be reasonable to conclude that 20 attempts "should be enough for everyone" (™) but the author intends to carry out further research to validate the claim in numerical terms.
Sociocultural aspects also stood out in the proxy detection dataset. When comparing the average detection cost with the emergence and attrition of proxy IPs there were discernable differences between the standard workweek and weekend (measured in UTC, see Fig 5). The middle of the workweek (Tuesday, Wednesday and Thursday) was relatively stable, and volatility increased around the weekend. The author is not currently offering any technical explanations for this pattern (number of publicised proxies remains relatively consistent from one day to another with a slight uptick on Sundays and Mondays) but notes that various hypothetical sociocultural causes for the weekday differences are conceivable.
The discoveries of the current research can be summarised by the following bullet points:
Most software designed for detecting open proxies tends to come with sufficient documentation and as such any replication of said information will not be attempted. Instead, "best practices" recommendations for commonly used or usable programs will be offered in light of the above research.
The well-established BOPM project has been forked and is actively developed as HOPM. This powerful proxy scanner was primarily designed for use with IRC but also comes with a reusable scanner library (libopm).
The author's current "best practices" recommendations for employing HOPM / libopm are:
These scripts perform three connection attempts per proxy and as such are a better starting point than the average proxy scanner's single connection attempt, but even with three connection attempts more than 20% of publically listed proxies will be undetectable. As such, the author's "best practies" for employing nmap for open proxy scanning are:
proxycheck is an older Linux/Unix command line program which performs configurable proxy scans of an IP and returns results via STDOUT. At the time of writing this the software has not been developed since 2004 and lacks support for protocols such as IPv6 or HTTPS.
Some aspects of note when using proxycheck are:
socat was not specifically developed for detecting open proxies, but this versatile network tool makes for a usable IPv6 and HTTPS-capable command line proxy detection tool which provides results via STDOUT. The project is actively developed and well supported.
As such socat can reasonably be used for open proxy detection via the command-line with the following caveats: