logo一言堂

Polipo and the GFW

Polipo is a web proxy. Together with ssh one can bypass the firewall to access website that cannot be directly visited, or to browse the web in an untrusted network to increase security. It is a very useful tool.

However, with China's GFW, some rare condition may happen.

Typical usage

You are on machine a, and want to access website running on b. But the problem is a firewall is bloking you to do so. Luckily, you have access to aonther machine c on the other side of the firewall, that can access b. The only accss you have on c is ssh. With ssh and polipo you can setup a proxy on c, and expose it through ssh securely, so you and you alone can access it.

Polipo setup on c

There is nothing special; default seting should be ok. You will only need it to listen to a local port, default is 8123

SSH setup on c

Again, nothing special. For your safety, it is strongly recommended to use public key authentication, and public authentication only.

clieent side setup

all you need to do is:

    ssh -L 8123:localhost:8123 MACHINEC

And you have a web proxy locally on port 8123 that can visit all the sites the c can visit.

HTTP and tunnel

Polipo is an HTTP proxy. It understands HTTP and can do a lot behind the scene.

  • caching. It caches files so the next time it does not need to fetch it for you.
  • multiplexing and pipelining. It reuse connection to the server for multiple clients to increase efficiency.

So, web performance is boosted and bandwidth is saved.

However, there is another method used to access the web, HTTPS. HTTPS is HTTP over SSL, so the communication is protected from end to end. Because it is end to end, none of the caching, multiplexing and pipelining can work. Polipo is a faithful HTTPS proxy as well; it does not do anything funny for HTTPS and just blindly forwarding a raw TCP connection. This is called a tunnel in polipo. The performance boosting features are no more; no caching, and client side connections and server side connections are 1 to 1 mapped and spliced.

China's GFW

China is blocking a lot of content on the web. It blocks contents by:

  • DNS tricks
  • drop connection, at various stage

If polipo is within the GFW, it subjects to the same limitation as a normal web client such as a browser, however, it behaves slightly differently. We have observed 2 kinds of dropped connections because of the GFW:

  1. connection dropped at the beginning. For the client this is the same as if the server is not online. For the majority of the block sites this is the common behavior.
  2. connection dropped at the last stage of the TCP 3 way handshake. Some blocked sites, curiously google's sites, are blocked this way.

And you have HTTP and HTTPS, which give you a 2x2 matrix. For HTTP the polipo will make the connection asynchronously and return errors of it own; and for HTTPS polipo will make the connection synchronously and the client will see the same effect as without the proxy:

  • case 1 HTTP. Polipo will return remote cannot be found.
  • case 2 HTTP. Polipo will return remote connection timeout.
  • case 1 HTTPS. no message is returned, but the client will see that remote site cannot be found eventually.
  • case 2 HTTPS. Your client will stuck for a while and timeout locally.

From the client point of view case 1 and case 2 HTTPS are the same but for polipo it is not. For case 2 HTTPS. polipo established a tunnel that is only half established. When client side timeout happened the server side is still at SYN_WAIT stage and will stuck here forever, there is no timeout. This is different from case 2 HTTP when polipo is a full HTTP and TCP client with timeout protection. The end result is leaked file descriptors. WHen the leak accumulates large enough polipo will be out of file descriptors, and stop functioning.

Remedy

It is a polipo bug. Upstream may take sometime to fix but in the meantime I can only do:

  • block google within polipo, so case 2 HTTPS does not happen as often.
  • restart polipo periodically.

It is sad that the GFW causes yet another grief.