WonderProxy Blog

June 22, 2011

Increasing Max-Connection time for VPN

Filed under: Uncategorized — Paul Reinheimer @ 4:13 pm

A few weeks ago we launched a new VPN Service aimed at workers on the go. Since launch we’ve improved the service twice (without touching the cost). We’ve adjusted the service so that everyone has access to both a North American server and a European one, and we’ve increased the max connection time from 4 to 12 hours.

The original 4 hour limit stems from how our customers were originally using the VPN service: to test their websites including flash and silverlight content. In that sense the four hour limit was reasonable.

Since launching the VPN service on its own, we’ve received an email as well as a comment or two from friends on that limit. We weren’t sold on increasing it (as there are other repercussions of that limit) until our friend Helgi at orchestra.io pointed out:

But here is my main problem, the VPN disconnected and everything I had running automagically connected on an unprotected network and thus negating the point of a VPN – This could happen while I go to the toilet or nip out for lunch/coffee and thus exposing my data longer than I’d want (e.g. until I come back).

Leaving our customers with an insecure connection, especially one where they might not be in a position to notice the change is definitely a problem. Accordingly, we’ve now raised the connection time to 12 hours.

March 30, 2011

Obtaining an Extended Verification SSL Certificate

Filed under: Uncategorized — Paul Reinheimer @ 4:41 pm

We decided to obtain an Extended Verification SSL certificate for WonderProxy and start running our website entirely through it (no standard http:// pages, just https:// for everything). Despite lots of regular SSL experience the process was rather foreign to us. We decided to obtain the certificate through GoDaddy for cost reasons.

Steps

  1. Register with GoDaddy and purchase an EV certificate token
  2. Flip over to their Certificate system, use the token to initiate a request
  3. Do the fun bits with OpenSSL to generate a Certificate Signing Request
  4. Hand that data off to GoDaddy

    Now this is the part where I thought the extra fees I was paying for the certificate would come into play, and GoDaddy’s team would leap into action researching my request, not so much. In fact what occurs is that your own highly paid lawyers or accountants leap into action, and bill you by the minute.

  5. Receive instructions from GoDaddy detailing the steps your Lawyer or Registered Accountant needs to follow. You need either a legal or accounting(?) opinion about the validity of your company and registration. The opinion letter has eight key elements:
    1. Your corporation is a valid, active, legal entity.
    2. You conduct business under this corporate name, and it is duly registered with the appropriate government agency
    3. The person signing & submitting the request is authorized to do so on behalf of the company
    4. The person approving the request is also authorized to do so (these were both me, it’s a small company)
    5. The company has a physical place of business and that address
    6. The company has a phone number and that phone number
    7. The company has an active bank account
    8. The company owns the domain in question

    Number 7 there caused us a few issues. Due to the official Quebec registrar being closed we hadn’t obtained a Quebec registration. We were registered federally, and had a provincial tax number, just not an official enterprise number. Without this enterprise number we were unable to obtain a bank account (or verify our PayPal account), so several things were delayed all for the want of a number.

  6. Submit opinion letter to GoDaddy
  7. Fill out a few forms from GoDaddy confirming the request, including the signer and approver, file with GoDaddy
  8. GoDaddy phones the lawyer who issued the opinion letter (using the phone number in some sort of lawyer registry (in the US this would be the Bar) to confirm the information and that they in fact issued the opinion letter
  9. GoDaddy phones the signer and possibly the approver (I was both people, so there was only one phone call) to confirm the details on their forms
  10. An internal GoDaddy “Audit” department reviews the data (this isn’t the person you deal with while completing the steps
  11. Certificate Issued

Total cost was probably ~$400 in professional services and GoDaddy fees. Our goal, clearly, is to have this cost outweighed by the level of trust and security the average user has for an EV certificate. Now that we’re offering dedicated VPN plans, protecting our users privacy from start to finish is even more important.

March 21, 2011

Usability Testing

Filed under: Uncategorized — Paul Reinheimer @ 1:21 pm

Several weeks ago while attending the fantastic Webstock conference I also attended a full day tutorial by Christine Perfetti of Perfetti Media on Usability Testing. The tutorial was fantastic, I’ve been interested in usability for years, and my shelf has several books on the subject (Designing Web Usability, Prioritizing Web Usability, and Don’t make me think) but I learned a lot.

During the tutorial we performed actual usability tests on our fellow attendees on our own websites. One of the tasks I assigned to my victims/volunteers was simple (or so I thought):

You work for a small company with a total of 5 developers and testers and require access to seven servers, mostly in the US but one or two in Europe would be great. Which plan will meet your needs?

Having developed the site, this is a question I can answer in seconds. It took the testers however several frustrating tries to actually find the information, generally they found it by exhausting all other options. The problem I discovered was the text of the link to the details page:

The testers read that link as being a “checkout” link, as opposed to a way to get more information. Thinking about it more critically, it was rather silly. The process of links you follow to actually purchase read: Sign Up -> Details -> Buy Now. The intermediate step seems like a step in the wrong direction. Placing Sign Up on the front page is a great call to action, but it’s not what the link does, and it hides critical details from enquiring minds.


Accordingly, we’ve now changed the text of the link to “Service Plans” (and tragically lost the snowman which has been with us since the start). This provides a much more sensible series of links “Service Plans” -> Details -> Buy Now. The Details button itself is still quite ugly, but that’s a problem for a future post.

I don’t expect this to have a radical and immediate affect on sales, but I hope I have made the site less confusing for the prospective client. I’d highly recommend usability testing (and Christine in particular as a trainer/facilitator) for any developer seeking to improve their site.

February 9, 2011

Miles per Milisecond, a Look at the WonderProxy Network

Filed under: Uncategorized — Paul Reinheimer @ 2:33 pm

Update: This post has been updated to account for pings being round trip times while distances are only one way. The original post failed to account for this in the last two tables, thanks to Steve for pointing this out.

Update 2: You can now view updates data with ping time between cities at our new WonderNetwork site.

Running a global network of servers for GeoIP application testing leaves you with a lot of servers and some interesting questions, occasionally an interesting combination. I found myself asking if we could compare ping times to physical distances, to see how efficient the internet was, and to confirm my suspicion that transferring data between Sydney Australia and Fremont California would be faster, mile per mile, than transferring between Boston and Fremont. My reasoning was that Australia → United States is one long cable, whereas within the US it would be switched frequently, which is slower.

First we generated a script to ping every city in our network from every other city in our network (Boston → New York and New York → Boston would both be executed). This took something like ten hours (we’ve since smartened up and now execute the test in parallel). Our results look something like this:

Ping between cities

Baltimore Brisbane Dallas Fremont Milan Moscow New York Paris Sydney Zurich
Baltimore 239.38 35.64 79.68 104.70 141.15 174.71 97.71 251.07 114.11
Brisbane 238.32 221.60 174.38 350.12 357.21 245.89 335.79 33.45 346.92
Dallas 34.53 220.14 44.00 140.00 168.00 36.80 130.94 221.72 127.74
Fremont 79.39 176.07 44.57 167.85 214.59 78.43 176.22 184.91 167.16
Milan 104.84 339.54 139.86 170.73 67.08 112.17 21.61 344.70 11.17
Moscow 140.82 366.42 169.04 209.00 67.07 131.29 56.74 387.09 60.29
New York 174.61 246.87 39.46 77.49 111.60 131.34 78.33 319.33 101.03
Paris 97.75 337.47 131.79 175.19 21.29 56.13 77.31 348.84 108.06
Sydney 262.50 42.98 222.25 191.96 345.21 376.57 261.24 354.16 358.75
Zurich 102.18 346.71 127.59 173.61 11.25 60.32 100.24 108.25 345.05

This table shows us what we expected. Sydney is far away from most of our other servers, so the ping time is high. Our Fremont server is incredibly well connected with a top tier provider so it has good routes. The Baltimore → New York ping may raise your eyebrows, it certainly caught our attention. A quick look at the traceroute shows:

traceroute to newyork.wonderproxy.com (69.147.239.239), 30 hops max, 60 byte packets
 1  173.246.103.253 (173.246.103.253)  0.551 ms  0.557 ms  0.604 ms
 2  4xe-pc400.vcore1-dc1.balt.gandi.net (173.246.96.33)  0.606 ms  0.661 ms  0.693 ms
 3  xe3-4-core4-d.paris.gandi.net (217.70.176.233)  97.342 ms  97.325 ms  97.272 ms
 4  p251-core3-d.paris.gandi.net (217.70.176.253)  123.541 ms  123.547 ms  123.528 ms
 5  linx.ge1-0.cr01.lhr01.mzima.net (195.66.225.15)  119.101 ms  119.085 ms  119.017 ms
 6  te0-5.cr1.nyc2.us.packetexchange.net (69.174.120.89)  181.881 ms  176.377 ms  176.349 ms

Our provider in Baltimore seems to be routing all of their traffic through their central datacenter in Paris, rather sub-optimal (we’ve opened a ticket).

Next we needed to determine how far apart these cities were, that being the other half of the equation. We worked with the city center when more specific information on location wasn’t available, and generated the approximate latitude and longitude for every server in our network using publicly available sources. We then used the excel calculation script from Movable Type Scripts to determine distances, and came up with this:

Distance between cities

Baltimore Brisbane Dallas Fremont Milan Moscow New York Paris Sydney Zurich
Baltimore 9553 1216 2448 4211 4853 171 3819 9845 4122
Brisbane 9553 8375 7138 10162 8795 9688 10349 457 10143
Dallas 1216 8375 1466 5355 5788 1376 4956 8638 5255
Fremont 2448 7138 1466 5983 5914 2564 5596 7481 5857
Milan 4211 10162 5355 5983 1429 4040 400 10348 137
Moscow 4853 8795 5788 5914 1429 4694 1554 9060 1371
New York 171 9688 1376 2564 4040 4694 3648 9993 3952
Paris 3819 10349 4956 5596 400 1554 3648 10600 304
Sydney 9845 457 8638 7481 10348 9060 9993 10600 10355
Zurich 4122 10143 5255 5857 137 1371 3952 304 10355

Finally, we merged the two tables, did some math and came up with this:

Miles per Milisecond

  Baltimore Brisbane Dallas Fremont Milan Moscow New York Paris Sydney Zurich
Baltimore 79.81 68.24 61.44 80.44 68.76 1.96 78.17 78.43 72.25
Brisbane 80.17 75.59 81.87 58.05 49.24 78.80 61.64 27.33 58.48
Dallas 70.42 76.09 66.63 76.50 68.90 74.78 75.70 77.92 82.28
Fremont 61.67 81.08 65.78 71.29 55.12 65.38 63.51 80.92 70.08
Milan 80.33 59.86 76.58 70.09 42.61 72.03 37.02 60.04 24.52
Moscow 68.92 48.00 68.48 56.59 42.61 71.51 54.78 46.81 45.48
New York 1.96 78.49 69.75 66.18 72.40 71.48 93.14 62.59 78.24
Paris 78.14 61.33 75.21 63.88 37.58 55.37 94.38 60.77 5.63
Sydney 75.01 21.26 77.73 77.94 59.95 48.12 76.50 59.86 57.73
Zurich 80.68 58.51 82.37 67.47 24.36 45.46 78.85 5.62 60.02

Here things start to look a bit better for several of the connections. In the first chart which looked at raw ping times Sydney faired poorly. Here, accounting for the extreme distance between Sydney and the majority of our network we see that, mile for mile, it’s actually doing quite well. Other links like Paris → Milan that were looking quite good previously are now exposed as being rather inefficient.

While we’re examining the efficiency of our network, this is the chart we’ll use. Simple ping times tell a story (we use smoke ping to monitor consistency and packet loss), but not the whole story. From this we get a more realistic view of hour our connections are performing.

One last experiment

Networks are fast, but just how fast? In a vacuum Light travels roughly 186,000 miles/second, or 186 miles/millisecond. Light travels slower through fiber optics, on average about 35% slower, which gives us ~120.9 miles/millisecond. Let’s look at the speed of pings across our network, as a percentage of the theoretical maximum:

Network speed as a percentage of the speed of light

  Baltimore Brisbane Dallas Fremont Milan Moscow New York Paris Sydney Zurich
Baltimore 66.02 56.44 50.82 66.53 56.88 1.62 64.66 64.87 59.76
Brisbane 66.31 62.52 67.72 48.01 40.73 65.18 50.98 22.60 48.37
Dallas 58.25 62.93 55.12 63.27 56.99 61.85 62.61 64.45 68.05
Fremont 51.01 67.06 54.41 58.96 45.59 54.08 52.53 66.93 57.96
Milan 66.44 49.51 63.34 57.97 35.24 59.58 30.62 49.66 20.28
Moscow 57.01 39.71 56.64 46.81 35.25 59.15 45.31 38.72 37.62
New York 1.62 64.92 57.69 54.74 59.88 59.12 77.04 51.77 64.71
Paris 64.63 50.73 62.21 52.84 31.08 45.80 78.06 50.27 4.65
Sydney 62.04 17.59 64.29 64.47 49.59 39.80 63.28 49.51 47.75
Zurich 66.74 48.40 68.13 55.81 20.15 37.60 65.22 4.65 49.65

For this comparison to be fair cables would need to be run directly between cities, which is clearly not the case. We still think it’s interesting. Hitting 68% of the theoretical best speed is quite remarkable.

January 20, 2011

Improving Site Performance

Filed under: Uncategorized — Paul Reinheimer @ 9:48 am

Our site hasn’t really been our focus over the past months, instead I’ve been concentrating on acquiring new network locations, while Will has been improving our server setup and maintenance architecture (we’ve blogged about Setting up Proxy Servers and Managing 30+ servers previously). More recently we’ve been taking a harder look at how the site performs, both for us, and for our users, and found it lacking.

Examining our main page’s performance with XHGui quickly revealed that a considerable amount of time was being spent generating the information displayed in the footer (server list, server country list, and proxied traffic). This data was shuffled off to APC‘s user storage mechanism removing it from the average page load entirely. Google Webmaster Tools still reported a startlingly high average pageload time:

Google Webmaster Tools analysis of site performance

This was quite surprising as the site seemed pretty snappy overall. Further investigation showed that server specific pages loaded more slowly (3-5 seconds!). Since our goal is to provide proxies for GeoIP testing, having server specific pages load slowly is sub-optimal. Looking at the pages with YSlow and Page Speed reveals that the real culprit is the embedded Google Map. Switching to use a static map greatly reduced page load time (to ~800ms). This also reduced functionality, as the map is no longer dynamic, but we plan on switching to a combined static & dynamic system in the future.

If you’re interested in front end performance, High Performance Web Sites and Even Faster Web Sites are invaluable.

Reading through the suggestions from YSlow a bit more closely, then diving into the Apache documentation I also managed to find a few quick gains by configuring our web server to do a bit more work for us:

  ExpiresActive On
  ExpiresByType image/png "access plus 1 month"
  ExpiresByType text/css "access plus 1 month"
  ExpiresByType image/jpeg "acces plus 1 month"
  <Location />
    SetOutputFilter DEFLATE
    BrowserMatch ^Mozilla/4 gzip-only-text/html
    BrowserMatch ^Mozilla/4\.0[678] no-gzip
    BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
    SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png)$ no-gzip dont-vary
  </Location>

YSlow will tell you to turn off eTags under the default rule-set. If you’re running with a single web server this is bad advice. You may want to select the Small Site or Blog ruleset to get the most out of the tool. Moving forward we may decide to re-organize our javascript code to make expiry rules for it easy (we can’t set distant expiry for all javascript documents as our live status bar relies on it), for now we’ll leave them as is. We’re happy with our new scores:

Screenshot showing our A grade with YSlow

YSlow - A Grade

Screenshot showing our grade of 93 within Page Speed

Page Speed - 93/100

Having sorted out the low hanging fruit on the front end, I looked at the account pages and the administrative tools we’re using. Performance there was abysmal, with some pages taking at least 10 seconds to load. The pages with the worst performance were the ones displaying any sort of usage statistic; the very worst being ones that displayed aggregate statistics for all users. Looking at our usage table built on the squid logs it has nearly a million rows. Despite being indexed there’s still a lot of data to aggregate and sum.

With an eye toward improving performance I decided to build some summary tables. The first one aggregates usage by user, by server, by day. This summary table was roughly 1/23rd of the original usage table. Makes sense since it rolled up the 24 hourly reports into one. This table is considerably quicker to query, and I started rolling it out to various portions of the admin section immediately.

Table indicting much higher performance using summary tables

While these numbers are still rather pathetic, remember that these are admin actions, not forward facing pages. Optimizing for these would be folly; the time would be much better spent working on outward facing pages read by users and search engines alike. The significant increase here will simply make managing the system a speedier task.

Knowing that the summary table is going to be useful, we need to keep it up to date. To accomplish this task we’re running this query after every log rotation (a process described in our post Squid log parsing for proxy billing). Luckily I’ve got friends at Percona (authors of the MySQL Performance Blog) who gave me a hand with crafting the query:

INSERT INTO sum_usage_daily SELECT
	`user_id`,
	`group_id`,
	server_id,
	date(`timestamp`)AS `date`,
	sum(`bytes`)AS `bytesSum`
FROM
	`usage`
WHERE
	`server_id` IS NOT NULL
AND timestamp BETWEEN date_sub(date(NOW()), INTERVAL 2 DAY) AND date(NOW())
GROUP BY
	`dd`,
	`user_id`,
	`server_id`,
	`group_id`
ORDER BY
	NULL ON DUPLICATE KEY UPDATE bytes = VALUES(bytes);

Note: ON DUPLICATE KEY UPDATE had numerous bugs prior to MySQL 5.0.38, ensure you’re on a recent version before mimicking this query with your own data.

This query looks at the past two days of traffic, either inserting new records, or updating existing ones when they exist. The  AND timestamp BETWEEN date_sub(date(NOW()), INTERVAL 2 DAY) AND date(NOW()) clause ensures we’re looking at full days, rather than the last 48 hours (the latter would result in incorrect summaries for entire days). This keeps the summary table up to date throughout the day and ensures that yesterday’s data is correct as well.

My only regret was changing some of the column names in the summary table. While “date” represents the contents of the column better than “timestamp”, it did mean that column references in code had to be changed rather than just switching the table reference. Other than that the conversion has been quite quick and painless.

Having reigned in the performance of the site, it’s time to look at adding new site features, and a few new products. More on those later.

January 6, 2011

Squid Log Parsing for Proxy Billing

Filed under: Uncategorized — Paul Reinheimer @ 7:31 pm

Parsing logs from Squid is a routine task for us, we perform this task on a regular basis (currently hourly) to bill our customers for used traffic. The process involves a script on our central server connecting to each remote server in turn, rotating the logs on that machine, then pulling the old one down to the local system. The system silently ignores servers that are down, we’re already kept up to date about outages by Nagios. The script will warn if it’s able to connect, but anything goes awry (permission errors, premature disconnect, etc.) By rotating the logs using a central script, rather than doing it automatically on each machine, the case of a  log being rotated, but not transferred back to the central server (due to a down or unreachable server) is mitigated.

WonderProxy Log Handling

The log parser is called with a single parameter, the path to the log in question, from that it determines the name of the server from which the logs originated. The system where most of the scripts are largely oblivious to the source of their data has allowed us to expand our network significantly without any changes to the scripts (the most recent change was moving from a 4 hour rotation cycle to a 1 hour cycle). The logs we receive from squid look like this:

1294011828.214   1309 184.163.123.123 TCP_MISS/200 10032 GET http://wonderproxyblog.com/ paul DIRECT/76.74.254.120 text/html
1294011828.414    581 184.163.123.123 TCP_MISS/404 5564 GET http://wonderproxyblog.com/.well-known/host-meta paul DIRECT/72.233.2.58 text/html
1294011828.784      0 184.163.123.123 TCP_MEM_HIT/200 943 GET http://s0.wp.com/wp-content/themes/h4/global.css? paul NONE/- text/css
1294011828.787      0 184.163.123.123 TCP_MEM_HIT/200 791 GET http://s2.wp.com/wp-includes/js/l10n.js? paul NONE/- application/x-javascript
1294011828.795      0 184.163.123.123 TCP_HIT/200 9024 GET http://s.gravatar.com/js/gprofiles.js? paul NONE/- text/javascript
1294011828.858     66 184.163.123.123 TCP_MISS/200 2158 GET http://b.scorecardresearch.com/beacon.js paul DIRECT/96.17.156.19 application/x-javascript
1294011828.871      0 184.163.123.123 TCP_MEM_HIT/200 1511 GET https://ssl-stats.wordpress.com/w.js? paul NONE/- application/x-javascript
1294011828.928    139 184.163.123.123 TCP_MISS/200 2749 GET http://edge.quantserve.com/quant.js paul DIRECT/64.94.107.11 application/x-javascript
1294011828.997    205 184.163.123.123 TCP_HIT/200 27230 GET http://s1.wp.com/wp-includes/js/jquery/jquery.js? paul NONE/- application/x-javascript
1294011829.091    334 184.163.123.123 TCP_MISS/200 414 GET http://wordpress.com/remote-login.php? paul DIRECT/74.200.247.60 text/html

We then parse that output with a regular expression (regex):

(?P<timestamp>\d+)\.\d{3}\s+-?\d+ (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (?P<cache>\w+)\/(?P<httpresponse>\d+) (?P<size>\d+) 
	(?P<uri>.+) (?P<user>\S+) (?P<method>[A-Z]+\/\S+).+!\n

The regular expression names its parameters, giving us an easy set of data to work with inside our billing script. The regex isn’t perfect, the system occasionally runs into a line it doesn’t parse properly which it emails back in a report. Originally this happened frequently, so the regex was tweaked (inside the RxToolkit of Komodo IDE), it’s now quite rare. As you can see there’s an incredibly large number of lines for just a partial page load (me loading the blog front page), it doesn’t include the numerous requests required to populate a google map widget from a recent post, or any of the images used on the page. We don’t require this level of granularity for billing, nor do we necessarily want it to be easily accessible. To turn this into a slightly more manageable log entry we merge requests: All requests through a given proxy server, for a given hour, are merged into a single piece of data to be inserted into the database containing: the user’s ID, the user’s account number, the server ID (from our network), the traffic in bytes, and the timestamp of the hour. This gives us a more manageable usage table, the raw log files are merged with other files from the same server on the same day and archived. While we’d like to just get rid of them (they’re just gathering dust and occupying hard drive space) many jurisdictions require record keeping from providers, and we’d like to be able to fully account for usage for our customers upon request.

To come back to the log file for a moment, I’d like to look at one piece of data in particular:

1294011828.871      0 184.163.123.123 TCP_MEM_HIT/200 1511 GET https://ssl-stats.wordpress.com/w.js? paul NONE/- application/x-javascript
1294011828.928    139 184.163.123.123 TCP_MISS/200 2749 GET http://edge.quantserve.com/quant.js paul DIRECT/64.94.107.11 application/x-javascript
1294011828.997    205 184.163.123.123 TCP_HIT/200 27230 GET http://s1.wp.com/wp-includes/js/jquery/jquery.js? paul NONE/- application/x-javascript

What our regular expression terms cache, the squid docs term Squid Result Codes and it generally indicates where squid got the resource. In the case of a miss it had to retrieve the resource from the URL in question. A TCP_HIT indicates it was cached, while a TCP_MEM_HIT indicates it was in cache, and still in memory avoiding hitting the disk. In our experience the list from the squid docs is non-exhaustive, for example TCP_REFRESH_UNMODIFIED doesn’t appear, so some research or testing was necessary as we put the system into use. We use this information to determine whether to record the size at its original value, or to double it (as a server proxying a 100Kb resource must download it from the source server, then upload it to the end user).

We’ve considered technologies like Spread to give us near real time logs from across the network, but we haven’t really seen the need. None of our customers are asking for it, and the need to go read old log files for detailed information has only come up twice, both times well after the fact.

January 2, 2011

Buying VPS Systems

Filed under: Uncategorized — Paul Reinheimer @ 1:44 pm

One thing we’ve managed to accrue a lot of experience with is dealing with VPS providers. In an ideal world we’d probably be able to buy thirty servers from a single provider, but that’s not the case. Most of our providers give us a single server to work with, so we’re managing lots of accounts in parallel. We’ve got some great providers that month after month give us zero issues to deal with, sketchy providers where we had to get PayPal or our credit card issuer involved, and some odd cases like where the provider went down between deciding to order, and finding the credit card to pay with.

Finding Candidates

Finding a VPS in a specific location can be somewhat challenging. The hosting market is extremely competitive and is flooded with GoogleAds on any relevant term; a few sites have been so well optimized for search engines, they appear even when irrelevant. Searching for generic phrases like “hosting in denmark” are likely to yield results containing a lot of basically spam sites, that link to companies that pay a referral fee regardless of their actual location. The most effective method we’ve found is to use Google Maps to search for hosting providers in the appropriate location.

Due Diligence

Having already been burned numerous times, we then do a few searches to look for reviews of the provider. Web Hosting Talk has a popular set of forums and a lot of active users. It’s a great place to start. There’s also a lot of… less experienced users there posting based more on their own incompetence than the provider, beware. One thing I do watch out for is hosting providers posting details of a users account in response to a complaint. While I understand their desire to defend themselves, I generally feel this presents a lack of professionalism on the part of the provider so I steer clear.

It’s also a good idea to look at how long a company has been in business. While every company does need to start somewhere, we’ve already been around long enough to see several hosts come and go. Finding some history of posts in WHT is helpful, finding a relatively short period of heavily discounted coupon posts is probably an indicator that they’re desperate to get their first customers (and revenue) through the door.

Looking for coupons

Lots of providers have sales nearly constantly, we’ll do a few quick searches on the net as a whole, and on Web Hosting Talk specifically. We’ve managed to knock some costs down significantly by signing up with coupons.

Signing Up

A pretty basic process, that a few providers do manage to make difficult (with non-refilling forms after error, multiple step process, or email based validation mid-registration). We’ll take a bare bones Debian 5 64bit machine whenever we can get it.

Favorite Providers

It doesn’t take much for us to like you: Give us the box quickly, give us the operating system we asked for, keep an accurate clock, and don’t have serious downtime. That said, our list of favourite providers isn’t terribly long (we’re only listing providers we’ve had for at least 6 months, and have no serious issues with).

VPSVille – Our VPS in Toronto has been a solid part of the network, no issues since we procured it.
GPLHost – We have several locations with GPLHost, they’re all reliable, we appreciate being able to set up our box with a minimal Debian install.
Slicehost – Solid provider, easy to set up and get going.
Cool Housing – Reliable provider.
Gandi – Their payment system is tricky, and leaves a bunch to be desired but the VPS has been solid.
MyHost.ie – Reliable provider in Ireland.

Providers We’ve left with prejudice

Spry – Migrated our systems to a different city unapologetically, and without notice. They then blocked ports that we used to monitor our system’s health, rendering our monitoring infrastructure useless.
Delimiter – Stopped answering support tickets, cities would just go dark either for days at a time or permanently. If you compare their offered locations six months ago to the present you’ll notice a lot of omissions.
Enotch – Completely failed to actually set up our VPS, we ended up requesting a refund and using a different provider.

December 28, 2010

Proxies and GeoIP

Filed under: Uncategorized — Paul Reinheimer @ 1:44 pm

GeoIP is a technology gaining in popularity, simply put it attaches a geographical location to a user, based on their IP address. Depending on the service you use and the users geographical location this data could range from simply the user’s country, to granular detail like which neighbourhood in which city. MaxMind is currently the leader in GeoIP information, providing both free and commercial databases. Other popular providers include: IP2Location and InfoSniper.

You can check to see how your IP geo locates with this test.

Websites are using GeoIP information to customize all manner of services to their end users. Google selects the correct local version (google.com vs google.co.uk), Amazon may offer a more appropriate local store, Hulu refuses to display content to users not within the United States. More commonly merchants use a user’s location to select pricing points, display currency, or the most appropriate billing system.

Testing these systems presents a challenge, especially when multiple systems are involved (you may control your site, but your billing service might be independently run). A common technique for testing these systems is to force the system to accept your location as being elsewhere (e.g. example.org?ip=127.0.0.1). The inherent problem is that systems you interact with will ignore your hacks, and you’re left with a difficult problem when you move to production: you either lose the ability to test the GeoIP in production, or leave these hacks enabled (while you can attach these hacks to specific IPs to only allow them to work from your office, you’ve now added two additional layers on top of your codebase, that need to be maintained and bug free).

Proxies can help. By using a proxy server you’re able to route all of your web traffic through a specific location, you appear to both your application and whatever systems it interacts with to be there. At WonderProxy we’ve got over 35 servers in over 25 countries, perfect for GeoIP application testing, without the hacks.

There are limitations to this however, as systems that use Flash will detect the users original IP (as flash generally ignores proxy settings). In those environments a VPN is needed. More on those later…

December 27, 2010

Lessons Learned

Filed under: Uncategorized — Paul Reinheimer @ 5:04 pm

We’ve learned a lot over the past year building our network of GeoIP proxies. Bad hosting providers, software roll-outs, and crazy customers. This blog is where we hope to share some of what we’ve learned while building WonderProxy.

Theme: Customized Shocking Blue Green. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.