How to Scale a Paywall by Proxy

The most touted feature of the public Internet is, and has always been, free access to information.  Whether that information is porn, what your friends are eating for lunch, or why your government is choosing to drop bombs on funeral goers is irrelevant.  The point is that the information is out there, you can get to it, and it’s as free as used condoms at a Greyhound bus station.  Except when it isn’t.

Since Capitalism can’t let anything go un-capitalized, there was a need to stop the ravenous masses from devouring the free lunch buffet of data swirling around in the tube-y netherworld that is the Internet.  And so Capitalist God created the paywall, and it was annoying as all fucking hell because you really want to read that article but they won’t fucking let you and nothing is fucking fair any more.  Calm down and fear not my child, for I shall show you the way up and over this devious machination of free market sourced technology.  Well probably anyway….

See there are two major types of paywalls: hard ones and soft ones.  Hard ones block access to all users until you pay the cyber-toll.  These are rare because this heavy handed approach will decimate your online audience and is worthless unless your content is worth something to enough people willing to pay for it.  For example, if I did this with The Daily Segfault even my own mother wouldn’t read it.  So of course, the only entities willing to do this are large media outlets, typically well read newspapers like the New York Times or the Wall Street Journal who have brand recognition, content people are willing to pay for, and steadily declining revenue streams.  Occasionally very desperate non-profits like WIkileaks do this, but it usually doesn’t end wellAs a side note: do donate to Wikileaks they do good work and really do deserve what ever money you can chuck at them.

Those hard paywalls I can’t help you with, and it should also be noted that the approach I will show you to scale paywalls may not work in every instance as not all paywalls are built the same way.  Thankfully though, most paywalls are of the soft variety and most of these use the same method for blocking us e-rabble.  A “soft” paywall is one that, instead of blocking you outright, will allow you to make a certain number of visits to the cherished content before it gets all bitchy and demands that you pony up the cash if you want to keep reading.  The key vulnerability in this system is that the site has to remember who you are, or else it might just block someone who isn’t a mooching, good-for-nothing, sack-o-shit, 47%-er.

So obviously the key here is to just stop being you.  Basically the way most sites determine if you are you is by checking the IP address of where your request to view the content is coming from.  The content protecting server then checks it’s database to see if you have consumed the dataz too frequently and need to be blocked, otherwise it lets you view the content for free this time and increments the number of times you visited the site in the database by 1.  Now you could scale a soft paywall by bypassing the organization’s firewall, executing an escalation of privilege attack, accessing the server’s database file, and resetting the value for the block counter to 0.  However, this is a lot of work and more than just a tad illegal for trying to view a side-boob article on the Huffington Post.

breaking news: look at that side boob!

BREAKING NEWS: Look at that side boob!

The answer is a lot simpler than that, although admittedly less sexy.  In the words of Mr. McGuire in “The Graduate”, “I just want to say one word to you. Just one word…proxies“.  A proxy is a very simple but powerful concept for hackers and anyone who just wants to be more anonymous on the internet.  Basically a proxy is just a server that makes a request of another server on your behalf.  So for example, if your work blocks facebook.com but not efreeproxyip.com (or one of the bijillion public proxy servers like it), you can go to said proxy and ask it to go to facebook for you.  It does this and then sends your computer the stuff you wanted from facebook.  Since this data is coming from the proxy and not the blocked site your companies router has no idea that you’re wasting time on facebook when you should be wasting time working.  Word of warning: obviously if you are going to be proxying personal information (like passwords, home address, etc) you better damn well trust your proxy, because they will have access to everything you send through the proxy.

So let’s try this out.  Our vict…err…helpful assistant today will be the website of my local media institution, the orlandosentinel.com.  Unless you have your own proxy server set up, you’re going to need a proxy.  Thankfully the helpful folks over at Public Proxy Servers have a database of public proxy servers.  As you can see from the picture here, the Orlando Sentinel will block you after you reached “your allowance of free articles”.  So to re-up your allowance it is time to become Mr. efreeproxyip.com by going to said site and typing in the URL of the article you want to see, such as http://www.orlandosentinel.com/news/local/breakingnews/os-teresa-jacobs-texts-memo-20121010,0,6681231.story.

Before and after the proxy pwn’ing…do note how the URL on the second picture is NOT the orlandosentinel.com.

It really is that simple.  Go surf the free web seas!

Doing Useful Stuff with the Wireshark Network Analyzer

Anyone interested in network programming and general hackery will eventually come into contact with Wireshark.  This is because Wireshark is an insanely useful and well built tool for network analysis.  Essentially what Wireshark does is something called “packet sniffing”,  but to describe this very useful function requires that we understand just a little bit of background knowledge.

osi_model

The International Standards Organization/Open System Interconnection (ISO/OSI), better known as simply the “OSI model”, displays the seven theoretical levels of network organization. For more information click on this image.

Whenever you send data over a network there is a necessary and complex process of encoding additional meta-data (data that describes data) which is used by routers to determine how and where to send the data.  Furthermore, additional meta-data is needed to instruct the receiving computer on how to handle the data, such as displaying the data as a web page, or a flash video, etc. etc.  There is also meta-data and special protocols required to negotiate between the computer sending the data and the computer receiving, to make sure that the data intended to be sent was in fact sent, and how to go about sending that data.  It is not uncommon for there to be more additional meta-data generated on how to send data, than the actual data that the user intends to send!

However, I think it is better to leave the course in data networks to the far better educators at MIT, who can teach you this for free and much better than me!  Basically the point I’m trying to make is that when you send even a small message over a network, it requires a lot of additional data to describe it.  Obviously all of this data cannot be sent in one pass, because it would be a real strain on the network (AND WOULD BREAK THE INTERNETS).  Also, what would happen if you sent all that crap and it wound up getting lost along the way?  That would suck, right?  So basically your computer chops up all this data into multiple pieces called datagrams or more informally, packets.

Wireshark basically reads all the packets received by your computer’s NIC (Network Interface Card) and displays all the information in a well-organized form allowing you to monitor your computer’s communications with other computers on your LAN (Local Area Network) and the Internet.  Often this traffic is conversations between your computer and another computer on your network or a server on the internet, although it can also include broadcast transmissions such as ARP packets (Address Resolution Protocol).  These packets are sent out to all computers in a network, for example ARP packets are sent out so a computer can determine where another computer is on the network.  Not unlike a person entering an office and asking each person “Hey are you Bob?”, until they find Bob. Obviously computers are a lot more patient than people.  With wireless communication the medium requires that all communications are broadcast to all computers within the range of the wireless access point.

Typically your NIC drops the packets of any broadcast transmission not intended for your computer.  This can be changed by setting your NIC into promiscuous mode (or as I like to call it “whore mode”).  Whore mode is particularly useful on wireless networks as you can monitor all traffic between each computer on the network and the wireless access point.  This is why you should always be careful on unencrypted networks, because any unencrypted packets you send can be read by any dick with Wireshark.  Unfortunately, some of the more prudish OS’s, such as windows, do not support whore mode with some wifi NIC’s.

Wireshark_Icon

Wireshark. For when you need to find out how you broke your network or just because you want to be a dick at Starbucks but can’t bring yourself to pretend to write a screenplay.

With that out of the way, let’s crack open a wee bit of the shark and see what we can do with it.  You can download Wireshark binaries for Windows and Apple, as well as source code here.  With Linux you can either directly compile the source code or use a package manager such as yum or aptitude, although I’m pretty sure Linux users already know this.  The Wireshark page is a vast trough of information on all things Wireshark, and I’d recommend going there for Wireshark related questions.

So once you install Wireshark, you should be ready to go.  Now you need to start capturing packets, you can do this by going to capture -> interfaces.  At the menu you can see all the NIC’s present on your computer including two special interfaces “any” and “lo”.  As any suggests it captures traffic from all of your NIC’s at once which could come in handy if you want to sniff traffic on a wireless and wired network at the same time, among other possibilities.  Lo stands for “local” and doesn’t actually capture packets on the network.  Instead lo captures inter-process communications within your own computer, essentially allowing you to view programs talking to other programs on your local machine.  You can also set options such as capture filters which can be used to block out certain types of traffic that you aren’t interested in, which is a subject for it’s own post.

 Here is an example of where Wireshark can be used for good.  You know when you’re connecting to an encrypted wireless network and it gives you a bunch of options for authentication like PEAP, TLS, WTF, ETC?  I was having trouble connecting to my university’s (go Knights!) wireless network and wasn’t sure why it was being such a bitch.  This is what Wireshark does.

MAC Addresses redacted to protect me from the NSA.  Click the image to be directed to a page with this enlarged to readable size.

So, I started up Wireshark and started capturing network packets on my wireless card.  What you can see from the image to the left (when enlarged anyway) is my computer’s NIC (“TwinhanT”) attempting to negotiate my use of the university’s WPA encrypted network with the wireless access point (“Cisco_16:e7:12”).  As you can see by the packets marked “Failure”, my NIC is not having much success getting our friend Cisco to let us onto the network so we can be snarky on Facebook.  This shall not stand!

What is happening in this exchange is that TwinhanT and Cisco are trying to negotiate which authentication protocol to use.  The clue to what is going wrong can be seen in the “info” portion of packets numbers 3, 8, and 13; namely the “Request, PEAP”.  We only need to dig slightly deeper to find the problem.

Click the image to be directed to a page with this enlarged to readable size.

The reason for the aforementioned problem can be seen in the highlighted portion of the image to the right.  Basically, Cisco is all like, “Yo TwinhanT mang, I’m down with the PEAP let’s use that for authenticating our secret convo”.  However, TwinhanT is saying, “You know I’m really more into EAP-TTLS, myself”.  So then Cisco is all like, “NAH BRAH, REJECTED GTFO!!1!”.  Long story, short, I switched the authentication method to PEAP and everything was sunshine and puppies.

Pictured: sunshine, puppies, and the beginning of a beautiful friendship.

That pretty much wraps up what I hope was a useful introduction into what Wireshark is, and the good / evil that can be done with it.  If you fractured pieces of humanity in the screaming void of madness and trolls that people refer to as “the Internet” appreciated this rambling nonsense I would be more than happy to waste more of my time fabricating more rambling nonsense for you to jam into your dirty mind holes.  Please feel free to point out all the mistakes I made in this column and how much it sucks, but be forewarned that I will almost certainly ignore you.

Which Programming Language Should You Choose?

Introduction: My Hesitation on the Path to the Dork Side

            When I first became interested in programming one of the first things I agonized over was which programming language to start with.  I knew very little about programming (and computers in general) and I was deathly afraid of pouring time and energy into learning a language that would turn out was about to become obsolete.  I worried that if I learned an obsolescent language that I would have to spend even more energy unlearning all that I learned from working with the previous language.  After some searching on the internet and talking to fellow students, I finally decided on learning C++.  From what I heard it was an in demand language so that was that.

In retrospect, I’m surprised that I wasn’t aware of how silly that mindset was; I did know how to use a search engine right?  Something I came to realize is that once you learn the fundamentals of programming, it really is not that difficult to learn additional languages.  As long as you don’t choose a very obscure or antiquated language, you should be fine (which you probably wouldn’t come across as a novice anyway).  Despite what I thought at the time, programming languages have a much longer lifespan than most computer technology I was familiar with.  As a child I was aware mostly of computer hardware and video games, both of which have very short lifespans in comparison to a programming language.  Whereas your iPod from five years ago is ancient, the C++ programming language dates back to 1983.  Furthermore, C++ is an extension of the older (and still non-obsolescent) C programming language, which dates back to somewhere in late 60’s when it was being developed by Dennis Ritchie at Bell Labs.

Something else to consider is the relative similarities between many programming languages in terms of syntax, usage, and philosophy.  Many languages use the same (or quite similar) symbols and usage conventions.  Also it is quite common for languages to have extensions or options to allow the language to adopt paradigms not commonly used by that language by default, and allow it to replicate features of other languages.  While there are definitely clear differences between programming languages, it is usually true that becoming proficient in one programming language significantly lowers the bar in terms of difficulty of learning a new programming language.  This is completely the opposite of my fear that I might have to ‘unlearn’ techniques if I picked the “wrong language”.

Despite the following bit of reassurance it is important to point out that programming languages do have significant differences.  It is important to know a few things about computing in general before picking a programming language to learn.  Some programming languages have specialized uses and may not be suitable to specific tasks, while being invaluable to others.  Other programming languages are very broad and can be applied to many uses to varying degrees of success.  This article is an attempt to provide a guide that I wish that I had all those years ago when I began my journey to the dork side.

Low Level: Bit by Bit, Byte by Byte

            The first thing to know is the layers of programming levels into which all languages fall, be it not very neatly however.  All languages are subjectively ranked from lowest to highest, with a clear delineation between high and low, and a lot of ambiguity in between that.  The levels refer to how close the language is to the hardware, or how abstract the language is.  Lower level languages have the most control over hardware, but are not very portable (meaning that programs are less likely to work on a wider variety of systems) and take longer to write.  On the other end of the spectrum, the higher level languages are very portable and easy to write, but have much less control over the hardware.

At the lowest level is machine code, the strings of 0’s and 1’s that are the digital lingua franca of modern computing machines.  Machine code indicates exactly which circuits on a computer or device is switched on and which are switched off.  This allows for microscopic level control over a computer’s hardware, but is very specific to the computer or device for which it is intended.  Since computer hardware is typically proprietary technology, it varies dramatically from manufacturer to manufacturer, and it is the least portable of all computer languages.  Machine code has very little likelihood of being compatible with anything but the exact type of device it is written for.  Furthermore, machine code is very difficult to program in directly and thankfully virtually no one programs in machine code directly.  Regardless of these deficiencies all other programming languages are eventually translated into this language when executed.

Slightly higher level is assembly code, Assembly code incorporates simple human language with very basic commands such as LOAD, STORE, and ADD.  This language was more common before the development of C, as it was much easier to work with than machine language and more portable.  Assembly language is translated into machine code when it is run, and is still focused on manipulating data on a very low level, focusing on basic calculations and storage of data in computer registers.  Despite its high level of control over hardware, it is rarely used today outside of developing simple hardware drivers and working on devices with very little available memory (although higher level languages are now taking over the latter).  Working on a typical application with assembly language or machine code is akin to building a car atom by atom, or molecule by molecule.  Obviously neither machine languages nor assembly languages are a good choice for a first language.  Indeed, many modern programmers know very little or any of either.

A Block of assembly code

If you're looking at this and thinking "hey this is obvious", call your local mental health professional and/or priest. To see what this does by clicking the image.

Mid-level Languages: A Better Way to Build a Car

            The first language that is a viable choice is C.  C is an older language that is still in use in many legacy applications, and whose concepts underlie many modern programming languages.  While C frees the programmer from much of the micromanagement of data, it still allows the programmer a great deal of access to the underlying hardware.  This, combined with the lack of modern programming concepts such as garbage collection, object oriented development, and exception handling, still places a lot of responsibility on the shoulders of the programmer.  While it is inevitable that most programmer will learn at least a little C, it is probably not the first language you should start with as it assumes that the programmer is capable of avoiding things like buffer overflow and performing other nitty-gritty aspects of programming.  In 1983, Bjarne Stroustrup developed C++ to provide modern features and object-oriented concepts such as classes.  While still in heavy use in modern programs, I would also argue that the many of the responsibilities inherited from C also keep this from being the best choice for a first language.

Angry C being angry

C will not hold your hand. C will not clean up after you. C obeys without question. (See more angry C by worldgnat by clicking the image)

Higher up the continuum from C/C++, things start to become a little cloudier (for me especially).  The next language to consider, in my opinion would be Java.  While it shares some similarities to C, in terms of syntax and conceptualization, Java is especially designed to be object oriented and has many modern features built into the standard language.  What separates Java from C++, is both its automatic management of many programming tasks and how the language is translated into lower level languages.

While C++ links together several files and then compiles into an executable binary (ie. a program written in machine code), Java compiles into something called bytecode.  Bytecode is an intermediate representation, essentially a boiled down depiction of the java code which is then feed to a program called an interpreter.  The interpreter then continues the work originally set out in writing the bytecode and returns an executable binary program which can be understood and manipulated by the computer.  The advantage of this approach is portability.  Due to this feature, Java code written on one machine is able to be run on any machine that has the Java interpreter.  The interpreter’s backends allow for executable code for that specific machine to be generated from bytecode that is feed into it.  By working as a middleman the interpreter clears out a lot of the problems of earlier languages in terms of portability, but sacrifices execution time in the process.  C/C++ programs typically run more quickly than Java programs as they are directly compiled into machine code, likewise machine code runs more quickly than C/C++ programs as they are machine code (although the eventual products of C/C++ and Java are also machine code).

Dynamic Languages: Learn By Doing

            While Java is definitely a good potential first language due to its portability and enhanced ease of use, I would recommend similarly high level languages such as Python, Ruby and Perl (personally in that order).  Up until this point, most of the languages (C/C++, Java) are what are known as statically typed languages.  These languages require that you write a program, link the various files together, compile them together into bytecode or machine code, and then run the executable.  On the other hand languages such as Python are dynamically typed.  Dynamically typed languages do not make executable files, instead the bytecode is interpreted every time the program is run.  While this incurs a performance penalty, it allows code to be written on the fly.  This means that you can type in programs a fragment at a time, with the results being shown as they are entered.  These interpreters allow you to test expressions and code fragments in these languages and see the results, typically immediately.

Such a feature is great for novice programmers as it allows them to experiment with new language conventions and concepts.  Many more advanced programmers use these languages to rapidly prototype applications and see the results, then rewriting that part of the larger program using a more efficient language such as C++.  However, with advancements in compiler technology and computer hardware, many programmers write fully functional programs directly in languages such as Python and Ruby.  Python, in particular overcomes much of its performance liability by creating “compiled” versions of saved source code by converting the source code file into C (by default) and then using that file when the python file is launched.  As such, most python code from files run with execution times comparable to C.      

A further advantage of a language like Python is that it handles memory management, gratis.  Therefore, programmers are freed from low level concerns and allowed to focus on more pressing issues.  These modern languages typically manage memory at least as well as a competent low level programmer.  Python and Ruby are also particularly suited for object oriented programming as all programming constructs are treated as objects.  Furthermore, these languages are typically very elegant and easy to read.  As such I would highly recommend them to anyone who wants to learn the programming trade.

Top Floor: Databases, Browsers, and Video Editors

            Beyond what I have laid out on the programming level continuum are even higher level languages.  These languages have very specific functionality such as operating within web browsers and by video editing software; examples of such languages are Javascript, Flex, SQL, and Ajax.  Languages such as javascript can be very helpful to learning, and are also very useful to know in their own right, but are also not broad enough to be as helpful as other languages.

The highest level languages are arguably not really programming languages, per se.  Languages such as SQL, QUEL, and D are better described as query or database languages as they are focused on retrieving and storing records in computer databases.  While languages such as HTML/CSS, XML, and TeX are markup languages, or as I like to call them, formatting languages.  These formatting languages eschew commands and control structures and simply focus as a template for the storage and arrangement of data.  The common thread running amongst these high level computing languages, and what essentially separates them from programming languages, is that such languages do not detail computation directly but rather direct the arrangement, storage, and loading of data.

In conclusion, I would recommend that the reader start with Python (or Ruby) as it has features which facilitate experimentation, has a well-designed syntax, sound programming philosophy, and other features which handle low level minutia and allow the novice to better focus on the more immediate concerns of modern programming.

Zen of Python

Mantra #12-13: "There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch."

Once you have a good enough grasp on programming in python it would make a great deal of sense to move on to a lower level language such as C/C++/C# or Java.  With a solid grasp of the underlying principles of programming a student can place more concentration on the lower level elements of modern programming, and allow for a richer understanding and skill set.  Most of all I heartily suggest that any potential new students of programming learn from my lesson and take every opportunity to experiment with new languages and technologies.  For such is the kingdom of code; disassemble, guess, hack, reassemble, repeat.  Do not fear the unknown!