These days almost all web development is done with frameworks. Whether you use rails, Sinatra, or Lotus, you don't really have to think about how cookies and other headers pass from nginx or apache, to the application server and into your app. They just do.
We're going to examine this journey in a little more depth. Because it turns out that the story of headers contains a lot of interesting information about the history of the web.
What are HTTP headers anyway?
Whenever a web browser makes a requesnginxt, it sends along these things called HTTP headers. They contain cookies, information about the user agent, caching info — a whole lot of really useful stuff.
You can see what headers are being sent by looking at a request in your browser's development tools. Here's an example. As you can see, the headers aren't anything magical. They're just text formatted in a certain way.
How headers aren't passed to your app
If you've ever written a rack app, you've probably seen the env
hash, which contains the app's environment variables. If you take a look inside of it, you'll see that in addition to normal system environment variables, it also contains all the request headers.
# config.ru
run lambda { |env| [200, {"Content-Type" => "text/plain"}, [env.inspect]] }
# Outputs:
# { "HTTP_HOST"=>"localhost:9000", "HTTP_CONNECTION"=>"keep-alive", "HTTP_PRAGMA"=>"no-cache", "HTTP_CACHE_CONTROL"=>"no-cache", "HTTP_ACCEPT"=>"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "HTTP_UPGRADE_INSECURE_REQUESTS"=>"1", "HTTP_USER_AGENT"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36", ... }
This is not how nginx passes headers to your app. :)
Application servers
Nowadays, most Ruby web apps run in application servers like Unicorn. Since the app servers aren't spawned by nginx, nginx can't set their environment variables.
How do the headers travel from nginx to unicorn then? Simple. When nginx forwards the request to the app server, it sends the entire requests — headers and all.
To demonstrate this, I built a simple application server that dumps everything nginx sends it to STDOUT.
require "socket"
# Create the socket and "save it" to the file system
server = UNIXServer.new('/tmp/socktest.sock')
# Wait until for a connection (by nginx)
socket = server.accept
# Read everything from the socket
while line = socket.readline
puts line.inspect
end
socket.close
If you configure nginx to connect to this server instead of Unicorn, you'll see exactly what information is being sent to the app server: just a normal HTTP request. Headers and all.
For more information on how to write a simple upstream app server, check out my post on unix sockets.
Why bother with environment variables then?
In 1993, the NSCA published a spec for something called the "Common Gateway Interface" or CGI for short.
It was a way for servers like Apache to run arbitrary programs on disk to generate dynamic webpages. A user would request a page, and Apache would literally shell out and run a program to generate the results. Since Apache spawned the apps directly, it could set their environment variables.
The CGI standard specifies that HTTP headers be passed in as environment variables. And to avoid any naming collision with the existing environment variables, it specifies that "HTTP_" should be prepended to the name.
So you wind up with a bunch of environment variables that look like this:
HTTP_ACCEPT="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
HTTP_ACCEPT_CHARSET="ISO-8859-1,utf-8;q=0.7,*;q=0.7"
HTTP_ACCEPT_ENCODING="gzip, deflate"
HTTP_ACCEPT_LANGUAGE="en-us,en;q=0.5"
HTTP_CONNECTION="keep-alive"
HTTP_HOST="example.com"
HTTP_USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"
Nowadays hardly anyone uses CGI for new development, but it's still very common to see HTTP headers stored as environment variables — even though sometimes they're fake environment variables.
How app servers fake it
Application servers parse headers out of the raw HTTP request. So how do they get in the environment? Well, the app server puts them there.
I dug around a little in webrick and was able to find the smoking gun:
self.each{|key, val|
next if /^content-type$/i =~ key
next if /^content-length$/i =~ key
name = "HTTP_" + key
name.gsub!(/-/o, "_")
name.upcase!
meta[name] = val
}
Eventually, these "fake" environment variables are merged in with other more real environment variables and passed into your rack app and on to Rails, which takes them back out of the environment hash. :)