|
|
|
By Martin Heller
SOMETIMES, REALITY GRABS you by the shirt collar and shakes you until you pay attention. That's what happened recently when I got involved with some performance problems at WINDOWS Magazine's Web site ( http://www.winmag.com).
Bandwidth was our biggest problem-or so we thought. So we replaced a 56Kb line with a T1, only to find that pages were still taking 20 seconds to load. We determined the delays were in the client/server software we use to insert ads on our Web pages. Taking out the ads brought our page load times down by a factor of 10, to two seconds.
But we had to put the ads back in, and we had to show specific groups of ads in rotation on certain pages. For instance, if you bring up the magazine's home page, you'll find ads at the top and bottom of the page. If we're running ad rotation software, the ads will change every time you refresh the page, while the editorial remains the same.
Our ad rotation software had a client that used sockets to query a server, which kept track of the ads in a database. The server kept tabs on how many times ads were displayed, and how many times users clicked on the ads to link to the sponsors. My first task was to try to optimize the client software.
I received what turned out to be a consultant's UNIX version of the client, which wouldn't even compile on Windows NT. Because I didn't know another version was already available for NT, I started porting the UNIX version to a Win32 version (see sidebar). I wanted to try out the optimization of removing some extra socket calls. On UNIX, it's considered good practice to make programs as general and robust as possible, and the extra calls dynamically determined the IP address and port number of the server.
I thought this porting would be a trivial effort because Microsoft's propaganda implies UNIX sockets programs should run with minimal changes when compiled as Winsock programs. Of course, nothing in software is a trivial effort, but we always forget that when we're starting a project.
I started by removing the UNIX #includes and inserting Winsock #includes, and changing the type of the variable s from int to SOCKET. That got the program to compile but not link, so I replaced the bcopy call with a memcpy call to enable the program to link. When I tried to run the program, it ended with an error message after the first gethostbyname call.
I thought about this for a while, slapped my forehead and tested on a machine that was actually configured for an Internet provider and a DMS server that could resolve the gethostbyname. I decided this would be a better test than configuring a hosts file on my development machine.
The same thing happened. I re-read the sockets section of my own Win32 programming book, and right there in black and white, it said to initialize Winsock with WSAStartup and close it down with WSA- Cleanup. They say the mind is the second thing to go!
Once I added those socket calls and set the wVersionRequested variable for Winsock 1.1, I got the program to the point where it would dial up the Internet, but it wouldn't connect the socket to the server. I went about my business of optimizing the program-removing the unneeded gethostbyname and getservbyname lookups and wiring in the correct IP address and port for the server-but the connection still failed. I had just discovered I hadn't converted from host byte order to network byte order when I received the Winsock version of the client via e-mail (see our Web site, www.winmag.com).
With LOOKUPBYNAME defined as 0, the Winsock client had already implemented the optimizations on which I was working. It had also converted the port and IP address from host byte order to network byte order, using addr.sin_port = htons ((u_short)Port) and addr.- sin_addr.s_addr = htonl- (ServerIP). That's important because the Net is big-endian, while Intel machines and even RISC machines running NT are little-endian.
Knowing the client software was already completely optimized, I tested it on my own machine. With nothing else running, it still took 10 to 20 seconds to get a response from the server, and sometimes it couldn't even get a socket connection to the server. I would see a packet go out by watching the blinking of the transmit light, then seconds would go by before a packet would finally come in from the server. I witnessed the same network behavior when I used telnet to connect to the correct TCP port on the server and send the inquiry string.
Even I get the message when I'm shaken by the collar or hit over the head with a two-by-four: The client was not our problem. It was the server. We fixed the site temporarily by inserting static instead of rotating ads, and I started looking into server-side execution technology.
WinMag's Web server uses O'Reilly & Associates' WebSite software, which supports server-side #include and #exec directives as HTML comments. We were using the #exec directive to run our ad rotation client as a CGI program.
Of course, I could have written a new ad rotation system as another CGI program. But I didn't want to, for a variety of reasons. The main reason is that CGI programs use a program instance for each client, and the magazine's Web site is starting to have a lot of simultaneous clients.
I decided to use ISAPI, and the powers that be agreed to try out Microsoft's Internet Information Server (IIS), one of two Web servers that currently support ISAPI (the other is Process Software's Purveyor). As I discussed in my June column, ISAPI programs are DLLs rather than .EXE files-DLLs that IIS loads as needed and then keeps in memory as long as possible. In the ISAPI model, each client gets a separate thread of execution in a shared DLL.
Actually, there are two kinds of ISAPI DLLs: extensions and filters. Extensions have two required entry points-GetExtensionVersion and HttpExtensionProc-and are called explicitly as URLs. Filters have two required entry points-GetFilterVersion and HttpFilterProc-and are called automatically for any of seven events: when the server is reading raw data, authenticating the client, mapping the URL to a physical location, sending raw data back to the client, writing to the log file, has preprocessed the headers of the request, or is ending the session with the client. Because the entry points are different for ISAPI extensions and ISAPI filters, the two may be combined into a single DLL.
I thought it would be simplest to replace our existing #exec directives, which call our CGI ad client software, with #exec directives that call our new ISAPI extension. Unfortunately, IIS doesn't currently support the #exec directive, even though it supports the #include directive. It doesn't yet support server-side scripting either, which took care of my second idea.
In fact, supporting and enabling #exec and #include isn't always the best way to run a Web server. Both directives can introduce security holes if not properly managed, and both introduce some inefficiencies.
My third idea was to write a URL mapping filter that would redirect requests for HTM files to an ISAPI-extension DLL that would preprocess the file and return the expanded text. That may work, but it's more complicated than necessary. You can do what you want with an ISAPI filter that responds to the SF_NOTIFY_SEND_RAW_ DATA event. When this particular notification occurs, the entire raw data buffer of the requested URL is present and available for modification. The existing data buffer may not be big enough to hold the expanded content, but the filter is able to point the server at its own buffer.
I'm still working on this filter. I'm finding ISAPI filters rather difficult to debug, but I'm getting the hang of it. There's a technical note, TN063, in the Visual C++ documentation about debugging Internet extension DLLs, and some of the techniques are applicable to filter DLLs.
Despite the pain, I'm glad to be hacking code. Writing about this stuff has its moments, but making one dumb mistake after another is just the sort of reality check I need to keep my writing honest.
Senior Contributing Editor Martin Heller writes about and does Windows programming from Andover, Mass. Martin Heller's e-mail ID is: mheller@cmp.com
|
|
|