|
|
|
By Martin Heller
Your home page is a one-way document. People browsing it see your information, but you don't know much about them. Although there are technologies that track hits made on your images and links, these activities are tied only to an electronic log-on ID.
With a little design work, however, you can get the user to tell you about him or herself. The easiest, most common way to do this is to add e-mail links to your Web pages. For instance, I have a chunk of HTML code at the bottom of my main page. As source code, it looks like this:
<b>The doctor is IN.
Click here to send him e-mail: </b>
<a href="mailto: mheller@cmp.com">
mheller@cmp.com</a>
Here's how it looks on the browser:
The doctor is IN.
Click here to send him e-mail:
mheller@cmp.com
The text entries in angle brackets are HTML tags. The <b> tag starts bold text, and the </b> tag ends it. The tag <a href="mailto: mheller@cmp.com"> starts an anchor, which specifies a hypertext link-in this case, to mail.
More commonly, hypertext links start with http://, prompting jumps to other Web pages. The </a> tag ends the anchor. Anchor text-the text between the <a href="..."> anchor tag and the </a> end tag-is usually displayed underlined and in blue, although you can alter anchor text style in some browsers. You can also use images as anchors; these are usually displayed in a blue box.
When someone clicks on the underlined blue area in this example, their mail application will start (if it hasn't already) and become the current application, ready for a new message with my e-mail address already inserted in the "to: " field. But it doesn't force anyone to tell me who they are before they download my files and read my articles, so I can't automatically capture names and addresses for mailings.
By using HTML forms, however, I could require visitors to register before I allow access to certain pages, and I could also automatically store that registration information in a database. This process has two parts: the forms themselves and the programs that accept the information sent from the forms.
Although designing an HTML form is not an intuitive process, you can learn to do it in a few hours with a word processing program. As a shortcut, you can use an HTML add-on for your favorite word processor, but even that is unnecessary.
We saw before that some HTML tags bracket their subject. Forms are no different: They start with a <form> tag and end with a </form> tag. The <form> tag takes three parameters: an action, a method and an encryption type.
Let's analyze the following sample of HTML forms code.
<html> <head> <TITLE> Form Test</TITLE> </head>
<body> <h1>Form Test </h1> <hr>
<form method=post action="cgitest.exe">
My name is: <input type="text" name="name" value="" size=50,1 maxlength=50><br>
My e-mail address is: <input type="text" name="email" value=""size=80,1 maxlength=80><br>
Please add me to your mailing list: <input type="checkbox" name="checkbox"><br>
My preferred color is: <select size=5 name="select">
<option selected>red <option>blue <option>green
</select><br>
<input type="submit" value="Submit">
<input type="reset" value="Reset">
</form> </body> <hr>
<address>
This Form Last Updated: Monday, February 19, 1996, 3: 34: 45 PM </address> </html>
Here's how it looks on the browser:
Click Here to see a
10.9KB bitmap image of artwork
which goes with this article, entitled:
Form Test
After coding our header and title, we specified an action in the third line. The action is a program name, CGITEST.EXE. We also could have specified a URL as an action, or we could have omitted the action, in which case the output would be sent to the current document's URL.
Next, we specify a method. Here, we're using the post method, which means that the form's output will be sent to the action URL as the standard input stream, with its length held in the environment variable CONTENT_LENGTH. The alternative is the get method, which sends the form's output to the action URL as a long command string with blanks changed to plus signs and ampersands between fields.
Using the get method, the source code for our sample form would look like this: cgitest.exe?name =Martin+Heller&email=mheller@cmp.com&checkbox=on&select=red.
The get method is usually used for searching, and the post method is usually used for processing the contents of HTML forms.
The form's actual content goes in fields (sometimes called widgets). In the sample form we've used four fields and two buttons: text fields for the name and e-mail address, a check box field for follow-up mailings, a selection field for a color, a submit button and a reset button. Check boxes restrict answers to yes or no. A selection field narrows the output to specific options. The reset button returns all fields to their default values, while the submit button sends the field's current contents to the form's action destination.
The sample has two <input> tags with type=text, an <input> tag with type=check box and a <select> tag. Between the <select> and </select> tags are three <option> tags, with the first marked as selected to establish a default value of red. Two <input> tags correspond with the submit and reset buttons. Several <br> tags break lines.
Making sense of a form's output is only a little more difficult, but you do need to program. The standard for the interface between an HTML form and a program is called the Common Gateway Interface (CGI). CGI is not a programming language-in fact, almost any programming or scripting language can be used to write a CGI program. The only requirement is that the CGI program be able to read environment strings, process the strings into something it can understand, read and write standard output (usually formatted as HTML so it can be displayed on a browser).
To process my name and e-mail address form, a program would get the QUERY_STRING environment variable and break the retrieved string up into fields by looking for ampersands.
When it had single fields it would convert plus signs to blanks and process them by name, taking name=Martin+ Heller and setting the name string variable to "Martin Heller." When it processed all the fields passed, it might enter the completed record into a database and then write a confirmation to standard output: "Thank you, Martin Heller. Your information has been entered. We'll be sending your notices to your e-mail address, mheller@cmp.com."
Capturing information from the user is one of many things you can do with forms. You can add search capabilities to your pages. It's easy to do, if you've already installed a search engine at your site. The best way to find that out is to ask your system administrator.
Ordinary full-text search of a site's files is often done with utilities like grep, fgrep and egrep, all of which originated on UNIX systems. These programs don't accept CGI or produce HTML output, but a number of wrapper programs allow them to work with Web servers. In addition, searching text directly is slow, and a busy Web site can't afford to allow constant searching of its full text. Several indexing programs and search engines for index files have been written to solve that problem.
For instance, CMP's TechWeb uses a commercial index and search engine, WAIS-which can handle queries in natural language with Boolean operators-and a version of WAISGATE, a Web server to WAIS gateway program. Here's a sample search option, along with the HTML source for the search. The action for the form is given as a full URL to a program called CMP_WAISGATE; the default query for this particular form uses the Boolean AND operator.
<FORM METHOD=POST ACTION=http://techweb.cmp.com/techweb/programs/ cmp_waisgate>
<P><STRONG>More articles on: </STRONG>
<INPUT NAME= "search_term" VALUE= "Microsoft AND Internet" SIZE=25>
<INPUT TYPE="submit" VALUE="Submit Query">
Publication: <SELECT NAME= "field.pub">
<OPTION>Windows Magazine
<OPTION>All CMP Publications </SELECT>
</FORM>
Here's how it looks on the browser:
Click Here to see a
1.34KB bitmap image of artwork
which goes with this article, entitled:
More Articles On:
WAIS ranks the documents it finds in terms of relevance to your query, and WAISGATE generates a formatted HTML document that gives you titles for and links to the documents found, and allows you to further narrow the search. Other versions of WAIS, such as FREEWAIS, are available for download. You can find other free search engines and gateways-for instance, SWISH, HARVEST and WWWWAIS-on the Net. The best way to find them is to use them. Go to http://www.yahoo.com/ or http://www.webcrawler.com/ to search.
Because searching an index file is really just a specialized database search, some sites use general-purpose database programs to maintain and search their indexes. It's not uncommon to find SQL Server or Access in use at a Web site, holding lists of one sort or another.
CGI is a platform-independent standard, and so is HTML. But CGI has its limits. One problem is that both the post and get interfaces are awkward. Additionally, CGI programs take significant resources on the server. They also present a security threat to many servers.
Many free code libraries can help you deal with messy strings and awkward interfaces. Find them on the Web by searching for CGI. The resource problem occurs because CGI programs are executable files: You necessarily spawn one copy of the program for each client. That can chew up a lot of memory.
Microsoft's remedy for this ailment is ISAPI, a dynamic link library interface to its IIS Web server. ISAPI's principal advantage is that each client using the DLL shares a single copy of it, which reduces both the memory consumed and the time required to load the program. Another interface, IDC, ties SQL Server databases to the IIS Web server in a way that's easy to set up and maintain.
The security problem is sticky. Webmasters need to control what CGI programs are allowed to run on the server by restricting access to the directories used for programs. They must also thoroughly examine and test all CGI programs before making them publicly available.
Web forms add too much to a site to pass up. Yes, it takes some programming to handle the material you collect in forms; yes, you have to keep security in mind when you test those programs; yes, it's work. But it's work you'll have to do to get the most out of your site.
Martin Heller surfs the Net and hacks code from Andover, Mass. Contact Martin at his Web page at http://www.winmag.com/people/mheller, via e-mail at mheller@cmp.com.
|
|
|