Under the Hood — The Basics of HTML: Part One

By: John Gallant , Holly Bergevin ,

Page 1 of 2

Set for printing

Next

This article series is not intended as a hand-coding manual. Rather, it is aimed toward people who use web page editors such as Dreamweaver, and would like to have a good general idea about what is going on "under the hood" when they create web pages. We will begin right at zero, and describe the source document point by point, in easy to understand, mostly non-technical language.

Before we're through, you will at least be able to follow the average html discussion without suffering from MEGO (My Eyes Glaze Over).

The Rendering Engine

That sounds like something you might find in a slaughterhouse, doesn't it? But no, the term rendering engine refers to the "core" programming of all web browsers. That core is designed to read a web document's source, and hopefully translate the "content" and its surrounding code, "rendering" a result on the user's screen that resembles what the author intended to be seen.

A browser reads the source in a totally linear fashion, meaning it skips nothing, so each character in the source can have big effects on the completed screen display (but not always). Thus, an extra character in a text block won't break the page, but a wayward space or character inside a tag can radically alter the display. As far as the rendering engine is concerned, the entire document is one long string of characters and spaces. Carriage returns and line feeds count as spaces. So, if you hit the enter key to begin a new line of text, the new line of text won't necessarily appear on a different line on the screen, but the last word of the first line and the first word of the second line will have a space between them and not run together.

If there are two spaces in a row, they are automatically collapsed into just one space when read by the browser. If you need more than one space between words, you must add special code spaces, an example of which is written    .

A Word About Wrapping

When a browser is rendering a string of words on the screen, and runs out of room in a line, the browser must do something called "wrapping". This means the browser picks a spot in the string, "breaks" it off, and resumes rendering the string one line lower. This line rendering and wrapping continues until the text string is finished.

Usually, the spot chosen to wrap text is a keyboard space, as found between most words. In fact, the W3C specs say that in Western languages, text should only be wrapped at a white space. There are several white space characters available to authors. In practice, different browsers may disagree on where wrapping is allowed, for example after a hyphen or other character. If the entire text string is all characters without spaces, there will be no place for it to wrap, and trouble will result.

Providing Structure

The page text and any images are called the "content". Images are normally treated just like text, unless the browser is instructed to handle them differently. "Instructed, you say?" Yes.

Let's think about this "content". If the source document for a page contained only text and images, that content would simply fill the screen from the top down as it was rendered by the browser. Sure, it would be in the same order and nothing would be missing, but it would be less than beautiful. Not only that, but everything would be in one big heap, and users might grow very old trying to skim such a document for just the part they are interested in. Clearly, some page structure is in order (no pun intended).

The way structure is provided is by "enclosing" parts of the content in special html "tags", which tell the browser what each content section is, and, in some cases, what to do with it. For example, you have a title you want to appear at the top of the page. You would place a pair of tags (a start tag and an end tag) on either end of the text string that is your title. This tells the browser, "Here is a page title. Separate it from the other content, and (perhaps) make it bigger." Another tag pair will tell the browser that a text string is a paragraph. Typically, paragraphs get separated from other content sections but not enlarged. Each type of tag tells the browser what special things (if any) to do with the enclosed content, or "element".

HTML stands for HyperText Markup Language. "Hypertext" refers to links, but generally means all web code. "Markup language" is just that, a predefined language understood by browsers, which "marks up" content, providing some structure to that content. Without it, we're back to the "big heap".

Element Types

In HTML, element types represent structure or desired behavior. Each type of element has a different name, but all follow the same basic construction, or "syntax". The defining part is the element name, which in the case of that page title above is a "header" and is shortened to "h". There are actually six different sizes of header, and that top title, being the most important, is number one, or "h1". Subtitles might be "h2", sub-subtitles might be "h3", and so forth. That's how it is supposed to be done, anyway. Generally, as the numbers for the headers get bigger, the size of the header text gets smaller. A paragraph element is a simple "p", and thankfully there are no other kinds of paragraphs (whew!).

But we're not done with the syntax just yet. Element type declarations generally have three parts: a start tag, content, and an end tag. All start tags MUST have angle brackets on both ends, pointing outward. So the "h1" start tag will appear like this:  <h1> . The paragraph start tag will look like  <p> . ALL TAGS must have these angle brackets, or the browser won't treat it as a real tag.

Nearly all start tags must have paired "closing" or end tags, so the browser knows what part of the content is within the element, and what part is not. A closing tag is exactly like a starting tag, except that the closing tag has one more character in it. This is so the browsers can tell starting tags from closing tags. That character is a forward slash,  / , and it is always placed directly after the first angle bracket in the closing tag. So, in the code, a complete "header" element might look like  <h1>My Wonderful Web Page</h1> .

In the early days of the web, HTML was very loose, and it was not required to have closing tags for many elements. The browser was expected to figure out that when another start tag came along, the earlier tag was to be closed at the same time. For geeky reasons, this is considered a "Bad Thing," so now authors are strongly encouraged to close all tags (and closing all tags makes the browser work faster).

A few element types do not need closing tags, because they don't enclose any text content. These are called "empty" elements. Two of these are the image element, <img>, and the "horizontal rule," <hr>. The image element pulls in an image file from the server, and the <hr> tag tells the browser to paint a horizontal line across the screen, to separate blocks of text.

With certain modern types of page code (for example, pages written in xHTML, or eXtensible HyperText Markup Language), it is required to close even these empty elements, by placing a forward slash at the END, rather than the beginning of the tag, <img href="mypic.gif" alt="My Picture" />. But for most pages this will not be necessary.

Sometimes authors will speak of "p-tags", or "h1-tags", when they are actually referencing a paragraph or h1-header element. It may seem picky, but there is a distinct difference between a "tag" (the angle brackets and enclosed element name) and the element type itself, which, represents a structure or desired behavior.

Page 1 of 2 1 2 Next


download
Download Support Files


Keywords
html, head, body, source code, element types, nesting, wrapping, page structure, doctype, frameset