Sensitive information exposure

Web page content, even when robot exclusion has been requested, may be indexed or stored by search engines (or users) that have access to the content. Simple removal of the pages does not eliminate the content that may be accessible to users of a search engine. Some search engines have “archival caches” of pages in case the page is no longer accessible. Most search engines include the first lines of pages in their results response (use of “description” meta attribute can provide some control over this). Inclusion of sensitive data should be considered in this context. Efforts to eliminate data errors or expired pages may require replacement with other content at that URI and re-indexing of that content to flush out archival caches. Digital signature or fingerprinting of pages to assure content integrity can reduce risks of user modification of sensitive data, however, it is not possible to take action to assure the elimination of all copies of specific content

Intellectual property rights (IPR)

Web pages may contain intellectual property that belongs to the owner of the Web page or to a third party. Usage of intellectual property should be reviewed by appropriate counsel.

Copyright information

Every Web page has an implicit copyright, subject to the legal jurisdiction in which the work was created or claimed and any contractual arrangements between the developer and other interested parties. Every wellengineered Web page should include a specific copyright statement eliminating any ambiguity about this (which might be kept in metadata if the visible presentation is deemed objectionable). Even if the intention is to make material available in the public domain, the wording to be used should be reviewed with experts familiar with the relevant jurisdiction(s). Well-engineered Web pages shall not knowingly include copyright-protected information without appropriate permission from the copyright holder. Well-engineered Web pages should include a entry (see Annex D)

Trademark information

Well-engineered Web pages and well-engineered Web sites may use trademarks that are the property of either the site owner or another party. These trademarks may be used within the scope of the site or used within the domain name, metadata, or a dynamic database that generates the well-engineered Web page. Because the international trademark system is both industry- and geographically-oriented, this inherently presents the potential for conflicts between Web site owners and trademark holders. Well-engineered Web pages should include information, including applicable Rfield designations, that helps resolve these conflicts. This could include metatags, explanations, and links to the appropriate information regarding the trademark owner.

Security designations

In an Intranet environment, pages should include an RMfield identified by the XML tag set … indicating the organizational security characteristic of the page content. For HTML, use: … The exact wording will vary in different organizations, and may have legal implications (which will vary by country). Typical security “banners” include: — XYZ Corp. Confidential — Internal Use Only — Public Information Be aware that pages without appropriate security designations may be implicitly public information (even though protected by copyright) or lacking in essential legal protections, depending on the legal jurisdictions from which they may be accessible. Be aware that the security designation will not assure automated enforcement of the security designation.

In an Extranet environment, pages should include similar banners in a way that is consistent with the associated Extranet community. Collaboration may permit sharing of confidential information, and such pages would carry corporate-specific banners; or collaboration may generate confidential information within the collaboration, and have designations specific to that arrangement.

Declaration of security designation should not be considered sufficient to provide security control. Site design should include evaluation of passwords, encryption, and other techniques to provide additional security controls. A person qualified to assess the adequacy of the security indicators and security protection for the page should subject each page with a security designation to a review. The person should conduct the review before the page is initially placed on the Web.

The review will consider both the code for the page and the displayed page. Consideration should be given to viewing the page with all possible browsers. Subsequent reviews will be required to ensure continued security policy is properly implemented. Reviews may be at regularly scheduled intervals, as a result of a review-triggering event (e.g., page change), or when major architecture changes are to be implemented (e.g., expanding to the Internet or adding Extranet components).

Dates and time

A well-engineered Web pages shall include a page date as an RMfield (, or <… class=”pagedate”>). This indicates the most recent date when a change considered being of value to the target-user communities has occurred. Each well-engineered Web page shall include an expiration date as an Mfield or RMfield (, or <…class=”expirationdate”>). This date indicates the earliest date that the page information may be deleted. The page information can be changed during this period, but the type of information presented on the page should remain constant or the user redirected to the new location of the information. The expiration date serves at least three functions: a) A basis for automated deletion or archiving of the page, b) An indication that can be used by pages linking to this page of it’s expected life span, and c) A basis for exclusion of the page from indexing or search query processes.

The value “archival” may be used to indicate that the page contents are not expected to change; some form of persistent URI should be considered for archival pages where ongoing reference is expected. Well-engineered Web pages should include applicable dates from this list: a) Date of last modification, represented as an Mfield (, <… class=”datemodified”>). Changes in this date may occur without substantive changes in the content of the page. (Mfield is suggested since this date is considered only to be of use in page management, but not for target-user communities.) b) Content date, represented as an Mfield or RMfield (, <… class=”contentdate”>), which is used to indicate that the content was current as of this date. This may not reflect changes in content from a previous content date. c) Date of next content review, represented as an Mfield or RMfield (, <… class=”nextupdate”>), is used to indicate when a review is scheduled. Substantive changes might occur prior to this date, and some form of user notification may be needed in certain business situations. (See 7.8 on active links also.) d) Date of retirement, represented as an Mfield or RMfield (, < … class=”date retired”> may be used to indicate when a page has been archived and is no longer considered active. Organizations with requirements for archiving some or all information may want to include use of this date in their well-engineered Web site project plan.

If the purpose of the above dates is for internal maintenance rather than use by the target-user community, it may be appropriate to maintain the information independently from the page content. All dates, including the above, shall be presented with four-digit years. Designers should use ISO 8601:2000 [B25] format: YYYY-MM-DD (all digits) for dates. Dates should include time, and time-zone, such as one based upon Coordinated Universal Time (UTC), if this is relevant to the usage (HH:MM:SS, should be 24- hour format if machine-readable). If time is included, the time zone shall be specified. Because local time in this context may be ambiguous, time-zone designators are recommended (UTC or UTC-offset) when indicating the time.

The recommended ISO 8601:2000 [B25] time designation format is: YYYY-MM-DDThh:mm:ssTZD

where: YYYY is year MM is month (01–12) DD is day (01–31) The letter “T” is required if time is present hh is hour (00–23) mm is minute (00–59) ss is second (00–59) (decimal fractional extensions may be incorporated) TZD is time-zone designator value should be “Z” for UTC or +hh:mm for positive (east) displacement from UTC or –hh:mm for negative (west) displacement from UTC

This format should be used in any machine-readable fields where date is included in the field. For date independent (time only) machine readable fields the time subset should be used. The ISO 8601:2000 [B25] date format is the preferred format by the HTML recommendations and by this recommended practice. IETF RFC 1123:1989 [B18] defines the format as exemplified by Sun, 06 Nov 1994 08:49:37 GMT, and this format is required by HTTP 1.1 in response fields.