Quotations in Network Messages

J. K. Buda

Introduction

One of the characteristic features of public messages posted on electronic networks is the extensive use of quotations.

The first study in the present series identified this use of quotations as a measure to counteract the effects of topic fragmentation and drift. [1] The second study in the series analysed a corpus of 500 network messages, and found, among other things, that 5% of messages contained quotations.[2]

The present study seeks to define the format of network quotations, and to address some of the doubts raised by the second study.

Object

The object of the present study was a corpus of one thousand consecutive Internet messages posted between June 1 and July 6, 1994. The bulk of the messages represent an ongoing discussion about religion and science cross-posted to the following newsgroups:

alt.atheism
alt.transcendental.meditation
sci.logic
sci.philosophy.meta
sci.skeptic

A number of messages were also cross-posted to alt.drugs, alt.magick, and alt.rave.

The size of the corpus was twice that of the one used in the previous study. An analysis of the first corpus (Corpus A) produced some unexpected results, and raised doubts about the representative nature of the sample, particularly the bias generated by frequent posters, and the influence of topic matter upon form and content.

It was hoped that the size of Corpus B would address some of these doubts. In contrast to the methodology of the second study, in which a statistical analysis was attempted only after the data had been collected and processed, the present study made use of real-time analysis (described later), allowing statistical fluctuations to be noted as they occurred.

Method

The 1,000 messages of Corpus B were downloaded from the Internet using the standard Unix rn newsreader program, and saved in text format. Each message was analysed according to the following criteria, and the results entered into a customized FileMaker Pro database.

Message size
Attribution
Quotation size
Quotation style
Nesting
Editing

1. Message size

Word processing software was used to count the number of words in each message. Message headers were not included in the count, neither were message signatures. Most signatures were set off from the body of the message by some kind of graphic separator such as a string of hyphens or asterisks, but occasionally there was no clear dividing line between message and signature. In such instances, complimentary closes and sign-offs were included in the count, and network addresses and other contact information were not.

2. Attribution

The categories for attribution were created to cover all five forms encountered in the corpus: Message; Author; Message & Author; Multiple; None.

The 'Multiple' category covered messages containing quotations from several sources, each identified separately.

The 'None' category covered messages containing quotations without any attribution. It should be noted that some unattributed messages were, in fact, sections of longer postings that had been divided into a number of shorter messages to avoid length constraints imposed by some network systems. A number of messages contained indirect attributions (e.g. author or topic thread) within the body of the text, but these did not apply to specific quotations, and were thus not counted.

3. Quotation Size

Each quotation block within a message was counted separately. Ellipses and deletions (see Editing below) were ignored.

4. Quotation Style

Thirteen different styles were encountered in the corpus, most of them using symbols at the beginning of each line of quotation. Nesting (see below) produced quotations containing a mixture of styles, but the only one counted was that used for the primary quotation. In a small number of messages, more than one style was used. In these cases, the style adopted for the first quotation was the one entered in the database.

5. Nesting

Quotations can often contain quotations, and such nesting was recorded in the database according to the maximum depth of the inclusion. A quotation within a quotation was recorded as 'x2', a quotation within a quotation within a quotation was recorded as 'x3', and so on.

6. Editing

Although most quotations consisted of unedited text, some exhibited one or other of two types of editing: deletion and emphasis. Presence or absence of both were noted, as was the style used in each case. As with quotation style, editing included in quotations (as opposed to editing of quotations) was ignored.

Initial data input was performed using Claris FileMaker Pro software running on a Macintosh Centris 660AV computer. Each message in the corpus was assigned its own record in the database, each record containing fifty-six data fields. Of these fifty-six data fields, twenty were used for actual input, and the remainder were used to perform a number of calculations upon the data contained in them. For example, some fields were set up to keep count of incidences of nesting, editing, etc., whilst others displayed cumulative totals of word counts for messages and quotations. Most entry fields required input of numbers, and this had to be done manually from the keypad. Entry of repetitive data, however, was facilitated by the use of custom-made pop-up menus containing a selection of options. Consecutive numbering of records was done automatically, and a specific tab order was assigned to the most frequently used fields, enabling automatic movement from field to field. In other words, hitting the return key after each entry automatically moved the cursor to the next appropriate field.

Many of the calculation fields were hidden so as not to clutter up the computer screen, but running totals, averages, and counts of the main criteria were displayed to provide a rough guide to the influence, if any, of possible bias factors. Significant fluctuations in the main parameters were noted only during input of the first six hundred records, with little or no variation during the input of the subsequent four hundred.

When data input had been completed, the FileMaker Pro file was saved in SYLK format, and transferred to a Microsoft Excel database. Unnecessary fields were excluded during the transfer process, and the order of fields optimized for the Excel software. Whereas the order and positioning of fields in the FileMaker Pro database had been optimized for rapid data input and onscreen readability, that of the Excel database was optimized for data processing and manipulation.

Results

The analysis of the Excel database was carried out in two parts: a brief preliminary analysis of all messages in the corpus, and a detailed analysis of those containing quotations.

Results of the preliminary analysis were as follows:

A1. Message Size (in words)

Maximum	6217
Minimum	0
Average	364

• Some of the longest postings in the corpus were not messages as such, but Frequently Asked Question (FAQ) sheets. FAQs are a unique feature of network newsgroups, and consist of prepared text files reposted at regular intervals. The size of the Internet system, the rapidity of its growth, and the high turnover rate of users, mean that there is a constant stream of new users visiting newsgroups for the first time. Such users tend to ask the same basic questions, and answering them can become a nuisance for regular contributors. FAQs are designed to answer such questions in the most efficient way possible. The longest message in the corpus was, in fact, a FAQ, and it was reposted twice during the five week period covered by the sample.

The shortest message in the corpus was an aborted posting containing no words at all. The poster immediately followed it up with an apology and a successful reposting. The second shortest message consisted of a single word: 'test'. The author was presumably unaware of specific newsgroups (misc.test and alt.test, for example) set up specifically to allow new users to practice posting messages .

A2. Incidence of Quotations

Messages Containing Quotations	877	(88%)
Messages Without Quotations	123	(12%)

• This result was in complete contrast to that obtained in the second study, in which only 5% of messages contained quotations.

The following are the results of an analysis of only those messages containing quotations:

B1. Message Size (in words)

Maximum	3289
Minimum	29
Average	373

B2. Attributions

Message and Author	596
Author	223
No quotation	123
No attribution	53
Multiple attribution	4
Message	1

• Attribution of quotations is usually automatic, and the high incidence of Message and Author attributions is probably a reflection of the default setting for many of the most popular newsreaders.

B3. Quotation Size (in words)

Maximum	1897
Minimum	0
Average	52

• The shortest quotation (if it can be called that) in the corpus consisted of a standard message and author attribution followed by the deletion marker < ALL DELETED >. There were several instances of single-word orphan quotes (see below), but the shortest quotations proper were three words long.

B3. Number of Quotations per Message

No. of Quotes	Messages	No. of Quotes	Messages
1	523	6	17
2	118	7	14
3	83	8	12
4	55	9	5
5	35	10	5

A further seven messages contained more than ten quotations, the largest number in any single message being thirty-five.[3]

B4. Quotation Style

>	520
Initials	95
:	94
]	83
=	35
\|	17
Separator	9
None	7
Indent	7
#	4
"	4
}	1
$	1

• The thirteen kinds of quotation marker encountered in the corpus can be divided into two categories: markers inserted at the beginning of each line of the quotation, and markers inserted before and after the quotation as a whole. 'Separator' and '"' were the the only markers belonging to the latter category.

As with attributions, the high incidence of the '>' style is probably a reflection of the default setting of the newsreaders used.

'Initials' refers to a style of quotation format in which the initials of the original poster are inserted at the beginning of each line of the quotation. A quotation from author Jane Doe would, for example, appear thus:

JD>  The pair hacked off thousands of computer nerds worldwide when they
JD>dumped one of the first junk mailings in cyberspace last April, posting
JD>a bulletin advertising their law firm on more than 5,000 newsgroups on
JD>the so-called infohighway (See "Spam Jam," _Tucson Weekly_, May 11).
JD>Netheads complained that such indiscriminate postings would eventually
JD>lead to so much clutter on the net that it would no longer be a useful
JD>communication tool.

'Separator' refers to the device used to differentiate otherwise unformatted quotations from the body of the message. One message used verbal introductions to both quotation and response; the remaining eight messages used either a row of plus marks, or a row of hyphens framing a short phrase such as 'begin/end quote' or 'begin/end former article'

B5. Nesting

x2	246
x3	137
x4	34
x5	12
x6	2

• Nesting proliferates when posters chose to respond to a reply to a message, rather than to the original message itself.

B6. Editing

Deletion	22
Emphasis	15

• Deletion of unnecessary parts of quotations has to be done manually, and the format chosen is therefore at the whim of the poster. It came as something of a surprise, therefore, to find that all instances of deletion in the corpus were indicated by comments within parentheses. In two messages the comment consisted of ellipsis marks; in the remaining twenty messages a verbal comment was enclosed—either a remark such a 'deletia' or 'text deleted', or a brief summary of the text deleted.

A similar uniformity was seen in the format chosen to emphasize text. In all but one of the relevant messages, words were emphasized by the insertion of repeated circumflex marks in the line below, as in:

: What is phased? Do you mean fazed? Is enlightenment knowing how to spell?
                              ^^^^^^

The only exception used a series of 'equals' marks in place of the circumflex marks, as in:

: Should what I underlined be in quotes? I sure hope I never make a mistatke
                                                                    ========

Discussion

Before examining the results of the analysis, it might be useful to look at the mechanics of posting Internet messages, as these have a direct bearing on the way messages and quotations are formatted.

With several thousand newsgroups generating a torrent of new messages each day, some kind of interface is needed to filter the enormous amount of information available. For most users, this interface takes the form of a newsreader program—an item of software that facilitates the reading, writing, and processing of network messages. The two most common newsreaders are rn and nn, available on most Unix systems. Other comparable newsreaders are available for other hardware and software platforms. Although individual newsreaders may appear different when viewed onscreen, and may require different commands to run them, they all perform the same basic functions; namely, enabling the user to chose which newsgroups and which messages he wishes to read, and then giving him the option of responding to each message as it is presented .

For example, if, after reading a particular message, a user wishes to compose a reply, the newsreader will automatically summon up a text editor,[4] and insert the relevant information in the header. Included in the header will be such items as the name and Net ID of the sender, the name of the subject thread to which it belongs, and the IDs of other messages in the thread. The newsreader will then allow the user to enter the text of the reply. At this stage, most newsreaders will present the option of inserting the text of the original message. If the user selects this option, the newsreader will insert the entire message, formatted as a quotation, at the beginning of the text window.

It is then up to the user to edit the quotation accordingly, deleting any unnecessary parts or juxtaposing his own text with sections of the quotation.

This subtractive method of formatting quotations is in contrast to the additive method offered by some newsreaders. By allowing the user to work with two or more text windows, such newsreaders make it possible to 'cut' or 'kill' parts of the original message and 'paste' or 'yank' them into the text being composed.

It is also possible to duplicate these functions with a simple stand-alone text editor or word processor, and then transfer the completed text to a newsreader. Many text editors are designed specifically for use in telecommunications environments, and contain quotation formatting functions equivalent to those of newsreaders. A number of quotation-formatting utilities are also available to supplement text editors and word processors without such functions.

Exactly how the quotation will be formatted will depend on the program used. Some programs only offer one option; others allow the user a choice of several, or permit customizing of the choices available.

Once the message has been completed, the newsreader will automatically post it to the relevant newsgroup, often adding a pre-registered 'signature' containing contact information (ID, phone number, address, etc.) about the author.

As the above description may show, there are basically three stages to the process of posting a network message. The first stage begins with the reading of previous messages in the newsgroup, and ends with the user deciding to post his own message. The second stage consists of the composition of the message itself, and this includes the formatting of any quotations inserted. The final stage consists of posting the completed message to the newsgroup.

Newsreaders help to automate the first and third stages of this process, but the second remains fundamentally manual. If we compare network messages to traditional letters, computer software can do the work of both postman and stationer—can even address an envelope or append a signature—but it cannot write the text of the letter itself.

For all the convenience of newsreader software, posting a network message remains a daunting task, both technically and psychologically, and this is one of the main reasons for the gross imbalance between the number of users posting to a newsgroup, and the number restricting themselves to reading only. It is not unusual for a newsgroup with an active membership of several dozen to be read by several tens of thousands of users.

Turning now to a consideration of the results of the analysis, the initial data for message size show an apparent uniformity between the average size of all messages in the corpus and those containing quotations; likewise between those with quotations and those without. The average size of all messages was 364 words, that of messages with quotations was 373, and that of messages without quotations was 298. The extraordinarily large size of some of the largest messages in the corpus was, however, sufficient to influence these averages, and a further analysis of size distribution was attempted. The results of this analysis are shown in the following chart, the upper graph representing messages with quotations, and the lower those without.

Chart 1 – Message Size Distribution

From this chart it can be seen that the majority of messages containing quotations (530 messages, 60% of the total) lay in the 50–350 word range, with the 100–150 word range accounting for almost 120. Of the 123 messages without quotations, on the other hand, 70 (57%) were less than 150 words long, 31 (25%) being less than 50 words in length.

The same phenomenon was not evident in the analysis of quotation length. The initial results indicated an average size of 52 words for the 1982 quotations found in the corpus.[5] The following chart shows that the overwhelming majority of quotations (1452, 77%) were, indeed, less than a 100 words in length.

Chart 2 – Quotation Size Distribution

It is not clear why so many messages should be in the 50–350 word range, nor why the most common size for quotations should be just under 60 words. Although some commercial Internet providers may limit the size of incoming messages, as far as most users are concerned, message length is not a factor they have to worry about, and the corpus does indeed contain several messages over 1,000 lines and 5,000 words in length.

Leaving aside the possibility that ±200 words represents an optimum length for written communication, a slight correlation does exist between this figure and computer screen size. The most common size of terminal is 24 rows of 80 characters, though most text editing programs will automatically wrap each line at closer to 70 or 72 characters. Assuming use of block style formatting,[6] with blank lines in lieu of indentation, and assuming a line length of 70 characters, we can derive a rough figure of 20 lines of approximately 14 words per screen. Allowing further for shortened lines (ends of paragraphs, indented quotations, attributions, etc.), a typical one-screen display would contain about 170–240 words of text, which is close to the average size of most messages in the corpus.

It is difficult to assess the significance of this apparent correlation. Perhaps there exists a measure of psychological resistance to going beyond a single screenful of text (in the same way that many letter writers prefer cramming in extra lines at the bottom of a sheet of letter paper to crossing over to a new page). It could also be that the technical skill needed to scroll within a multi-screen message is a constraining factor. Although many graphic user interfaces make such scrolling, and the cutting and pasting of text between windows, relatively easy, it can be a daunting procedure for those using standard Unix text editors such as vi or emacs.

An indication that screen display size and the difficulty of manoeuvering through longer messages are significant factors in message formatting is provided by the existence of orphan quotations. These are irrelevant quotations included at the end of messages, presumably by mistake. Fifty orphan quotations were found in the corpus, ranging in size from 1 to 221 words, the average being 41 words.

An orphan quotation is generated when a user forgets to delete part of a quotation inserted automatically into a message. As described earlier, when a user decides to compose a reply to a newsgroup message, he is given the option of including that message in his own as a formatted quotation. The entire message is inserted, including any signature that might have been appended. It is considered good manners, or good 'netiquette', to then delete any parts of the quotation that do not bear directly upon the response. Leaving quotations untouched wastes valuable data transmission time (or 'bandwidth'), not to mention data storage space, and makes it difficult for readers to follow discussion threads. For these reasons, a user responding to a long message will customarily first scroll back to the beginning of the inserted quotation, and then systematically delete irrelevant sections, inserting text of his own where appropriate, resulting in a multi-layer sandwich of quotation and response. The following example from the corpus may serve to illustrate this procedure:

[John A. Stanley wrote (quoting ??):]
JS>>Why not keep this on sci.skeptic as well? This was originally from there,
JS>>as I recall, and I cross-posted it over to here...

JS>The problem here is that Judy is accessing Internet via a BBS using
JS>an offline mail reader (BTW, Judy, dump the test drive version and
JS>get the registered version; it's lightyears better.)

I have the registered version, but it has more bells and whistles
than I need or want.

JS>She has no way to crosspost unless her BBS picks up
JS>sci.skeptic and she manually posts there as well.

It's not at all difficult for me to crosspost "manually" (it's an
automatic feature of OLX), and my BBS just started carrying
sci.skeptic.

JS>My first exposure to Internet was through a BBS, but I was harshly
JS>criticised for the non-conforming format. That's one reason I
JS>switched to win.net.

Unfortunately, this is the only access available to me at the
moment.  If it's problematic for other users, I'll have to drop
out.  (I'm not sure why it should merit "harsh criticism,"
though.  Complaints, maybe.  But flames??)

As the user moves through the included message, deleting parts of the quotation and inserting his own text, more and more of the original message will scroll into view. When the user comes to the end of the inclusion, and has made the final deletion or insertion, he will then invoke the relevant newsreader command to send the message out to the Internet. If , by chance, any included text remains off-screen when the send command is given, this text will also be sent as part of the message. An orphan quotation is, therefore, the result of the user forgetting that a part of the original message remains off-screen, or, more often, assuming that no more of the original quotation remains to be dealt with.

All of the orphan quotations found in the corpus consisted of the end segment of the original message, usually the signature or part of the signature, though occasionally with part of the preceding text.

An example of a typical orphan quotation follows:

Are you intentionally trying to be an idiot? If you think that "true
scientists" must re-verify every single thing which had been previously
discovered, then there can be no such thing as scientific advancement.
We would all be repeating Faraday's experiments instead of enjoying
the benefits of electronic communication.

>--
>                                       David Gudeman
>gudeman@cs.arizona.ed

--
_____________________________________________________________________________
Mark Rupright           | "Contrariwise, if it was so, it might be; and if it
UNC Physics             | were so, it would be; but as it isn't, it ain't.
rupright@physics.unc.edu| That's logic."                        Lewis Carroll

At first glance, it would appear that 'David Gudeman' is the author of the message, but the 'greater-than' quotation symbols at the left indicate that this signature is, in fact, an inclusion, the actual author being the 'Mark Rupright' listed in the signature immediately below.

In the absence of a concluding signature, as in the example below, an orphan quotation can easily lead to a misattribution of author.

>There are MANY things still to be revealed by science. Know this, they
>existed long beforehand and will exist long after science has come and
>gone on this Rock (tm).

What existed before what?  What the hell(tm) are you talking about?

Nobody said science has all the answers. Try addressing some real
issues.

>john markey
>jmarkey@freenet.columbus.oh.us

The author of this message was not 'John Markey', and any reader noticing the quotation symbols and wishing to identify the real author would have to scroll back to the header of the message.

The database figures for quotation editing also seem to confirm that the technical difficulties of using on- or off-line text editors influence the formatting of network quotations. Two examples of editing were found: deletion and emphasis. Deletion here refers to deletions within quotation blocks, as opposed to deletion of the blocks themselves. Such deletion usually calls for some indication of the extent of the deletion, and the most common method found in the corpus was the insertion of a short comment or summary offset by brackets and set on a separate line, as in:

MI>: The spiritualistic, psychic, or occult side of TM was
MI>: covered up in the great scientification of TM in the 70s.
(deletia)
MI>: this was inside knowledge for a new initiate of the 70's.
MI>: My attention definitely began to focus on the spiritualistic or
MI>: occult, in addition to relaxation or enlightenment.

In two corpus messages the short comment or summary was replaced by traditional ellipsis marks, but no example was found of ellipsis marks used within the body of text. In other words, all deletion was performed on the level of whole lines of text, often resulting in truncated sentences, as in the example above.

Whereas the deletion and insertion of text in units of lines is a relatively simple matter (some text editors are optimized for handling text in this manner), deleting parts of lines and inserting ellipses calls for more familiarity with text editor functions. The main problem is that deletion of line sections necessitates a re-formatting of the text, and if the text has already been automatically formatted as a quotation, this formatting will be destroyed.

For example, if the relevant part of a message has been included as a quotation formatted thus:

=>Both doctrines contain inherent internal logical contradictions of the
=>immovable object vs. the irresistable force type. Both continue to be
=>widely believed be certain religions. The concepts themselves are
=>invalid, and it is almost trivially easy to derive logical
=>impossibilities from them.

and if a user attempts to replace the second sentence with ellipses, the quotation may end up looking like this:

=>Both doctrines contain inherent internal logical contradictions of the
=>immovable object vs. the irresistable force type.... The concepts themselves are
=>invalid, and it is almost trivially easy to derive logical
=>impossibilities from them.

The solution to this is to first strip the quotation indicators ('=>' in this case), perform the deletion of text and insertion of ellipses, and then reformat the resulting text as a quotation. Whilst not technically impossible, this procedure is clearly daunting enough to put off most users, as illustrated by the absence from the corpus of any quotations thus formatted.

The insertion of emphasis marks does not present the same kind of technical challenge, but the extremely low incidence of such formatting indicates that it, too, is not favored by most users.

The traditional method of adding emphasis is the use of italics, underlining, or bold characters. None of these options are available to most network users, limited as they are to the standard set of ASCII characters.[7] These traditional options have been replaced by the use of uppercase letters or the addition of bracketing symbols.[8] The use of the latter will, of course, result in the same destruction of quotation formatting described above. With monospaced display fonts standard on most networks, the replacement of lowercase characters with uppercase equivalents presents no such problem, but does involve the difficulty of manoeuvring around text in single-character steps, and then adds the obligation of acknowledging the addition of emphasis.

This disinclination to tamper with previously formatted quotations, together with a preference for handling text in units of lines, results in the method of emphasis found universally in the corpus: the insertion of an extra line containing circumflex marks (in one instance 'equals' marks) positioned under the relevant text. It would appear, however, that even the relatively simple procedure of positioning the string of circumflex marks by the insertion of repeated spaces was sufficient to put off many users, and hence result in the extremely low incidence of such emphasis.

Two other aspects of quotation formatting remain to be addressed: attribution and quotation size.

Most quotation attributions are generated automatically when the user selects the option of including the original message. The newsreader will refer to a template stored in the user's personal directory,[9] and replace the variables with the relevant data from the header of the original message.

This template is created along with the user's personal directory when the user first receives a network ID. The form it takes is largely at the whim of the network manager. The default attribution template on the Waseda Unix system is, for example:

'In article %i,\n   %f wrote:\n'

Consequently an inclusion of a message with the header

Path: wsdnws!buda
From: buda@cfi.waseda.ac.jp (Jud Buda)
Newsgroups: alt.meditation.transcendental,sci.skeptic,sci.med
Subject: Re: Neurotic Fundamentalists
Date: 25 Aug 1994 10:44:25 GMT
Organization: Centre for Informatics, Waseda University
Lines: 38
Message-ID: <33hsm9$ohr@news.cfi.waseda.ac.jp>
References: <lisaz.3.00094A6F@spacecom.com> <33c4kk$5rr@news.cfi.waseda.ac.jp>
>
NNTP-Posting-Host: buda@fuji.cfi.waseda.ac.jp
Xref: wsdnws alt.meditation.transcendental:5602 sci.skeptic:71525 sci.med:68564

will be prefaced with the following attribution:

In article <33hsm9$ohr@news.cfi.waseda.ac.jp>, buda@cfi.waseda.ac.jp (Jud Buda) wrote:

The corpus reveals that this form of attribution (by Message ID and Author) is by far the most common, presumably because of its default nature, and users are either reluctant to edit the template, or unaware of its existence.[10] Two hundred and twenty-three messages in the corpus did, however, contain attributions by Author alone.

There is no way to ascertain the reason for this form of attribution—whether it happens to be the default on the user's home system, or whether the user has seen fit to edit the relevant template. In either case, the absence of a Message ID attribution is not significant. As reference to the sample header above will show, message headers list previous messages in the thread under 'References:', and a user wishing to read the original message would be able to locate it in this way (theoretically—most systems store messages in the order in which they reach the system, and assign consecutive numbers to each; locating a previous message by its ID would entail a search of a text archive).

Of the 53 corpus messages containing unattributed quotations, 12 made use of quotations with internal attributions. That is to say, the author chose to respond to a message as quoted in another message. As to why they should do this, two possibilities come to mind: they saw no point in going back to the original message simply for the sake of a primary quotation, or the primary quotation was not available on their system. The enormous volume of messages circulating on the Internet forces most hosts to limit the number of old message they keep stored. If the traffic on a particular newsgroup is extremely large, or if a user has not accessed the newsgroup for some time, it could be that he will find many unread messages already deleted the next time he accesses. There is another possible reason for inability to locate the original message in a thread. Because of the intricacies of Internet message distribution, it is not uncommon for messages to arrive out of step,[11] or not arrive at all. It could be that at the time a user reads a quotation in a message, the message containing the source of that quotation has still not arrived on his server. One unattributed message in the corpus does, in fact, excuse itself by noting that the author was unable to find the original.

Six unattributed messages contain indications that they were parts of longer messages. The author clearly felt obliged to keep the length of each message to less than 100 lines. This was probably a constraint imposed by her newsreader software or her Internet provider, and in other messages (not included in the corpus) she did complain about erratic addressing of messages. Assuming that the lack of attribution within this particular string of messages was unavoidable, and removing them from the original list of 53, we are left with only 14 messages containing no form of attribution whatever. Of these, one was a repost of an announcement. We can conclude, therefore, that only 13 out of 877 messages (1.5%) contained unattributed quotations.

Various analyses were performed in an attempt to find possible correlations between message size, quotation size, and number of quotations per message. No significant correlations were found, although a slight anomaly in quotation size was noted. The average size of first quotations was 102 words, and that of all subsequent quotations was 56 words. The average size of first quotations in messages containing only one quotation was 105 words. It is not clear why the first quotation should be, on average, twice as large as any of the others. If the figure for quotations in messages containing only one quotation were significantly larger than that for first quotations in messages containing several, it could be posited that single-quotation messages were more likely to contain entire, undeleted inclusions. This, however, was not the case, and one is left with the assumption that the greater size of first quotations reflects the perceived importance of these quotations in laying the groundwork (if one may call it that) for the response.

The results of the present study do not, in themselves, provide any clue as to why the incidence of quotations should be so much higher in Internet messages than in messages posted on commercial information services. A comparison of the two kinds of networks does, however, yield some indications. On commercial information services such as CompuServe, America OnLine and GEnie, incoming messages are processed in real time, and available to other users almost instantaneously. This is in sharp contrast to the pattern of message trafffic on the Internet, where messages can take anything from several seconds to several days to reach a specific site. As was noted earlier, it is not uncommon for messages to arrive out of step, or for replies to arrive before the original postings. The centralized message handling of information services makes this impossible. It could be argued, therefore, that the higher incidence of quotations in Internet messages reflects a greater necessity for them.

Conclusions

The results of the present study provide a basis for a tentative definition of a typical Internet message. We can, for example, say that a representative Internet message would be approximately 175 words in length, and would contain a single quotation of approximately 70 words.[12] The quotation would be attributed according to Message ID and Author, and would be formatted by use of 'greater-than' symbols along the left margin. A search of the corpus for a message fitting these tentative parameters[13] produced only one hit, and the relevant message is given below in its entirety.

From: lesikar@TIGGER.STCLOUD.MSUS.EDU (arnold v. lesikar)
Subject: Re: Religion and Science
Message-ID: <1994Jun6.002053.696@lamont.ldgo.columbia.edu>
Sender: news@lamont.ldgo.columbia.edu
Organization: ST. CLOUD STATE UNIVERSITY, ST. CLOUD, MN
Date: Mon, 6 Jun 1994 00:20:53 GMT

In article <2stk18$opv@gap.cco.caltech.edu>, carl@SOL1.GPS.CALTECH.EDU (Carl J :

>=>OMNISCIENCE
>=>OMNIPOTENCE
>=>
>=Except that if you have ever argued these points with sophisticated
>=religious believers, you will find that most, at least most Christian
>=believers, do not hold with _absolute_ omnipotence.
>
>The demand was for an example of a religious belief that people continued to
>believe after it had been logically proven false. The fact that some
>religious people don't believe it doesn't address the fact that some do.
Since we are into splitting hairs, the burden is on you to demonstrate
the existence of religious people who subscribe to the doctrine of
absolute omnipotence in the face of a proof of its incoherence. The
existence of uninformed believers who do not fully comprehend the
concept will not do. What you must demonstrate to meet Gudeman's
challenge is the existence of religious believers who hold to the
absolute omnipotence of God fully realizing the contradictory nature
of the concept.

I am eagerly awaiting your citations and evidence.

most sincerely,
arn
lesikar@tigger.stcloud.msus.edu

The results of the study also clearly indicate the effects of constraints imposed upon message composition and quotation formatting by the system itself. A number of formatting characteristics would seem to be less the result of choice or preference, and more the consequence of difficulties experienced by users of the network interface.

Notes

1 Buda, J. K. "Electronic Network Communication". Otsuma Women's University Annual Report: Humanities and Social Sciences XXIII (1991).

2 Buda, J. K. "The Formatting of Network Messages". Waseda University School of Commerce Cultural Review 4 (1993).

3 This gives a total of 867+7 = 874. The three remaining messages contained only an orphan quote.

4 In the case of rn, writing and posting of messages is handled by a separate program called Pnews, but the interface is, to all intents and purposes, seamless.

5 This figure includes 50 orphan quotations.

6 For a definition and example of block formatting, see "The Formatting of Network Messages", p. 76.

7 It should be noted, however, that an increasing number of bulletin boards and information services are now providing sophisticated Graphic User Interfaces (GUI) which support use of multiple font sizes, colors, italics, and boldface characters in user messages.

8 For examples, see "The Formatting of Network Messages", p. 87-9.

9 The user directory is a segment of host storage space set aside for the sole use of the use to which it is assigned.

10 On most systems the files containing these templates and other variables are invisible to the casual user. Many message did, however, show signs of editing, prosaic expressions such as 'wrote' and 'said' being replaced with more fanciful expressions such as 'spake', 'saith', 'sez' or 'quoth'. Two examples follow:

Quoth gudeman@cs.arizona.edu (David Gudeman) (in <GUDEMAN.94Jun5195813@baskervi:

x740902@tiuk.ti.com,Internet, Spake Thus into the Net

11 For a detailed description of this phenomenon, see "Electronic Network Communication", p. 78–80.

12 Both of these values are derived from the respective medians of the relevant categories.

13 The search criteria for size were Messages: 170–180 words, Quotations: 65-75 words.

Source: Waseda Review No. 26, 1994