Subject: Re: Publishing Scholarly Work on the Web -- opinion anyone?
From: dking@amphissa.com (David N. King)
Date: Sat, 21 Sep 96 07:16:01 GMT
In article <51kk84$a0c@news.esrin.esa.it>, Nick Kew wrote:
>My original suggestion is to hold *abstracts* online, with the provision
>to hold full papers where appropriate. Keeping abstracts in an easily-
>searchable website would surely be a valuable service to researchers,
>while referring them to the traditional publishing media for full papers.
This is a terrific idea. It was first implemented three decades ago with the
MEDLINE system. There are currently several thousand bibliographic databases
that provide citations and abstracts "pointing" to the printed publications.
A few hundred are widely available through "vendors" like Dialog. Some of
them are already migrating to the web. MEDLINE is available on the web thru
several sites, the best public access being via the National Library of
Medicine's GratefulMed web-based system. Others are getting there.
>
>My software will index and cross-reference the abstracts,
There are already many systems that do this, but the fact is, 20+ years of
R&D; has not yet resulted in a machine indexing system that is satisfying.
Mechanically, you can do it pretty easily; in practical terms, it produces
marginal intellectual access to conceptual content. But maybe your
parsing, weighting, and automated Boolean algorithms are better than anyone
else has conceived yet, and I'd really like to see it, if it is. Have you
published it? If you have, I'm sure you are aware of the large research
literature on the problem of machine indexing of scholarly/technical
literature. If all you are planning to do is parse words from abstracts into
a database searchable with a typical web search engine query mechanism,
thanks but I'll pass.
Of course, that all assumes you have legal right to use the abstracts to
create a publicly accessible, searchable database and serve up the
abstracts. Have you discussed this idea with publishers and agreed upon an
acceptable framework for putting their copyrighted material up on your web
site? Or were you planning to simply download the abstracts from existing
databases, capitalizing on the work of those who create and maintain those
databases? Have you negotiated the legal aspects of that? Or were you
planning to write and keyboard your own abstracts? That's an option with
fewer legal hurdles, but it sounds like a lot of work.
>and has the option
>to hold any or all of the full papers online according to publisher choice.
Ah, now we are getting to the present. You are interested in creating a
digital library! Comparable to a traditional library, only in electronic
form. Tools for bibliographic control and access (electronic indexes with
abstracts) to a collection of literature in electronic form, all accessible
from one electronic "location." Great idea! There is a substantial
literature on this which I'm sure you are familiar with. ACM devoted a
special issue to it last year. There is an electronic journal on the subject
and of course there is a wealth of literature in traditional paper format.
You can find a bit on the web too. Digital libraries. Great idea!
There are some notable R&D; projects under way. National Science Foundation
has funded, I think, 9 major R&D; projects to the tune of $25 million at
major institutions: U of Michigan, Berkeley, Illinois, Stanford, etc. Those
projects are getting under way. But a couple of projects got an
earlier start. Perhaps the most impressive to date is the Red Sage project
at UCSF which is now in its 3rd year. A collaboration between the UCSF
Library & Center for Knowledge Management, AT&T; Bell Labs, and 20 publishers
of the biomedical literature. It is pretty small-scale: 70 medical and
biomedical research journals, including the major titles in clinical
medicine -- bitmapped images of every printed content page including
graphics, tables, photos, etc. The electronic journal collection is linked
to the MEDLINE database with a top-notch forms-based web search interface
called Medsage. Every UCSF doctor, nurse, researcher, student, etc, with a
network link or web access has access to the electronic library from their
office desktop. Pretty slick! Yes, it is fully operational. (Access is
restricted to UCSF of course. If you are interested, you can find out more
at http://www.library.ucsf.edu)
Make a wild guesstimate of the size of the database. 70 journals, maybe 1000
pages per year in each, abstracts and citations, one per article. 3 years in
the collection. That's, let's see, only 210,000 pages of articles. Not all
that small when you think about it, but manageable. But of course, there are
3500 journals in medicine alone. There are around 6 million records in the
MEDLINE database, most with abstracts. Consider the kind of system required
to manage and serve that up. How about if we just limit the system to the
top 500 journals? Maybe 50,000 articles per year. That's only 50,000
abstracts. Then throw in all the journal pages for those articles. Better
limit the collection to just the last couple of years, I guess. That's,
let's see, maybe around 1,000,000 pages of content, plus 100,000 abstracts
plus a database for searching. But to be a major digital library (a Harvard
or Illinois or Berkeley), expand that to include all of the quality journals
in all areas published; a minimum collection would be 50,000 titles out of
the 200,000+ published worldwide. And they can't limit it to the last year
or two; the have to meet the research and academic needs of their
university. I can't add that high.
Consider the mess of irrelevant junk you get trying to search using current
web-based search engines, and that the web at present has relatively little
meaningful content. Multiply that by millions of content-rich pages
annually. This is not something one just does overnight and serves up on
a little Indy. One needs equipment and technical staff to deal with the
technology (easy to come by if you can afford it) and needs people
knowledgeable about conceptual design and construction of complex
knowledge-based systems (harder to come by) and needs economic models and
evolutionary development strategies (virtually non-existant).
But the current, more serious obstacles are economic and legal. You might
want to consider those aspects in developing your system. Do you have any
publishers signed up yet to participate in your project? Have you figured
out how you will pay them for the right to provide access to their
copyrighted publications? And how to cover the costs you incur from them?
There are very thorny problems involved in this, and the publishers don't
really know what economic models to work with, what the "marketplace" of
electronic publishing looks like, or how to price their electronic product
yet. But you can bet for sure that they are not going to give away their
product or sit by and watch others distribute it without reimbursing them.
The long tradition of libraries providing free access to the literature
disguises the truth: information is not free, it is very expensive.
>
>As others have pointed out, the peer-review process is an important element
>of academic publishing. I believe web-based collaboration software can
>be used to facilitate this process, providing a forum ("workgroup") whose
>members are a paper's authors together with recognised referees in a
>subject area. Such papers may have readonly access to the general public
>(or subscribers-only if a publisher prefers) while in the review process,
>thus accelerating the publication cycle.
This idea has been floated by a few people. To date, there has not been a
mad rush by authors to abandon the established schorlarly publishing
channels. The realm of print publishing is too closely intertwined with
academic and professional recognition, grants and funding, careers and
livelihood. If you give a researcher the choice of publishing in a major
print journal like "Science" or an IEEE journal, or just tossing their paper
(their ideas and work -- their intellectual property) out there on the web
for others to "contribute to" using collaboration software, I don't think
you'd have a hard time guessing which he would choose. This is a nifty idea
conceptually and an attractive one technologically. It will be interesting
to see if it ever catches on. I'd say that chances are very slim in the
short run, but may be marginally better down the road in a very few
specialized areas like law and engineering.
>
>The technology is ready: we need only apply it!
I'd say current technology is not yet ready on the scale that is needed,
although it is getting there. I'd say the current crop of typical web search
engines and indexing systems are inadequate for current web content and
completely worthless for anything more substantive. But the web is a very
solid foundation for growth and improvement, and there will be real progress
made over the next 5 years.
I think it likely, in the short term, that we'll see print publications
migrating to the web via digital libraries -- first, university libraries
subscribing to electronic versions of print journals with access limited to
their campus (this is already happening per the Red Sage example), then,
professional societies providing access to the journals they publish to
their members free and to non-members for a fee (this is beginning now too;
IEEE journals are going up now for example), and a few publishers testing
marketing models for publishing on the web (Journal of Biological Chemistry
and a few others are doing that now). Then we'll see commercial sites run by
"vendors" of the literature with professional indexing/abstracting linked to
electronic collections (still a year or two away).
Of course, all of the above is just my personal opinion, and I'd be just as
glad to be wrong about any of my predictions. :-)
David N. King
Subject: STOP REPOSTING SPAM was Re: MAKE MONEY NOW
From: szdefons@boris.ucdavis.edu (Eric DeFonso)
Date: 23 Sep 1996 16:41:26 GMT
In article <3496@dstrip.demon.co.uk>,
Steve Rencontre wrote:
>STOP POSTING THIS CRAP.
[entire repost deleted for brevity]
I have an idea - why don't YOU stop RE-POSTING this crap? Sheesh, you even
reposted the full header - what an incredible waste of bandwidth and disk
space.
People, when this stuff gets posted to your favorite newsgroup, please,
just read the header (the last entry in the "Path" is often a good
indication) and determine exactly where the post originated from. Then
send a polite message to the postmaster at that site, telling him or her
that someone is abusing their account privileges, or at least that the
offending spam appears to have originated from their site, and that they
should be aware of it. You'd be amazed how well that works.
Simply posting angry responses, which themselves are often
crossposted to several groups, does absolutely nothing. And more often
than not, threatening to mail-bomb the *apparent* sender of the message
simply wastes even more bandwidth, especially since spammers often forge
their addresses in the From: line of their messages.
Sorry for the crosspost, but it seems to me that the spam anomaly for the
past few weeks has been strongly positive for many newsgroups, and it is
very important to know how to respond productively to these violations.
Now, back to your regularly scheduled discussions....
Eric D
UC Davis