Jump to:
Recs: Search:

Overview of the Documents and the Site

Where did these documents come from?

The vast majority of the documents came from the tobacco industry, and are the same documents available on the industry websites (see www.tobaccoarchives.com).

The Bliley collections came from the commerce committee.

What is indexing, and who does it?

Indexing is the process of associating additional information with a document, like title, author, subect, and abstract. Some indexing can be very complete, listing every person and organization named in a document. Other indexing is very fast, associating just one or two fields with a document.

Many of the documents have had basic indexing done by the tobacco industry, as part of the trial. However, this indexing lacked abstracts and subject categories. More complete indexing is being done now by research teams funded by the National Cancer Institute and other institutions.

Do you have a Daily Document Newsletter?

Searching the Collections

How do I search for more than one term at a time?
Like many other websites, TDO allows users to search by combining terms with and and or. For example, to search for information about airline smoking bans, you can search for

           airline and (restriction or ban)
(Note: turning stemming on will return results for airline and airlines, restriction and restrictions, ban and bans, etc.)

By default, and is inserted between consecutive words. That is, if you search for

           airline smoking ban
you will get the same results as if you searched for
           airline and smoking and ban
To search for a phrase, use quotation marks ("). That is, a search for
           "airline smoking ban"
will find only documents in which those words appear next to each other and in order.

To find words near each other, use the w/ command. For example

           airline w/10 ban
finds all documents where the word airline is within 10 words of ban.
What is stemming, fuzziness, and synonyms?

Each of these broadens your search by taking your search terms and changing them so that you get more results.

Stemming tries variations of your search -- adding "s", "es", "ing", etc. Although it's not perfect, it generally finds most variations of English word forms.

Fuzzy Search The OCR of the image is often slightly incorrect, but close. By turning fuzziness on, you can find words that are close but not exactly the same as your search term.

Synonyms Synonyms will automatically expand your search to include similar terms. For example, if you search for "youth" with synonym expansion on, it will find documents with "youth", "teen", "teenager", and "adolescent". Or if you search for "latino", it will also find "hispanic".

What does checking "All Details" in the search form do?

It turns much more detail about the documents you've searched for. If you don't check it, it just returns basic information -- the title, number of pages, author, recipient, type and characteristics, plus any field that matches your search term (e.g. the Named Organizations field, if any of your search terms show up there).

By default this is off, since these records can be quite long.

What does checking "Show First Page" in the search form do?

It turns shows the first page of the document integrated with your search results. This can save lots of time, because you don't need to click on the document to see if it's relevant (if you can find that out from the first page).

How can I restrict my search to words that are part of an author's name or a title? How can I search for a document from a particular date?

To search for words within a particular field, precede the word with the field name and a colon, like this:

    author:smith
    title:analysis
    date:19811001
Dates are represented as 8-digit numbers in the format YYYYMMDD. You can use wildcards if you don't have an exact date. For example, to search for documents from October 1981 or from all of 1981, you can use these criteria:
    date:198110*
    date:1981*
These field names are available, though some are rare or not very useful:
    abstract               fileset_code           privilege
    additive               file_code_begin        product_type
    affiliation            file_code_end          project
    alias                  file_number            prototype
    area                   full_text              publication
    attachment             grant_number           quotes
    attendee               hypothesis             rank
    author                 import_collection_code recipient
    bates_begin            import_document_code   referenced_document
    bates_end              indexed_date           region
    box_number             indexer_email          relevant_pages
    brand                  index_status           request
    case_code              intended               resource_code
    changes                issue                  restricted
    characteristic         job_title              results
    client                 keyword                role
    collection_code        language               side
    comment                lawyers_present        site
    company                litigation             smoke_constituent
    component              location               source
    components             location               strategy
    copied                 major_subject          subject
    court_reporter         marketing_type         synonyms
    date                   master_begin           target_market
    date_loaded            master_end             team
    date_produced          master_id              technology
    depository_date        message                testimony_date
    description            minor_subject          thesaurus_term
    document_code          named                  title
    document_file          notes                  tobacco_type
    ending_date            original_file          type
    exhibits               page_count             url
    expertise              page_range             witness
    fact_type              payment                witness_type

Abstracting and Indexing

how do I sign up an indexer?

Administration

How can I change the order of my fields for editing and display?

Go to the "Configure Fields" page under "Administration". You will see a list of the fields that are currently active. The number to the left is the order number (default: 50). Simply edit the fields, one at a time, and pick numbers for the "Edit Order" field that indicate the order.

Note: the fields shouldn't be sequential (1,2,3,4), because then it's too hard to insert one field between two others. Instead, pick numbers like 10,20,30, so that if you ever want to move a field between two others you have some room to do so.

If I have deposition transcripts that seem to be missing on your site, can I e-mail them to you and have you add them to the DATTA collection?

We'll try, send it to keith@tobacco.org, please include where you got the transcript from, case name, etc.

What is the Daily Document Newsletter?