Annotated Bibliography

An annotated bibliography is a list of publications together with a brief explanation of the value and purpose of each item. A good bibliography should serve as a guidepost for further research. It should identify what sort of research topics have been covered in the past, indicate the relationship between related publications, and suggest ideas for further research. If you have no idea what you want to study, then begin by reading up on a very broad topic such as:
  • Distributed Filesystems
  • Virtual Machines
  • Memory Allocation
  • Disk Performance
  • Input/Output (e.g., Audio/Video)
  • Kernel Programming

A bibliography on any of the topics given above would be far too large! As you proceed with your research, look for themes that run through what you are reading. Begin to narrow your topic down to something more specific, such as:
  • Consistency Management in Distributed Filesystems
  • Performance Overhead of Virtual Machines
  • Ensuring Fairness in Memory Allocation
  • Modification of a Wi-Fi Kernel Module to Support Ad-Hoc Networks
  • Design and Implementation of a Real-Time CPU Scheduler
  • Analysis of Impact of File Systems on Continuous Database Insertions

Ideally, your bibliography will encompass your course project and you are encouraged to try to align your bibliography as much as possible with your project topic, but you are not committed to a particular project at this point.

Requirements

Your annotated bibliography must be a collection of references all reasonably related to your chosen topic. Each entry must be given a complete citation and must be accompanied by one solid paragraph summarizing the paper and its relevance to your topic area. The exact form of the citation is not crucial so long as you are consistent and complete. A citation should give the authors' names, the title of the article, the title of the book/journal/conference, and enough information so that someone else could find it in another library. This means that you should include the publisher, volume and number, page numbers, web address, and other details as appropriate and whenever available. Each entry should explicitly indicate the type of citation: conference article, journal article, and so forth. Your submission must be a PDF file; while not required at this point, it is recommended to write your bibliography using LaTeX.

The descriptive paragraph should give enough information to help you or another reader recall its relevance to the scientific community. Describe what the paper is trying to communicate. Is it proposing a new algorithm or architecture, comparing several existing systems, relating experience with an existing system, or something else entirely? What is the main idea expressed in the abstract? How does it relate to previous work? Does it build upon or discredit previous ideas? Either way, chase down several references within the paper, and add them to your bibliography if warranted.

In general, your final annotated bibliography should have:
  • At least 20 items total.
  • At least 15 refereed journal, conference, or workshop articles not on the course reading list.
  • Up to 3 other unrefereed items such as magazine articles, technical reports, and web pages.
  • Up to 3 books, book chapters, monographs, or dissertations.
  • At least 50% of the articles written in 2010 or later.

Note that if you plan to work in a team, no more than 3 papers can overlap between your bibliography and those of your team members!

Publication Types

The following list describes the most common types of publications and those are the ones that you should concentrate on:
  • Technical Reports: A technical report is a very preliminary research report that is written internally and then archived at an institution. For example, Notre Dame has its own technical report series. Technical reports are generally written and deposited without being refereed, or sometimes even proofread. However, they are an important vehicle that allow researchers to publically establish their activities or data without the delay of submitting to a conference or journal. Good technical reports are often revised and submitted to a conference or journal.
  • Conference Articles: A conference is usually a yearly gathering of researchers in the same area of specialization. A conference committee solicits papers for the conference perhaps six months in advance. Papers are refereed by the committee, and those with the best reviews are accepted to the conference. The authors attend the conference and give a short lecture on the paper. After the conference, a book is published, usually called "Proceedings of the Conference on XYZ," containing the papers submitted. The best papers in a conference are often invited to be published in a journal.
  • Journal Articles. Journals are typically published several times a year. Much like a conference, a journal has a primary editor and a committee of reviewers. Papers may be submitted to a journal at any time, but are generally longer and more polished than those submitted to a conference. If the paper is accepted, the referees may require the paper to be revised before publication. This whole process from submission to publication may take several years. In computer science, a journal paper is considered to be slightly more valuable than a conference paper. (In other fields, a journal paper is much more valuable than a conference paper.)
  • Books and Book Chapters. Researchers often write books once they have gained a large amount of experience in a given field. Sometimes, an academic book will be have each chapter written by a different author. Books are solely the work of the author(s) and are generally not peer reviewed. Thus, books can serve as an introduction to or overview of a given field, but are not likely to contain any hard research results.
  • Dissertations. A dissertation is the final result of a master's or doctoral degree. In some sense, it is peer-reviewed because it must pass the muster of the student's reviewing committee. Dissertations are usually a deposit of everything a student has learned in the last 2-7 years, and thus are long and quite detailed. A good dissertation should point you to other papers written of more digestible length by the same student.

Hints on Research

Tread lightly! You do not need to read each paper thoroughly. In fact, you do not have time to read all of the papers in your bibliography! Begin by reading the abstract. If it is not relevant to your topic, toss it out right away. If it is relevant, then read the introduction and conclusions and skim over the middle parts. Summarize the main points, save a copy or a printout, and move on. If there is a detailed algorithm or idea, jot it down and return to it later if you deem it to be important. Of course, you will have to return and read some of these papers carefully at a later time.
Start in a Known Place. Begin by skimming the papers on the class reading list related to your topic, and then follow the references that seem important. Another approach is to look for tutorials and reviews on a certain topic and look at the references in those documents. Similarly, you can also skim appropriate sections in a relevant textbook to find a good starting point.
Be wary. Journal and conference articles vary widely. The vast majority are mediocre, and only a small number are of great value. Distinguishing between the two may be difficult at first -- that's ok! -- but you will gain confidence with this in time. If you are unsure about the value of a paper, there is no harm in mentioning this in the bibliography.
Search Effectively. Google Scholar is an easy place to start searching for papers, if you already know the right kinds of keywords to search for. For example, searching for "distributed file systems" in Google Scholar turns up a good selection of highly cited papers. However, you should not rely entirely upon Google, but you should also search in the archives of the professional organizations related to computer science and engineering: ACM, IEEE, and USENIX. Here are their library pages:
Going a little deeper, try looking through the tables of contents of well-known conferences and journals. The following are well-known publications specifically about operating systems:
The following are more specialized and may be appropriate, depending on your choice of bibliography topic:
Now, suppose that you come across a reference to an article that is either quite old or otherwise not online. For example, the following paper appeared in the conference HPDC, but is not available online at the HPDC website:

J. B. Weissman, A. S. Grimshaw, "Network Partitioning of Data Parallel Programs", Proceedings of the Third IEEE Symposium on High Performance Distributed Computing.

Here is where regular Google comes in. Do a search for the entire title with quotes around it: "Network Partitioning of Data Parallel Programs" and you may find a copy placed online by the authors or other readers. Or, you may find nothing. Some articles may require you to pay a fee to be able to download the full paper. Luckily, Notre Dame's library (like most college and university libraries) has agreements with the primary publishers, allowing us to download papers without paying a fee (ND Library Website). If you come across a paper that you require for either your bibliography or project, but are not able to find or obtain a full copy, inform the TA or instructor and we will do our best to help you get access to the paper.

Example Bibliography


Transaction Support in File Systems
Iam A. Student

(Technical Report) Butler Lampson, Howard Sturgis, "Crash Recovery in a Distributed Data Storage System", Tech Report, Xerox Palo Alto Research Center 1979.
A complete transaction system is build from the ground up in four layers of abstraction, emphasizing the compositional nature of software. Everything is proven using simply exhaustive case analysis. I didn't totally understand the difference between errors and disasters, so I'll have to go back and read that again. Although it's just a technical report, there are many references to it, so it must be a classic.

(Conference Article) Michael A. Olson, "The Design and Implementation of the Inversion File System", Proceedings of the USENIX Winter 1993 Technical Conference.
A simple idea is proposed: Build a filesystem on top of a database by using tables for metadata and directory structure. Not surprisingly, there is a significant performance hit: only 30-80 percent of NFS throughput. On the other hand, you get vastly increased flexibility, including the possibility of using the server itself for computing. It seems like there should be a more efficient way of getting transactions into files. This paper relies heavily on Margo Seltzer's work below.

(Journal Article) M. Stonebraker, et al. "Mariposa: A Wide-Area Distributed Database System", VLDB Journal 5:1 January 1996, pages 48-63.
This paper proposes that databases distributed over the WAN are fundamentally different from databases distributed over the LAN because of the independence of individual nodes and the expense of moving data over the wide area. Although this is nominally about databases, I think it will apply to filesystems as well, because the same distinction between LAN and WAN is necessary. There is a long section on bidding that will require some careful reading. Stonebraker appears in many database papers.

(Book) Jim Gray and Andreas Reuter, "Transaction Processing: Concepts and Techniques", Morgan Kaufmann, San Francisco, 1993.
This book is an algorithmic bible for building transaction based systems. Starting with the basics of storage devices, it builds up algorithms for logging, transactions, recovery and more. Two surprising elements: One, there is a big section on fault tolerance and the underlying sources of failures; Two, although the focus is on databases, there is an entire section on filesystems. Note that Jim Gray was a major figure in the System-R database and the Tandem NonStop system.

(Dissertation) Margo Seltzer, "File System Performance and Transaction Support", Ph.D. Dissertation, University of California at Berkeley, 1992.
This dissertation explores adding transactions to file systems in excruciating detail. The first few chapters focus on simulation of varying system structures and workloads. Once a structure is chosen, a transaction-based filesystem is built and evaluated. I'll have to return to this to see exactly what designs were considered or discarded.