Managing Disk Space in Linux

Over the past two years the SSCC allocated some 400GB of project disk space, compared to 75GB for the prior two-year period. We currently have enough space to meet the research needs of our members, but this space cost about $100,000. Reducing the growth rate of disk usage in order to make that space last as long as possible is a high priority for the SSCC.

Individual members can do their part by managing the files in their home and project directories. Most members have files that can simply be deleted, and others can be archived or compressed. This article will discuss Linux tools for managing your disk space. For managing Windows files, see Managing Disk Space in Windows.

Viewing Your Files

The first task is identifying what files you have and which ones are are likely candidates for removal. You can use the good old ls -l command (aliased as ll for most people) but du will give you more options, especially when combined with other tools.

du directory -ha

where directory should be replaced by the name of the directory you want to examine, will give you a list of all files and subdirectories in that directory and their sizes. Sizes will be listed in bytes, kilobytes or megabytes as appropriate for easy reading by humans. Note that this includes all tables of all subdirectories, so running this command on high level directories will probably give you more text than you can use.

To view just the biggest files, you can send these results to the sort program and then list only the top results using head. The disadvantage is that sort can't understand megabytes and such, so we'll have to tell du to list all the file sizes in kilobytes.

du directory -ka | sort -n -r | head -n20

This will show the twenty biggest files and directories underneath the starting directory (you can choose how many to view by changing the number after -n). These are the files you should focus on.

Options for Large Files

Once you've identified the large files in your home or project directory you can evaluate what to do with them. Here are some questions you might ask yourself:

  • Is this the only copy of this file?
  • Have I used this file recently?
  • Am I likely to need this file in the future?

If the answer to all three questions is "no" then the file can probably be deleted. Deleting files you no longer need is the best way to save disk space and is very easy to do. Simply type:

rm file

where file should be replaced by the name of the file you want to remove.

On the other hand, if you think you may need the file sometime in the future but don't need it right now, archiving it on a CD or DVD may be the best solution. You can request that SSCC staff archive files for you by visiting our archive request page. If you'd prefer to do it yourself, the SSCC makes DVD writers available in 2470 and 4218 Sewell Social Science Building. The student operator in 4218 will be happy to assist you in using them. For vital files we suggest you make two or even three copies and store them in separate locations.

If you use the file occasionally but not on a regular basis, consider compressing it. That way it remains on the network and you can have it ready for use again in a matter of minutes, but it takes up less disk space--in most cases much less. Using Compressed Data in Linux has instructions.

If you use large Stata data sets, the user-written gzsave and gzuse commands act just like regular save and use, but work with compressed files directly. For details or to install them, start up Stata and type:

findit gzsave

Do not make copies of standard data files archived by CDE or other agencies or individuals. If you are working as a group, keep files needed by everyone in the group in a shared location rather than each member making their own copy.

Using Temporary Space

One easy way to make sure you don't forget to delete a file when you're done with it is to put it in temporary space. In Linux, files stored in /temp30days are deleted after thirty days, but you are welcome to use as much space as you need during that time—just make yourself a directory there. If you store files you'll only need briefly in /temp30days, you'll never have you worry about going back to delete them. Keep in mind that /temp30days is not backed up.

Last Revised: 2/2/2006