lokety

Perform Satistical Operations on Columns in CSV Files

by lokety on Mar.07, 2007, under Posts

This is a simple but powerful way to process files in Unix, using the humble program awk.

To calculate sum or average of a numerical column of a comma separated file, create a text file like so:

BEGIN { FS = “,” }
{ s += $3 }
END { printf “sum = %.2f, avg = %.2f, hits = %d\n”, s, s/NR, NR }

Use your creative juices to save it with a meaningful name, say, test.awk.

Call awk with this file, like so:

awk -f test.awk mycsvfile.csv

You should see the sum, average and number of lines processed. In this example, it is assumed that the values in each line are separated by commas, and the numerical column is the third one.

For more information on awk, RTFM or Google it.

Technorati Tags: , , , , , , ,

  • Share/Bookmark

Related posts:

:, ,


Leave a Reply

Spam protection by WP Captcha-Free

I Digg

Archives

Be Good

View Teng-Yan Loke's profile on LinkedIn

The Hunger Site - Click to fight against hunger

Counter visits since 16 Sep 2006

Technology Blogs - BlogCatalog Blog Directory