Tuesday, January 29, 2008

Another way of manipulating large amount of data in oracle (Remove duplicate)


Below example explains, 1 way to remove duplicate data. tmp_data table contains 6millions of record, but with duplicate. Table tmp_duplicate contains all duplicates data that exists in tmp_data.

create or replace procedure delete_duplicate is

cursor c is select sn from tmp_data a where a.sn in (select sn from tmp_duplicate);

type c1_type is table of c%rowtype;

l_c1 c1_type;

begin

open c;

loop

fetch c bulk collect into l_c1 LIMIT 1000;

for i in 1 .. l_c1.count

loop

--Expecting to do the delete

dbms_output.put_line('Delete this ' || l_c1(i).sn);

end loop;

dbms_output.put_line('----------------------');

exit when c%notfound;

end loop;

close c;

end delete_duplicate;

Seperate certain data in file to another file.



Let say, i have many raw files, example data files generated from system. Every day the system generates the raw files, with pipe | seperated. At the end of the month i want to extract certain data from each of the files. Another problem arise is, these files contains more 100K of lines.

List of files in my unix server
mixed_raw_data_20080101.csv
mixed_raw_data_20080102.csv
mixed_raw_data_20080103.csv
mixed_raw_data_20080104.csv
mixed_raw_data_20080105.csv
mixed_raw_data_20080106.csv
.....

Content of the file
Hello|amy
Hello|amy
Hello|amy
Hello|amy
Hello|aby
Hello|aby

e.g:
Extract aby from all the listed files. Manually i can do each files like this
more mixed_raw_data_20080101.csv | grep '|aby' > mixed_raw_data_20080101.csv.extract

But what if i have to extract for the whole month which has 30 files....

for f in mixed_raw_data_*.csv; do more $f | grep '|aby' > $f.extract ; done;

Saturday, January 12, 2008

Find all large files on a Linux machine


Finds all files over 20,000KB (roughly 20MB) in size and presents their names and size in a human readable format:

find / -type f -size +20000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

Change file extensions with bash shell



To change all .htm files in a folder to .html files:

for f in *.htm; do mv $f `basename $f .htm`.html; done;