Loading only new files (with logfiles)

An interesting question from the community about loading only files which had not previously been loaded and storing this information into QVDs.

The requirement was:
1- Scan a folder of files
2- load only new files
3- log the files loaded into a log qvd
4- load the data into a qvd
5- This must also work for a ‘first run’ scenario, i.e. where the QVD’s don’t yet exist.

Below is the code used, I’ve commented it enough to explain it and there are test files attached where you can see this working.



Let vFilePath='E:\'; //set the location we want to work in, can be as granular as wanted

//Check for the log qvd
LET logqvdfile = '$(vFilePath)qvds\Processed_ServerLog_FilesList.qvd';
LET baseQVDFileSize = FileSize('$(logqvdfile)'); //test the file size

IF baseQVDFileSize > 0 THEN // if the file exists then it will have a size of > 0

TRACE logfile found, loading it;

//import the existing logfile

log:
LOAD filename, loaded
from [$(vFilePath)qvds\Processed_ServerLog_FilesList.qvd] (QVD);

ELSE
ENDIF

//check for the fact qvd using the same logig

LET factqvdfile = '$(vFilePath)qvds\ServerLog_Full.qvd';
LET baseFactFileSize = FileSize('$(factqvdfile)');

IF baseFactFileSize > 0 THEN // if the file exists then it will have a size of > 0

TRACE factfile found, loading it;

//load in the existing data

fact:
LOAD filename,
[1],
[2],
[3]
from [$(vFilePath)qvds\ServerLog_Full.qvd] (QVD);

for each File in filelist ('$(vFilePath)sourcedata\*.txt') //scan our data folder

test: //preceeding load required for our not exists test
load filename
WHERE NOT EXISTS(filename); //this is the line which means the system will not load files where filename already exists (its checking this to the fact table)

LOAD '$(File)' as filename
AUTOGENERATE 1
;
next File;

LET CKNumRows=NoOfRows('test');
FOR p=0 to $(CKNumRows) -1 // start a loop, peek is zero based so if only one row need to do 0 to 0 hence the -1
LET vFileChecker=Peek('filename',p,'test'); //get the next filename from our test table

//load the fact file
fact:
LOAD '$(vFileChecker)' as filename,
[1],
[2],
[3]
FROM
'$(vFileChecker)'
(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

NEXT p; //increment p so that when we peek we are looking at the next row of data and hence next file

concatenate(log) //now add this data into our log table

LOAD filename, now() as loaded
resident test;

drop table test; //don't need this table anymore so drop it

ELSE //our fact QVD doesn't exist already so we can't check filename

TRACE factfile not found so building it;

//our fact doesn't exist so we can just whip through and load all the files

for each File1 in filelist ('$(vFilePath)sourcedata\*.txt')
fact:
LOAD '$(File1)' as filename, [1],
[2],
[3]
FROM
'$(File1)'
(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

NEXT File1

TRACE logfile not found so building it;

//the log doesn't exist either so we can just load them all

log:
LOAD filename, now() as loaded
resident fact;

ENDIF

//store the two tables into the qvd files
store log into [$(vFilePath)qvds\Processed_ServerLog_FilesList.qvd];
store fact into [$(vFilePath)qvds\ServerLog_Full.qvd];

//drop the tables as we don't need them any more
drop table log;
drop table fact;

relevant supporting files

sourcedata

daily_load_test

Leave a Reply