Fixing truncated records with a quick Perl hack

A little while back I mentioned a quick hack I had done for Bowker to fix some truncated records in one of the production tables…and I finally found a few extra minutes to go ahead and share that quick/simple hack with you now. First let me give you the quick background. Basically there’s a web service that logs gobs of data submitted by a user (via a front end form on a web site). It logs this information in a CLOB field in the database as a raw XML content. In addition to sending the data off to the web service for storage (and later processing), the fron end code saves a local copy of each XML file (because we are in beta and wanted to be able to handle the specific type of problems this hack fixes). On the backend, once a night, there is a process within Bowker that reads these XML records and does some processing on them (in an attempt to update a handful of other non-integrated systems and products). If the record can be processed by the batch program, the XML record is removed from the table…if it can’t the record remains (and the user is notified via another batch system about required updates). For an unknown reason at some point last week, all of the records remaining in the table were truncated at a certain character length. This of course meant that the XML within the fields was no longer valid XML, and that being the case the batch program and the front end services began to have all kinds of problems dealing with the records. We needed to clear this problem up as quickly as possible! Simply deleting the truncated records from the table would actually solve the problem, but it would also mean that no more alerts would go out to the users and the data the users put in previously would not be saved (and available for fixes/updates) to them! What we really needed was to fix the trunctaion and restore the XML for each record. Luckily we had been storing copies of the XML files on the server making the requests. So all I had to do was download those, loop through them figuring out just which ones we really wanted to re-run, and then update them in the database (that’s what the below script does). The keys to making this really easy are mostly related to the internal knowledge I have/had of the systems in place and the data I’m dealing with. For example, I knew the basic web service takes an XML file and stores it into the table (doing an update or an insert as needed)…so that meant I could just reuse that without having to rewrite that logic or deal with CLOB database issues in my local script. I also knew that inside the truncated xml records in the database there was a tag early on that listed the ISBN, and this value was unique to each file. It was early enough in each file that it had not been truncated, and it was unique enough that I could key off of it. So that meant I could use it to determine which records I really should recall the service for (I didn’t want to put back in records that the batch system had already successfully processed and therefore deleted from the table). With all of that in mind, writing a Perl script to run on my local machine (with an DBI connection to the Oracle database) was really pretty trival…and here it is (names and variables changed for security of course):

use DBI; use HTTP::Request::Common qw(POST); use LWP::UserAgent; $database = “MY_DB”; # the DB we want to work in $username = “MY_USERNAME”; # the username we are accessing the DB with $password = “MY_PASSWORD”; # the password for the user we are accessing the DB with my $dbh = DBI->connect(“dbi:Oracle:$database”, $username, $password); $dbh->{‘LongTruncOk’} = 1; $dbh->{'LongReadLen’} = 255; my @files = glob(“C:\\MY_LOCAL_FILEPATH\\*”); $reccount = 0; foreach my $file(@files){   if ($file =~ /isbnmemberwebservice/) {     # this is a file we want to to check the contents of     $reccount++;     open (SDLFILE, $file);     my $thisfile = “”;     while ($line = <SDLFILE> ) {       $thisfile .= $line;     }     # get the ISBN out of this data     my $isbn = “0”;     if ($thisfile =~ /<isbn13>(\d+)<\/isbn13>/) {       $isbn = $1;       my $sth = $dbh->prepare(“SELECT count(0) FROM MY_SDL_STORAGE_TABLE where xml like ’%” . $isbn . “%’”) or die;       $sth->execute() or die;       while (my @data = $sth->fetchrow_array()) {         if ($data[0] != 0) {           $reccount++;           $ua = LWP::UserAgent->new;           my $req = POST “http://my_web_service_url_to_submit_data_to”, [ xml => $thisfile ];           $response = $ua->request($req)->{’_content’};           print $reccount . “: ” . $isbn . “ record updated in MY_SDL_STORAGE_TABLE\n”;         } else {           print $reccount . “: ” . $isbn . “ NO RECORD\n”;         }       }     }   } }

This post has received 38 loves.


ARCHIVE OF POSTS



This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.

Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.

If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).