• Welcome to PlanetSquires Forums.
 

Reading data from a text file

Started by Petrus Vorster, July 25, 2018, 04:25:49 PM

Previous topic - Next topic

Petrus Vorster

hi all

Reading data from a text file is easy, but how do one go about trying to get workable information from a file which has NO structure at all.
There are no delimited lines or anything one can parse into something.

I have a report that sometimes has hundreds of serial numbers on them and i want to see if i cannot find some miraculous way to read this file and extract only what i need.
I have attached the file if anyone can think of a way one can read this into a parsed field.

I have been trying various delimiters but not with much success.
-Regards
Peter

Petrus Vorster

Well luckily it turned out all the different products have 14 digit serial numbers.
But its still darn hard to to detect certain strings even with INSTR to read the different product codes.

But at least the trail i have managed to filter out the serial numbers.
-Regards
Peter

David Kenny

#2
You can use regexpr.  It's pretty powerful and quite fast.

FUNCTION PBMAIN () AS LONG
    GetSerials("Cons1.TXT")
End Function                           

Sub Getserials(FileSpec as string)
    Local Str           As String
    Local SerNo         As String           
    Local iPos, iLen    As Long
    Open FileSpec For Binary As 1               'Open in binary mode
    Get$ #1, Lof(1),Str                         'Read it all
    Close 1
    Do                                                                   
        RegExpr "[0-9A-F]* TO" In Str At iPos + iLen To iPos , iLen      'Note: Your example showed each SerNo in duplicate because the
        SerNo += Mid$(Str, iPos, iLen - 3) & $CrLf                       '      form had SerNo TO SerNo and in each case they were the same.
    Loop Until iLen = 0                                                  '      I'm only grabing the one before the TO but still not checking
                                                                         '      for duplicates
    MsgBox SerNo                                'List them
End Sub


By the way, the regular expression in my example looks for any contiguous characters (of any size) containing digits (0-9) or upper-case letters (A-F) followed by ' TO'. Your sample file works with this regular expression, but you may find situations that it would need to modified. It would match 'B TO' in 'TAB TO COL 10' for instance.

David

Petrus Vorster

Hi David

Thanks a million, i didnt even know about this command.
I will most definitely go and try this.

-Thanks, Peter
-Regards
Peter

Petrus Vorster

Thanks David, that works perfectly.
Now i will experiment with that and see what else i can retrieve like order numbers etc.

The system can export in Excel and CSV (now they tell me), but I am now curious to what extend i can play with this.

Thanks a million.
-Regards
Peter

David Kenny

It's in the PB help file.  There are many other sources out there of regular expressions as well and each version has a slight 'dialect'. 
There used to be an online regexpr tester at PowerBasic.com.  It seems to be missing now.  Kevin Voell wrote one in PB as well, it's still available in the forums.  You can supply a sample file to search, and it will show you what your current expression will find.  Then you just copy it to your code.

It gets easier after looking at it for a while and trying things.  Very powerful.