PlanetSquires Forums

Support Forums => General Board => Topic started by: Paul Squires on January 09, 2010, 04:28:54 PM

Title: PARSE$ versus PARSE
Post by: Paul Squires on January 09, 2010, 04:28:54 PM
I always knew that the PARSE$ statement was slow on large strings when you tried to iterate it in a for/next loop, but I am about to share with you just how dramatic it is.

In the code editor for FireFly that I wrote, you can paste multiple lines of text into the text editor (as you would expect to do in a text editor). Here is the kind of code that I was using:


   ' add each $CrLf segment (if any)
   pLineNew = JellyEdit_pLine( ed, @pLine.nLineNum )
   For x = 1 To nCount + 1
     
      st = Parse$( sLine, $CrLf, x )
     
      '  Update our memory structure to hold the new text
      MemString @pLineNew.pText, st
     
      pLineNew = JellyEdit_pLine( ed, @pLine.nLineNum + x )
      If x > 1 Then
         Incr @ed.yCaret
         @ed.fRefresh = %TRUE
      End If
         
   Next



Works perfectly for a few hundreds of lines of text... but I just had to copy/paste about 32,000 lines of text from Jose's TypeLib Browser for a pretty large OCX control.

When I tried to post it into the code editor I thought that FF was freezing up due to a bug or something... it took 57 seconds to parse the large string.

I then changed the code to the following (parse the string to an array first using PB's PARSE to array capabilities):


   Dim sLines( 1 To nCount + 1 ) As String
   Parse sLine, sLines(), $CrLf
   
   ' add each $CrLf segment (if any)
   pLineNew = JellyEdit_pLine( ed, @pLine.nLineNum )
   For x = 1 To nCount + 1
     
      st = sLines(x)
     
      '  Update our memory structure to hold the new text
      MemString @pLineNew.pText, st
     
      pLineNew = JellyEdit_pLine( ed, @pLine.nLineNum + x )
      If x > 1 Then
         Incr @ed.yCaret
         @ed.fRefresh = %TRUE
      End If
         
   Next


Using the new approach, the paste of 32,000 lines took 1.4 seconds... what a hell of a difference.

Oh, by the way, this new fix will be in 3.06.
Title: Re: PARSE$ versus PARSE
Post by: Roger Garstang on January 11, 2010, 11:28:48 AM
Yeah, Parse$ is pretty useless unless you are using it to just parse small strings like GPS strings over a COM Port, commandlines, or something where it is maybe 10-20 items since every call to Parse$ has to parse the string from the beginning to x number of items.  How do you get the count of the items for your DIM?  PB's line count has to count all the items by running through the string once too.  You might get an even better advantage parsing it yourself and resizing the array twice its current size each time you run out of room then to the exact size needed when done. You'd think PB would optimize PARSE$ when it sees it in a FOR Loop to remember the last position.  If smart they may make Parse not even re-allocate the string data and just put the string in a block just how it is with each array element pointing to where it finds each break and setting length to keep it from containing the CRLF bytes.  Would be interesting to access the data with pointers and see if it is all one block still containing the CRLFs, etc.
Title: Re: PARSE$ versus PARSE
Post by: Martin Francom on January 12, 2010, 06:20:15 AM
Interesting.... That's a tip of which I will make note.
Title: Re: PARSE$ versus PARSE
Post by: Gary Stout on January 12, 2010, 04:21:48 PM
Here is an excerpt from the most recent update of PB 9.0 history file..... not sure if this addresses the PARSE scenerio you are describing.

Changes to existing Statements and Functions:

- Dramatic speedup of INSTR(), TALLY(), PARSE$(), EXTRACT$(), REMAIN$(), RETAIN$(),
  and REMOVE$() in most situations.
Title: Re: PARSE$ versus PARSE
Post by: Roger Garstang on January 12, 2010, 05:05:26 PM
Hey Gary, hadn't seen you in a while.  That reminds me I need to get the .03 updates...I'm still on the .02.  They released updates pretty close together this time.  Didn't post a history that I had seen either, so I forgot all about them being released.  Maybe they finally made them remember the last position if used in a loop to speed things up.

Edit:
Hmm, actually looking at my .02 history it listed that already.  I'm going to have to remember to backup my history and compare the two when I update.