Obtaining and Caching a Lot of Words

list of A words

It is surprising how often I wish I had a lot of words handy.  This week it has been because I’ve wanted to play with the AutoCompleteBox (you just set the list of words as the ItemSource for the control and voila!

In previous posts I demonstrated how I obtained these from a book through Project Gutenberg and how I used a background worker thread to keep the UI up to date. Today I’ll show how to use Isolated Storage to stash the words locally to dramatically improve performance, and then after I show this nifty trick at DevConnections I’ll write up how to obtain the words on one page and then use them in a AutoCompleteBox on a second page (ok, it’s not that hard).

Isolated Storage

Isolated storage really works well here, because once you go to the bother of getting and sorting these words, it is rather silly to go get them again the next time you run the program. The trick, of course is just to check to see if you’ve already saved them in Iso Storage and then if so, just reconstitute them. If not, then when you are done using them, stash them away in isolated storage for next time.

You can get all sorts of fancy saving away complex data structures and saving different lists, but to keep things simple, let’s just… well, keep things simple.

When we’re about to get ask the user what file to open to grab words from, we’ll do a quick "look aside" to see if we already have words saved,

void Page_Loaded( object sender, RoutedEventArgs e )
{
  worker.WorkerReportsProgress = true;
  worker.DoWork += new DoWorkEventHandler( worker_DoWork );
  worker.ProgressChanged += new ProgressChangedEventHandler( worker_ProgressChanged );
  worker.RunWorkerCompleted += new RunWorkerCompletedEventHandler( worker_RunWorkerCompleted );
  if ( TestIsoStorage() )
  {
    FilePicker.IsEnabled = false;
    if ( worker.IsBusy != true )
      worker.RunWorkerAsync( null );
  }
  else
  {
    FilePicker.Click += new RoutedEventHandler( FilePicker_Click );
  }
}

 

This takes a bit of explanation. I’m still setting up my worker thread, because i’m going to use it whether or not i Have the words. It will be the worker thread that take s the single string of words and rebuilds my list of strings that the application expects. And why not? That part is already working?  The only change I wanted to make was either to get the file and parse it or not.

Let’s look at TestIsoStorage(),

The logic here is that I call GetUserStoreForApplication which returns an IsolatedStorageFile at the application level (and since this is a resource I want to make sure is given up as quickly as possible I take advantage of C#’s using construct) . With that, I can test if my isolated storage file exists and if it does, I open a StreamReader and in one line I open the file for reading and suck the entire contents out as a single string, which  I place into a string builder.

NB: I’m of two minds about my ambivalence about having a single return point. One argument is that it is less confusing if you use a flag (retVal) and always exit at the end, the other responds with a word I’m not allowed to write here. Most of the time I would rewrite this as

private bool TestIsoStorage()

{

  bool retVal = false;

  using ( var store = IsolatedStorageFile.GetUserStoreForApplication() )

  {

    if ( store.FileExists( "SortedWords" ) )

    {

      using ( StreamReader reader =

        new StreamReader( store.OpenFile( "SortedWords", FileMode.Open ) ) )

      {

        sb = new StringBuilder();

        sb.Append( reader.ReadToEnd() );

        retVal = true;

      }

    }

    return retVal;

  }

}

but I don’t get too excited about it.

The key to note (and I admit it is almost a hack) is that if we get the words from the file, we never call the dialog box (in fact we disable the open file button) and kick off the background thread with a null file

if ( TestIsoStorage() )
{   
  FilePicker.IsEnabled = false;   
  if ( worker.IsBusy != true )      
    worker.RunWorkerAsync( null );
}

 

The first half of DoWork is encased in a big if statement that basically turns it into a noop if we have obtained the words from isolated storage.  I kinda’ hate this because the connection is not obvious, but it works, its late and I swear I’ll come back and fix it… really.

void worker_DoWork( object sender, DoWorkEventArgs e )
{
  const long MAXBYTES = 200000;
  BackgroundWorker workerRef = sender as BackgroundWorker;
  if ( workerRef != null )
  {    // begin massive ugly hack      
    if ( e.Argument != null )
    {
      System.IO.FileInfo file = e.Argument as System.IO.FileInfo;
      if ( file != null )
      {
        System.IO.Stream fileStream = file.OpenRead();
        using ( System.IO.StreamReader reader = new System.IO.StreamReader( fileStream 
        {
          string temp = string.Empty;
          try
          {
            do
            {
              temp = reader.ReadLine();
              sb.Append( temp );
            } while ( temp != null && sb.Length < MAXBYTES );
          }
          catch { }
        }     // end using             
        fileStream.Close();
      }        // end if file != null      
    }           // end if argument is null       
    string pattern = "\\b";
    allWords = System.Text.RegularExpressions.Regex.Split( sb.ToString(), pattern );
    long total = allWords.Length / 100;
    long soFar = 0;
    int newPctg = 0;
    int pctg = 0;
    foreach ( string word in allWords )
    {
      newPctg = (int) ( ( ++soFar ) / total );
      if ( newPctg != pctg )
      {
        pctg = newPctg;
        workerRef.ReportProgress( pctg );
      }
      if ( words.Contains( word ) == false )
      {
        if ( word.Length > 0 && !IsJunk( word ) )
        {
          words.Add( word );
        }     
      }       
    }        
  }                      
}

 

Finally, when the thread ends we make sure to go save the words for next tmie if we’ve not done so yet,

private void StoreWords()
{   
  Message.Text = "Storing Words in Isolated Storage...";    
  using ( var store = IsolatedStorageFile.GetUserStoreForApplication() )   
  {      
    if ( ! store.FileExists( "SortedWords" ) )      
    {         
      StringBuilder sb = new StringBuilder();         
      foreach ( string s in words )         
      {            
        sb.Append( s + " " );         
      }         
      using ( StreamWriter writer = 
        new StreamWriter( store.OpenFile( "SortedWords", FileMode.Create ) ) )         
        { 
          writer.Write( sb.ToString() ); 
        }
    }
  }
}

 

The result, not surprisingly is a much faster start up to the program.    I do worry just a bit about the detritus of long forgotten isolated storage files cluttering up the disk. I wonder if we can put in a self-destruct timer?  I’ll have to look into that.

 

-j

About Jesse Liberty

Jesse Liberty has three decades of experience writing and delivering software projects and is the author of 2 dozen books and a couple dozen online courses. His latest book, Building APIs with .NET will be released early in 2025. Liberty is a Senior SW Engineer for CNH and he was a Senior Technical Evangelist for Microsoft, a Distinguished Software Engineer for AT&T, a VP for Information Services for Citibank and a Software Architect for PBS. He is a Microsoft MVP.
This entry was posted in z Silverlight Archives. Bookmark the permalink.