Importing CSV files into Excel and LibreOffice Calc

I have a spreadsheet where I first take CSV data and manipulate it into a standardized format for searching, then run calculations to get the answers I need. I’ve been opening the CSV file in Notepad, then manually pasting it into the appropriate tab on the spreadsheet. To get the results I need, it’s pulling data from three CSV files. That’s working fine because this is just for my use at the moment and I can quickly copy the data from Notepad into the tabs…

I need to automate the import of the CSV data when the file is opened so it can be used by more people in the future. You know as well as I do that if I tell them they’ll have to do the copy-and-paste routine, some won’t want to use the spreadsheet no matter how useful it will be to them.

I’m getting stuck due to a variety of factors:

  1. I created the spreadsheet with LibreOffice Calc and I have no idea if the people using it will have Calc, Excel or the web version of Excel on their computer. I’d like to just have one file, so I’m currently saving in XLSX.

  2. The CSV files have to be manually renamed before opening the spreadsheet since they include a timestamp in the filenames. Otherwise, the connection to the CSV files within the spreadsheet has to parse the filename to select the right file. The filenames have a common format of ABC.csv, with A = unique identifier, B = report type, C = timestamp, and I would be parsing to find B.

  3. Anything that opens the CSV files needs to be able to handle relative file paths instead of hard-coded links. The spreadsheet and the CSV files will be in the same folder, but I want the users to be able to place the folder anywhere on their computer’s drive.
    If relative paths are not supported, then I will have to do something like say “create a folder called TEST on Drive C and copy your files to there”.

  4. The CSV files have a variable amount of data in them. The tabs where the CSV data will land have to be blank when the spreadsheet is opened to potentially avoid having new data mixing with old data.
    Example: Report 1 from unit 1 has 50 lines of data, report 1 from unit 2 has 45 lines, and report 1 from unit 3 has 75 lines. If you re-used the spreadsheet where you had imported unit 1’s data and then fed it unit 2’s data, you have 5 lines from unit 1 that don’t get overwritten.

  5. LibreOffice Calc has an External Data function (Sheet > External Links) that does the import pretty good, but as soon as I blank out the tabs where the CSV data landed, that breaks the external link to the files. (When I’m browsing to the file, it shows the full path, but I can edit the path to make it a relative link.)
    The intent is that they will open the spreadsheet, have it import the CSV files, then they will use Save As to make it a unit-specific report. When they want to run the report for a new machine, they go back to the original spreadsheet and open that again. (I may recommend they create a new folder for each new machine.)

 
This is probably a good candidate for using macros, but LibreOffice doesn’t seem to come with a Java Runtime Environment and I won’t know if the other computers will have JRE on them. If they don’t, they may not have local admin rights to install JRE.

Anyone got any ideas how to tackle this?

Assuming it’s Windows, will Powershell be an option?

1 Like

The computers will have Windows and I have PowerShell on my laptop, but I don’t know if anyone else will have that.

Here is the beginning and end of some macro code I wrote to import text files into Excel many years ago for a fix for the library the Dragonlady was working at. There may be some helpful nuggets in there. For example, I found it sped things up to turn calculation and screen updating off until the data was distributed and then on again. Near the end, I select the print area for them in case they want to go straight to print, which was the preferred option for this particular sheet.

Private Sub Workbook_Open()
    Dim Line, SourceFile As String
    Dim FileNum, Count, Start, Pos, RowCounter, InputLine, InputRecordCounter As Integer
    Dim varSheet, varLocation, varCallNumber, varAuthor, varTitle, varBarcode, varType, varBranch As String
    Dim rtn
    Dim CurrWorkBk As Workbook
    varSheet = "Pull List"
    Set CurrWorkBk = ActiveWorkbook
    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False
    SourceFile = CurrWorkBk.Path & "\Library Notice.txt"
    If Dir(SourceFile) = "" Then
        rtn = MsgBox("Input file (" & SourceFile & ") not found." & vbCrLf & "Please create input file in this location and try again.", vbExclamation + vbOKOnly, "INPUT ERROR")
        Exit Sub
    End If
    CurrWorkBk.Worksheets(varSheet).Activate
    RowCounter = 3
    InputLine = 1
    FileNum = FreeFile
    Open SourceFile For Input Access Read As FileNum
    CurrWorkBk.Worksheets(varSheet).Range("C1").Value = CStr(FileDateTime(SourceFile))
    Do While (Not EOF(FileNum))
        Line Input #FileNum, Line
.
.
.
    Loop
    Close FileNum
    CurrWorkBk.Worksheets(varSheet).Range("B1").Value = varBranch
    CurrWorkBk.Worksheets(varSheet).Range("A3:F" + Format(RowCounter)).Sort Key1:=Worksheets(varSheet).Range("A3"), Key2:=Worksheets(varSheet).Range("B3")
    CurrWorkBk.Worksheets(varSheet).PageSetup.PrintArea = CurrWorkBk.Worksheets(varSheet).Range("A1:F" + Format(RowCounter - 1)).Address()
    CurrWorkBk.Worksheets(varSheet).Range("A1").Select
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
End Sub

Oh, of course it would have to be completely different in Excel than it is in LibreOffice.

In LibreOffice, you can copy the CSV data from Notepad and when you paste it into the spreadsheet, it detects it as CSV data and pops up the Import Text window. You pick how you want the data separation to be handled and a few other options, and OK. If you have your cells already formatted with the type of data they will have, you’re golden. It will remember what you selected from the time before.

If you try to use CTRL-V to paste in the CSV data in Excel, it’s pure text, with the commas and quotation marks included.

To do an import of the data copied from Notepad, it’s click on the v under “Paste” in the Home menu, select Use Text Import Wizard, make sure it’s set for delimited data, click Next, change the delimiters from a space to a comma because it always assumes you will be dealing with spaces as your delimiter no matter how many times you’ve used this before, and then pick from one of the three simplified data formats:

  • General: convert anything that looks like a number to a number, anything that looks like a date to a date and everything else is text.
  • Force it to be text.
  • Force it to be a date.

That’s all fine and dandy until you have a data of -007 that you need to be treated as text because Excel decides that’s -7.

I think I can get it back into a four-character text string by converting it to a string, stripping out the dash, pasting in two zeroes in front of the number, grabbing the three right-most characters and finally, putting the dash back in the front.

All of which is unnecessary in LibreOffice because it can handle -007 as text if you want it to.

Yes. It would have to have a single quote in front to be handled as a text field in that instance.

A coworker who has done similar analysis on external spreadsheets gave me a lot of help on this. It now uses Excel’s Get and Transform to do the import. It’s almost completely automated.

  1. I wanted a single spreadsheet that could be used in Excel or LibreOffice:
    Due to import differences, this is only going to work in Excel. It didn’t immediately work when I opened the spreadsheet in LO Calc, so I’m going to switch to Excel for further updates.

  2. Have to rename files ahead of time due to timestamps in the filenames:
    No longer required. The G&T import parses the file names and selects the relevant ones for importing.

  3. I wanted relative file paths so the spreadsheet and its data files could be located anywhere:
    Had to be a compromise. The import has to have a hard-coded path, but the spreadsheet could be anywhere. So, we’re doing “create a TEST folder on Drive C and put the CSV files in there you want processed before opening the spreadsheet”.

  4. CSV files have variable amounts of data in them so some tabs have to be blank before the import to avoid mixing new data in with any existing old data:
    No longer a problem. The import creates a new table that completely replaces the previous table every time the import is refreshed.

  5. LibreOffice’s External Data function: no longer applicable.
     

I think we got this down to as user-friendly as we can. The people that will be using this really should only have to do three things:

  1. Only put one set of data files in the folder at a time. (Get and Transform could match the same report from multiple sets of data if they are in the same parent folder.)
  2. Fill in some info on the customer that owns the device.
  3. Hit refresh to have the spreadsheet update with that device’s data set.

They just save the spreadsheet with a new name and they’re ready to go with a comprehensive report on the health and performance of the device, with recommendations for future work to be done on it.
 

We learned that if you try to put the folder with the CSV files on the Desktop, you’d have to do a kind of lookup in Windows to create a variable with the username of whoever’s logged in right now for the import path. The hard-coded path is further altered by Windows itself due to the assumption that your Desktop is a part of OneDrive instead of being a local folder on your own computer.

Pointing the import to a folder at the root of Drive C avoids that headache. The other people use their laptops to perform diagnostics on the devices, so they should have enough rights to create the folder and move files in and out of it.
 

The proof of concept is done. Now I can begin adapting it to be more device-specific, customized for the features of each one. In car terms, it would be like adjusting it from being for a Chevy Camaro with the V6 engine and either the 6-speed manual transmission or the 10-speed automatic transmission, to being for the Chevy Malibu with its 4-cylinder engine and CVT.

1 Like

I will definitely need to learn the new version of Excel. I can’t get some conditional formatting to work, so I’m having to copy cells out of the LibreOffice Calc version and paste them into Excel so the conditional formatting carries over, then fix what needs to be in that cell.

Oh, joy. A lot of the links between tabs on the spreadsheet are no longer links. They’re showing static values. I will have to check every cell on every tab and manually fix the ones that broke.

2 Likes

This is getting stupid. Excel is preventing me from formatting cells if they have a link or a VLOOKUP to another tab in the spreadsheet. It can be as simple as saying the cell has the number format with the thousands separator.

It’s breaking one of the simplest formulas there is. =SUM(A1:A3)
If I use =A1+A2+A3, it works.

1 Like

For anyone else that runs into it, the answer is whether the number in the cell is an actual number or text being processed as a number. SUM can’t handle the latter. A formula that manually specifies the cell(s) invokes “process text that looks like a number as a number” and can handle it.

Use INT or VALUE to force numbers stored as text to become actual numbers. This also fixed why the cell formatting wasn’t working.

Before anyone asks, I have no control over how the reports are generated. I can’t have them changed to output actual numbers and output the rest as text.

Anyone want to give this a shot?

I found some data in one of the reports that has two sections next to each other with the same non-unique identifiers. We’ll use a simple 3x7 range as an example:

Dogs /              /
Dog1 / Spaniel / Spotted
Dog2 / Collie / White
Dog3 / Pitbull / Gray
Dog1 / Poodle / Fluffy
Dog2 / Terrier / Brindle
Dog3 / Labrador / Tan

I can find “Dog1” easily enough and get “Spotted” with VLOOKUP. But I actually need to jump down and over to get the second Dog1’s “Terrier” and the second Dog3’s “Labrador”.

I tried wrapping the VLOOKUP in the ROW command so it would tell me what row the initial “Dog1” is in so I could feed row+4 into a new VLOOKUP. No go on the row show.

I started digging into INDEX, MATCH and OFFSET, but they can’t seem to cope with the VLOOKUP referencing a different sheet.

This particular search can be shelved if I can’t figure it out. It’s a nicety that would say, “if you see an error, look at this item”. But I’d like to include it as a kind of nudge in the right direction. “You know this update process you don’t like to do? Well, here’s something that can be fixed by running updates.”

In one instance, I found it easiest to import all data into a hidden tab and deal with it locally.
My god. That was 25+ years ago. I may still have the code around somewhere, though.
Maybe I was having my slave transcribe the numbers onto a second wax tablet so I didn’t have to travel all the way to the market to do the archaic spreadsheets.

It has been imported and I’m working with the data inside the spreadsheet. I guess this is kind of like what I’ve had to do before when a complex formula or lookup just doesn’t want to work. I have to split it up into several steps and brute force finding what I need.

Here’s a more accurate example of what needs to be done.

“Dog1” is found with VLOOKUP in cell E1239 in the Pets sheet. Now that it’s been found, I need to link to the the Pets sheet for H1269:H1298 put a copy of the values there into a sheet called Results. On the Results sheet, I can link to the ten specific items I need since at that point, they’re at a fixed range of cells on the Results sheet and no longer free-floating inside the Pets sheet with its variable amount of data.

Using a bit of pseudo-code, it would be something like this to pick out the ten values without grabbing a range of cells to copy from first:
=VLOOKUP(“Dog1”,Pets!F1:H3000,R+30 and C+3,0)
R and C are the row and column “Dog1” was found at. Shift down 30 rows and right 3 columns. Repeat for the other 9 values I need with appropriate + to R and C.

The brute force method would be something like this, which is what isn’t working.
=ROW(VLOOKUP(“Dog1”,Pets!F1:F3000,1,0)
This is what would return row 1239 in the above example.
(from there, )
cell1 in the Results sheet:   =Pets!$H$row+30
cell2:   =Pets!$H$row+33
cell4:   =Pets!$H$row+36
(repeat for cell4 to cell10)