- 0 0
In the context of the recent controversy surrounding one of the reporting sources for the various Coronavirus Positive tests having been miss-counting data due to truncation in Excel.
What's not mentioned in the news stories , or poorly understood is what went wrong and why.
It's nothing to do with databases, more a lack of trainging and/or using the tools incorrectly.
Their reporting was predicated upon reading the last ( most recent ) row(s) of a 'csv' file ( A large text file, with data/variables in rows, separated by a comma. )
This went wrong when they tried using excel to open up the entire file ( as opposed to using an appropriate tool to read just the last row. ).
This could have been achieved reliably by opening in -
Notepad ( yes - the data is just a text file, and depending on the complexity of said data, it could have simply been opened in the default windows text editor ).
SQL Server ( or similar ) - Import the CSV into an actual database, use Select statment with a filter to return only the last/most recent row.
Python - Python can read the CSV into a data-frame ( similar to a database table ) using Pandas - again the data could be filtered to present the last row.
Excel - Instead of trying to open the entire file into excel , and getting truncation ( Excel can only show 1,048,756 rows - so if a file had 1,048,757 rows, you'd never see the last row) , the file should have been opened as a data connection and transformed / filtered with PowerQuery.