Ideas/Recommendations for Data Delivery
From RAJAR Wiki
1. General comments on the format of documentation files
It is proposed that all documentation files will be sent in the tab separated variable format (.tsv)
All files would be sent in this format for consistency, and in some cases to avoid duplication of files in differing formats which has an overhead in terms of maintenance.
- A number of different file formats are currently used, and a common format for all the files is a good thing --John Shorter (HallettArendt)
- Is there any good reason why tsv format is proposed rather than csv (comma-separated)? We would prefer csv every time for all files. This has already been used for many files for several years. A change now to tsv would involve substantial re-programming on our part. Will there ever be a release that does not feaure some format change? -- [Paul Ely - IMS (UK) Ltd]
2. Specific comments about the Check Tables
As the name suggests, this data is provided in the form of tables, and therefore includes a lot of formatting text aimed at producing readable printouts. The main use of this data however is for input into computerised systems, and therefore this additional text is actually a hindrance.
These are the recommendations:
- All titling, page throw characters, extra lines of whatever nature are removed.
- There is just one line per station or group, with the 4 digit Ipsos report number as the first field. Exact duplicates are not necessary.
- The station name and reporting period will be in separate columns.
- Weighted and unweighted check figures be provided in separate files or separate columns. Currently extra coding or manual work is required to remove/ignore the unweighted portion of the data.
- The actual data is provided to the same level of precision as individual sample points, which is currently in units of 1/10th of a person. [Cy Booker]
- Data is also provided for 'All people' as well as 'Adults'. [Cy Booker]
- Data will be tab separated. The plain text versions of these files may be surplus to requirements.
3. Segment definition files
There are a few recommendations regarding the provision of these files, which are used to define TSA definitions.
- We recommend that every station code which is live in the current database must have a corresponding entry. All national stations are currently excluded from the seg files.
- National stations to have code '*' to signify 'all'? - Alan M (Telmar)
- It is better that a TSA is specified for each station code, it removes the need to make assumpations --John Shorter (HallettArendt)
- Each entry must be confined to a single line. Currently long lines are split across 2 or 3 lines, making it very difficult to process safely
- We manually edit this file before processing to resolve this issue currently. This would remove the need for this. --John Shorter (HallettArendt)
- As well as providing segment lists for individual stations, lists to be provided for each of the 4 digit Ipsos report number which will cover all groups. This will make it easier for bureaus to match group totals.
- This will help greatly when balancing back to check tables. -- John Shorter (HallettArendt)
4. Out of area listening
In recent issues of Rajar data, it has become more common for stations to appear listed outside of their TSA definition. This should only occur when this is required to match the correct figures for a station group total. Therefore we recommend that: All occurrences where stations are listed outside of their TSA be provided as part of the documentation. This will remove any uncertainty about whether this type of occurrence is deliberate, or a mistake in the data.
- this will help with problem solving for the small number of stations that are involved.--John Shorter (HallettArendt)
5. 4 Digit Ipsos Report Number
It would be possible to use RSL's own station/group codes to improve matching between new data from Ipsos and stations/groups as they exist in bureaux systems. These would be contained in all documentation. If there are any changes to these codes these will be advised in DUG meetings. For example, a station's TSA changes so considerably as to reasonably be held to constitute a new station.
- I notice that in the check tables this time (eg tablesq.txt),there is a serial number before each station name (eg 1900:BBC Radio 1).Is this going to be a regular feature from now on or can they be removed please? [Paul Ely - IMS (UK) Ltd]
- Roland (Broadnet) also noticed these numbers. Requested that another file is sent without them as it affects his processing. Ideal format is as per LookUp3, 123_name_period_pop_weighted_unweighted--Emilyp 09:59, 11 Aug 2005 (BST)
- Having looked through previous check fig files, previously csv sent with these codes, but txt was not. This time around they are in sync, hence problems at IMS and Broadnet.--Emilyp 10:10, 11 Aug 2005 (BST)
- Roland (Broadnet) also noticed these numbers. Requested that another file is sent without them as it affects his processing. Ideal format is as per LookUp3, 123_name_period_pop_weighted_unweighted--Emilyp 09:59, 11 Aug 2005 (BST)
6. Nested Data
It may be possible to create a single documentation file containing stations/groups with their RSL codes and their segment lists, either using XML or a tsv file with extra delimiters.
7. Postcode Data
Possibility this could be sent
- We get this info in the form of the Karadana file --John Shorter (HallettArendt)
