Bug #602

TMQ invalid prepare files shall be dropped (i.e if size is bellow the must have size)

Added by Madars 6 months ago. Updated 6 days ago.

Status:ClosedStart date:11/01/2020
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:-
Target version:-

Description

N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523340925:tmq_msg_add :qspace.c:0795:Adding to G_msgid_hash [4Ez+WCeaRqKbVqIPs0H7iAEAZAAAAAAAAAAAAAAAAAA=]
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523340937:id_serialize:common.c:0308:Original MSGID (nr bytes: 32)
  0000  e0 4c fe 58 27 9a 46 a2 9b 56 a2 0f b3 41 fb 88  .L.X'.F..V...A..
  0010  01 00 64 00 00 00 00 00 00 00 00 00 00 00 00 00  ..d.............
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523340981:id_serialize:common.c:0315:MSGID after serialize: [4Ez+WCeaRqKbVqIPs0H7iAEAZAAAAAAAAAAAAAAAAAA=]
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523340993:tmq_msg_add :qspace.c:0840:Message with id [4Ez+WCeaRqKbVqIPs0H7iAEAZAAAAAAAAAAAAAAAAAA=] successfully enqueued to [MYQ1] queue (DEBUG: locked 0)
N:NDRX:3:9211474f:17251:7f1c15841940:001:20201031:205523344478:e_get_blocks:isk_xa.c:1266:Loading [/home/user1/modules/endurox-go/tests/07_basic_durable_queue/conf/../var/qspace1/prepared/YZT3oUBAYJXzDI3LSkK+IuQwL6YhHgEAAQAB-001]
N:NDRX:2:9211474f:17251:7f1c15841940:001:20201031:205523344538:ead_tx_block:isk_xa.c:0995:ERROR! Failed to read tx file (tmq_storage_get_blocks: /home/user1/modules/endurox-go/tests/07_basic_durable_queue/conf/../var/qspace1/prepared/YZT3oUBAYJXzDI3LSkK+IuQwL6YhHgEAAQAB-001): req_read=696, read=0: Success
N:NDRX:2:9211474f:17251:7f1c15841940:001:20201031:205523344649:e_get_blocks:isk_xa.c:1287:Failed to read [/home/user1/modules/endurox-go/tests/07_basic_durable_queue/conf/../var/qspace1/prepared/YZT3oUBAYJXzDI3LSkK+IuQwL6YhHgEAAQAB-001]: File exists
N:NDRX:4:9211474f:17251:7f1c15841940:001:20201031:205523344670:mq_load_msgs:qspace.c:1590:tmq_load_msgs return -1
N:NDRX:2:9211474f:17251:7f1c15841940:001:20201031:205523344686:ndrx_main   :rvmain.c:0991:tpsvrinit() fail
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523344764:_tpsvrdone__:mqueue.c:0539:tpsvrdone called - requesting background thread shutdown...
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523344789:_close_entry:tmi/xa.c:0423:atmi_xa_close_entry
N:NDRX:2:9211474f:17251:7f1c15841940:001:20201031:205523344803:_close_entry:isk_xa.c:0632:xa_close_entry() called
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523344845:ndrx_tpterm :i/init.c:0628:_tpterm called
N:NDRX:5:9211474f:17251:7f1c15841940:001:20201031:205523344888:rx_svq_close:ys_svq.c:0110:close 0x1538c20 mqd

Such cases might appear if disk is full while performing the enqueue.

The other option would be to avoid the prepare phase, if disk is full

History

#1 Updated by Madars 2 months ago

Seems that tmqueue in case if tmsrv is missing, ignore fact transaction is started:

Log:

N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699719:_shm_get_svc:mi/shm.c:0852:ndrx_shm_get_svc [@TM-1] - result: 1, iterations: 0, pos: 15268, install: 0
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699724:_shm_get_svc:mi/shm.c:0409:Service @TM-1 not available, count of servers: 0
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699728:_shm_get_svc:mi/shm.c:0616:ndrx_shm_get_svc returns -1
N:NDRX:4:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699732:ndrx_tpacall:tpcall.c:0486:Service is not available @TM-1 by shm
N:NDRX:3:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699738:et_error_fmt:perror.c:0341:ndrx_TPset_error_fmt: 6 (TPENOENT) [ndrx_tpacall: Service is not available @TM-1 by shm]
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699743:ndrx_tpacall:tpcall.c:0702:ndrx_tpacall return -1
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699747:ndrx_tpcall :tpcall.c:1168:_tpacall to @TM-1 failed
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699751:ndrx_tpcall :tpcall.c:1226:ndrx_tpcall: return -1 cd 0
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699756:m_generic_fb:autils.c:0771:@TM-1 failed: 6:TPENOENT (last error 6: ndrx_tpacall: Service is not available @TM-1 by shm)
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699760:m_generic_fb:autils.c:0776:got response from [@TM-1]
N:NDRX:4:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699764:bug_dump_UBF:/utils.c:0251:Response buffer:
TMCALLERRM      1
TMTXTOUT        30
TMTXFLAGS       0
TMCMD   b
TMPROCESSID     srv,tmqueue,60,7322,19,2
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699774:m_generic_fb:autils.c:0790:Failed to call RM: 6:[6:TPENOENT (last error 6: ndrx_tpacall: Service is not available @TM-1 by shm)] 
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699779:ndrx_tpfree :ed_buf.c:0383:_tpfree buf=0x7f0b780b1e90
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699784:m_generic_fb:autils.c:0823:atmi_xa_call_tm_generic returns (nil)
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699788:ndrx_tpbegin:tmi/xa.c:0990:Failed to execute TM command [b]
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699792:tmq_enqueue :tmqapi.c:0117:Failed to start global tx!
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699819:tmq_enqueue :tmqapi.c:0161:Got message for Q:  (nr bytes: 68)
  0000  00 01 55 42 46 31 00 00 00 00 00 00 00 00 00 00  ..UBF1..........
  0010  00 00 00 00 00 00 00 00 00 00 00 00 18 00 00 00  ................
  0020  44 00 00 00 00 00 00 00 44 00 00 00 59 02 00 0a  D.......D...Y...
  0030  0e 00 00 00 48 65 6c 6c 6f 20 2f 51 20 39 31 38  ....Hello /Q 918
  0040  21 00 23 23                                      !.##
N:NDRX:2:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699871:mq_msgid_gen:qspace.c:0175:MSGID: struct size: 20
N:NDRX:4:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699876:tmq_enqueue :tmqapi.c:0229:Messag prepared ok, about to enqueue to [ERR] Q...
N:NDRX:3:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699881:with_default:qspace.c:0349:Q config [ERR] not found, trying to default to [@]
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699886:id_serialize:common.c:0308:Original MSGID (nr bytes: 32)
  0000  4f 57 e2 ad cf 62 4a bd ac 18 a7 44 c0 f5 32 07  OW...bJ....D..2.
  0010  02 00 3c 00 00 00 00 00 00 00 00 00 00 00 00 00  ..<.............
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699901:id_serialize:common.c:0315:MSGID after serialize: [T1firc9iSr2sGKdEwPUyBwIAPAAAAAAAAAAAAAAAAAA=]
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699910:tmq_msg_add :qspace.c:0795:Adding to G_msgid_hash [T1firc9iSr2sGKdEwPUyBwIAPAAAAAAAAAAAAAAAAAA=]
N:NDRX:5:d5d3db3a:07322:7f0bc6bcc700:019:20210309:182600699915:e_cmd_newmsg:isk_xa.c:1152:Writing new message to disk (nr bytes: 764)

Code:

    if (!tpgetlev())
    {
        NDRX_LOG(log_debug, "Not in global transaction, starting local...");
        if (EXSUCCEED!=tpbegin(G_tmqueue_cfg.dflt_timeout, 0))
        {
            NDRX_LOG(log_error, "Failed to start global tx!");
            userlog("Failed to start global tx!");
        }
        else
        {
            NDRX_LOG(log_debug, "Transaction started ok...");
            local_tx = EXTRUE;
        }

#3 Updated by Madars 19 days ago

Fixed in 7.5.26+

Release notes:

- Any file created and not finished full write, file will be deleted and error properly is reported.
- During the startup if tmqueue fails to read the files, startup of proces shall ignore invalid files and proceed. This means that particular operation was unsuccessful, thus transaction was rolled back.

#4 Updated by Madars 19 days ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

#5 Updated by Madars 19 days ago

  • Status changed from Resolved to Closed

#6 Updated by Madars 11 days ago

  • Status changed from Closed to In Progress

Additional things to fix:

1. Check dequeue write err (shall terminate with failure) -> QMEOS return error, when dequeue fails.
2. Ensure that messages are written to disk in transaction fashion (write header with 0x00.., write data, write header with real data)
2.1. Any active message for which header is bad (so probably was power loss or process terminate in write middle), shall be unlinked.
3. Update message headers with reserved disk space, to simplify future upgrades.
4. Make command blocks smaller in size (to store only required data)

#7 Updated by Madars 6 days ago

  • Status changed from In Progress to Resolved

Release notes

Tmqueue has been upgraded for better file handling, corruption detection, housekeeping of corrupted files. Internal format of the queue files has been changed. Thus to upgrade to given Enduro/X version, any in progress messages must be completed prior the upgrade.

#8 Updated by Madars 6 days ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF